Skip to content

Releases: berops/claudie

v0.9.16

26 Nov 10:55
56e059b

Choose a tag to compare

v0.9.16

What's Changed

  • The open stack provider will now use image names instead of image ids, this was due to the possibility of the ids being replaced by the provider and no longer valid #1902

Bug fixes

  • Fix cloudflare account id propagation when updating to newer claudie versions #1904

v0.9.15

13 Nov 09:41
97232b0

Choose a tag to compare

v0.9.15

Bug fixes

  • Fixes issues with incompatible docker api in the ansibler service that resulted in the error from #1885

v0.9.14

17 Oct 12:27
e0b73c1

Choose a tag to compare

v0.9.14

What's Changed

  • Correctly remove taints,annotations,labels when removed from a NodePool in the InputManifest #1852
  • In some cases unnecessary tasks were spawned which would prolong the building of the cluster without any side-effect, these have been removed #1856
  • Expand machine spec to contain number of GPUs #1854
    Inside the NodePool specification it is now possible to specify the number of GPUs the instance has
    which is made use of when autoscaling based on GPU workload.
- name: autoscaled
  providerSpec:
   name: aws
   region: eu-central-1
   zone: eu-central-1a
  autoscaler:
    min: 0
    max: 20
  # GPU machine type name.
  serverType: g4dn.xlarge
  machineSpec:
    # explicitly specify how many GPU's the instance type provides.
    nvidiaGpu: 1
  image: ami-07eef52105e8a2059
  • Add support for OpenStack provider, with the main aim of supporting the openstack offering from OVH #1857
    It is now possible to use an openstack provider within the InputManifest.
    The support for openstack has been added in the v0.9.14 version of the claudie templates.
    - name: ovh-1
      providerType: openstack
      templates:
        repository: "https://github.com/berops/claudie-config"
        tag: v0.9.14
        path: "templates/terraformer/openstack"
      secretRef:
        name: ovh-secret
        namespace: e2e-secrets

v0.9.13

13 Sep 15:04
159b095

Choose a tag to compare

v0.9.13

What's Changed

  • Concurrency limits are now configurable #1838
  • Autoscaled nodepools are now limited to 256 nodes #1839
  • Metadata secret will now be updated after node deletion #1841
  • Builder TTL has been made configurable via the BUILDER_TTL env, with a default value of 2 hours #1850

Bug fixes

  • Prometheus metric for currently deleted nodes has been fixed #1849

v0.9.12

19 Aug 09:12
b9efabf

Choose a tag to compare

v0.9.12

What's Changed

  • Retries were added to reading the output from OpenTofu, which could occasionally fail. #1824
  • Increased concurrency limits to decrease the build time of larger clusters. This change also affects Claudie's memory requirements, which should fit within 8 GB. #1819
  • For autoscaled events, Terraformer will now skip refreshing the LoadBalancers and DNS infrastructure, if present. #1830

v0.9.11

04 Aug 16:31
b285ee4

Choose a tag to compare

v0.9.11

What's Changed

READ ME: A lot of core changes are made in this release, before updating an already deployed Claudie instance, make sure you have working backups of your kuberentes clusters

  • InputManifest was extended to also include a NoProxy list in the proxy settings to bypass the proxy for the listed endpoints, if used. #1745
kubernetes:
    clusters:
      - name: proxy-example
        version: "1.30.0"
        network: 192.168.2.0/24
        installationProxy:
            mode: "on"
            noProxy: ".suse.com"
  • Update kubeone to 1.10 #1749

  • Migrate to OpenTofu v1.6.2 from terraform v1.5.7 #1755

    READ ME: OpenTofu 1.6.2 is compatible with the previosly used Terraform version 1.5.7, while claudie will take care of the update, make sure you have working backups if you are updating an already deployed Claudie instance, in case of a disaster scenario

  • Add sprig to all templates used within claudie #1768

  • Builder will now support faster termination and wait only on the current task being processed instead of the whole workflow #1770

  • Claudie will now support proper HA DNS Loadbalancing #1777

    This feature will be available with the latest claudie templates v0.9.11

    READ ME: for already deployed Claudie instances, if you used Cloudflare as a provider you will need to update your secret to also include the Accound ID the token was created for.

  • NGINX was replaced by Envoy on Loadbalancers. #1735

    READ ME: If you update an already deployed Claudie instance, this is a one time update that will introduce a small downtime of the services while NGINX is being replaced with Envoy.

  • Upgraded all terraform providers to the latest possible version that still supports the claudie templates version v0.9.8 #1782

  • Claudie will now perform a rollout restart for the NVIDIA GPU operator daemonset as part of the workflow, which overwrites the /etc/containerd/config.yml. #1790

Bug fixes

  • Return partially updated state instead of always defaulting to current state after error in deletion #1793
  • Restarting SSH session after updating environmnet variables, is now part of the ansible workflow, which previosly caused issue in which the updated environment variables were not reflected in a re-used SSH connection #1792
  • Fixed a memory leak in the autoscaler service. #1787

v0.9.10

09 Apr 15:04
926f566

Choose a tag to compare

v0.9.10

What's Changed

  • Decrease the amount of retries for cleanup of static nodes during deletion from 4 to 2 #1729

Bug fixes

  • Fix panic when deleting clusters with static nodes for which DNS was not built correctly #1724
  • Fix propagation of desired state from operator to manager service #1726
  • Fix multiple HTTP proxy environment variables present in /etc/environment #1727
  • Fix partial DNS apply, which would left part of the infrastructure untracked #1728

v0.9.9

01 Apr 12:25
530b7a5

Choose a tag to compare

v0.9.9

What's Changed

  • General maintenance release, updated dependencies used by Claudie #1709

  • Upgrading Longhorn from version 1.7.0 to version 1.8.1 #1709

    After upgrading Longhorn to the newer version, some pods of the old and new versions will coexist if your cluster uses a PVC that uses the Longhorn storage class (which is the default), as they would reference the old v1.7.0.

    To upgrade the volumes to the newer version, it's possible to use the Longhorn UI to set Settings > Concurrent Automatic Engine Upgrade Per Node Limit to a value greater than 0 to upgrade old volumes.
    This is a setting that controls how Longhorn automatically upgrades volumes’ engines to the new default engine image after upgrading Longhorn manager. More on: https://longhorn.io/docs/1.8.1/deploy/upgrade/auto-upgrade-engine/

    Once the upgrade is complete, the old engine image pods and the instance manager will be terminated after ~60 minutes of non-use (after all volumes have been upgraded to use the latest Longhorn version) You can also follow the official Longhorn post on this: https://longhorn.io/kb/troubleshooting-some-old-instance-manager-pods-are-still-running-after-upgrade/

v0.9.8

19 Mar 13:11
72c4533

Choose a tag to compare

v0.9.8

What's Changed

  • Added support for alternative names for load balancers #1693

       dns:
         dnsZone: example.com
         provider: example
         hostname: main
         alternativeNames:
           - other

    Templates that Claudie uses by default, will be updated separately to make use of the alternative names.

Bug fixes

  • If the current state was not built and some of the nodes did not have an assigned IP address, Claudie would fail to correctly determine if the nodes were reachable. #1691
  • Claudie will now increase the limits for fs.inotify to a higher number, as depending on the workload on each node, reaching the limits would result in an error from which Claudie would not recover. #1696
  • Annotations for static nodepools will now be correctly propagated. #1696

Claudie v0.9.7

12 Mar 13:18
600878b

Choose a tag to compare

v0.9.7

What's Changed

  • Additional settings were added to roles for LoadBalancers. #1685.

    It is now possible to configure adding/removing proxy protocol and sticky sessions.

    stickySessions will always forward traffic to the same node based on the IP hash.

    proxyProtocol will turn on the proxy protocol. If used, the application to which the traffic is redirected must support this protocol.

      loadBalancers:
      roles:
        - name: example-role
          protocol: tcp
          port: 6443
          targetPort: 6443
          targetPools:
            - htz-kube-nodes
          # added
          settings:
            proxyProtocol: off (default will be on)
            stickySession: on. (default will be off)
    
  • Claudie will now ping nodes to check If any of the nodes became unreachable, Claudie will report the problem and will not work on any changes until the connectivity issue is resolved. #1658

    For unreachable nodes within the kubernetes cluster, Claudie will give you the options of resolving the issue or removing the node from the InputManifest or via kubectl, Claudie will report the following issue

    fix the unreachable nodes by either:
     - fixing the connectivity issue
     - if the connectivity issue cannot be resolved, you can:
       - delete the whole nodepool from the kubernetes cluster in the InputManifest
       - delete the selected unreachable node/s manually from the cluster via 'kubectl'
         - if its a static node you will also need to remove it from the InputManifest
         - if its a dynamic node claudie will replace it.
         NOTE: if the unreachable node is the kube-apiserver, claudie will not be able to recover
               after the deletion.
    

    For unreachable nodes within the loadbalancer cluster, Claudie will give you the options of resolving the issue or removing the nodepool or load balancer from the InputManifest, Claudie will report the following issue

    fix the unreachable nodes by either:
     - fixing the connectivity issue
     - if the connectivity issue cannot be resolved, you can:
       - delete the whole nodepool from the loadbalancer cluster in the InputManifest
       - delete the whole loadbalancer cluster from the InputManifest
    

Bug fixes

  • It may be the case that the cluster-autoscaler image may not share the same version as the specified kubernetes version in the InputManifest. Claudie will now correctly recognize this and pick the latest available cluster-autoscaler image #1680

  • Claudie will now set the limits of max open file descriptors on each node to 65535 #1679