Skip to content

Separate CRDs Helm chart for upgradeable CRD lifecycle#208

Open
carlydf wants to merge 7 commits intomainfrom
cdf/crd-chart
Open

Separate CRDs Helm chart for upgradeable CRD lifecycle#208
carlydf wants to merge 7 commits intomainfrom
cdf/crd-chart

Conversation

@carlydf
Copy link
Collaborator

@carlydf carlydf commented Feb 26, 2026

Summary

  • Creates a new temporal-worker-controller-crds Helm chart with CRDs in templates/ (not crds/) so they are upgraded on helm upgrade
  • Adds helm.sh/resource-policy: keep to all CRDs so they survive helm uninstall of either chart
  • Removes crds/ from the main chart; CRDs now live exclusively in the CRDs chart
  • Updates make manifests to output to the new chart's templates/ and inject the annotation via awk
  • Updates make install, make uninstall, and make deploy to use the new paths
  • Adds helm-lint-crds and helm-template-crds jobs to helm-validate.yml CI gate
  • Updates release.yml and helm.yml to bump both chart versions and push both charts on release
  • Adds docs/crd-management.md with compatibility commitment, install/upgrade/rollback instructions, and a migration guide for existing users
  • Updates README.md installation section to document the two-chart install flow

Upgrade order

# Install / upgrade CRDs first, then the controller
helm upgrade temporal-worker-controller-crds ...
helm upgrade temporal-worker-controller ...

Rollback order

# Roll back controller first, CRDs optionally afterward
helm rollback temporal-worker-controller
helm rollback temporal-worker-controller-crds  # usually not needed

Migration for existing users (Controller Helm Chart v0.12.0 and earlier)

Helm does not delete CRDs when upgrading to the new chart, so upgrading the main chart is safe. A one-time helm install temporal-worker-controller-crds adopts the existing CRDs into the new release.

Test plan

  • helm lint --strict helm/temporal-worker-controller-crds passes
  • helm template test-release helm/temporal-worker-controller-crds emits two CRDs with helm.sh/resource-policy: keep
  • helm template test-release helm/temporal-worker-controller emits zero CRDs
  • CI helm-validate.yml passes all 4 jobs (lint, template, lint-crds, template-crds)

🤖 Generated with Claude Code

@carlydf carlydf requested review from a team and jlegrone as code owners February 26, 2026 05:00
To provide an explicit, version-tracked CRD upgrade path, the temporal-worker-controller ships CRDs as a separate Helm chart: `temporal-worker-controller-crds`. This is the same pattern used by [Karpenter](https://karpenter.sh/docs/upgrading/upgrade-guide/) (`karpenter-crd`) and [prometheus-operator-crds](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-operator-crds).

Benefits:
- CRDs survive accidental `helm uninstall` of either chart (`helm.sh/resource-policy: keep`)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems at odds with the sentence on line 5 that says:

and does not delete them on helm uninstall

I actually don't think the helm.sh/resource-policy: keep thing is a good idea. If someone runs helm uninstall, they are doing it on purpose and likely want to clean up their installation of something entirely. Leaving the (perhaps old-versioned) CRDs installed in the Kubernetes cluster is not the behaviour expected when an operator runs helm uninstall, IMHO...


## Compatibility Commitment

> **CRD chart version N is forward-compatible with controller chart versions N and N−1.**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the reason that TWC's Helm chart version is currently on a v0 release series because the CRD API version is v1alpha1? Are we planning on coordinating a v1 Helm Chart release series when moving the CRD API version to v1?

Comment on lines +31 to +39
The table below defines what must hold true for the commitment to be considered met. "N" is the current release and "N+1" is the next.

| Scenario | CRDs | Controller | Expected outcome |
|---|---|---|---|
| Normal upgrade (recommended order) | N → N+1 first | N → N+1 after | No issues |
| CRDs ahead, controller not yet upgraded | N+1 | N | Safe: new fields are optional, so the older controller ignores unknown fields in reads and doesn't include them in its writes (which is acceptable since they're optional) |
| Wrong order: controller upgraded first | N | N+1 (CRDs not yet upgraded) | Silent feature degradation: new status fields the N+1 controller writes are pruned by the API server (schema still at N). All existing CRs continue reconciling normally. New features depending on those fields don't work until CRDs are upgraded. No crashes, no per-CR errors. |
| Controller rollback (CRDs stay at N+1) | N+1 | N+1 → N | Safe for spec (the controller never writes TWD spec). Status fields added in N+1 will be absent after the first reconciliation cycle — this is acceptable because status is fully reconstituted from live state, not preserved across reconciles. |
| CRD rollback | N+1 → N | N | See [CRD rollback and field pruning](#crd-rollback-and-field-pruning) — this is the dangerous case. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, I think this is AI improperly using a Markdown table for something that isn't really a tabular concept. I would recommend removing the above section entirely (I think it's more confusing to the average reader than anything else...) and just sticking with the ## Upgrading instructions below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants