Separate CRDs Helm chart for upgradeable CRD lifecycle#208
Separate CRDs Helm chart for upgradeable CRD lifecycle#208
Conversation
…ions are out of sync but within 'compatible' window
…controller into cdf/crd-chart
…controller into cdf/crd-chart
| To provide an explicit, version-tracked CRD upgrade path, the temporal-worker-controller ships CRDs as a separate Helm chart: `temporal-worker-controller-crds`. This is the same pattern used by [Karpenter](https://karpenter.sh/docs/upgrading/upgrade-guide/) (`karpenter-crd`) and [prometheus-operator-crds](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-operator-crds). | ||
|
|
||
| Benefits: | ||
| - CRDs survive accidental `helm uninstall` of either chart (`helm.sh/resource-policy: keep`) |
There was a problem hiding this comment.
This seems at odds with the sentence on line 5 that says:
and does not delete them on
helm uninstall
I actually don't think the helm.sh/resource-policy: keep thing is a good idea. If someone runs helm uninstall, they are doing it on purpose and likely want to clean up their installation of something entirely. Leaving the (perhaps old-versioned) CRDs installed in the Kubernetes cluster is not the behaviour expected when an operator runs helm uninstall, IMHO...
|
|
||
| ## Compatibility Commitment | ||
|
|
||
| > **CRD chart version N is forward-compatible with controller chart versions N and N−1.** |
There was a problem hiding this comment.
is the reason that TWC's Helm chart version is currently on a v0 release series because the CRD API version is v1alpha1? Are we planning on coordinating a v1 Helm Chart release series when moving the CRD API version to v1?
| The table below defines what must hold true for the commitment to be considered met. "N" is the current release and "N+1" is the next. | ||
|
|
||
| | Scenario | CRDs | Controller | Expected outcome | | ||
| |---|---|---|---| | ||
| | Normal upgrade (recommended order) | N → N+1 first | N → N+1 after | No issues | | ||
| | CRDs ahead, controller not yet upgraded | N+1 | N | Safe: new fields are optional, so the older controller ignores unknown fields in reads and doesn't include them in its writes (which is acceptable since they're optional) | | ||
| | Wrong order: controller upgraded first | N | N+1 (CRDs not yet upgraded) | Silent feature degradation: new status fields the N+1 controller writes are pruned by the API server (schema still at N). All existing CRs continue reconciling normally. New features depending on those fields don't work until CRDs are upgraded. No crashes, no per-CR errors. | | ||
| | Controller rollback (CRDs stay at N+1) | N+1 | N+1 → N | Safe for spec (the controller never writes TWD spec). Status fields added in N+1 will be absent after the first reconciliation cycle — this is acceptable because status is fully reconstituted from live state, not preserved across reconciles. | | ||
| | CRD rollback | N+1 → N | N | See [CRD rollback and field pruning](#crd-rollback-and-field-pruning) — this is the dangerous case. | |
There was a problem hiding this comment.
LOL, I think this is AI improperly using a Markdown table for something that isn't really a tabular concept. I would recommend removing the above section entirely (I think it's more confusing to the average reader than anything else...) and just sticking with the ## Upgrading instructions below.
Summary
temporal-worker-controller-crdsHelm chart with CRDs intemplates/(notcrds/) so they are upgraded onhelm upgradehelm.sh/resource-policy: keepto all CRDs so they survivehelm uninstallof either chartcrds/from the main chart; CRDs now live exclusively in the CRDs chartmake manifeststo output to the new chart'stemplates/and inject the annotation viaawkmake install,make uninstall, andmake deployto use the new pathshelm-lint-crdsandhelm-template-crdsjobs tohelm-validate.ymlCI gaterelease.ymlandhelm.ymlto bump both chart versions and push both charts on releasedocs/crd-management.mdwith compatibility commitment, install/upgrade/rollback instructions, and a migration guide for existing usersREADME.mdinstallation section to document the two-chart install flowUpgrade order
Rollback order
Migration for existing users (Controller Helm Chart v0.12.0 and earlier)
Helm does not delete CRDs when upgrading to the new chart, so upgrading the main chart is safe. A one-time
helm install temporal-worker-controller-crdsadopts the existing CRDs into the new release.Test plan
helm lint --strict helm/temporal-worker-controller-crdspasseshelm template test-release helm/temporal-worker-controller-crdsemits two CRDs withhelm.sh/resource-policy: keephelm template test-release helm/temporal-worker-controlleremits zero CRDshelm-validate.ymlpasses all 4 jobs (lint, template, lint-crds, template-crds)🤖 Generated with Claude Code