Skip to content

feat: add third_party_telemetry_enabled config to disable infra telemetry#181

Merged
stevenolen merged 12 commits intomainfrom
third-party-telemetry-enabled
Apr 6, 2026
Merged

feat: add third_party_telemetry_enabled config to disable infra telemetry#181
stevenolen merged 12 commits intomainfrom
third-party-telemetry-enabled

Conversation

@stevenolen
Copy link
Copy Markdown
Collaborator

Add a workload-level third_party_telemetry_enabled config option (default: true) that disables usage reporting and update checks for third-party infrastructure components when set to false.

Components affected:

  • Grafana: analytics reporting, update checks, plugin update checks
  • Loki: analytics reporting
  • Mimir: usage stats
  • Alloy: reporting
  • Traefik: version check and anonymous usage (AWS + Azure)
  • Calico: usage reporting via FelixConfiguration (EKS only)

Also fixes:

  • Tigera Operator chart version bumped 3.26.1 → 3.29.3 to support native defaultFelixConfiguration in Helm values
  • Tigera Operator now uses configurable tigera_operator_version from WorkloadClusterComponentConfig instead of hardcoded version

Going to deploy this to test workload now to confirm!

Category of change

  • Bug fix (non-breaking change which fixes an issue)
  • Version upgrade (upgrading the version of a service or product)
  • New feature (non-breaking change which adds functionality)
  • Build: a code change that affects the build system or external dependencies
  • Performance: a code change that improves performance
  • Refactor: a code change that neither fixes a bug nor adds a feature
  • Documentation: documentation changes
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

  • I have reviewed my own diff and added inline comments on lines I want reviewers to focus on or that I am uncertain about

@stevenolen stevenolen requested a review from a team as a code owner March 16, 2026 18:26
@stevenolen stevenolen force-pushed the third-party-telemetry-enabled branch from 9f5c4cc to 3931738 Compare March 16, 2026 18:55
Lytol
Lytol previously approved these changes Mar 16, 2026
Copy link
Copy Markdown
Contributor

@Lytol Lytol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll trust you that the options for each component are correct. Looks good!

…etry

Add a workload-level `third_party_telemetry_enabled` config option
(default: true) that disables usage reporting and update checks for
third-party infrastructure components when set to false.

Components affected:
- Grafana: analytics reporting, update checks, plugin update checks
- Loki: analytics reporting
- Mimir: usage stats
- Alloy: reporting
- Traefik: version check and anonymous usage (AWS + Azure)
- Calico: usage reporting via FelixConfiguration (EKS only)

Also fixes:
- Tigera Operator chart version bumped 3.26.1 → 3.29.3 to support
  native defaultFelixConfiguration in Helm values
- Tigera Operator now uses configurable tigera_operator_version from
  WorkloadClusterComponentConfig instead of hardcoded version
@stevenolen stevenolen force-pushed the third-party-telemetry-enabled branch from 3931738 to a496abe Compare March 17, 2026 18:42
force_update_version is a valid parameter on aws.eks.NodeGroup but not
aws.eks.Cluster. It was incorrectly being injected into cluster_args,
causing Cluster creation to fail with an unexpected keyword argument.

Store the value on self for use by node groups instead.
When disabling third-party telemetry, the Tigera Helm chart needs to
manage the existing default FelixConfiguration. Add a CustomResourcePatch
to set Helm ownership labels/annotations before the Helm release runs,
allowing Helm to adopt the pre-existing resource.
timtalbot
timtalbot previously approved these changes Mar 18, 2026
Calico 3.29.3 fails with iptables-legacy-save exit status 111 on
Amazon Linux 2023 (kernel 6.12) which only ships nftables. Set
iptablesBackend: NFT in defaultFelixConfiguration unconditionally.
- Switch linuxDataplane from Iptables to Nftables (GA in 3.31, fixes AL2023 kernel 6.12)
- Remove iptablesBackend: NFT (not needed with native Nftables dataplane)
- Remove nonPrivileged: Enabled (no longer supported in 3.31)
- Disable Goldmane and Whisker (new 3.31 components, CRDs not pre-installed)
timtalbot
timtalbot previously approved these changes Mar 18, 2026
The Nftables linuxDataplane value was added in Calico 3.31, but older
CRD schemas only allow Iptables/BPF/VPP. Patch the CRD enum before
the Helm release so the API server accepts the new value, enabling
single-step upgrades from older versions.
3.31.4 with linuxDataplane: Iptables works on AL2023 — the operator
uses iptables-nft under the hood. The Nftables dataplane caused
NetworkUnavailable to stick on nodes after the transition. Removing
the CRD patch since it was only needed for the Nftables enum.
@stevenolen
Copy link
Copy Markdown
Collaborator Author

stevenolen commented Mar 18, 2026

Update: Tigera Operator upgrade to 3.31.4

This PR now also includes an upgrade of the Tigera Operator from 3.26.1 → 3.31.4, needed for AL2023 compatibility (iptables-legacy-save exit status 111 on kernel 6.12).

What changed:

  • Upgraded Tigera Operator chart to 3.31.4 (default, configurable via tigera_operator_version)
  • Disabled Goldmane and Whisker (new 3.31 components whose CRDs don't exist on older clusters)
  • Removed nonPrivileged: Enabled (no longer supported in 3.31)
  • Added FelixConfiguration adoption patch so Helm can manage the existing default FelixConfiguration (needed for defaultFelixConfiguration.usageReportingEnabled telemetry control)
  • Dataplane stays on linuxDataplane: Iptables — 3.31.4 correctly uses iptables-nft under the hood on AL2023, no need for the Nftables dataplane

Tested on:

  • ganso01-staging
  • duplicado03-staging

Note: The commit history has some noise from a failed attempt at switching to linuxDataplane: Nftables, which caused NetworkUnavailable stuck on all nodes. That approach was abandoned — Iptables on 3.31.4 works cleanly on AL2023.

@stevenolen stevenolen requested a review from timtalbot March 18, 2026 21:28
The Tigera operator auto-detects EKS and defaults cni.type to AmazonVPC
when the field is empty. Due to a race condition during install/upgrade,
the operator can fill this default before Helm writes the user's value.
Once set, Helm's 3-way merge won't revert it.

Add a Pulumi CustomResourcePatch that explicitly sets cni.type=Calico
on the Installation CR after the Helm release, ensuring Calico CNI
overlay networking is always configured regardless of operator behavior.
@stevenolen
Copy link
Copy Markdown
Collaborator Author

Update: Fix Calico CNI override on EKS

During the Tigera 3.31.4 upgrade, we discovered that the operator auto-detects kubernetesProvider: EKS and defaults cni.type to AmazonVPC when the field is empty. Due to a race condition during install/upgrade, this default can be written before Helm applies the user's cni.type: Calico value. Once set, Helm's 3-way merge won't revert it.

This caused new nodes on staging clusters to come up NotReady — calico-node wouldn't install the CNI plugin because it thought AWS VPC CNI was handling networking.

Fix: Added a CustomResourcePatch that explicitly sets cni.type=Calico on the Installation CR after the Helm release. Verified on both staging clusters — all nodes recovered to Ready.

Helm 3.31.4 manages the default FelixConfiguration on its own when
defaultFelixConfiguration is enabled — the manual Helm ownership
labels/annotations patch is not needed and causes drift on every run.
@stevenolen stevenolen merged commit 2843209 into main Apr 6, 2026
7 checks passed
@stevenolen stevenolen deleted the third-party-telemetry-enabled branch April 6, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants