Kubernetes integration testing using Minikube and Argo Workflows in Jenkins #2032

AndreKurait · 2025-12-04T22:25:46Z

Description

This PR enables Kubernetes integration testing for the OpenSearch Migration Assistant using Minikube and Argo Workflows. It introduces a new workflow template (testMigrationWithWorkflowCli) that uses the migration console CLI commands to orchestrate full migrations, along with supporting infrastructure for local development and CI testing.

Key Changes

New Workflow Template - testMigrationWithWorkflowCli

Creates a new Argo WorkflowTemplate that uses migration workflow CLI commands instead of direct Argo/Kubernetes API calls
Adds helper scripts (configureAndSubmit.sh, monitor.sh) for workflow configuration and monitoring
Integrates with the existing fullMigrationWithClusters workflow for end-to-end testing

Kyverno Policies for Dev Environment

Adds zeroResourceRequests policy to zero out pod resource requests in dev environments (allows running on resource-constrained Minikube)
Adds mountLocalAwsCreds policy for mounting host AWS credentials into pods
Policies use post-install hooks with ConfigMap + Job pattern to ensure Kyverno is ready before applying
Configures Kyverno webhooksCleanup to use migrations/migration_console image instead of unavailable bitnami/kubectl:1.30.2

Helm Chart Updates

Simplifies image references to use direct repository:tag format (removes unused _imageHelper.tpl and registryPrefix pattern)
Updates valuesDev.yaml with Kyverno configuration for local development
Fixes Kyverno policy syntax to use patchStrategicMerge instead of invalid JMESPath

Test Automation Improvements

Updates test_runner.py to support registry prefix for individual image repositories
Adds existence check for workflow templates directory
Updates Jenkins pipelines to use Minikube registry

Cleanup

Removes legacy workflow templates from migrationConsole/workflows/templates/ (now generated via DSL in orchestrationSpecs)
Consolidates test workflows into migrationConsole/testWorkflows/

Issues Resolved

Enables local Kubernetes integration testing with Minikube for the Migration Assistant.

Testing

All 4 integration tests pass locally on Minikube (ES 5.6 → OS 2.19):
- Test0001SingleDocumentBackfill ✅
- Test0004MultiTypeUnionMigration ✅
- Test0005MultiTypeSplitMigration ✅
- Test0006OpenSearchBenchmarkBackfill ✅
Jest snapshots updated for changed resource configurations
Linting passes

Check List

New functionality includes testing
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…to run a migration Signed-off-by: Jugal Chauhan <[email protected]>

Signed-off-by: Jugal Chauhan <[email protected]>

…kins-k8s-local-test

Signed-off-by: Jugal Chauhan <[email protected]>

…yaml Signed-off-by: Andre Kurait <[email protected]>

Signed-off-by: Andre Kurait <[email protected]>

codecov · 2025-12-04T22:27:47Z

Codecov Report

❌ Patch coverage is 54.00000% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.36%. Comparing base (ee8c09d) to head (8a8779c).

Files with missing lines	Patch %	Lines
...b/console_link/console_link/models/argo_service.py	54.00%	23 Missing ⚠️

❌ Your patch check has failed because the patch coverage (54.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2032      +/-   ##
============================================
+ Coverage     77.55%   78.36%   +0.80%     
- Complexity       14       21       +7     
============================================
  Files           603      603              
  Lines         24200    24247      +47     
  Branches       1855     1854       -1     
============================================
+ Hits          18769    19000     +231     
+ Misses         4512     4353     -159     
+ Partials        919      894      -25

Flag	Coverage Δ
gradle	`76.52% <ø> (+1.29%)`	⬆️
node	`90.11% <ø> (ø)`
python	`78.60% <54.00%> (-0.21%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Andre Kurait <[email protected]>

- Document chart overview and prerequisites - Add installation instructions - Include detailed security warnings for Kyverno policies - Document mountLocalAwsCreds policy risks and alternatives - Add values reference table for Kyverno configuration Signed-off-by: Andre Kurait <[email protected]>

- Add clarifying comment for actual-registry label in buildDockerImagesMini.sh - Remove outdated comment from migrationConsole/Dockerfile - Add warning headers to test helper scripts about package separation - Add TODO for removing base64 encoding in testMigrationWithWorkflowCli.ts - Add detailed comments explaining monitorResult flow and exit codes - Add TODO to resourceLoader.ts about separating test helpers - Rename testMigrationWithWorkflowCli to TestMigrationWithWorkflowCli for naming consistency Signed-off-by: Andre Kurait <[email protected]>

Remove --omit=optional from npm ci command to allow installation of platform-specific @typescript/native-preview binaries required by tsgo. Signed-off-by: Andre Kurait <[email protected]>

…ion workflow - Add handleWorkflowSuccess, handleWorkflowFailure, handleWorkflowError templates - Increase retry limit from 13 (~10 min) to 33 (~30 min) for monitor step - Add conditional steps to handle different workflow outcomes - Update snapshot tests Signed-off-by: Andre Kurait <[email protected]>

The configuration cache was failing because Task objects (Jar, StartScripts) were being passed directly to from() and inputs.files() methods, which caused Gradle to attempt serializing them. Fixed by using .map { it.outputs.files } to extract the output files from TaskProvider references instead of passing the task references directly. This resolves the following configuration cache errors: - copyArtifact_capture_proxy - copyArtifact_capture_proxy_es - copyArtifact_traffic_replayer - syncArtifact_migration_console_staging Signed-off-by: Andre Kurait <[email protected]>

- Remove invalid workflow-level parameters using valueFrom.configMapKeyRef with template expressions - Use valueFrom.configMapKeyRef at step level when calling templates instead - Templates receive image location and pull policy as input parameters - Follows the same pattern as full-migration.yaml and other workflow templates Signed-off-by: Andre Kurait <[email protected]>

When workflow cluster configurations lack authentication settings, the conversion to Python cluster schema was not adding any auth field, causing validation to fail with: 'No values are present from set: [basic_auth, no_auth, sigv4]'. This fix ensures that when no authentication configuration is provided in the workflow config (no authConfig field and no legacy auth fields), the converter defaults to no_auth: None, which is the appropriate setting for clusters that don't require authentication. Added test coverage to verify the fix handles the scenario where workflow configs only contain endpoint, allowInsecure, and version fields without any authentication configuration. Signed-off-by: Andre Kurait <[email protected]>

gregschohn · 2025-12-10T22:11:11Z

deployment/k8s/charts/aggregates/migrationAssistantWithArgo/README.md

+
+---
+
+## ⚠️ Security Considerations


Can you summarize this more? > 2/3 of the README is for an obscure developer-only feature.

gregschohn · 2025-12-10T22:14:20Z

...rts/aggregates/migrationAssistantWithArgo/templates/resources/kyvernoMountLocalAwsCreds.yaml

@@ -0,0 +1,88 @@
+{{- if and .Values.conditionalPackageInstalls.kyverno .Values.kyvernoPolicies.mountLocalAwsCreds }}
+apiVersion: v1
+kind: ConfigMap


Can you file a jira ticket for this. I'd vote to control this in the installJob script. To install kyverno before everything else is kicked off. Do the helm install for it, set the policies, wait for them to appear - then proceed as normal.

gregschohn · 2025-12-10T22:25:48Z

...rts/aggregates/migrationAssistantWithArgo/templates/resources/kyvernoMountLocalAwsCreds.yaml

+              - resources:
+                  kinds:
+                    - Pod


There are good arguments that pretty much every one of our pods could need those creds... Maybe we should check the service account so that it aligns with what we do for EKS.

In that case, saying that my rewrite will map service account X to aws creds profile Y. If there's no mapping, no creds at all - that lets you test multiple profiles/personas too.

gregschohn · 2025-12-10T22:29:00Z

deployment/k8s/charts/aggregates/migrationAssistantWithArgo/valuesForLocalK8s.yaml

+# Override image pull policy for dev - always pull latest
+images:
+  captureProxy:
+    repository: localhost:5000/migrations/capture_proxy


can you please update these ports to 5001. I've updated ports to 5001 in a number of spots already.

I'm not sure that is compatible with the minikube addon

After the image is pushed, refer to it by localhost:5000/{name} in kubectl specs.

https://minikube.sigs.k8s.io/docs/handbook/registry/

I could use
minikube addons enable registry-alias and then refer to it as registry.minikube/migrations/capture_proxy

implementation_plan.md

gregschohn · 2025-12-10T22:54:56Z

migrationConsole/lib/integ_test/testWorkflows/clusterWorkflows.yaml

-            kubectl create secret generic "${CLUSTER_NAME}-aws" \
-              --from-literal=AWS_ACCESS_KEY_ID=test \
-              --from-literal=AWS_SECRET_ACCESS_KEY=test \
-              -n "${NAMESPACE}" \
-              --dry-run=client -o yaml | \
-              kubectl label --local -f - migration-test=true cluster-name=${CLUSTER_NAME} -o yaml | \
-              kubectl apply -f -
+            kubectl apply -f - <<EOF
+            apiVersion: v1
+            kind: Secret
+            metadata:
+              name: ${CLUSTER_NAME}-aws
+              namespace: ${NAMESPACE}
+              labels:
+                migration-test: "true"
+                cluster-name: ${CLUSTER_NAME}
+            type: Opaque
+            stringData:
+              AWS_ACCESS_KEY_ID: test
+              AWS_SECRET_ACCESS_KEY: test
+            EOF


why did you make this change? Same for all the other ones too.

gregschohn · 2025-12-10T22:59:04Z

orchestrationSpecs/package.json

    "packages/*"
  ],
  "dependencies": {
+    "@typescript/native-preview": "^7.0.0-dev.20251210.1",


Why has this been lifted up into a general dependency? This was only required for type-checking.

gregschohn · 2025-12-10T23:01:16Z

orchestrationSpecs/packages/migration-workflow-templates/src/resourceLoader.ts

+export const configureAndSubmitScript = fs.readFileSync(path.join(testMigrationHelpersDir, 'configureAndSubmit.sh'), 'utf8');
+export const monitorScript = fs.readFileSync(path.join(testMigrationHelpersDir, 'monitor.sh'), 'utf8');


I had asked you to split these lines into a separate file and have two different resourceLoaders

gregschohn · 2025-12-10T23:05:47Z

.../packages/migration-workflow-templates/src/workflowTemplates/testMigrationWithWorkflowCli.ts

+            // - "RETRY": Migration still in progress, retry monitoring (exit code 1, triggers retry)
+            // - "ERROR": Unexpected error occurred (exit code 2, fails the step)


I'm pretty sure that RETRY - nor even ERROR will ever be possible. Even if we time out, I don't think that argo will fill in the parameters. IIRC, you'll only get the details when the task succeeds - in which case you have to disambiguate. That also means that the script is doing extra and misleading work.

gregschohn · 2025-12-10T23:08:42Z

.../packages/migration-workflow-templates/src/workflowTemplates/testMigrationWithWorkflowCli.ts

+    .addTemplate("handleWorkflowSuccess", t => t
+        .addInputsFromRecord(makeRequiredImageParametersForKeys(["MigrationConsole"]))
+
+        .addContainer(cb => cb
+            .addImageInfo(cb.inputs.imageMigrationConsoleLocation, cb.inputs.imageMigrationConsolePullPolicy)
+            .addCommand(["/bin/bash", "-c"])
+            .addResources(DEFAULT_RESOURCES.MIGRATION_CONSOLE_CLI)
+            .addArgs(["echo 'Migration workflow completed successfully'"])
+        )
+    )
+
+    .addTemplate("handleWorkflowFailure", t => t
+        .addInputsFromRecord(makeRequiredImageParametersForKeys(["MigrationConsole"]))
+
+        .addContainer(cb => cb
+            .addImageInfo(cb.inputs.imageMigrationConsoleLocation, cb.inputs.imageMigrationConsolePullPolicy)
+            .addCommand(["/bin/bash", "-c"])
+            .addResources(DEFAULT_RESOURCES.MIGRATION_CONSOLE_CLI)
+            .addArgs(["echo 'Migration workflow failed'"])
+        )
+    )
+
+    .addTemplate("handleWorkflowError", t => t
+        .addInputsFromRecord(makeRequiredImageParametersForKeys(["MigrationConsole"]))
+
+        .addContainer(cb => cb
+            .addImageInfo(cb.inputs.imageMigrationConsoleLocation, cb.inputs.imageMigrationConsolePullPolicy)
+            .addCommand(["/bin/bash", "-c"])
+            .addResources(DEFAULT_RESOURCES.MIGRATION_CONSOLE_CLI)
+            .addArgs(["echo 'Migration workflow encountered an error'"])
+        )
+    )


Is there any value to these? Why not just have one failure task... I don't think that the failure is right anyway right now because it doesn't exit with an error code - so the workflow will just keep going in all of these cases

Signed-off-by: Andre Kurait <[email protected]>

AndreKurait · 2025-12-10T23:28:08Z

Kyverno Jira https://opensearch.atlassian.net/browse/MIGRATIONS-2779

jugal-chauhan and others added 24 commits November 18, 2025 11:21

Initial commit to create a new template which uses workflow commands …

5f184d4

…to run a migration Signed-off-by: Jugal Chauhan <[email protected]>

Initial commit to create a new template which uses workflow commands …

2941c25

…to run a migration Signed-off-by: Jugal Chauhan <[email protected]>

Add and run unit test

c807a6c

Signed-off-by: Jugal Chauhan <[email protected]>

Merge remote-tracking branch 'origin/main' into jenkins-k8s-local-test

c0a83fb

Merge remote-tracking branch 'origin/jenkins-k8s-local-test' into jen…

3badc71

…kins-k8s-local-test

Add RetryParameters and combine scripts

7332491

Signed-off-by: Jugal Chauhan <[email protected]>

create workflowtemplate for full migration with workflow cli commands

758a7d7

Signed-off-by: Jugal Chauhan <[email protected]>

Fixes on structure and syntax after testing

11664f0

Signed-off-by: Jugal Chauhan <[email protected]>

initial squashed commit after merging from main

e1d7b9f

Signed-off-by: Jugal Chauhan <[email protected]>

Resolve merge conflicts

a78e725

Signed-off-by: Jugal Chauhan <[email protected]>

Resolving merge cionflicts again

aca5b2b

Signed-off-by: Jugal Chauhan <[email protected]>

Resolving merge cionflicts again

cdaef2d

Signed-off-by: Jugal Chauhan <[email protected]>

Merge branch 'main' into jenkins-k8s-local-test

31c30b8

Merge remote-tracking branch 'origin/main' into jenkins-k8s-local-test

18ae56f

clean up files

55aa4cf

Signed-off-by: Jugal Chauhan <[email protected]>

refreshing snapshot files

75a9517

Signed-off-by: Jugal Chauhan <[email protected]>

update apply workflows to use the DSL generated ones

c1142b1

Signed-off-by: Jugal Chauhan <[email protected]>

Merge branch 'main' into jenkins-k8s-local-test

ad4709f

Cleanup changes and start to build back up fullMigrationWithClusters.…

e00f6c9

…yaml Signed-off-by: Andre Kurait <[email protected]>

Revert changes from main on kafka authentication and package lock

1b29688

Signed-off-by: Andre Kurait <[email protected]>

Remove change on package-lock

2770fba

Signed-off-by: Andre Kurait <[email protected]>

Restore changes not needed to change

8f360ae

Signed-off-by: Andre Kurait <[email protected]>

Fix output and package-lock

5f597d9

Signed-off-by: Andre Kurait <[email protected]>

Cleanup to move test migration scripts to resources folder

ecd1bb6

Signed-off-by: Andre Kurait <[email protected]>

AndreKurait had a problem deploying to migrations-cicd December 4, 2025 22:25 — with GitHub Actions Failure

AndreKurait temporarily deployed to migrations-cicd December 4, 2025 22:25 — with GitHub Actions Inactive

Fix submit

32aed57

Signed-off-by: Andre Kurait <[email protected]>

AndreKurait temporarily deployed to migrations-cicd December 4, 2025 22:45 — with GitHub Actions Inactive

AndreKurait had a problem deploying to migrations-cicd December 4, 2025 22:45 — with GitHub Actions Failure

AndreKurait had a problem deploying to migrations-cicd December 10, 2025 19:15 — with GitHub Actions Failure

AndreKurait had a problem deploying to migrations-cicd December 10, 2025 19:36 — with GitHub Actions Failure

Fix x86/linux-x64 build failure by allowing optional npm dependencies

92c0520

Remove --omit=optional from npm ci command to allow installation of platform-specific @typescript/native-preview binaries required by tsgo. Signed-off-by: Andre Kurait <[email protected]>

AndreKurait had a problem deploying to migrations-cicd December 10, 2025 20:12 — with GitHub Actions Failure

AndreKurait had a problem deploying to migrations-cicd December 10, 2025 20:13 — with GitHub Actions Failure

AndreKurait had a problem deploying to migrations-cicd December 10, 2025 20:24 — with GitHub Actions Failure

AndreKurait temporarily deployed to migrations-cicd December 10, 2025 20:24 — with GitHub Actions Inactive

AndreKurait had a problem deploying to migrations-cicd December 10, 2025 21:03 — with GitHub Actions Failure

AndreKurait temporarily deployed to migrations-cicd December 10, 2025 21:03 — with GitHub Actions Inactive

AndreKurait force-pushed the k8s-integ branch from 2b17b78 to b65aa8b Compare December 10, 2025 21:03

AndreKurait temporarily deployed to migrations-cicd December 10, 2025 21:03 — with GitHub Actions Inactive

AndreKurait had a problem deploying to migrations-cicd December 10, 2025 21:03 — with GitHub Actions Failure

AndreKurait deployed to migrations-cicd December 10, 2025 21:39 — with GitHub Actions Active

AndreKurait had a problem deploying to migrations-cicd December 10, 2025 21:39 — with GitHub Actions Failure

AndreKurait had a problem deploying to migrations-cicd December 10, 2025 22:52 — with GitHub Actions Failure

gregschohn reviewed Dec 10, 2025

View reviewed changes

Remove implementation_plan.md

8a8779c

Signed-off-by: Andre Kurait <[email protected]>

AndreKurait requested a deployment to migrations-cicd December 10, 2025 23:24 — with GitHub Actions In progress

		export const configureAndSubmitScript = fs.readFileSync(path.join(testMigrationHelpersDir, 'configureAndSubmit.sh'), 'utf8');
		export const monitorScript = fs.readFileSync(path.join(testMigrationHelpersDir, 'monitor.sh'), 'utf8');

		// - "RETRY": Migration still in progress, retry monitoring (exit code 1, triggers retry)
		// - "ERROR": Unexpected error occurred (exit code 2, fails the step)

Kubernetes integration testing using Minikube and Argo Workflows in Jenkins #2032

Are you sure you want to change the base?

Kubernetes integration testing using Minikube and Argo Workflows in Jenkins #2032

Conversation

AndreKurait commented Dec 4, 2025 • edited by gregschohn Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Issues Resolved

Testing

Check List

Uh oh!

codecov bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndreKurait commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AndreKurait commented Dec 4, 2025 •

edited by gregschohn

Loading

codecov bot commented Dec 4, 2025 •

edited

Loading