Skip to content

Conversation

@AndreKurait
Copy link
Member

@AndreKurait AndreKurait commented Dec 4, 2025

Description

This PR enables Kubernetes integration testing for the OpenSearch Migration Assistant using Minikube and Argo Workflows. It introduces a new workflow template (testMigrationWithWorkflowCli) that uses the migration console CLI commands to orchestrate full migrations, along with supporting infrastructure for local development and CI testing.

Key Changes

New Workflow Template - testMigrationWithWorkflowCli

  • Creates a new Argo WorkflowTemplate that uses migration workflow CLI commands instead of direct Argo/Kubernetes API calls
  • Adds helper scripts (configureAndSubmit.sh, monitor.sh) for workflow configuration and monitoring
  • Integrates with the existing fullMigrationWithClusters workflow for end-to-end testing

Kyverno Policies for Dev Environment

  • Adds zeroResourceRequests policy to zero out pod resource requests in dev environments (allows running on resource-constrained Minikube)
  • Adds mountLocalAwsCreds policy for mounting host AWS credentials into pods
  • Policies use post-install hooks with ConfigMap + Job pattern to ensure Kyverno is ready before applying
  • Configures Kyverno webhooksCleanup to use migrations/migration_console image instead of unavailable bitnami/kubectl:1.30.2

Helm Chart Updates

  • Simplifies image references to use direct repository:tag format (removes unused _imageHelper.tpl and registryPrefix pattern)
  • Updates valuesDev.yaml with Kyverno configuration for local development
  • Fixes Kyverno policy syntax to use patchStrategicMerge instead of invalid JMESPath

Test Automation Improvements

  • Updates test_runner.py to support registry prefix for individual image repositories
  • Adds existence check for workflow templates directory
  • Updates Jenkins pipelines to use Minikube registry

Cleanup

  • Removes legacy workflow templates from migrationConsole/workflows/templates/ (now generated via DSL in orchestrationSpecs)
  • Consolidates test workflows into migrationConsole/testWorkflows/

Issues Resolved

Enables local Kubernetes integration testing with Minikube for the Migration Assistant.

Testing

  • All 4 integration tests pass locally on Minikube (ES 5.6 → OS 2.19):
    • Test0001SingleDocumentBackfill ✅
    • Test0004MultiTypeUnionMigration ✅
    • Test0005MultiTypeSplitMigration ✅
    • Test0006OpenSearchBenchmarkBackfill ✅
  • Jest snapshots updated for changed resource configurations
  • Linting passes

Check List

  • New functionality includes testing
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

jugal-chauhan and others added 24 commits November 18, 2025 11:21
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 54.00000% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.36%. Comparing base (ee8c09d) to head (8a8779c).

Files with missing lines Patch % Lines
...b/console_link/console_link/models/argo_service.py 54.00% 23 Missing ⚠️

❌ Your patch check has failed because the patch coverage (54.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2032      +/-   ##
============================================
+ Coverage     77.55%   78.36%   +0.80%     
- Complexity       14       21       +7     
============================================
  Files           603      603              
  Lines         24200    24247      +47     
  Branches       1855     1854       -1     
============================================
+ Hits          18769    19000     +231     
+ Misses         4512     4353     -159     
+ Partials        919      894      -25     
Flag Coverage Δ
gradle 76.52% <ø> (+1.29%) ⬆️
node 90.11% <ø> (ø)
python 78.60% <54.00%> (-0.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Andre Kurait <[email protected]>
- Document chart overview and prerequisites
- Add installation instructions
- Include detailed security warnings for Kyverno policies
- Document mountLocalAwsCreds policy risks and alternatives
- Add values reference table for Kyverno configuration

Signed-off-by: Andre Kurait <[email protected]>
- Add clarifying comment for actual-registry label in buildDockerImagesMini.sh
- Remove outdated comment from migrationConsole/Dockerfile
- Add warning headers to test helper scripts about package separation
- Add TODO for removing base64 encoding in testMigrationWithWorkflowCli.ts
- Add detailed comments explaining monitorResult flow and exit codes
- Add TODO to resourceLoader.ts about separating test helpers
- Rename testMigrationWithWorkflowCli to TestMigrationWithWorkflowCli for naming consistency

Signed-off-by: Andre Kurait <[email protected]>
Remove --omit=optional from npm ci command to allow installation of
platform-specific @typescript/native-preview binaries required by tsgo.

Signed-off-by: Andre Kurait <[email protected]>
…ion workflow

- Add handleWorkflowSuccess, handleWorkflowFailure, handleWorkflowError templates
- Increase retry limit from 13 (~10 min) to 33 (~30 min) for monitor step
- Add conditional steps to handle different workflow outcomes
- Update snapshot tests

Signed-off-by: Andre Kurait <[email protected]>
The configuration cache was failing because Task objects (Jar, StartScripts)
were being passed directly to from() and inputs.files() methods, which caused
Gradle to attempt serializing them.

Fixed by using .map { it.outputs.files } to extract the output files from
TaskProvider references instead of passing the task references directly.

This resolves the following configuration cache errors:
- copyArtifact_capture_proxy
- copyArtifact_capture_proxy_es
- copyArtifact_traffic_replayer
- syncArtifact_migration_console_staging

Signed-off-by: Andre Kurait <[email protected]>
- Remove invalid workflow-level parameters using valueFrom.configMapKeyRef with template expressions
- Use valueFrom.configMapKeyRef at step level when calling templates instead
- Templates receive image location and pull policy as input parameters
- Follows the same pattern as full-migration.yaml and other workflow templates

Signed-off-by: Andre Kurait <[email protected]>
When workflow cluster configurations lack authentication settings, the
conversion to Python cluster schema was not adding any auth field,
causing validation to fail with: 'No values are present from set:
[basic_auth, no_auth, sigv4]'.

This fix ensures that when no authentication configuration is provided
in the workflow config (no authConfig field and no legacy auth fields),
the converter defaults to no_auth: None, which is the appropriate
setting for clusters that don't require authentication.

Added test coverage to verify the fix handles the scenario where
workflow configs only contain endpoint, allowInsecure, and version
fields without any authentication configuration.

Signed-off-by: Andre Kurait <[email protected]>

---

## ⚠️ Security Considerations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you summarize this more? > 2/3 of the README is for an obscure developer-only feature.

@@ -0,0 +1,88 @@
{{- if and .Values.conditionalPackageInstalls.kyverno .Values.kyvernoPolicies.mountLocalAwsCreds }}
apiVersion: v1
kind: ConfigMap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file a jira ticket for this. I'd vote to control this in the installJob script. To install kyverno before everything else is kicked off. Do the helm install for it, set the policies, wait for them to appear - then proceed as normal.

Comment on lines +25 to +27
- resources:
kinds:
- Pod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are good arguments that pretty much every one of our pods could need those creds... Maybe we should check the service account so that it aligns with what we do for EKS.

In that case, saying that my rewrite will map service account X to aws creds profile Y. If there's no mapping, no creds at all - that lets you test multiple profiles/personas too.

# Override image pull policy for dev - always pull latest
images:
captureProxy:
repository: localhost:5000/migrations/capture_proxy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please update these ports to 5001. I've updated ports to 5001 in a number of spots already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that is compatible with the minikube addon

After the image is pushed, refer to it by localhost:5000/{name} in kubectl specs.

https://minikube.sigs.k8s.io/docs/handbook/registry/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use
minikube addons enable registry-alias and then refer to it as registry.minikube/migrations/capture_proxy

Comment on lines -47 to +61
kubectl create secret generic "${CLUSTER_NAME}-aws" \
--from-literal=AWS_ACCESS_KEY_ID=test \
--from-literal=AWS_SECRET_ACCESS_KEY=test \
-n "${NAMESPACE}" \
--dry-run=client -o yaml | \
kubectl label --local -f - migration-test=true cluster-name=${CLUSTER_NAME} -o yaml | \
kubectl apply -f -
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: ${CLUSTER_NAME}-aws
namespace: ${NAMESPACE}
labels:
migration-test: "true"
cluster-name: ${CLUSTER_NAME}
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: test
AWS_SECRET_ACCESS_KEY: test
EOF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you make this change? Same for all the other ones too.

"packages/*"
],
"dependencies": {
"@typescript/native-preview": "^7.0.0-dev.20251210.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this been lifted up into a general dependency? This was only required for type-checking.

Comment on lines +12 to +13
export const configureAndSubmitScript = fs.readFileSync(path.join(testMigrationHelpersDir, 'configureAndSubmit.sh'), 'utf8');
export const monitorScript = fs.readFileSync(path.join(testMigrationHelpersDir, 'monitor.sh'), 'utf8');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had asked you to split these lines into a separate file and have two different resourceLoaders

Comment on lines +51 to +52
// - "RETRY": Migration still in progress, retry monitoring (exit code 1, triggers retry)
// - "ERROR": Unexpected error occurred (exit code 2, fails the step)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that RETRY - nor even ERROR will ever be possible. Even if we time out, I don't think that argo will fill in the parameters. IIRC, you'll only get the details when the task succeeds - in which case you have to disambiguate. That also means that the script is doing extra and misleading work.

Comment on lines +73 to +104
.addTemplate("handleWorkflowSuccess", t => t
.addInputsFromRecord(makeRequiredImageParametersForKeys(["MigrationConsole"]))

.addContainer(cb => cb
.addImageInfo(cb.inputs.imageMigrationConsoleLocation, cb.inputs.imageMigrationConsolePullPolicy)
.addCommand(["/bin/bash", "-c"])
.addResources(DEFAULT_RESOURCES.MIGRATION_CONSOLE_CLI)
.addArgs(["echo 'Migration workflow completed successfully'"])
)
)

.addTemplate("handleWorkflowFailure", t => t
.addInputsFromRecord(makeRequiredImageParametersForKeys(["MigrationConsole"]))

.addContainer(cb => cb
.addImageInfo(cb.inputs.imageMigrationConsoleLocation, cb.inputs.imageMigrationConsolePullPolicy)
.addCommand(["/bin/bash", "-c"])
.addResources(DEFAULT_RESOURCES.MIGRATION_CONSOLE_CLI)
.addArgs(["echo 'Migration workflow failed'"])
)
)

.addTemplate("handleWorkflowError", t => t
.addInputsFromRecord(makeRequiredImageParametersForKeys(["MigrationConsole"]))

.addContainer(cb => cb
.addImageInfo(cb.inputs.imageMigrationConsoleLocation, cb.inputs.imageMigrationConsolePullPolicy)
.addCommand(["/bin/bash", "-c"])
.addResources(DEFAULT_RESOURCES.MIGRATION_CONSOLE_CLI)
.addArgs(["echo 'Migration workflow encountered an error'"])
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value to these? Why not just have one failure task... I don't think that the failure is right anyway right now because it doesn't exit with an error code - so the workflow will just keep going in all of these cases

Signed-off-by: Andre Kurait <[email protected]>
@AndreKurait
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants