-
Notifications
You must be signed in to change notification settings - Fork 41
Kubernetes integration testing using Minikube and Argo Workflows in Jenkins #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…to run a migration Signed-off-by: Jugal Chauhan <[email protected]>
…to run a migration Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
…kins-k8s-local-test
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
Signed-off-by: Jugal Chauhan <[email protected]>
…yaml Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (54.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2032 +/- ##
============================================
+ Coverage 77.55% 78.36% +0.80%
- Complexity 14 21 +7
============================================
Files 603 603
Lines 24200 24247 +47
Branches 1855 1854 -1
============================================
+ Hits 18769 19000 +231
+ Misses 4512 4353 -159
+ Partials 919 894 -25
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Andre Kurait <[email protected]>
- Document chart overview and prerequisites - Add installation instructions - Include detailed security warnings for Kyverno policies - Document mountLocalAwsCreds policy risks and alternatives - Add values reference table for Kyverno configuration Signed-off-by: Andre Kurait <[email protected]>
- Add clarifying comment for actual-registry label in buildDockerImagesMini.sh - Remove outdated comment from migrationConsole/Dockerfile - Add warning headers to test helper scripts about package separation - Add TODO for removing base64 encoding in testMigrationWithWorkflowCli.ts - Add detailed comments explaining monitorResult flow and exit codes - Add TODO to resourceLoader.ts about separating test helpers - Rename testMigrationWithWorkflowCli to TestMigrationWithWorkflowCli for naming consistency Signed-off-by: Andre Kurait <[email protected]>
Remove --omit=optional from npm ci command to allow installation of platform-specific @typescript/native-preview binaries required by tsgo. Signed-off-by: Andre Kurait <[email protected]>
…ion workflow - Add handleWorkflowSuccess, handleWorkflowFailure, handleWorkflowError templates - Increase retry limit from 13 (~10 min) to 33 (~30 min) for monitor step - Add conditional steps to handle different workflow outcomes - Update snapshot tests Signed-off-by: Andre Kurait <[email protected]>
The configuration cache was failing because Task objects (Jar, StartScripts)
were being passed directly to from() and inputs.files() methods, which caused
Gradle to attempt serializing them.
Fixed by using .map { it.outputs.files } to extract the output files from
TaskProvider references instead of passing the task references directly.
This resolves the following configuration cache errors:
- copyArtifact_capture_proxy
- copyArtifact_capture_proxy_es
- copyArtifact_traffic_replayer
- syncArtifact_migration_console_staging
Signed-off-by: Andre Kurait <[email protected]>
- Remove invalid workflow-level parameters using valueFrom.configMapKeyRef with template expressions - Use valueFrom.configMapKeyRef at step level when calling templates instead - Templates receive image location and pull policy as input parameters - Follows the same pattern as full-migration.yaml and other workflow templates Signed-off-by: Andre Kurait <[email protected]>
2b17b78 to
b65aa8b
Compare
When workflow cluster configurations lack authentication settings, the conversion to Python cluster schema was not adding any auth field, causing validation to fail with: 'No values are present from set: [basic_auth, no_auth, sigv4]'. This fix ensures that when no authentication configuration is provided in the workflow config (no authConfig field and no legacy auth fields), the converter defaults to no_auth: None, which is the appropriate setting for clusters that don't require authentication. Added test coverage to verify the fix handles the scenario where workflow configs only contain endpoint, allowInsecure, and version fields without any authentication configuration. Signed-off-by: Andre Kurait <[email protected]>
|
|
||
| --- | ||
|
|
||
| ## ⚠️ Security Considerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you summarize this more? > 2/3 of the README is for an obscure developer-only feature.
| @@ -0,0 +1,88 @@ | |||
| {{- if and .Values.conditionalPackageInstalls.kyverno .Values.kyvernoPolicies.mountLocalAwsCreds }} | |||
| apiVersion: v1 | |||
| kind: ConfigMap | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you file a jira ticket for this. I'd vote to control this in the installJob script. To install kyverno before everything else is kicked off. Do the helm install for it, set the policies, wait for them to appear - then proceed as normal.
| - resources: | ||
| kinds: | ||
| - Pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are good arguments that pretty much every one of our pods could need those creds... Maybe we should check the service account so that it aligns with what we do for EKS.
In that case, saying that my rewrite will map service account X to aws creds profile Y. If there's no mapping, no creds at all - that lets you test multiple profiles/personas too.
| # Override image pull policy for dev - always pull latest | ||
| images: | ||
| captureProxy: | ||
| repository: localhost:5000/migrations/capture_proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please update these ports to 5001. I've updated ports to 5001 in a number of spots already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that is compatible with the minikube addon
After the image is pushed, refer to it by localhost:5000/{name} in kubectl specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use
minikube addons enable registry-alias and then refer to it as registry.minikube/migrations/capture_proxy
| kubectl create secret generic "${CLUSTER_NAME}-aws" \ | ||
| --from-literal=AWS_ACCESS_KEY_ID=test \ | ||
| --from-literal=AWS_SECRET_ACCESS_KEY=test \ | ||
| -n "${NAMESPACE}" \ | ||
| --dry-run=client -o yaml | \ | ||
| kubectl label --local -f - migration-test=true cluster-name=${CLUSTER_NAME} -o yaml | \ | ||
| kubectl apply -f - | ||
| kubectl apply -f - <<EOF | ||
| apiVersion: v1 | ||
| kind: Secret | ||
| metadata: | ||
| name: ${CLUSTER_NAME}-aws | ||
| namespace: ${NAMESPACE} | ||
| labels: | ||
| migration-test: "true" | ||
| cluster-name: ${CLUSTER_NAME} | ||
| type: Opaque | ||
| stringData: | ||
| AWS_ACCESS_KEY_ID: test | ||
| AWS_SECRET_ACCESS_KEY: test | ||
| EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you make this change? Same for all the other ones too.
| "packages/*" | ||
| ], | ||
| "dependencies": { | ||
| "@typescript/native-preview": "^7.0.0-dev.20251210.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this been lifted up into a general dependency? This was only required for type-checking.
| export const configureAndSubmitScript = fs.readFileSync(path.join(testMigrationHelpersDir, 'configureAndSubmit.sh'), 'utf8'); | ||
| export const monitorScript = fs.readFileSync(path.join(testMigrationHelpersDir, 'monitor.sh'), 'utf8'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had asked you to split these lines into a separate file and have two different resourceLoaders
| // - "RETRY": Migration still in progress, retry monitoring (exit code 1, triggers retry) | ||
| // - "ERROR": Unexpected error occurred (exit code 2, fails the step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that RETRY - nor even ERROR will ever be possible. Even if we time out, I don't think that argo will fill in the parameters. IIRC, you'll only get the details when the task succeeds - in which case you have to disambiguate. That also means that the script is doing extra and misleading work.
| .addTemplate("handleWorkflowSuccess", t => t | ||
| .addInputsFromRecord(makeRequiredImageParametersForKeys(["MigrationConsole"])) | ||
|
|
||
| .addContainer(cb => cb | ||
| .addImageInfo(cb.inputs.imageMigrationConsoleLocation, cb.inputs.imageMigrationConsolePullPolicy) | ||
| .addCommand(["/bin/bash", "-c"]) | ||
| .addResources(DEFAULT_RESOURCES.MIGRATION_CONSOLE_CLI) | ||
| .addArgs(["echo 'Migration workflow completed successfully'"]) | ||
| ) | ||
| ) | ||
|
|
||
| .addTemplate("handleWorkflowFailure", t => t | ||
| .addInputsFromRecord(makeRequiredImageParametersForKeys(["MigrationConsole"])) | ||
|
|
||
| .addContainer(cb => cb | ||
| .addImageInfo(cb.inputs.imageMigrationConsoleLocation, cb.inputs.imageMigrationConsolePullPolicy) | ||
| .addCommand(["/bin/bash", "-c"]) | ||
| .addResources(DEFAULT_RESOURCES.MIGRATION_CONSOLE_CLI) | ||
| .addArgs(["echo 'Migration workflow failed'"]) | ||
| ) | ||
| ) | ||
|
|
||
| .addTemplate("handleWorkflowError", t => t | ||
| .addInputsFromRecord(makeRequiredImageParametersForKeys(["MigrationConsole"])) | ||
|
|
||
| .addContainer(cb => cb | ||
| .addImageInfo(cb.inputs.imageMigrationConsoleLocation, cb.inputs.imageMigrationConsolePullPolicy) | ||
| .addCommand(["/bin/bash", "-c"]) | ||
| .addResources(DEFAULT_RESOURCES.MIGRATION_CONSOLE_CLI) | ||
| .addArgs(["echo 'Migration workflow encountered an error'"]) | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any value to these? Why not just have one failure task... I don't think that the failure is right anyway right now because it doesn't exit with an error code - so the workflow will just keep going in all of these cases
Signed-off-by: Andre Kurait <[email protected]>
Description
This PR enables Kubernetes integration testing for the OpenSearch Migration Assistant using Minikube and Argo Workflows. It introduces a new workflow template (
testMigrationWithWorkflowCli) that uses the migration console CLI commands to orchestrate full migrations, along with supporting infrastructure for local development and CI testing.Key Changes
New Workflow Template -
testMigrationWithWorkflowClimigration workflowCLI commands instead of direct Argo/Kubernetes API callsconfigureAndSubmit.sh,monitor.sh) for workflow configuration and monitoringfullMigrationWithClustersworkflow for end-to-end testingKyverno Policies for Dev Environment
zeroResourceRequestspolicy to zero out pod resource requests in dev environments (allows running on resource-constrained Minikube)mountLocalAwsCredspolicy for mounting host AWS credentials into podsmigrations/migration_consoleimage instead of unavailablebitnami/kubectl:1.30.2Helm Chart Updates
repository:tagformat (removes unused_imageHelper.tplandregistryPrefixpattern)valuesDev.yamlwith Kyverno configuration for local developmentpatchStrategicMergeinstead of invalid JMESPathTest Automation Improvements
test_runner.pyto support registry prefix for individual image repositoriesCleanup
migrationConsole/workflows/templates/(now generated via DSL inorchestrationSpecs)migrationConsole/testWorkflows/Issues Resolved
Enables local Kubernetes integration testing with Minikube for the Migration Assistant.
Testing
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.