[ZEPPELIN-6367] Improve testing performance using Docker images #5125

kmularise · 2025-12-04T14:46:26Z

What is this PR for?

Optimize GitHub Actions CI pipeline by using pre-built Docker images for test environments. This reduces conda environment setup time from 20+ minutes to under 1 minute by caching the fully configured Python/R environment in GitHub Container Registry (GHCR).

What type of PR is it?

Improvement

Todos

Create Dockerfile for Python/R test environment
Add prepare-python-r-env job with GHCR integration
Convert core-modules to container job
Fix hashFiles() issue for container jobs
Upgrade to Debian 12 for MongoDB 8.0 compatibility
Install Temurin JDK 11 from Adoptium repository
Verify core-modules tests pass in container environment
Verify zeppelin-integration-test passes in container environment
Consider extending to other jobs (spark-integration-test, interpreter-test, etc.)

What is the Jira issue?

ZEPPELIN-6367

How should this be tested?

Automated: All existing tests in core-modules job should pass
Manual verification:
1. Check prepare-python-r-env job completes and pushes image to GHCR
2. Verify image is reused on subsequent runs (should show "exists=true" in logs)
3. Compare total workflow time before and after this change
4. Confirm MongoDB tests pass in Debian 12 container environment

Screenshots (if appropriate)

Questions:

Does the license files need to update? No
Is there breaking changes for older versions? No
Does this needs documentation? No

…validation

…validation in core.yml

This reverts commit 39d21fa.

hashFiles() fails in container jobs when evaluated before checkout. Pre-calculate hash in prepare-python-r-env job and pass it as output to core-modules job for Maven cache key.

tbonelee · 2025-12-06T11:37:45Z

.github/workflows/build-docker-images.yml

+      - name: Build and push Docker image
+        if: steps.check.outputs.exists != 'true'
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          file: .github/docker/python-r-env.Dockerfile
+          push: true
+          tags: |
+            ${{ steps.hash.outputs.image-name }}
+            ghcr.io/${{ github.repository_owner }}/zeppelin-test-env:py39-r-latest
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+          labels: |
+            org.opencontainers.image.source=${{ github.event.repository.html_url }}
+            org.opencontainers.image.revision=${{ github.sha }}
+          build-args: |
+            ENV_FILE=testing/env_python_3.9_with_R.yml


https://docs.github.com/en/actions/reference/workflows-and-actions/events-that-trigger-workflows#workflows-in-forked-repositories

PRs from forked repositories are limited to read-only permissions, which prevents workflows triggered by them from pushing images to the registry.

I suggest separating the build and push steps. (Assuming the image is not already in the registry,) we should build the image first for local testing, and then execute the push step only if the workflow has write access to GHCR.

Thanks for the suggestion! I've updated the workflow to separate the build and push steps and check write access to GHCR.

tbonelee · 2025-12-06T11:48:52Z

.github/workflows/core.yml

+  # ============================================
+  # Job 1: Prepare Docker image
+  # ============================================
+  prepare-python-r-env:
+    runs-on: ubuntu-24.04
+    permissions:
+      contents: read
+      packages: write
+    outputs:
+      image-name: ${{ steps.image.outputs.name }}
+      pom-hash: ${{ steps.hash.outputs.pom }}
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Calculate pom.xml hash
+        id: hash
+        run: |
+          echo "pom=${{ hashFiles('**/pom.xml') }}" >> $GITHUB_OUTPUT
+
+      - name: Generate image name with hash
+        id: image
+        run: |
+          # Include both environment file AND Dockerfile in hash calculation
+          COMBINED_HASH=$(cat testing/env_python_3.9_with_R.yml .github/docker/python-r-env.Dockerfile | sha256sum | cut -d' ' -f1 | cut -c1-12)
+          IMAGE_NAME="ghcr.io/${{ github.repository_owner }}/zeppelin-test-env:py39-r-${COMBINED_HASH}"
+
+          echo "name=${IMAGE_NAME}" >> $GITHUB_OUTPUT
+
+      - name: Check if image exists
+        id: check
+        run: |
+          if docker manifest inspect ${{ steps.image.outputs.name }} >/dev/null 2>&1; then
+            echo "exists=true" >> $GITHUB_OUTPUT
+          else
+            echo "exists=false" >> $GITHUB_OUTPUT
+          fi
+
+      - name: Set up Docker Buildx
+        if: steps.check.outputs.exists != 'true'
+        uses: docker/setup-buildx-action@v3
+
+      - name: Log in to GHCR
+        if: steps.check.outputs.exists != 'true'
+        uses: docker/login-action@v3
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Build and push if needed
+        if: steps.check.outputs.exists != 'true'
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          file: .github/docker/python-r-env.Dockerfile
+          push: true
+          tags: |
+            ${{ steps.image.outputs.name }}
+            ghcr.io/${{ github.repository_owner }}/zeppelin-test-env:py39-r-latest
+          cache-from: type=gha
+          cache-to: type=gha,mode=max


Is the workflow defined here the same as the one in build-docker-image.yml?

If so, I'd suggest converting build-docker-image.yml into a reusable workflow (using on: workflow_call).

That way, you can keep the current separation while allowing other wofkflows, such as prepare-python-r-env, to call it via the uses keyword.

For reference, here are the cases I mentioned:

Reusable workflow being called

https://github.com/apache/airflow/blob/5845599cb740b1c6bce3fcb5efc9688a71f4b2a7/.github/workflows/ci-image-build.yml

Workflow that calls another workflow

https://github.com/apache/airflow/blob/5845599cb740b1c6bce3fcb5efc9688a71f4b2a7/.github/workflows/ci-amd-arm.yml#L237

tbonelee · 2025-12-08T09:56:15Z

We have some user permission issues for npm cache directories in core-modules job.

kmularise added 18 commits December 2, 2025 15:40

chore: add python-r-env.Dockerfile

514ba86

chore: add github build docker image workflow

fa3ca63

chore: modify core workflow to use docker image

e717078

chore: remove tune-runner-vm step (sudo/ip not available in containers)

417cb53

fix: execute IRkernel by user level

b0e297c

fix: use mamba instead of conda to resolve libsolv solver crashes

334f0b9

fix: reorder ENV

6fadbd2

fix: use bash shell to ensure conda environment PATH is properly loaded

00d2def

fix: add MongoDB runtime dependencies

69613a8

fix: include Dockerfile in image hash calculation for proper cache in…

700ee80

…validation

fix: include Dockerfile in image hash calculation for proper cache in…

c3b84b9

…validation in core.yml

fix: upgrade Debian 12

1fccee1

fix: apply changed dependency

a84e670

fix: apply java 11 with Temurin 11

18ca00d

fix: modify hash key by github.sha

39d21fa

Revert "fix: modify hash key by github.sha"

dbe370f

This reverts commit 39d21fa.

fix: calculate pom.xml hash in prepare job for container job cache

d7c63a8

hashFiles() fails in container jobs when evaluated before checkout. Pre-calculate hash in prepare-python-r-env job and pass it as output to core-modules job for Maven cache key.

fix: temporailiy remove Mongodb dependencies in Debian 12

1bee4e5

kmularise changed the title ~~Zeppelin 6367~~ [ZEPPELIN-6367] Improve testing performance using Docker images Dec 4, 2025

fix: revert zeppelin-integration-test to vm

2c519d3

tbonelee reviewed Dec 6, 2025

View reviewed changes

kmularise added 3 commits December 7, 2025 13:36

fix: seperate build and push steps and convert to reusable workflow

7c949f9

fix: test by local docker image when workflow is executed by forked PR

b881316

fix: add node js and ca-certificates

c3ab327

fix: handle npm cache permission

db75c62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ZEPPELIN-6367] Improve testing performance using Docker images #5125

[ZEPPELIN-6367] Improve testing performance using Docker images #5125

Uh oh!

kmularise commented Dec 4, 2025 •

edited

Loading

Uh oh!

tbonelee Dec 6, 2025

Uh oh!

kmularise Dec 7, 2025

Uh oh!

tbonelee Dec 6, 2025

Uh oh!

tbonelee commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ZEPPELIN-6367] Improve testing performance using Docker images #5125

Are you sure you want to change the base?

[ZEPPELIN-6367] Improve testing performance using Docker images #5125

Uh oh!

Conversation

kmularise commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

Uh oh!

tbonelee Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

kmularise Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

tbonelee Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

tbonelee commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kmularise commented Dec 4, 2025 •

edited

Loading