Skip to content

Conversation

@kmularise
Copy link
Contributor

@kmularise kmularise commented Dec 4, 2025

What is this PR for?

Optimize GitHub Actions CI pipeline by using pre-built Docker images for test environments. This reduces conda environment setup time from 20+ minutes to under 1 minute by caching the fully configured Python/R environment in GitHub Container Registry (GHCR).

What type of PR is it?

Improvement

Todos

  • Create Dockerfile for Python/R test environment
  • Add prepare-python-r-env job with GHCR integration
  • Convert core-modules to container job
  • Fix hashFiles() issue for container jobs
  • Upgrade to Debian 12 for MongoDB 8.0 compatibility
  • Install Temurin JDK 11 from Adoptium repository
  • Verify core-modules tests pass in container environment
  • Verify zeppelin-integration-test passes in container environment
  • Consider extending to other jobs (spark-integration-test, interpreter-test, etc.)

What is the Jira issue?

ZEPPELIN-6367

How should this be tested?

  • Automated: All existing tests in core-modules job should pass
  • Manual verification:
    1. Check prepare-python-r-env job completes and pushes image to GHCR
    2. Verify image is reused on subsequent runs (should show "exists=true" in logs)
    3. Compare total workflow time before and after this change
    4. Confirm MongoDB tests pass in Debian 12 container environment

Screenshots (if appropriate)

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@kmularise kmularise changed the title Zeppelin 6367 [ZEPPELIN-6367] Improve testing performance using Docker images Dec 4, 2025
Comment on lines 58 to 74
- name: Build and push Docker image
if: steps.check.outputs.exists != 'true'
uses: docker/build-push-action@v5
with:
context: .
file: .github/docker/python-r-env.Dockerfile
push: true
tags: |
${{ steps.hash.outputs.image-name }}
ghcr.io/${{ github.repository_owner }}/zeppelin-test-env:py39-r-latest
cache-from: type=gha
cache-to: type=gha,mode=max
labels: |
org.opencontainers.image.source=${{ github.event.repository.html_url }}
org.opencontainers.image.revision=${{ github.sha }}
build-args: |
ENV_FILE=testing/env_python_3.9_with_R.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.github.com/en/actions/reference/workflows-and-actions/events-that-trigger-workflows#workflows-in-forked-repositories

PRs from forked repositories are limited to read-only permissions, which prevents workflows triggered by them from pushing images to the registry.

I suggest separating the build and push steps. (Assuming the image is not already in the registry,) we should build the image first for local testing, and then execute the push step only if the workflow has write access to GHCR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I've updated the workflow to separate the build and push steps and check write access to GHCR.

Comment on lines 36 to 98
# ============================================
# Job 1: Prepare Docker image
# ============================================
prepare-python-r-env:
runs-on: ubuntu-24.04
permissions:
contents: read
packages: write
outputs:
image-name: ${{ steps.image.outputs.name }}
pom-hash: ${{ steps.hash.outputs.pom }}

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Calculate pom.xml hash
id: hash
run: |
echo "pom=${{ hashFiles('**/pom.xml') }}" >> $GITHUB_OUTPUT
- name: Generate image name with hash
id: image
run: |
# Include both environment file AND Dockerfile in hash calculation
COMBINED_HASH=$(cat testing/env_python_3.9_with_R.yml .github/docker/python-r-env.Dockerfile | sha256sum | cut -d' ' -f1 | cut -c1-12)
IMAGE_NAME="ghcr.io/${{ github.repository_owner }}/zeppelin-test-env:py39-r-${COMBINED_HASH}"
echo "name=${IMAGE_NAME}" >> $GITHUB_OUTPUT
- name: Check if image exists
id: check
run: |
if docker manifest inspect ${{ steps.image.outputs.name }} >/dev/null 2>&1; then
echo "exists=true" >> $GITHUB_OUTPUT
else
echo "exists=false" >> $GITHUB_OUTPUT
fi
- name: Set up Docker Buildx
if: steps.check.outputs.exists != 'true'
uses: docker/setup-buildx-action@v3

- name: Log in to GHCR
if: steps.check.outputs.exists != 'true'
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build and push if needed
if: steps.check.outputs.exists != 'true'
uses: docker/build-push-action@v5
with:
context: .
file: .github/docker/python-r-env.Dockerfile
push: true
tags: |
${{ steps.image.outputs.name }}
ghcr.io/${{ github.repository_owner }}/zeppelin-test-env:py39-r-latest
cache-from: type=gha
cache-to: type=gha,mode=max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the workflow defined here the same as the one in build-docker-image.yml?

If so, I'd suggest converting build-docker-image.yml into a reusable workflow (using on: workflow_call).

That way, you can keep the current separation while allowing other wofkflows, such as prepare-python-r-env, to call it via the uses keyword.

For reference, here are the cases I mentioned:

@tbonelee
Copy link
Contributor

tbonelee commented Dec 8, 2025

We have some user permission issues for npm cache directories in core-modules job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants