Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
10dd91a
feat: add native Delta Lake scan via delta-kernel-rs
schenksj Apr 13, 2026
6980fa1
test: Delta native scan test suites
schenksj Apr 13, 2026
0a21380
bench: Delta benchmarks and TPC runner infrastructure
schenksj Apr 13, 2026
2cef606
ci/docs: Delta CI workflow and documentation
schenksj Apr 13, 2026
c0ada8d
feat: expand kernel predicate pushdown with IN and Cast support
schenksj Apr 13, 2026
2b8dedd
docs: add IN/NOT IN and Cast to supported predicates list
schenksj Apr 13, 2026
64e1f3e
fix: address CI linting and security feedback
schenksj Apr 13, 2026
c97b60e
fix: use Hadoop Path when parsing input file URIs in native Delta scan
schenksj Apr 14, 2026
3e4b6a0
test: add Delta Lake regression suite mirroring the Iceberg pattern
schenksj Apr 14, 2026
29361d6
fix: distinguish CometDeltaNativeScan instances across snapshot versions
schenksj Apr 14, 2026
ee5a375
fix: use correct Delta artifact ID for Spark 3.4 test dependency
schenksj Apr 14, 2026
bf38729
test(delta-regression): robustness fixes + DELTA_JAVA_HOME support
schenksj Apr 14, 2026
db4b6eb
test(delta-regression): make Delta 2.4.0 diff install Comet extensions
schenksj Apr 14, 2026
d8aa2bb
Merge remote-tracking branch 'upstream/main' into delta-kernel-phase-1
schenksj Apr 14, 2026
75e404d
fix: populate InputFileBlockHolder for Delta native scans so Delta ME…
schenksj Apr 14, 2026
0dbb807
test(delta-regression): make spark/test actually run on modern JDKs
schenksj Apr 14, 2026
297d857
feat(delta): close major native-scan coverage gaps + row tracking
schenksj Apr 15, 2026
340a594
chore(delta-regression): drop row-id-lookup targeted test
schenksj Apr 15, 2026
24427b5
fix(delta-regression): force UTC JVM timezone; use rootPaths for sche…
schenksj Apr 16, 2026
ff352e4
docs(delta): refresh support matrix; clarify cloud-fetch guard
schenksj Apr 16, 2026
9e88ac0
fix(delta): translate column-mapping names on the pre-materialised-in…
schenksj Apr 16, 2026
6b276b1
fix(delta): code review hardening + CI workflow registration
schenksj Apr 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/actions/setup-delta-builder/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: Setup Delta Builder
description: 'Setup Delta Lake to run Spark SQL regression tests with Comet'
inputs:
delta-version:
description: 'The Delta Lake version (e.g., 3.3.2) to build'
required: true
runs:
using: "composite"
steps:
- name: Clone Delta Lake repo
uses: actions/checkout@v6
with:
repository: delta-io/delta
path: delta-lake
ref: v${{inputs.delta-version}}
fetch-depth: 1

- name: Setup Delta Lake for Comet
shell: bash
run: |
cd delta-lake
git apply ../dev/diffs/delta/${{inputs.delta-version}}.diff
152 changes: 152 additions & 0 deletions .github/workflows/delta_regression_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: Delta Lake Regression Tests

concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

on:
push:
branches:
- main
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/test/scala/org/apache/spark/sql/benchmark/**"
pull_request:
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/test/scala/org/apache/spark/sql/benchmark/**"
# manual trigger
workflow_dispatch:

permissions:
contents: read

env:
RUST_VERSION: stable
RUST_BACKTRACE: 1

jobs:
# Build native library once and share with all test jobs
build-native:
name: Build Native Library
runs-on: ubuntu-24.04
container:
image: amd64/rust
steps:
- uses: actions/checkout@v6

- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: 17

- name: Restore Cargo cache
uses: actions/cache/restore@v5
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-

- name: Build native library
# Use CI profile for faster builds (no LTO) and to share cache with pr_build_linux.yml.
run: |
cd native && cargo build --profile ci
env:
RUSTFLAGS: "-Ctarget-cpu=x86-64-v3"

- name: Save Cargo cache
uses: actions/cache/save@v5
if: github.ref == 'refs/heads/main'
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}

- name: Upload native library
uses: actions/upload-artifact@v7
with:
name: native-lib-delta-regression
path: native/target/ci/libcomet.so
retention-days: 1

delta-spark:
Comment on lines +58 to +106
needs: build-native
strategy:
matrix:
os: [ubuntu-24.04]
java-version: [17]
delta-version:
- {full: '3.3.2', spark-short: '3.5', scala: '2.13', module: 'spark'}
- {full: '4.0.0', spark-short: '4.0', scala: '2.13', module: 'spark'}
- {full: '2.4.0', spark-short: '3.4', scala: '2.12', module: 'core'}
fail-fast: false
name: delta-regression/${{ matrix.os }}/delta-${{ matrix.delta-version.full }}/java-${{ matrix.java-version }}
runs-on: ${{ matrix.os }}
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v6
- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: ${{ matrix.java-version }}
- name: Download native library
uses: actions/download-artifact@v8
with:
name: native-lib-delta-regression
path: native/target/release/
- name: Build Comet
run: |
./mvnw install -Prelease -DskipTests -Pspark-${{ matrix.delta-version.spark-short }}
- name: Setup Delta Lake
uses: ./.github/actions/setup-delta-builder
with:
delta-version: ${{ matrix.delta-version.full }}
- name: Run Comet smoke test (fail fast)
# Verify Comet is actually wired into Delta's test SparkSession before
# running the full suite. Catches silent config drift where the plugin
# is on the classpath but not applied to query plans.
run: |
cd delta-lake
build/sbt "${{ matrix.delta-version.module }}/testOnly org.apache.spark.sql.delta.CometSmokeTest"
- name: Run Delta Lake Spark tests
run: |
cd delta-lake
build/sbt "${{ matrix.delta-version.module }}/test"
Comment on lines +107 to +152
136 changes: 136 additions & 0 deletions .github/workflows/delta_spark_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: Delta Lake Native Scan Tests

concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

permissions:
contents: read

on:
push:
branches:
- main
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/test/scala/org/apache/spark/sql/benchmark/**"
pull_request:
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/test/scala/org/apache/spark/sql/benchmark/**"
workflow_dispatch:

env:
RUST_VERSION: stable
RUST_BACKTRACE: 1

jobs:
build-native:
name: Build Native Library
runs-on: ubuntu-24.04
container:
image: amd64/rust
steps:
- uses: actions/checkout@v6

- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: 17

- name: Restore Cargo cache
uses: actions/cache/restore@v5
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-

- name: Build native library
run: |
cd native && cargo build --profile ci
env:
RUSTFLAGS: "-Ctarget-cpu=x86-64-v3"

- name: Save Cargo cache
uses: actions/cache/save@v5
if: github.ref == 'refs/heads/main'
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}

- name: Upload native library
uses: actions/upload-artifact@v7
with:
name: native-lib-delta
path: native/target/ci/libcomet.so
retention-days: 1

delta-native-suite:
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
needs: build-native
strategy:
matrix:
os: [ubuntu-24.04]
java-version: [17]
spark-version:
- {short: '3.4', full: '3.4.3'}
- {short: '3.5', full: '3.5.8'}
- {short: '4.0', full: '4.0.1'}
fail-fast: false
name: delta-native/${{ matrix.os }}/spark-${{ matrix.spark-version.full }}/java-${{ matrix.java-version }}
runs-on: ${{ matrix.os }}
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v6
- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: ${{ matrix.java-version }}
- name: Download native library
uses: actions/download-artifact@v8
with:
name: native-lib-delta
path: native/target/debug/
- name: Run CometDeltaNativeSuite
run: |
./mvnw -Pspark-${{ matrix.spark-version.short }} -pl spark -am test \
-Dsuites=org.apache.comet.CometDeltaNativeSuite \
-Dmaven.gitcommitid.skip
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
5 changes: 5 additions & 0 deletions .github/workflows/pr_build_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,11 @@ jobs:
org.apache.spark.sql.comet.ParquetEncryptionITCase
org.apache.comet.exec.CometNativeReaderSuite
org.apache.comet.CometIcebergNativeSuite
org.apache.comet.CometDeltaNativeSuite
org.apache.comet.CometDeltaColumnMappingSuite
org.apache.comet.CometDeltaAdvancedSuite
org.apache.comet.CometDeltaRowTrackingSuite
org.apache.comet.CometFuzzDeltaSuite
- name: "csv"
value: |
org.apache.comet.csv.CometCsvNativeReadSuite
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/pr_build_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,11 @@ jobs:
org.apache.spark.sql.comet.ParquetEncryptionITCase
org.apache.comet.exec.CometNativeReaderSuite
org.apache.comet.CometIcebergNativeSuite
org.apache.comet.CometDeltaNativeSuite
org.apache.comet.CometDeltaColumnMappingSuite
org.apache.comet.CometDeltaAdvancedSuite
org.apache.comet.CometDeltaRowTrackingSuite
org.apache.comet.CometFuzzDeltaSuite
- name: "csv"
value: |
org.apache.comet.csv.CometCsvNativeReadSuite
Expand Down
Loading