[SPARK-57056][SQL] Add SupportsBranching DSv2 interface and branching DDL#56102
[SPARK-57056][SQL] Add SupportsBranching DSv2 interface and branching DDL#56102viirya wants to merge 1 commit into
SupportsBranching DSv2 interface and branching DDL#56102Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Could you address two scalastyle check failures?
Scalastyle checks failed at following occurrences:
[error] /__w/spark-1/spark-1/sql/core/src/test/scala/org/apache/spark/sql/connector/SupportsBranchingSuite.scala:154:24:
[error] Are you sure that you want to use toUpperCase or toLowerCase without the root locale? In most cases, you
[error] should use toUpperCase(Locale.ROOT) or toLowerCase(Locale.ROOT) instead.
[error] If you must use toUpperCase or toLowerCase without the root locale, wrap the code block with
[error] // scalastyle:off caselocale
[error] .toUpperCase
[error] .toLowerCase
[error] // scalastyle:on caselocale
[error]
[error] /__w/spark-1/spark-1/sql/core/src/test/scala/org/apache/spark/sql/connector/SupportsBranchingSuite.scala:149:68: Use java.util.Set.of() instead.
[error] Total time: 76 s (0:01:16.0), completed May 25, 2026, 6:42:24 PM
SupportsBranching DSv2 interface and branching DDL
|
|
||
| package org.apache.spark.sql.connector.catalog; | ||
|
|
||
| import java.util.NoSuchElementException; |
There was a problem hiding this comment.
Does this interface need to rely on NoSuchElementException?
There was a problem hiding this comment.
Good catch — relying on the generic JDK NoSuchElementException makes the interface semantically weaker and inconsistent with the rest of the DSv2 API (e.g. NoSuchTableException, NoSuchPartitionException).
Fixed in the latest push: dedicated typed exceptions BranchNotFoundException and InvalidFastForwardException are now defined alongside BranchAlreadyExistsException, and the java.util.NoSuchElementException import is gone.
| /** | ||
| * Thrown when a branch with the requested name already exists. | ||
| */ | ||
| class BranchAlreadyExistsException extends RuntimeException { |
There was a problem hiding this comment.
What about something like BranchNotFound or InvalidFastForward?
There was a problem hiding this comment.
Done. The latest push adds both BranchNotFoundException and InvalidFastForwardException as nested classes alongside BranchAlreadyExistsException, following the same naming pattern (Exception suffix). Each method's @throws javadoc now references the new typed exception.
| override def fastForward(branchName: String, targetBranchName: String): TableBranch = { | ||
| val current = branches.get(branchName) | ||
| if (current == null) { | ||
| throw new java.util.NoSuchElementException(s"Branch not found: $branchName") |
There was a problem hiding this comment.
Agreed — switched to the typed BranchNotFoundException introduced in SupportsBranching (per your comment on the interface).
| } | ||
| val target = branches.get(targetBranchName) | ||
| if (target == null) { | ||
| throw new java.util.NoSuchElementException(s"Branch not found: $targetBranchName") |
There was a problem hiding this comment.
ditto.
Please correct me if this is the standard of DSv2 and Apache Iceberg.
There was a problem hiding this comment.
Same fix — switching to BranchNotFoundException. The IllegalArgumentException for the fast-forward direction check is also now InvalidFastForwardException for the same reason.
On the DSv2 / Iceberg convention question:
- Spark's DSv2 surface uses typed
NoSuch*Exceptionclasses (NoSuchTableException,NoSuchPartitionException,NoSuchNamespaceException) for missing-entity errors; the plain JDKNoSuchElementExceptionis not used. - Iceberg doesn't have a standardized DSv2 interface for branching at all — internally it uses
ValidationException/IllegalArgumentExceptioninManageSnapshots/ branching procedures, but that's an Iceberg-internal choice rather than a DSv2 convention. Since this PR is proposing the DSv2 interface, I think we should follow Spark's DSv2 naming here.
… DDL
Add a SupportsBranching mix-in interface for DSv2 tables so data sources
can expose table branching through standard Spark SQL DDL:
ALTER TABLE t CREATE [OR REPLACE] BRANCH [IF NOT EXISTS] name
[AS OF VERSION <id>]
ALTER TABLE t DROP BRANCH [IF EXISTS] name
ALTER TABLE t FAST FORWARD branch TO target
SHOW BRANCHES (FROM|IN) t
The interface defines createBranch / dropBranch / fastForward /
listBranches operations and a TableBranch value type. Reads and writes
against a specific branch are not part of this change.
Includes parser grammar, logical plans, analyzer dispatch through
ResolvedTable, exec nodes, an in-memory implementation on InMemoryTable
for testing, and unit + integration tests.
Co-authored-by: Claude Code
|
Thanks for the review @dongjoon-hyun! Force-pushed
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Thank you for updating.
Like TableCatalog interface, please make all newly-added exceptions (e.g. BranchNotFoundException) extend AnalysisException and define them in org.apache.spark.sql.catalyst.analysis package.
| | ALTER TABLE identifierReference | ||
| DROP BRANCH (IF EXISTS)? branch=identifier #dropBranch | ||
| | ALTER TABLE identifierReference | ||
| FAST FORWARD branch=identifier TO target=identifier #fastForwardBranch |
There was a problem hiding this comment.
For SQL syntax:
- When we use this as
verb, this should be one-word,fast-forwardorfastforward. - To be consistent with
CREATE BRANCHandDROP BRNCH, we may need to defineFASTFORWARD BRANCHlike the following.
CREATE BRANCH X
DROP BRNCH X
FASTFORWARD BRANCH X
What changes were proposed in this pull request?
Add a standard DSv2 mix-in interface
SupportsBranching, plus the SQL surface to manage branches:With companion value type
TableBranch(name, snapshotId, creationTimeMs)andSupportsBranching.BranchAlreadyExistsExceptionfor the duplicate-create case.New DDL:
Implementation:
TableBranchvalue type undersql/catalyst/.../connector/catalog/.BRANCH,BRANCHES,FAST,FORWARDas non-reserved keywords; updatedocs/sql-ref-ansi-compliance.md.CreateBranch/DropBranch/FastForwardBranch/ShowBranches) and exec nodes; dispatch throughResolvedTableinDataSourceV2Strategy.DataSourceV2Implicits.asBranchableandQueryCompilationErrors.tableDoesNotSupportBranchingErrorso non-branching tables fail with a clear message.CREATE_BRANCH_WITH_IF_NOT_EXISTS_AND_REPLACEfor the conflicting clauses.SupportsBranchingonInMemoryTablefor testing.Reads and writes against a specific branch (
SELECT ... FOR BRANCH 'x',INSERT ... FOR BRANCH 'x') are out of scope here and are added in the follow-up SPARK-57057.Why are the changes needed?
Apache Iceberg and similar table formats support named branches as a first-class concept, but Spark today only exposes branching through connector-specific SQL extensions (e.g.
IcebergSparkSessionExtensions). A standard DSv2 interface lets any data source expose branching through built-in Spark SQL, the same waySupportsDelete/TruncatableTable/SupportsRowLevelOperationsstandardize their respective capabilities.Does this PR introduce any user-facing change?
Yes. New SQL DDL is recognized:
Data sources that do not implement
SupportsBranchingare unaffected — running the new DDL against them raisesAnalysisException("does not support branching"). Four new non-reserved keywords (BRANCH,BRANCHES,FAST,FORWARD) are added; they remain usable as identifiers in non-DDL contexts.How was this patch tested?
DDLParserSuite: 5 new parser tests covering all four DDL forms and theIF NOT EXISTS/OR REPLACEconflict error.SupportsBranchingSuite: 12 new integration tests exercising the new DDL end-to-end againstInMemoryTable, including positive cases for each operation and negative cases for duplicates, missing branches, fast-forward direction, and non-branching tables.DDLParserSuitestill pass;TableIdentifierParserSuite(7 tests) confirms the new keywords don't regress identifier handling.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.7)