Skip to content

[SPARK-57056][SQL] Add SupportsBranching DSv2 interface and branching DDL#56102

Open
viirya wants to merge 1 commit into
apache:masterfrom
viirya:SPARK-57056
Open

[SPARK-57056][SQL] Add SupportsBranching DSv2 interface and branching DDL#56102
viirya wants to merge 1 commit into
apache:masterfrom
viirya:SPARK-57056

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented May 25, 2026

What changes were proposed in this pull request?

Add a standard DSv2 mix-in interface SupportsBranching, plus the SQL surface to manage branches:

public interface SupportsBranching extends Table {
    TableBranch createBranch(String name, OptionalLong sourceSnapshotId);
    default TableBranch replaceBranch(String name, OptionalLong sourceSnapshotId);
    boolean dropBranch(String name);
    TableBranch fastForward(String branch, String targetBranch);
    default TableBranch[] listBranches();
}

With companion value type TableBranch(name, snapshotId, creationTimeMs) and SupportsBranching.BranchAlreadyExistsException for the duplicate-create case.

New DDL:

ALTER TABLE t CREATE [OR REPLACE] BRANCH [IF NOT EXISTS] name [AS OF VERSION <integer>]
ALTER TABLE t DROP BRANCH [IF EXISTS] name
ALTER TABLE t FAST FORWARD branch TO target
SHOW BRANCHES (FROM | IN) t

Implementation:

  • Define the interface and TableBranch value type under sql/catalyst/.../connector/catalog/.
  • Extend the ANTLR grammar with the four DDL forms; register BRANCH, BRANCHES, FAST, FORWARD as non-reserved keywords; update docs/sql-ref-ansi-compliance.md.
  • Add logical plans (CreateBranch / DropBranch / FastForwardBranch / ShowBranches) and exec nodes; dispatch through ResolvedTable in DataSourceV2Strategy.
  • Add DataSourceV2Implicits.asBranchable and QueryCompilationErrors.tableDoesNotSupportBranchingError so non-branching tables fail with a clear message.
  • Add error condition CREATE_BRANCH_WITH_IF_NOT_EXISTS_AND_REPLACE for the conflicting clauses.
  • Implement SupportsBranching on InMemoryTable for testing.

Reads and writes against a specific branch (SELECT ... FOR BRANCH 'x', INSERT ... FOR BRANCH 'x') are out of scope here and are added in the follow-up SPARK-57057.

Why are the changes needed?

Apache Iceberg and similar table formats support named branches as a first-class concept, but Spark today only exposes branching through connector-specific SQL extensions (e.g. IcebergSparkSessionExtensions). A standard DSv2 interface lets any data source expose branching through built-in Spark SQL, the same way SupportsDelete / TruncatableTable / SupportsRowLevelOperations standardize their respective capabilities.

Does this PR introduce any user-facing change?

Yes. New SQL DDL is recognized:

ALTER TABLE t CREATE [OR REPLACE] BRANCH [IF NOT EXISTS] name [AS OF VERSION <integer>]
ALTER TABLE t DROP BRANCH [IF EXISTS] name
ALTER TABLE t FAST FORWARD branch TO target
SHOW BRANCHES (FROM | IN) t

Data sources that do not implement SupportsBranching are unaffected — running the new DDL against them raises AnalysisException ("does not support branching"). Four new non-reserved keywords (BRANCH, BRANCHES, FAST, FORWARD) are added; they remain usable as identifiers in non-DDL contexts.

How was this patch tested?

  • DDLParserSuite: 5 new parser tests covering all four DDL forms and the IF NOT EXISTS / OR REPLACE conflict error.
  • SupportsBranchingSuite: 12 new integration tests exercising the new DDL end-to-end against InMemoryTable, including positive cases for each operation and negative cases for duplicates, missing branches, fast-forward direction, and non-branching tables.
  • All 152 existing tests in DDLParserSuite still pass; TableIdentifierParserSuite (7 tests) confirms the new keywords don't regress identifier handling.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you address two scalastyle check failures?

Scalastyle checks failed at following occurrences:
[error] /__w/spark-1/spark-1/sql/core/src/test/scala/org/apache/spark/sql/connector/SupportsBranchingSuite.scala:154:24: 
[error]       Are you sure that you want to use toUpperCase or toLowerCase without the root locale? In most cases, you
[error]       should use toUpperCase(Locale.ROOT) or toLowerCase(Locale.ROOT) instead.
[error]       If you must use toUpperCase or toLowerCase without the root locale, wrap the code block with
[error]       // scalastyle:off caselocale
[error]       .toUpperCase
[error]       .toLowerCase
[error]       // scalastyle:on caselocale
[error]     
[error] /__w/spark-1/spark-1/sql/core/src/test/scala/org/apache/spark/sql/connector/SupportsBranchingSuite.scala:149:68: Use java.util.Set.of() instead.
[error] Total time: 76 s (0:01:16.0), completed May 25, 2026, 6:42:24 PM

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-57056][SQL] Add SupportsBranching DSv2 interface and branching DDL [SPARK-57056][SQL] Add SupportsBranching DSv2 interface and branching DDL May 26, 2026

package org.apache.spark.sql.connector.catalog;

import java.util.NoSuchElementException;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this interface need to rely on NoSuchElementException?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — relying on the generic JDK NoSuchElementException makes the interface semantically weaker and inconsistent with the rest of the DSv2 API (e.g. NoSuchTableException, NoSuchPartitionException).

Fixed in the latest push: dedicated typed exceptions BranchNotFoundException and InvalidFastForwardException are now defined alongside BranchAlreadyExistsException, and the java.util.NoSuchElementException import is gone.

/**
* Thrown when a branch with the requested name already exists.
*/
class BranchAlreadyExistsException extends RuntimeException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like BranchNotFound or InvalidFastForward?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The latest push adds both BranchNotFoundException and InvalidFastForwardException as nested classes alongside BranchAlreadyExistsException, following the same naming pattern (Exception suffix). Each method's @throws javadoc now references the new typed exception.

override def fastForward(branchName: String, targetBranchName: String): TableBranch = {
val current = branches.get(branchName)
if (current == null) {
throw new java.util.NoSuchElementException(s"Branch not found: $branchName")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wired to me.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — switched to the typed BranchNotFoundException introduced in SupportsBranching (per your comment on the interface).

}
val target = branches.get(targetBranchName)
if (target == null) {
throw new java.util.NoSuchElementException(s"Branch not found: $targetBranchName")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Please correct me if this is the standard of DSv2 and Apache Iceberg.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fix — switching to BranchNotFoundException. The IllegalArgumentException for the fast-forward direction check is also now InvalidFastForwardException for the same reason.

On the DSv2 / Iceberg convention question:

  • Spark's DSv2 surface uses typed NoSuch*Exception classes (NoSuchTableException, NoSuchPartitionException, NoSuchNamespaceException) for missing-entity errors; the plain JDK NoSuchElementException is not used.
  • Iceberg doesn't have a standardized DSv2 interface for branching at all — internally it uses ValidationException / IllegalArgumentException in ManageSnapshots / branching procedures, but that's an Iceberg-internal choice rather than a DSv2 convention. Since this PR is proposing the DSv2 interface, I think we should follow Spark's DSv2 naming here.

… DDL

Add a SupportsBranching mix-in interface for DSv2 tables so data sources
can expose table branching through standard Spark SQL DDL:

  ALTER TABLE t CREATE [OR REPLACE] BRANCH [IF NOT EXISTS] name
      [AS OF VERSION <id>]
  ALTER TABLE t DROP BRANCH [IF EXISTS] name
  ALTER TABLE t FAST FORWARD branch TO target
  SHOW BRANCHES (FROM|IN) t

The interface defines createBranch / dropBranch / fastForward /
listBranches operations and a TableBranch value type. Reads and writes
against a specific branch are not part of this change.

Includes parser grammar, logical plans, analyzer dispatch through
ResolvedTable, exec nodes, an in-memory implementation on InMemoryTable
for testing, and unit + integration tests.

Co-authored-by: Claude Code
@viirya
Copy link
Copy Markdown
Member Author

viirya commented May 26, 2026

Thanks for the review @dongjoon-hyun!

Force-pushed a13783190d4 addressing all four review comments:

  • Removed the dependency on java.util.NoSuchElementException; introduced typed BranchNotFoundException and InvalidFastForwardException alongside BranchAlreadyExistsException, following the existing DSv2 NoSuch*Exception naming pattern.
  • Updated InMemoryTable.fastForward and DropBranchExec to throw the new typed exceptions.
  • Fixed the two scalastyle errors (toLowerCase without Locale.ROOT and util.Collections.emptySet()) in SupportsBranchingSuite.
  • Updated the suite to intercept the new typed exceptions.

sql/Test/scalastyle is clean locally and the full SupportsBranchingSuite still passes. PTAL when you get a chance.

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating.

Like TableCatalog interface, please make all newly-added exceptions (e.g. BranchNotFoundException) extend AnalysisException and define them in org.apache.spark.sql.catalyst.analysis package.

import org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException;
import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;

class NoSuchTableException private (
message: String,
cause: Option[Throwable],
errorClass: Option[String],
messageParameters: Map[String, String])
extends AnalysisException(

| ALTER TABLE identifierReference
DROP BRANCH (IF EXISTS)? branch=identifier #dropBranch
| ALTER TABLE identifierReference
FAST FORWARD branch=identifier TO target=identifier #fastForwardBranch
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SQL syntax:

  1. When we use this as verb, this should be one-word, fast-forward or fastforward.
  2. To be consistent with CREATE BRANCH and DROP BRNCH, we may need to define FASTFORWARD BRANCH like the following.
CREATE BRANCH X
DROP BRNCH X
FASTFORWARD BRANCH X

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants