Spark 4.1: Fix async microbatch plan bugs by RjLi13 · Pull Request #15670 · apache/iceberg

RjLi13 · 2026-03-17T22:25:38Z

This PR targets fixes in response to @yingjianwu98 review. See #15299

This addresses in the review:

Fixing comparing iceberg snapshot Id (changed to equality as soon as we hit AvailableNow cap to stop) so we do not stop early since iceberg snapshot ids are random
handle fillQueueFailedThrowable in planFiles to mirror latestOffset

Two more bug fixes

Handle AvailableNow cap during queue preload so that we do not load more into queue than the AvailableNowTrigger cap.
Handle a race condition where the queue can continue filling up while iterating through the copied queueList, but end up using the actual queue's tail instead of the copy, which can be different.

One improvement:

Remove synchronization of queue to natively use LinkedBlockingDeque (and remove tracking tail variable)
Tests written to catch all of this

The async planner feature is disabled for users by default.

RjLi13 · 2026-03-17T23:21:46Z

Regarding @yingjianwu98 comment on #15299 (comment)

Does it make sense to combine refresh and fillQueue in the single thread?

I feel fillQueue will probably want the latest state of the table

I think no, fillQueue should be at least run once per refresh, but we want to run more frequently as refresh might bring more snapshots that fillQueue will go through one by one.

RjLi13 · 2026-03-26T17:48:07Z

@bryanck @singhpk234 please take a look and review, thank you!

RussellSpitzer · 2026-03-31T15:12:43Z

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/AsyncSparkMicroBatchPlanner.java

        queuedRowCount.get());

-    // Convert to list for indexed access
    List<Pair<StreamingOffset, FileScanTask>> queueList = Lists.newArrayList(queue);


Why are we building this list? We can get the tail now with peekLast()

Nvm, I see the issues with this

Let me know if there's anything I can do (comments, etc) that could make it easier to understand

I would probably rename this something like "queueSnapshot" since we are basically trying to extract a point in time view of the queue, the indexing isn't important any more (or the list type)

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/AsyncSparkMicroBatchPlanner.java

RussellSpitzer · 2026-03-31T15:26:20Z

...k/v4.1/spark/src/test/java/org/apache/iceberg/spark/source/TestStructuredStreamingRead3.java

+      for (FileScanTask task : tasks) {
+        expectedBatchCount += 1;
+      }
+    } catch (IOException e) {


We can just throw, we don't need to catch here

RussellSpitzer · 2026-03-31T15:28:27Z

...k/v4.1/spark/src/test/java/org/apache/iceberg/spark/source/TestStructuredStreamingRead3.java

+    table.refresh();
+    int expectedBatchCount = 0;
+    try (CloseableIterable<FileScanTask> tasks = table.newScan().planFiles()) {
+      for (FileScanTask task : tasks) {


Could replace this with Iterables.size(tasks)

RussellSpitzer · 2026-03-31T15:30:36Z

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/AsyncSparkMicroBatchPlanner.java

+
+    // Synchronously add data to the queue to meet our initial constraints.
+    // For Trigger.AvailableNow, constructor-time preload is normally initialized from
+    // latestOffset(...) with no explicit end offset, so bounded preload must stop at the cap.


i'm not sure I understand this comment, what is the cap?

The cap here refers to the AvailableNowTrigger limit, which I understand prevents reading beyond what's available now, even if later there's more data. I would check to make sure initial preloading doesn't cross that.

RjLi13 · 2026-03-31T21:13:57Z

Thank you @RussellSpitzer for the review! I addressed your comments please let me know if I missed anything

RussellSpitzer · 2026-03-31T21:50:35Z

That's it for me, but I really think @bryanck should take a look before we merge

bryanck · 2026-03-31T23:27:17Z

This LGTM as well, thanks for the updates @RjLi13 and for the review @RussellSpitzer !

bryanck · 2026-04-01T14:53:37Z

Also, thanks @yingjianwu98 for your review as well!

RjLi13 · 2026-04-01T17:50:21Z

Thank you @bryanck @yingjianwu98 @singhpk234 @RussellSpitzer for the reviews and helping me get this feature in!

github-actions bot added the spark label Mar 17, 2026

yingjianwu98 approved these changes Mar 18, 2026

View reviewed changes

RjLi13 marked this pull request as ready for review March 20, 2026 17:41

RjLi13 force-pushed the fix-async-microbatch-plan-bugs branch from 8155a32 to ce9b293 Compare March 26, 2026 00:58

RussellSpitzer reviewed Mar 31, 2026

View reviewed changes

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/AsyncSparkMicroBatchPlanner.java Show resolved Hide resolved

RussellSpitzer reviewed Mar 31, 2026

View reviewed changes

RussellSpitzer approved these changes Mar 31, 2026

View reviewed changes

Ruijing Li added 8 commits March 31, 2026 12:06

Spark 4.1: Fix Async Planner Issues in Review

e7d45d7

Fix bounded cap of availableNow

8ac86ba

Add tests

d8469d2

Switch to linkedBlockingDeque

3aeb3b4

Fix Potential Race conditon found

af75d15

Fix Microbatch stream compile issue

0f478a0

Address review comments on test cleanup

3c69286

Fix spotless issue

8cde4b2

RjLi13 force-pushed the fix-async-microbatch-plan-bugs branch from ce9b293 to 8cde4b2 Compare March 31, 2026 19:12

RussellSpitzer requested a review from bryanck March 31, 2026 21:50

Rename queueList to queueSnapshot to portray intention

5eb87b1

bryanck approved these changes Mar 31, 2026

View reviewed changes

bryanck merged commit 8504800 into apache:main Apr 1, 2026
25 checks passed

RjLi13 deleted the fix-async-microbatch-plan-bugs branch April 1, 2026 17:49

This was referenced Apr 2, 2026

Spark 4.0: Backport Aync Micro Batch Planner Feature #15876

Open

Spark 4.1: Focus Coverage of async planner stream tests #15877

Closed

Conversation

RjLi13 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RjLi13 commented Mar 17, 2026

Uh oh!

RjLi13 commented Mar 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RjLi13 commented Mar 31, 2026

Uh oh!

RussellSpitzer commented Mar 31, 2026

Uh oh!

bryanck commented Mar 31, 2026

Uh oh!

Uh oh!

bryanck commented Apr 1, 2026

Uh oh!

RjLi13 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RjLi13 commented Mar 17, 2026 •

edited

Loading