Skip to content

refactor(pyamber): share receiver-batch construction in partitioners#5989

Draft
Ma77Ball wants to merge 1 commit into
apache:mainfrom
Ma77Ball:refactor/partitioner-batch-helpers
Draft

refactor(pyamber): share receiver-batch construction in partitioners#5989
Ma77Ball wants to merge 1 commit into
apache:mainfrom
Ma77Ball:refactor/partitioner-batch-helpers

Conversation

@Ma77Ball

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

  • Added Partitioner.build_receiver_batches(channels) to the base class, the single home for the ordered (receiver, batch) list built via dict.fromkeys (which preserves channel order where a set literal would not).
  • Replaced the three identical inline copies of that construction in RoundRobinPartitioner, HashBasedShufflePartitioner, and RangeBasedShufflePartitioner with a call to the shared helper, dropping the drifted copy-pasted comments.
  • No behavior change: each partitioner still builds the same ordered, deduplicated receiver list.

Any related issues, documentation, discussions?

Closes: #5988

How was this PR tested?

  • Run cd amber && PYTHONPATH=src/main/python python -m pytest src/test/python/core/architecture/sendsemantics/test_partitioners.py -q, expect all 35 tests to pass.
  • The order/dedup behavior owned by the new helper is covered by the existing TestRoundRobinPartitioner::test_init_preserves_channel_order and test_init_dedupes_duplicate_channels_preserving_first_seen_order; confirm they still pass against the refactored construction.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.8 in compliance with ASF

@github-actions

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @Yicong-Huang
    You can notify them by mentioning @Yicong-Huang in a comment.

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.23%. Comparing base (0e0ec11) to head (8f9db91).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #5989   +/-   ##
=========================================
  Coverage     56.23%   56.23%           
  Complexity     2987     2987           
=========================================
  Files          1120     1120           
  Lines         43167    43170    +3     
  Branches       4658     4658           
=========================================
+ Hits          24274    24277    +3     
  Misses        17472    17472           
  Partials       1421     1421           
Flag Coverage Δ *Carryforward flag
access-control-service 70.00% <ø> (ø) Carriedforward from 0e0ec11
agent-service 44.59% <ø> (ø) Carriedforward from 0e0ec11
amber 57.77% <ø> (ø) Carriedforward from 0e0ec11
computing-unit-managing-service 0.00% <ø> (ø) Carriedforward from 0e0ec11
config-service 51.56% <ø> (ø) Carriedforward from 0e0ec11
file-service 59.02% <ø> (ø) Carriedforward from 0e0ec11
frontend 49.27% <ø> (ø) Carriedforward from 0e0ec11
notebook-migration-service 78.57% <ø> (ø) Carriedforward from 0e0ec11
pyamber 90.16% <100.00%> (-0.05%) ⬇️
python 90.75% <ø> (-0.01%) ⬇️ Carriedforward from 0e0ec11
workflow-compiling-service 55.14% <ø> (ø) Carriedforward from 0e0ec11

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown
Contributor

✅ No material benchmark regressions detected

🟢 2 better · 🔴 0 worse · ⚪ 13 noise (<±5%) · 0 without baseline

Compared against main 0e0ec11 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🟢 bs=10 sw=10 sl=64 399 0.244 24,549/34,741/34,741 us 🟢 -9.0% / 🔴 +126.6%
bs=100 sw=10 sl=64 783 0.478 125,764/150,060/150,060 us ⚪ within ±5% / 🔴 +35.7%
bs=1000 sw=10 sl=64 917 0.56 1,092,138/1,151,254/1,151,254 us ⚪ within ±5% / 🔴 +8.2%
Baseline details

Latest main 0e0ec11 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 399 tuples/sec 409 tuples/sec 757.16 tuples/sec -2.4% -47.3%
bs=10 sw=10 sl=64 MB/s 0.244 MB/s 0.25 MB/s 0.462 MB/s -2.4% -47.2%
bs=10 sw=10 sl=64 p50 24,549 us 24,286 us 12,971 us +1.1% +89.3%
bs=10 sw=10 sl=64 p95 34,741 us 38,189 us 15,333 us -9.0% +126.6%
bs=10 sw=10 sl=64 p99 34,741 us 38,189 us 18,732 us -9.0% +85.5%
bs=100 sw=10 sl=64 throughput 783 tuples/sec 795 tuples/sec 957.66 tuples/sec -1.5% -18.2%
bs=100 sw=10 sl=64 MB/s 0.478 MB/s 0.485 MB/s 0.585 MB/s -1.4% -18.2%
bs=100 sw=10 sl=64 p50 125,764 us 124,983 us 103,982 us +0.6% +20.9%
bs=100 sw=10 sl=64 p95 150,060 us 154,087 us 110,583 us -2.6% +35.7%
bs=100 sw=10 sl=64 p99 150,060 us 154,087 us 118,369 us -2.6% +26.8%
bs=1000 sw=10 sl=64 throughput 917 tuples/sec 906 tuples/sec 979.6 tuples/sec +1.2% -6.4%
bs=1000 sw=10 sl=64 MB/s 0.56 MB/s 0.553 MB/s 0.598 MB/s +1.3% -6.3%
bs=1000 sw=10 sl=64 p50 1,092,138 us 1,103,034 us 1,024,553 us -1.0% +6.6%
bs=1000 sw=10 sl=64 p95 1,151,254 us 1,147,957 us 1,063,789 us +0.3% +8.2%
bs=1000 sw=10 sl=64 p99 1,151,254 us 1,147,957 us 1,096,239 us +0.3% +5.0%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,500.72,200,128000,399,0.244,24549.34,34741.23,34741.23
1,100,10,64,20,2553.10,2000,1280000,783,0.478,125763.58,150059.92,150059.92
2,1000,10,64,20,21805.97,20000,12800000,917,0.560,1092137.56,1151253.80,1151253.80

@Ma77Ball Ma77Ball marked this pull request as draft June 28, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pyamber refactor Refactor the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deduplicate receiver-batch construction across shuffle partitioners

2 participants