Skip to content

chipingress/batch: compute batch-split sizes incrementally#2106

Draft
pkcll wants to merge 1 commit into
mainfrom
chipingress-batch-split-improvement
Draft

chipingress/batch: compute batch-split sizes incrementally#2106
pkcll wants to merge 1 commit into
mainfrom
chipingress-batch-split-improvement

Conversation

@pkcll
Copy link
Copy Markdown
Contributor

@pkcll pkcll commented May 30, 2026

This branch has two related changes to the chipingress batch client.

1. Compute batch-split sizes incrementally (O(n) instead of O(n²))

splitMessagesByRequestSize re-serialized the entire growing batch on every message (newBatchRequestproto.Size), so splitting an n-event batch walked 1+2+…+n events' worth of bytes — O(n²). This accumulates each event's repeated-field contribution incrementally instead, sizing the batch in O(n).

A CloudEventBatch encodes events as a repeated field, so its serialized size is a fixed base plus the independent per-event contribution (tag + length-delimited message). eventFieldSize computes that via protowire and matches proto.Size exactly. Subsequent split batches also no longer pre-allocate a full-length backing array.

Benchmarks

BenchmarkSplitMessagesByRequestSize (~10 events per sub-batch, forcing many splits):

n before (quadratic) after (linear) speedup
10 2940 ns / 1208 B / 22 allocs 580 ns / 168 B / 3 allocs ~5x
100 30767 ns / 15536 B / 255 allocs 6335 ns / 4160 B / 56 allocs ~4.9x
1000 322229 ns / 157616 B / 2558 allocs 64900 ns / 43040 B / 559 allocs ~5x

The old implementation is kept in the test file as the quadratic benchmark baseline so the comparison is reproducible:

go test ./pkg/chipingress/batch -bench BenchmarkSplitMessagesByRequestSize -benchmem -run '^$'

Tests

  • eventFieldSize accumulation equals proto.Size for the whole batch (guards the additive-size assumption).
  • Every produced sub-batch fits within the limit by real proto.Size, with no events dropped across splits.
  • Existing split tests unchanged and passing (-race).

2. Raise default batchSize from 10 to 100

NewBatchClient's default batchSize was 10; raised to 100 for ~10x more events per PublishBatch under sustained load. This is safe alongside change #1 since larger batches that exceed maxGRPCRequestSize are now split in O(n).

  • batchInterval (100ms) still bounds latency for partial batches, so this only affects per-RPC event count under load, not tail latency.
  • 100 stays below the default messageBuffer (200), so the size trigger still fires under load — no other default needs retuning.
  • Callers can still override via WithBatchSize.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 30, 2026

📊 API Diff Results

No changes detected for module github.com/smartcontractkit/chainlink-common/pkg/chipingress

View full report

@pkcll pkcll changed the title chipingress/batch: split batches in O(n) instead of O(n²) chipingress/batch: compute batch-split sizes incrementally May 30, 2026
…tchSize

splitMessagesByRequestSize re-serialized the entire growing batch on every
message (newBatchRequest -> proto.Size), so splitting an n-event batch walked
1+2+...+n events worth of bytes. Accumulate each event's repeated-field
contribution incrementally instead, sizing the batch in O(n). eventFieldSize
computes the per-event cost via protowire and matches proto.Size exactly;
subsequent split batches no longer pre-allocate a full-length backing array.

Also raise the default NewBatchClient batchSize from 10 to 100 for ~10x more
events per PublishBatch under sustained load. batchInterval (100ms) still bounds
latency for partial batches, 100 stays below the default messageBuffer (200) so
the size trigger still fires, and larger batches that exceed maxGRPCRequestSize
are now split in O(n). Callers can still override via WithBatchSize.

Benchmarks (BenchmarkSplitMessagesByRequestSize, ~10 events/batch):
  n=10    2940ns/1208B/22 allocs -> 580ns/168B/3 allocs
  n=100  30767ns/15536B/255 allocs -> 6335ns/4160B/56 allocs
  n=1000 322229ns/157616B/2558 allocs -> 64900ns/43040B/559 allocs

Tests assert the incremental total equals proto.Size and that every produced
sub-batch fits the limit with no events dropped.
@pkcll pkcll force-pushed the chipingress-batch-split-improvement branch from f781286 to bd237fe Compare May 30, 2026 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant