Skip to content

perf(pg-protocol): encode length-prefixed strings in a single pass#3681

Open
kylecannon wants to merge 2 commits into
brianc:masterfrom
kylecannon:pr/pg-protocol-single-pass-write
Open

perf(pg-protocol): encode length-prefixed strings in a single pass#3681
kylecannon wants to merge 2 commits into
brianc:masterfrom
kylecannon:pr/pg-protocol-single-pass-write

Conversation

@kylecannon
Copy link
Copy Markdown

What

Adds Writer.addInt32PrefixedString, which writes a value's Int32 byte-length prefix immediately followed by its UTF-8 bytes, computing Buffer.byteLength once. The previous addInt32(Buffer.byteLength(s)).addString(s) pairing scanned each string three times (once for the prefix length, again inside addString, then the encode). The new method is used in two places:

  1. Bind parameter values (serializer.writeValues), the common write path for parameterized queries.
  2. The SASL initial response message.

Why

Bind is on the hot path for every parameterized query, and the cost of the redundant Buffer.byteLength scans grows with parameter size. Doing the length prefix and the UTF-8 write in one pass removes two of the three string scans for each string parameter.

Correctness

The serialized wire output is byte-identical to the previous addInt32(Buffer.byteLength(s)).addString(s) sequence:

  • The Int32 length prefix is the UTF-8 byte length (Buffer.byteLength), not the character count, exactly as before.
  • this.ensure(4 + len) is called before capturing the local buffer reference, so a backing-buffer growth cannot write through a stale reference.
  • addInt32 and addString are left unchanged for all other callers.

No public API or behavior changes. The new method is purely additive.

Benchmark

Measured with an in-process serialization microbenchmark (no network), alternating between this branch and base each round to cancel thermal drift:

case improvement
bind, 2 small string params about +16% ops/sec
bind, 10 mixed params about +23% ops/sec
bind, multi-byte unicode param about +14% ops/sec
full insert (parse + bind) about +9% ops/sec

The gain grows with parameter size, since the eliminated scans are proportional to string length.

Testing

The pg-protocol test suite passes. This change adds a multi-byte unicode bind unit test that asserts the Int32 length prefix equals the UTF-8 byte length rather than the character count (the exact property the single-pass write must preserve, which the existing ASCII-only bind tests could not catch).

Add Writer.addInt32PrefixedString, which writes a value's Int32 byte-length prefix
immediately followed by its UTF-8 bytes, computing Buffer.byteLength once. The
previous `addInt32(Buffer.byteLength(s)).addString(s)` pairing scanned each string
three times. Used for Bind parameter values and the SASL initial response. Wire
output is byte-identical; addInt32/addString are unchanged for other callers.

Benchmark (packages/pg-protocol/bench/write-bench.js, alternating-sampled vs base):
  bind(2 small)    +16%
  bind(10 mixed)   +23%
  bind(unicode)    +14%
  full insert seq   +9%

Adds a multi-byte unicode bind unit test asserting the Int32 prefix equals the
UTF-8 byte length, not the char length.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread packages/pg-protocol/src/outbound-serializer.test.ts Outdated
@charmander
Copy link
Copy Markdown
Collaborator

Other than that, looks good. Can you share the benchmark you used?

Co-authored-by: Charmander <~@charmander.me>
@kylecannon
Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants