Skip to content

Decrement refcount for encoded set items#991

Merged
jcrist merged 3 commits intojcrist:mainfrom
ishmandoo:main
Apr 8, 2026
Merged

Decrement refcount for encoded set items#991
jcrist merged 3 commits intojcrist:mainfrom
ishmandoo:main

Conversation

@ishmandoo
Copy link
Copy Markdown
Contributor

Summary

This fixes a reference leak in the set encoders used by both JSON and MessagePack.

json_encode_set and mpack_encode_set iterate with PyIter_Next, which returns a new reference for each item. Those item references were not decref'd after successful encoding, so set elements could be kept alive longer than intended.

Changes

  • add missing Py_DECREF(item) in json_encode_set
  • add missing Py_DECREF(item) in mpack_encode_set
  • preserve correct cleanup on encode failure
  • add a regression test covering both msgspec.json and msgspec.msgpack

Testing

  • rebuilt the extension locally
  • ran a focused regression script confirming set items are collectable after encoding for both protocols

Notes

This script reproduces the bug. Repeatedly encoding sets causes the memory to quickly climb to several GB.

import msgspec

encoder = msgspec.msgpack.Encoder()

for i in range(50_000):
    encoded = encoder.encode(set(range(10_000)))

@Siyet
Copy link
Copy Markdown
Collaborator

Siyet commented Apr 8, 2026

Ported to the community fork msgspec-arise: PR #29

@jcrist jcrist merged commit b5040d5 into jcrist:main Apr 8, 2026
21 checks passed
@jcrist
Copy link
Copy Markdown
Owner

jcrist commented Apr 8, 2026

Thanks, this is in! I pushed a few extra fixups, but this was mostly good-to-go. I plan to merge a few other PRs before cutting a release today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants