-
Notifications
You must be signed in to change notification settings - Fork 185
Enable window_bits and gzip functions for zlib-rs backend #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Byron <[email protected]>
977d847 to
d8a67e7
Compare
|
@folkertdev I'd need your help here 🙏. I think this PR shows that |
|
I would have expected this PR to break CI given that zlib-rs does not currently expose the API to implement those features? I'll need to make a quick release exposing some additional things. We have the functionality, just not the API. |
Actually the test doesn't check for set_dictionary and set_level for that reason, but everything else is exposed. Once we have access to these via the Rust API, the test can be adjusted to also validate them. In theory, it could then be copy-pasted for each of the other implementations. In any case, thanks for your help, it's much appreciated! |
|
Some fun kind of undocumented behavior: the native zlib backends count the dictionary length towards |
|
Maybe @jongiddy has more context, as this question is way over my head. Knowing that some models as built into the GH chat window have access to a lot of context and can do some thinking/planning, I thought maybe sharing their output here can be useful. Maybe not, I really can't even decide that 😅. GPT 4.1 MiniShort answer: probably a deliberate bookkeeping choice in flate2 rather than a spec bug in zlib. zlib's C implementation treats the preset dictionary as part of the stream history and increments its internal total_in/total_out counters to reflect that the stream now "contains" those bytes; flate2's counters reflect only the bytes that were actually passed to the compress()/decompress() calls by the caller (i.e., explicit input), and so do not add in the implicit dictionary length. Why that difference exists (summary)
Is it deliberate?
What I recommend
Suggested issue title/body (you can paste/edit this): Title: Body: Repro:
Observed:
Expected / question:
Workaround: If you’d like, I can help you draft a minimal reproducible Rust snippet to include in the issue (or a short test) showing the differing totals for the two backends. Which backend(s) and flate2 version are you testing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enables window_bits and gzip-formatted compression/decompression functions for the pure Rust zlib-rs backend by changing feature gates from any_zlib (C-based backends only) to any_impl (all backends including Rust implementations).
Key Changes:
- Feature gate updates for
Compress::new_with_window_bits(),Compress::new_gzip(),Decompress::new_with_window_bits(), andDecompress::new_gzip()functions - New comprehensive test suite validating these functions with the zlib-rs backend
- CI workflow updated to run new tests without mingw exclusion (appropriate for pure Rust implementation)
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/mem.rs | Changed feature gates from any_zlib to any_impl for window_bits and gzip functions, enabling them for zlib-rs backend |
| tests/zlib-rs-capabilities.rs | Added comprehensive test suite validating window_bits and gzip functions with various configurations and boundary conditions |
| Cargo.toml | Configured new test with required-features = ["zlib-rs"] to ensure it only runs with zlib-rs backend |
| .github/workflows/main.yml | Removed mingw exclusion for zlib-rs tests (appropriate for pure Rust) and added explicit test invocation for capabilities |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d8a67e7 to
816bd26
Compare
| "window_bits must be within 9 ..= 15" | ||
| ); | ||
| Decompress { | ||
| inner: Inflate::make(zlib_header, window_bits), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work correctly for miniz_oxide. That implementation ignores the window_bits field:
flate2-rs/src/ffi/miniz_oxide.rs
Lines 64 to 72 in 6e25a3f
| fn make(zlib_header: bool, _window_bits: u8) -> Self { | |
| let format = format_from_bool(zlib_header); | |
| Inflate { | |
| inner: InflateState::new_boxed(format), | |
| total_in: 0, | |
| total_out: 0, | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would act differently if it tries to decompress a deflate stream that requires a larger window size than specified yeah, zlib & co would fail while miniz_oxide would not since it's not something miniz_oxide supports. (Since miniz_oxide relies a fair on the buffers having a known size at compile time to help the compiler avoid bounds checks it would likely not be trivial to add without a lot of work while avoiding unsafe or performance penalties and I don't think that would be worth the effort.)
So whether to keep it for miniz_oxide depends if one is fine with the tradeoff of it behaving differently in this niche use case depending on backend or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting window size in general is a niche use case (if it wasn't already exposed I'd argue for it not being part of the API).
For the decoder, ignoring window_bits only means accepting input that should produce an error. But for the encoder, ignoring window_bits means potentially producing output that cannot be read by decoders which actually adhere to the window_bits setting. That seems bad.
|
Can we split
|
|
Absolutely. I also believe that these From what I see, this should only involve |
|
Hmm well I made the current setup work, and the split would probably not really help that much. |
| @@ -0,0 +1,87 @@ | |||
| //! Validate that certain feature-gated functionality is still available. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that this comment says that the tests check that "functionality is still available". There's actually nothing in any of these tests that ensure the window_bits settings actually work.
Several functions were unnecessarily restricted to C-based zlib backends despite working with pure Rust backends like
zlib-rs.Changes
Changed feature gate from
any_zlibtoany_implfor:Compress::new_with_window_bits()- Custom window size for compressionCompress::new_gzip()- Gzip-formatted compressionDecompress::new_with_window_bits()- Custom window size for decompressionDecompress::new_gzip()- Gzip-formatted decompressionThese functions work across all backends because:
make()methods already acceptwindow_bitsparameterwindow_bits + 16convention supported universallyDictionary functions (
set_dictionary,set_level) remainany_zlib-only as they require direct stream manipulation not exposed in zlib-rs stable API.Tests
Added validation for newly enabled functions:
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.