Native serialization to a stream for FlatIndex#275
Native serialization to a stream for FlatIndex#275razdoburdin wants to merge 9 commits intointel:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new native “save/load to std::(i/o)stream” path for FlatIndex by adding stream-based archiving/serialization primitives and updating the Flat orchestrator to use them instead of temp-directory packing.
Changes:
- Add
svs::lib::ArchiverCRTP base andsvs::lib::StreamArchiverto support stream-friendly serialization of metadata (TOML) and binary payloads. - Add
save_to_stream/load_from_streamAPIs and wireFlatIndex+ Flat orchestrator save/load to use streams. - Refactor native IO writer into CRTP
WriterwithFileWriterandStreamWriterspecializations.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| include/svs/orchestrators/exhaustive.h | Switch Flat save/load from temp-dir packing to direct stream-based save/load. |
| include/svs/lib/stream.h | New StreamArchiver for reading/writing TOML tables to/from streams. |
| include/svs/lib/saveload/save.h | Add helpers and public API for saving serialized objects to streams. |
| include/svs/lib/saveload/load.h | Add stream-based deserialization entry points and load_from_stream. |
| include/svs/lib/file.h | Refactor DirectoryArchiver to reuse the new Archiver base; simplify file extraction logic. |
| include/svs/lib/archiver.h | New CRTP base for common archiver operations (sizes/names/stream copy). |
| include/svs/index/flat/flat.h | Add FlatIndex::save(std::ostream&) using the new stream save pipeline. |
| include/svs/core/io/native.h | Refactor writer into CRTP base + FileWriter/StreamWriter. |
| include/svs/core/data/simple.h | Extend dataset serialization to support stream save/load and stream-oriented metadata. |
| include/svs/core/data/io.h | Add dataset save/load helpers operating directly on std::istream/std::ostream. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| inline ContextFreeSerializedObject begin_deserialization(std::istream& stream) { | ||
| lib::StreamArchiver::size_type magic = 0; | ||
| lib::StreamArchiver::read_size(stream, magic); | ||
| if (magic == lib::DirectoryArchiver::magic_number) { | ||
| // Backward compatibility mode for older versions | ||
| lib::StreamArchiver::size_type num_files = 0; | ||
| lib::StreamArchiver::read_size(stream, num_files); | ||
|
|
||
| std::string file_name; | ||
| lib::StreamArchiver::read_name(stream, file_name); | ||
| } else if (magic != lib::StreamArchiver::magic_number) { | ||
| throw ANNEXCEPTION("Invalid magic number in stream deserialization!"); | ||
| } | ||
|
|
||
| if (!stream) { | ||
| throw ANNEXCEPTION("Error reading from stream!"); | ||
| } | ||
|
|
||
| auto table = lib::StreamArchiver::read_table(stream); | ||
| return ContextFreeSerializedObject{std::move(table)}; | ||
| } |
There was a problem hiding this comment.
The “backward compatibility mode” for DirectoryArchiver streams is incomplete/incorrect: it reads num_files and the first file_name, but does not consume the rest of the directory archive (file sizes + contents) nor reconstruct the directory/context needed by the old on-disk format. After this, read_table() will interpret the next bytes (likely a file size) as the TOML table size and fail. Either fully unpack() the directory archive into a temp directory and load via the disk-based path/context, or remove this branch and fail fast with a clear error.
| ///// load_from_stream | ||
| template <typename T, typename... Args> | ||
| T load_from_stream(const Loader<T>& loader, std::istream& stream, Args&&... args) { | ||
| // At this point, we will try the saving/loading framework to load the object. | ||
| // Here we go! | ||
| return lib::load( | ||
| loader, detail::begin_deserialization(stream), stream, SVS_FWD(args)... | ||
| ); | ||
| } | ||
|
|
||
| template <typename T, typename... Args> | ||
| T load_from_stream(std::istream& stream, Args&&... args) { | ||
| return lib::load_from_stream(Loader<T>(), stream, SVS_FWD(args)...); | ||
| } |
There was a problem hiding this comment.
New stream serialization/deserialization entry points (save_to_stream, load_from_stream, begin_deserialization(std::istream&)) are introduced here, but there don’t appear to be corresponding unit tests alongside the existing saveload tests. Please add coverage that round-trips a representative object through a std::stringstream (and, if backward compatibility is intended, a test for loading the legacy DirectoryArchiver-packed stream).
ibhati
left a comment
There was a problem hiding this comment.
This looks good overall! 👍
Could you also add some unit tests to cover the new logic? It would be great to include a few flat-index integration tests as well to confirm that both the existing flows and the new flow continue to behave as expected.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
I have added tests for |
This PR adds native serialization to a stream for
FlatIndex.The core changes are:
svs::io::v1::Writeris now CRTP base class forsvs::io::v1::StreamWriterandsvs::io::v1::FileWriter.svs::lib::Archiverwas added as CRTP base class forsvs::lib::DirectoryArchiverandsvs::lib::StreamArchiver. Each of DirectoryArchiver and StreamArchiver have it's own magicnumber being used to determine if the model was serialized with the old or new format.