Skip to content

Comments

Native serialization to a stream for FlatIndex#275

Open
razdoburdin wants to merge 9 commits intointel:mainfrom
razdoburdin:serialization_flat_index
Open

Native serialization to a stream for FlatIndex#275
razdoburdin wants to merge 9 commits intointel:mainfrom
razdoburdin:serialization_flat_index

Conversation

@razdoburdin
Copy link
Contributor

@razdoburdin razdoburdin commented Feb 16, 2026

This PR adds native serialization to a stream for FlatIndex.

The core changes are:

  1. svs::io::v1::Writer is now CRTP base class for svs::io::v1::StreamWriter and svs::io::v1::FileWriter.
  2. svs::lib::Archiver was added as CRTP base class for svs::lib::DirectoryArchiver and svs::lib::StreamArchiver. Each of DirectoryArchiver and StreamArchiver have it's own magicnumber being used to determine if the model was serialized with the old or new format.
  3. For serialization/deserialization of toml::table intermediate stringstream is used. The related memory overhead looks acceptable as far as size of toml::table is small related to size of data.

@razdoburdin razdoburdin marked this pull request as draft February 16, 2026 12:32
@razdoburdin razdoburdin marked this pull request as ready for review February 17, 2026 07:48
@mihaic mihaic requested a review from Copilot February 17, 2026 20:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new native “save/load to std::(i/o)stream” path for FlatIndex by adding stream-based archiving/serialization primitives and updating the Flat orchestrator to use them instead of temp-directory packing.

Changes:

  • Add svs::lib::Archiver CRTP base and svs::lib::StreamArchiver to support stream-friendly serialization of metadata (TOML) and binary payloads.
  • Add save_to_stream / load_from_stream APIs and wire FlatIndex + Flat orchestrator save/load to use streams.
  • Refactor native IO writer into CRTP Writer with FileWriter and StreamWriter specializations.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
include/svs/orchestrators/exhaustive.h Switch Flat save/load from temp-dir packing to direct stream-based save/load.
include/svs/lib/stream.h New StreamArchiver for reading/writing TOML tables to/from streams.
include/svs/lib/saveload/save.h Add helpers and public API for saving serialized objects to streams.
include/svs/lib/saveload/load.h Add stream-based deserialization entry points and load_from_stream.
include/svs/lib/file.h Refactor DirectoryArchiver to reuse the new Archiver base; simplify file extraction logic.
include/svs/lib/archiver.h New CRTP base for common archiver operations (sizes/names/stream copy).
include/svs/index/flat/flat.h Add FlatIndex::save(std::ostream&) using the new stream save pipeline.
include/svs/core/io/native.h Refactor writer into CRTP base + FileWriter/StreamWriter.
include/svs/core/data/simple.h Extend dataset serialization to support stream save/load and stream-oriented metadata.
include/svs/core/data/io.h Add dataset save/load helpers operating directly on std::istream/std::ostream.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 834 to 854
inline ContextFreeSerializedObject begin_deserialization(std::istream& stream) {
lib::StreamArchiver::size_type magic = 0;
lib::StreamArchiver::read_size(stream, magic);
if (magic == lib::DirectoryArchiver::magic_number) {
// Backward compatibility mode for older versions
lib::StreamArchiver::size_type num_files = 0;
lib::StreamArchiver::read_size(stream, num_files);

std::string file_name;
lib::StreamArchiver::read_name(stream, file_name);
} else if (magic != lib::StreamArchiver::magic_number) {
throw ANNEXCEPTION("Invalid magic number in stream deserialization!");
}

if (!stream) {
throw ANNEXCEPTION("Error reading from stream!");
}

auto table = lib::StreamArchiver::read_table(stream);
return ContextFreeSerializedObject{std::move(table)};
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “backward compatibility mode” for DirectoryArchiver streams is incomplete/incorrect: it reads num_files and the first file_name, but does not consume the rest of the directory archive (file sizes + contents) nor reconstruct the directory/context needed by the old on-disk format. After this, read_table() will interpret the next bytes (likely a file size) as the TOML table size and fail. Either fully unpack() the directory archive into a temp directory and load via the disk-based path/context, or remove this branch and fail fast with a clear error.

Copilot uses AI. Check for mistakes.
Comment on lines +905 to +918
///// load_from_stream
template <typename T, typename... Args>
T load_from_stream(const Loader<T>& loader, std::istream& stream, Args&&... args) {
// At this point, we will try the saving/loading framework to load the object.
// Here we go!
return lib::load(
loader, detail::begin_deserialization(stream), stream, SVS_FWD(args)...
);
}

template <typename T, typename... Args>
T load_from_stream(std::istream& stream, Args&&... args) {
return lib::load_from_stream(Loader<T>(), stream, SVS_FWD(args)...);
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New stream serialization/deserialization entry points (save_to_stream, load_from_stream, begin_deserialization(std::istream&)) are introduced here, but there don’t appear to be corresponding unit tests alongside the existing saveload tests. Please add coverage that round-trips a representative object through a std::stringstream (and, if backward compatibility is intended, a test for loading the legacy DirectoryArchiver-packed stream).

Copilot uses AI. Check for mistakes.
Copy link
Member

@ibhati ibhati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall! 👍
Could you also add some unit tests to cover the new logic? It would be great to include a few flat-index integration tests as well to confirm that both the existing flows and the new flow continue to behave as expected.

@razdoburdin
Copy link
Contributor Author

This looks good overall! 👍 Could you also add some unit tests to cover the new logic? It would be great to include a few flat-index integration tests as well to confirm that both the existing flows and the new flow continue to behave as expected.

I have added tests for FlatIndex save-load with both native-stream serialization and creating of intermediate files with the following unpacking of the payload to the stream.

@razdoburdin razdoburdin requested a review from ibhati February 20, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants