[core] Introduce vector-store for data-evolution table#7240
Open
ColdL wants to merge 2 commits intoapache:masterfrom
Open
[core] Introduce vector-store for data-evolution table#7240ColdL wants to merge 2 commits intoapache:masterfrom
ColdL wants to merge 2 commits intoapache:masterfrom
Conversation
0ef7695 to
5e012b4
Compare
5e012b4 to
4976c76
Compare
leaves12138
reviewed
Feb 26, 2026
...re/src/main/java/org/apache/paimon/append/dataevolution/DataEvolutionCompactCoordinator.java
Outdated
Show resolved
Hide resolved
leaves12138
reviewed
Feb 26, 2026
leaves12138
reviewed
Feb 26, 2026
paimon-core/src/main/java/org/apache/paimon/operation/DataEvolutionSplitRead.java
Outdated
Show resolved
Hide resolved
leaves12138
reviewed
Feb 26, 2026
| .noDefaultValue() | ||
| .withDescription("Specify the vector store fields."); | ||
|
|
||
| public static final ConfigOption<MemorySize> VECTOR_STORE_TARGET_FILE_SIZE = |
Contributor
Author
There was a problem hiding this comment.
Fixed. Now the config names in code are consistent with the public configuration keys.
637e7d6 to
549c2b3
Compare
leaves12138
approved these changes
Feb 27, 2026
Contributor
There was a problem hiding this comment.
LGTM
Thanks for @ColdL , can you rebase the latest master to resolve the conflict
fe6d1c1 to
900cb36
Compare
900cb36 to
1bbc24b
Compare
Contributor
|
I think the PR statement should clearly state a few things:
|
Contributor
|
You can also create a separate doc in |
Contributor
Author
@JingsongLi Thanks for the review! I've updated the PR description. After confirmation, I will continue to update and add the corresponding docs. |
Contributor
|
@ColdL How about:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: update #7011
The goal of this PR is to optimize storage layout for vector scenarios in the Data Evolution table, specifically by storing vector columns and potentially associated columns in specified file formats.
For example, scalar columns can be stored using Parquet format, while vector columns and columns that may require point lookups during vector search can be stored using file formats like Lance.
1. Configuration
This PR introduces three new configuration options:
vector-field: defines the column names for separate storagevector.file.format: defines the file formatvector.target-file-size: specifies the file size threshold for rolling2. Storage Layout
When this feature is enabled, a set of columns specified by
vector-fieldwill be stored separately in the file format specified byvector.file.format, marked by.vector-store.in the data file path.File Path Pattern:
data-xxx-{count}.vector-store.{file-format}This design serves two purposes:
.vector-store.segment identifies these as separately stored column groups.{file-format}follows current conventions, using the file format as the suffixNote: Perhaps
.vector.is better than.vector-store., if confirmed, I will update this accordingly. Please see the discussion below for details.The final storage layout might be:
These vector-store files are associated with regular columns through Row-tracking / Data Evolution.
Tests
API and Format
Documentation