-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](broker load)fix wildcard import failure caused by 0-byte metadata files #59486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
d7134e0 to
98b984c
Compare
|
run buildall |
TPC-H: Total hot run time: 34667 ms |
TPC-DS: Total hot run time: 173437 ms |
ClickBench: Total hot run time: 26.84 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
98b984c to
41ecdde
Compare
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 36219 ms |
TPC-DS: Total hot run time: 174263 ms |
ClickBench: Total hot run time: 26.94 s |
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
| List<TBrokerFileStatus> filteredFileStatuses = Lists.newArrayList(); | ||
| for (TBrokerFileStatus fstatus : fileStatuses) { | ||
| if (fstatus.getSize() == 0 && isBinaryFileFormat) { | ||
| boolean isSuccessFile = fstatus.path.endsWith("/_SUCCESS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是不是可以类似这个pr:(#59398)把下划线和点开头的文件都过滤一下?如果担心影响比较大,可以在broker load的option里加一个开关。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
支持一个file Group中指定不同的文件类型后,这里其实不过滤也没有问题了。目前是为了保险起见增加了这个_SUCCESS文件的特殊过滤。
What problem does this PR solve?
Problem Summary:
When performing a Broker Load with wildcards (e.g.,
/*), if the directory contains0-byte metadata files like
_SUCCESS, the FE would incorrectly use the format/compressionof these metadata files (usually PLAIN) to overwrite the shared parameters for the
entire file group. This caused the BE to read compressed data files (like LZO) as
plain text, leading to import failures.
This PR fixes the issue by:
_SUCCESSfiles inBrokerLoadPendingTaskto avoid processingthem as data.
format_typeandcompress_typein eachTFileRangeDescinstead ofthe shared
TFileScanRangeParamsin FE (both legacy and Nereids paths).format_typein BE'sCsvReaderto ensure the correctreader is initialized for each file.
_SUCCESSfilteringRelease note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)