[chore](thirdparty)(paimon-cpp) reuse Doris Arrow stack and isolate external headers#60946
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Splits out thirdparty-only updates needed for paimon-cpp integration (and its Arrow/Parquet requirements), so thirdparty can merge independently ahead of the larger feature work.
Changes:
- Adds an Apache Arrow 17.0.0 patch with Parquet/Thrift-related fixes used by paimon-cpp.
- Updates thirdparty build/download scripts to build additional Arrow modules (dataset/acero/filesystem/compute) and apply the new Arrow patch.
- Adjusts paimon-cpp thirdparty cache + paimon-cpp patching to support reusing Doris-built Arrow (“external Arrow”) instead of building Arrow inside paimon-cpp.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| thirdparty/patches/paimon-cpp-buildutils-static-deps.patch | Adds “external Arrow” build path and other paimon-cpp build fixes in the paimon-cpp patchset. |
| thirdparty/patches/apache-arrow-17.0.0-paimon.patch | Introduces Arrow 17.0.0 Parquet/Thrift patches required by paimon-cpp behavior. |
| thirdparty/paimon-cpp-cache.cmake | Switches paimon-cpp configuration to reuse Doris Arrow and adjusts reuse messaging. |
| thirdparty/download-thirdparty.sh | Applies the new Arrow 17.0.0 paimon patch during thirdparty source patching. |
| thirdparty/build-thirdparty.sh | Builds Arrow with additional modules and adjusts paimon-cpp install logic for the external-Arrow path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| + if(PAIMON_USE_EXTERNAL_ARROW) | ||
| + set(ARROW_INCLUDE_DIR "${CMAKE_CURRENT_BINARY_DIR}/doris_external_arrow_include") | ||
| + file(MAKE_DIRECTORY "${ARROW_INCLUDE_DIR}") | ||
| + if(NOT EXISTS "${ARROW_INCLUDE_DIR}/arrow") | ||
| + execute_process(COMMAND "${CMAKE_COMMAND}" -E create_symlink | ||
| + "${PAIMON_EXTERNAL_ARROW_INCLUDE_DIR}/arrow" | ||
| + "${ARROW_INCLUDE_DIR}/arrow") | ||
| + endif() | ||
| + if(EXISTS "${PAIMON_EXTERNAL_ARROW_INCLUDE_DIR}/parquet" | ||
| + AND NOT EXISTS "${ARROW_INCLUDE_DIR}/parquet") | ||
| + execute_process(COMMAND "${CMAKE_COMMAND}" -E create_symlink | ||
| + "${PAIMON_EXTERNAL_ARROW_INCLUDE_DIR}/parquet" | ||
| + "${ARROW_INCLUDE_DIR}/parquet") | ||
| + endif() | ||
| + | ||
| + if(NOT PAIMON_EXTERNAL_ARROW_INCLUDE_DIR) | ||
| + message(FATAL_ERROR | ||
| + "PAIMON_EXTERNAL_ARROW_INCLUDE_DIR must be set when PAIMON_USE_EXTERNAL_ARROW=ON" | ||
| + ) | ||
| + endif() | ||
| + if(NOT EXISTS "${PAIMON_EXTERNAL_ARROW_INCLUDE_DIR}") |
There was a problem hiding this comment.
In the PAIMON_USE_EXTERNAL_ARROW branch, PAIMON_EXTERNAL_ARROW_INCLUDE_DIR is used to create symlinks before it is validated. If the variable is empty, this expands to "/arrow"/"/parquet" and can create incorrect symlinks or fail in confusing ways. Validate that PAIMON_EXTERNAL_ARROW_INCLUDE_DIR is set and exists before the execute_process() symlink creation, and only then create the include-tree links.
| + PAIMON_EXTERNAL_ARROW_ACERO_LIB | ||
| + PAIMON_EXTERNAL_PARQUET_LIB | ||
| + PAIMON_EXTERNAL_ARROW_BUNDLED_DEPS_LIB) | ||
| + if(NOT ${_paimon_external_lib}) |
There was a problem hiding this comment.
if(NOT ${_paimon_external_lib}) is unsafe here: when the cache variable is empty it expands to if(NOT ), which is a CMake parse error (instead of producing the intended FATAL_ERROR message). Use a DEFINED/empty-string check on the named variable (e.g., if(NOT DEFINED ... OR "${...}" STREQUAL "")) to fail cleanly.
| + if(NOT ${_paimon_external_lib}) | |
| + if(NOT DEFINED ${_paimon_external_lib} OR "${${_paimon_external_lib}}" STREQUAL "") |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Summary
Split thirdparty-only changes from #60883 into an independent PR, so
thirdpartycan merge first.Included Files
thirdparty/build-thirdparty.shthirdparty/download-thirdparty.shthirdparty/paimon-cpp-cache.cmakethirdparty/patches/apache-arrow-17.0.0-paimon.patchthirdparty/patches/paimon-cpp-buildutils-static-deps.patchWhy Split
thirdpartyintegration only.Follow-up
master.