Skip to content

[chore](thirdparty)(paimon-cpp) reuse Doris Arrow stack and isolate external headers#60946

Merged
morningman merged 1 commit intoapache:masterfrom
xylaaaaa:fix/paimoncpp-thirdparty-pr60883
Mar 2, 2026
Merged

[chore](thirdparty)(paimon-cpp) reuse Doris Arrow stack and isolate external headers#60946
morningman merged 1 commit intoapache:masterfrom
xylaaaaa:fix/paimoncpp-thirdparty-pr60883

Conversation

@xylaaaaa
Copy link
Contributor

@xylaaaaa xylaaaaa commented Mar 2, 2026

Summary

Split thirdparty-only changes from #60883 into an independent PR, so thirdparty can merge first.

Included Files

  • thirdparty/build-thirdparty.sh
  • thirdparty/download-thirdparty.sh
  • thirdparty/paimon-cpp-cache.cmake
  • thirdparty/patches/apache-arrow-17.0.0-paimon.patch
  • thirdparty/patches/paimon-cpp-buildutils-static-deps.patch

Why Split

  • Keep this PR focused on thirdparty integration only.
  • Reduce rebase/conflict risk for the original feature branch.

Follow-up

  1. Merge this PR first.
  2. Rebase the original feature branch on latest master.
  3. Keep non-thirdparty logic in the original PR.

Copilot AI review requested due to automatic review settings March 2, 2026 07:36
@Thearas
Copy link
Contributor

Thearas commented Mar 2, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@xylaaaaa xylaaaaa changed the title [chore](thirdparty) split paimon-cpp thirdparty changes from pr-60883 [chore](thirdparty)(paimon-cpp) reuse Doris Arrow stack and isolate external headers Mar 2, 2026
@xylaaaaa
Copy link
Contributor Author

xylaaaaa commented Mar 2, 2026

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Splits out thirdparty-only updates needed for paimon-cpp integration (and its Arrow/Parquet requirements), so thirdparty can merge independently ahead of the larger feature work.

Changes:

  • Adds an Apache Arrow 17.0.0 patch with Parquet/Thrift-related fixes used by paimon-cpp.
  • Updates thirdparty build/download scripts to build additional Arrow modules (dataset/acero/filesystem/compute) and apply the new Arrow patch.
  • Adjusts paimon-cpp thirdparty cache + paimon-cpp patching to support reusing Doris-built Arrow (“external Arrow”) instead of building Arrow inside paimon-cpp.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
thirdparty/patches/paimon-cpp-buildutils-static-deps.patch Adds “external Arrow” build path and other paimon-cpp build fixes in the paimon-cpp patchset.
thirdparty/patches/apache-arrow-17.0.0-paimon.patch Introduces Arrow 17.0.0 Parquet/Thrift patches required by paimon-cpp behavior.
thirdparty/paimon-cpp-cache.cmake Switches paimon-cpp configuration to reuse Doris Arrow and adjusts reuse messaging.
thirdparty/download-thirdparty.sh Applies the new Arrow 17.0.0 paimon patch during thirdparty source patching.
thirdparty/build-thirdparty.sh Builds Arrow with additional modules and adjusts paimon-cpp install logic for the external-Arrow path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +128 to +148
+ if(PAIMON_USE_EXTERNAL_ARROW)
+ set(ARROW_INCLUDE_DIR "${CMAKE_CURRENT_BINARY_DIR}/doris_external_arrow_include")
+ file(MAKE_DIRECTORY "${ARROW_INCLUDE_DIR}")
+ if(NOT EXISTS "${ARROW_INCLUDE_DIR}/arrow")
+ execute_process(COMMAND "${CMAKE_COMMAND}" -E create_symlink
+ "${PAIMON_EXTERNAL_ARROW_INCLUDE_DIR}/arrow"
+ "${ARROW_INCLUDE_DIR}/arrow")
+ endif()
+ if(EXISTS "${PAIMON_EXTERNAL_ARROW_INCLUDE_DIR}/parquet"
+ AND NOT EXISTS "${ARROW_INCLUDE_DIR}/parquet")
+ execute_process(COMMAND "${CMAKE_COMMAND}" -E create_symlink
+ "${PAIMON_EXTERNAL_ARROW_INCLUDE_DIR}/parquet"
+ "${ARROW_INCLUDE_DIR}/parquet")
+ endif()
+
+ if(NOT PAIMON_EXTERNAL_ARROW_INCLUDE_DIR)
+ message(FATAL_ERROR
+ "PAIMON_EXTERNAL_ARROW_INCLUDE_DIR must be set when PAIMON_USE_EXTERNAL_ARROW=ON"
+ )
+ endif()
+ if(NOT EXISTS "${PAIMON_EXTERNAL_ARROW_INCLUDE_DIR}")
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PAIMON_USE_EXTERNAL_ARROW branch, PAIMON_EXTERNAL_ARROW_INCLUDE_DIR is used to create symlinks before it is validated. If the variable is empty, this expands to "/arrow"/"/parquet" and can create incorrect symlinks or fail in confusing ways. Validate that PAIMON_EXTERNAL_ARROW_INCLUDE_DIR is set and exists before the execute_process() symlink creation, and only then create the include-tree links.

Copilot uses AI. Check for mistakes.
+ PAIMON_EXTERNAL_ARROW_ACERO_LIB
+ PAIMON_EXTERNAL_PARQUET_LIB
+ PAIMON_EXTERNAL_ARROW_BUNDLED_DEPS_LIB)
+ if(NOT ${_paimon_external_lib})
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(NOT ${_paimon_external_lib}) is unsafe here: when the cache variable is empty it expands to if(NOT ), which is a CMake parse error (instead of producing the intended FATAL_ERROR message). Use a DEFINED/empty-string check on the named variable (e.g., if(NOT DEFINED ... OR "${...}" STREQUAL "")) to fail cleanly.

Suggested change
+ if(NOT ${_paimon_external_lib})
+ if(NOT DEFINED ${_paimon_external_lib} OR "${${_paimon_external_lib}}" STREQUAL "")

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 2, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

PR approved by anyone and no changes requested.

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.56% (19626/37339)
Line Coverage 36.18% (183209/506450)
Region Coverage 32.48% (142179/437693)
Branch Coverage 33.43% (61648/184410)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.52% (26147/36561)
Line Coverage 54.26% (273960/504897)
Region Coverage 51.67% (228293/441830)
Branch Coverage 52.97% (97975/184974)

@morningman morningman merged commit d437efe into apache:master Mar 2, 2026
39 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants