Skip to content

Decode Hive partition values in listing tables#23226

Open
Kevin-Li-2025 wants to merge 1 commit into
apache:mainfrom
Kevin-Li-2025:kevin/percent-decode-hive-partitions
Open

Decode Hive partition values in listing tables#23226
Kevin-Li-2025 wants to merge 1 commit into
apache:mainfrom
Kevin-Li-2025:kevin/percent-decode-hive-partitions

Conversation

@Kevin-Li-2025

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

Hive-style partition values can contain percent-encoded characters in object-store paths, such as %2F for / or %20 for a space. parse_partitions_for_path currently returns those encoded bytes literally, so listing tables expose foo%2Fbar instead of foo/bar.

What changes are included in this PR?

  • Percent-decode extracted partition values in parse_partitions_for_path.
  • Return Cow<str> from the parser so unchanged values keep the borrowed fast path and decoded values can be owned only when needed.
  • Fall back to the original raw partition value if percent decoding does not produce valid UTF-8, rather than dropping the file from listing results.
  • Add helper-level and PartitionedFile conversion tests for decoded partition values.

Are these changes tested?

  • cargo fmt --all --check
  • cargo test -p datafusion-catalog-listing

Signed-off-by: Kevin-Li-2025 <2242139@qq.com>
@github-actions github-actions Bot added the catalog Related to the catalog crate label Jun 28, 2026
@Kevin-Li-2025 Kevin-Li-2025 marked this pull request as ready for review June 29, 2026 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Partition values are not URL-decoded when extracted from Hive-style paths

1 participant