Skip to content

[AURON #2257] Avoid URI reparsing in JNI Hadoop paths#2264

Open
zhtttylz wants to merge 2 commits into
apache:masterfrom
zhtttylz:fix-hadoop-fs-path-hash
Open

[AURON #2257] Avoid URI reparsing in JNI Hadoop paths#2264
zhtttylz wants to merge 2 commits into
apache:masterfrom
zhtttylz:fix-hadoop-fs-path-hash

Conversation

@zhtttylz
Copy link
Copy Markdown

@zhtttylz zhtttylz commented May 12, 2026

Which issue does this PR close?

Closes #2257

Rationale for this change

Auron's JNI Hadoop file wrappers currently reconstruct Hadoop paths with new Path(new URI(path)).
This does not preserve Hadoop Path(String) semantics before the path is passed back to FileSystem.

When a raw Hadoop path string contains a literal #, Java URI parsing treats the suffix after # as a fragment, so the actual Hadoop path is truncated.

For example, the intended path:

hdfs://mycluster/auron-it-hdfs-rbf-repro/raw#mini.txt

is opened as:

/auron-it-hdfs-rbf-repro/raw

What changes are included in this PR?

This PR stops reparsing Hadoop path strings through java.net.URI in JniBridge.
The path reconstruction is changed from:

- new Path(new URI(path))
+ new Path(path)

This preserves Hadoop Path(String) semantics.
Add a regression test for JNI Hadoop file wrapper path handling when the path contains a literal #.

Are there any user-facing changes?

This fixes a bug where Hadoop paths containing a literal # could be truncated.

No new APIs, configs, or migration steps are required.

How was this patch tested?

Ran the focused Java regression test:

mvn -pl auron-core -am -Pspark-3.5 -Pscala-2.12 -Ppre \
  -DskipBuildNative \
  -Dtest=org.apache.auron.jni.JniBridgeTest \
   test

Result:

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
BUILD SUCCESS

@zhtttylz zhtttylz marked this pull request as draft May 12, 2026 12:04
@zhtttylz zhtttylz marked this pull request as ready for review May 14, 2026 08:38
@cxzl25 cxzl25 requested a review from Copilot May 14, 2026 08:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in Auron’s JNI Hadoop file wrappers where paths containing a literal # could be truncated due to java.net.URI fragment parsing, and adds Java regression coverage to prevent recurrence.

Changes:

  • Adjusted JNI bridge path handling to avoid fragment truncation when # appears in the path string.
  • Added JniBridgeTest regression tests covering literal # handling and percent-encoding behavior.
  • Added a test-scoped Hadoop runtime dependency to support the new unit test.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
auron-core/src/main/java/org/apache/auron/jni/JniBridge.java Changes how input/output paths are converted to Hadoop Path objects to avoid # fragment truncation.
auron-core/src/test/java/org/apache/auron/jni/JniBridgeTest.java Adds regression tests asserting # is preserved and that read/write path encoding behavior is stable.
auron-core/pom.xml Adds hadoop-client-runtime as a test dependency to compile/run the new test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 74 to 76
public static FSDataInputWrapper openFileAsDataInputWrapper(FileSystem fs, String path) throws Exception {
// the path is a URI string, so we need to convert it to a URI object
return FSDataInputWrapper.wrap(fs.open(new Path(new URI(path))));
return FSDataInputWrapper.wrap(fs.open(toInputPath(path)));
}
Comment on lines 78 to 85
public static FSDataOutputWrapper createFileAsDataOutputWrapper(FileSystem fs, String path) throws Exception {
return FSDataOutputWrapper.wrap(fs.create(new Path(new URI(path))));
return FSDataOutputWrapper.wrap(fs.create(new Path(path)));
}

private static Path toInputPath(String path) throws URISyntaxException {
String safePath = path.indexOf('#') >= 0 ? path.replace("#", "%23") : path;
return new Path(new URI(safePath));
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JNI Hadoop file wrappers truncate paths containing literal #

2 participants