Skip to content

Conversation

@XiaoHongbo-Hope
Copy link
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope commented Jan 12, 2026

Purpose

When using REST catalog with blob-as-descriptor, we have two FileIO instances:

  1. Table FileIO (RESTTokenFileIO) - uses REST data token
  2. External OSS blob files - uses user OSS credentials

RESTTokenFileIO updates catalog_options dynamically in _initialize_oss_fs() when refreshing token. This would pollute the blob reader's configuration. So try to use a copy of catalog options in uri_reader_factory.

Java: Token updates in RESTTokenFileIO.fileIO() create a new Options object.

Tests

API and Format

Documentation

@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] fix rest token and blob-as-descriptor use same token issue and add rest catalog blob-as-descriptor sample [python] fix rest file io and blob-as-descriptor file io use same token issue and add rest catalog blob-as-descriptor sample Jan 12, 2026
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] fix rest file io and blob-as-descriptor file io use same token issue and add rest catalog blob-as-descriptor sample [python] fix rest file io and blob-as-descriptor file io token merge issue and add rest catalog blob-as-descriptor sample Jan 12, 2026
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review January 12, 2026 08:37
self.logger = logging.getLogger(__name__)
scheme, netloc, _ = self.parse_location(path)
self.uri_reader_factory = UriReaderFactory(catalog_options)
self.uri_reader_factory = UriReaderFactory(catalog_options.copy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need copy? What logic would update it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need copy? What logic would update it?

When using REST catalog with blob-as-descriptor, we have two FileIO instances:

  1. Table FileIO (RESTTokenFileIO) - uses REST data token
  2. External OSS blob files - uses user OSS credentials

RESTTokenFileIO updates catalog_options dynamically in _initialize_oss_fs() when refreshing token. This would pollute the blob reader's configuration. So try to use a copy of catalog options in uri_reader_factory.

Java: Token updates in RESTTokenFileIO.fileIO() create a new Options object.

Updated this info in PR description too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RESTTokenFileIO should not update catalog_options, should copy a new options, I think java also do like this?

@JingsongLi
Copy link
Contributor

+1

@JingsongLi JingsongLi merged commit f24cf39 into apache:master Jan 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants