Skip to content

HDDS-15273. Add OIDC AssumeRoleWithWebIdentity support to Ozone STS#10266

Open
paf91 wants to merge 6 commits into
apache:HDDS-13323-stsfrom
paf91:HDDS-15273-oidc-webidentity-sts
Open

HDDS-15273. Add OIDC AssumeRoleWithWebIdentity support to Ozone STS#10266
paf91 wants to merge 6 commits into
apache:HDDS-13323-stsfrom
paf91:HDDS-15273-oidc-webidentity-sts

Conversation

@paf91
Copy link
Copy Markdown

@paf91 paf91 commented May 14, 2026

What changes were proposed in this pull request?

This PR adds OIDC/WebIdentity support to Apache Ozone STS by implementing AssumeRoleWithWebIdentity on top of the existing STS runtime.

The feature is disabled by default through ozone.sts.web.identity.enabled=false, so there is no behavior change on upgrade unless explicitly enabled.

This change allows clients to exchange a Keycloak/OIDC JWT for temporary S3 credentials. The returned credentials can then be used with normal AWS SigV4 requests and x-amz-security-token.

Architecture

The change extends the existing Ozone STS temporary credential path instead of introducing a parallel S3 authentication model:

  • Keycloak/OIDC authenticates the caller by issuing a signed JWT.
  • Ozone STS validates the JWT using issuer, audience, expiry, and JWKS signature checks.
  • OM, not S3G, is the authoritative JWT validator.
  • OM authorizes role assumption through the configured Ozone authorizer / Ranger path.
  • OM issues temporary S3 credentials using the existing STS token infrastructure.
  • Subsequent S3 requests continue to use normal SigV4 plus x-amz-security-token.
  • Subsequent S3 authorization continues through the existing session-policy / Ranger authorization path.

Keycloak groups and roles are treated only as identity attributes. Ranger or the configured Ozone authorizer remains the policy decision point.

This PR does not replace Kerberos daemon authentication, does not add OFS OIDC login, does not add CLI device-code login, does not add daemon-to-daemon OIDC authentication, and does not use Keycloak Authorization Services as the Ozone policy engine.

Backward compatibility

  • Existing AssumeRole flow is unchanged.
  • Existing permanent S3 secret flow is unchanged.
  • Existing S3 SigV4 behavior is unchanged.
  • Previously-issued AssumeRole tokens remain valid.
  • Tokens serialized without AuthType deserialize as ASSUME_ROLE, preserving compatibility with previously-issued tokens.
  • The feature is disabled by default.

Main implementation points

  • Adds an STS-focused OIDC/JWT validation module with JWKS caching and refresh.
  • Adds ozone.sts.web.identity.* configuration keys.
  • Adds AssumeRoleWithWebIdentity request/response models and S3G STS XML response handling.
  • Adds a tightly scoped unauthenticated bootstrap bypass only for /sts Action=AssumeRoleWithWebIdentity when explicitly enabled.
  • Validates and strips raw WebIdentityToken in OM preExecute() before the Ratis-applied request is created.
  • The Ratis-applied request contains only sanitized identity/session fields, token expiry metadata, and token fingerprint.
  • Extends STSTokenIdentifier with a backward-compatible auth type for WebIdentity-backed credentials.
  • Keeps existing AssumeRole token compatibility.
  • Adds an authorizer hook for WebIdentity role assumption:
    IAccessAuthorizer.generateAssumeRoleWithWebIdentitySessionPolicy(...).

Security notes

  • S3G does not become the source of truth for JWT identity.
  • Raw OIDC JWTs are not persisted in the Ratis-applied request, OM metadata, STS tokens, or logs.
  • WebIdentity temporary credentials require the existing STS session token validation path.
  • SecretAccessKey and temporary credential material follow the existing STS credential handling model.
  • Operators must still protect OM/Ratis logs and metadata files.
  • alg=none is rejected.
  • Issuer, audience, exp, nbf, and iat are validated.
  • JWKS fetch is bounded by connect timeout, read timeout, and size limit.
  • Unknown-kid refresh is debounced and fails closed.
  • Insecure HTTP issuer/JWKS usage is test-only and emits an OM startup warning when explicitly enabled.
  • Deployments without a WebIdentity-capable Ranger/Ozone authorizer override fail closed with NOT_SUPPORTED_OPERATION.

Configuration keys

  • ozone.sts.web.identity.enabled
  • ozone.sts.web.identity.issuer.uri
  • ozone.sts.web.identity.jwks.uri
  • ozone.sts.web.identity.audience
  • ozone.sts.web.identity.username.claim
  • ozone.sts.web.identity.subject.claim
  • ozone.sts.web.identity.groups.claim
  • ozone.sts.web.identity.roles.claim
  • ozone.sts.web.identity.clock.skew
  • ozone.sts.web.identity.jwks.refresh.interval
  • ozone.sts.web.identity.jwks.connect.timeout
  • ozone.sts.web.identity.jwks.read.timeout
  • ozone.sts.web.identity.jwks.size.limit
  • ozone.sts.web.identity.require.https
  • ozone.sts.web.identity.allow.insecure.http.for.tests

Dependency

  • Uses Nimbus JOSE + JWT for OIDC/JWT/JWKS validation.

Design / user docs added

  • hadoop-hdds/docs/content/design/oidc-assume-role-with-web-identity.md
  • hadoop-hdds/docs/content/security/OzoneSTSWebIdentityKeycloakRanger.md

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15273

How was this patch tested?

The patch was tested with focused unit, S3 Gateway, OM, mini-cluster, and Keycloak Testcontainers coverage.

Unit and component tests

mvn -Dmaven.repo.local=/tmp/m2-ozone \
  -pl hadoop-ozone/common \
  -am \
  -DskipITs \
  -DskipShade \
  -Dtest='TestOidcJwtIdentityProvider,TestAssumeRoleWithWebIdentityRequest,TestAssumeRoleResponseInfo' \
  test

Result:

Tests run: 36, Failures: 0, Errors: 0, Skipped: 0
mvn -Dmaven.repo.local=/tmp/m2-ozone \
  -pl hadoop-ozone/s3gateway \
  -am \
  -DskipITs \
  -DskipShade \
  -Dtest='TestS3STSEndpoint,TestS3STSWebIdentityAuthBypassFilter,TestAuthorizationFilter' \
  test

Result:

Tests run: 44, Failures: 0, Errors: 0, Skipped: 0
mvn -Dmaven.repo.local=/tmp/m2-ozone \
  -pl hadoop-ozone/ozone-manager \
  -am \
  -DskipITs \
  -DskipShade \
  -Dtest='TestSTSTokenSecretManager,TestSTSSecurityUtil,TestS3AssumeRoleWithWebIdentityRequest,TestS3AssumeRoleRequest,TestS3AssumeRoleResponse,TestSTSTokenIdentifier' \
  test

Result:

Tests run: 77, Failures: 0, Errors: 0, Skipped: 0

Mini-cluster E2E test

The mini-cluster E2E test verifies:

  • generated JWT + local JWKS;
  • AssumeRoleWithWebIdentity;
  • temporary AccessKeyId, SecretAccessKey, and SessionToken;
  • real AWS SDK v2 S3 SigV4 request with session token;
  • allowed bucket succeeds;
  • denied bucket fails;
  • wrong/missing session credentials fail.
mvn -Dmaven.repo.local=/tmp/m2-ozone \
  -pl hadoop-ozone/integration-test-s3 \
  -am \
  -DskipShade \
  -Dtest=TestAssumeRoleWithWebIdentityEndToEnd \
  test

Result:

Tests run: 4, Failures: 0, Errors: 0, Skipped: 0

Keycloak Testcontainers integration test

The Keycloak IT starts a real Keycloak container, imports a test realm, obtains a real Keycloak JWT, exchanges it through Ozone STS, and uses the returned temporary credentials for S3 operations.

env DOCKER_HOST=unix:///var/run/docker.sock \
  TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE=/var/run/docker.sock \
  mvn -Dapi.version=1.44 \
  -Dmaven.repo.local=/tmp/m2-ozone \
  -pl hadoop-ozone/integration-test-s3 \
  -am \
  -DskipShade \
  -Dtest=TestAssumeRoleWithWebIdentityKeycloakIT \
  test

Result:

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

The -Dapi.version=1.44 parameter is used for Docker API compatibility in the local Docker environment.

Static check

git diff --check origin/HDDS-13323-sts..HEAD

Result:

No output / clean

paf91 added 4 commits May 14, 2026 09:14
Includes:
- STS-focused OIDC/JWKS validator
- OIDC config keys
- AssumeRoleWithWebIdentity authorizer request shape
- fail-closed default authorizer hook
- S3G STS bootstrap auth bypass only for Action=AssumeRoleWithWebIdentity
- design doc
- unit tests
Includes:

- OM protocol/client path for AssumeRoleWithWebIdentity

- OM-side JWT validation in preExecute()

- sanitized Ratis request without raw WebIdentityToken

- WebIdentity-backed STS token identity model

- backward-compatible STSTokenIdentifier authType

- reuse of existing STS validation path for subsequent S3 requests

- S3G STS XML response/routing for WebIdentity

- tests proving no raw JWT persistence and replay determinism
@peterxcli peterxcli self-requested a review May 14, 2026 07:17
@adoroszlai
Copy link
Copy Markdown
Contributor

Thanks @paf91 for the patch. As quick initial feedback: there is a compile error that prevents further test runs.

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 14, 2026

Thanks @paf91 for the patch. As quick initial feedback: there is a compile error that prevents further test runs.

My bad, the earlier validation was focused on selected unit/E2E tests and did not run the exact PR compile lane from CI from a clean state

@adoroszlai
Copy link
Copy Markdown
Contributor

Thanks for the update. Please make sure to check results, there are checkstyle/pmd errors.

https://github.com/paf91/ozone/actions/runs/25856843411

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 14, 2026

Thanks for the update. Please make sure to check results, there are checkstyle/pmd errors.

https://github.com/paf91/ozone/actions/runs/25856843411

Thanks, reran the relevant checks locally and fixed it. Now CI looks green

@peterxcli
Copy link
Copy Markdown
Member

Do you plan to add smoke tests that use the new Docker Compose cluster, including Keycloak or another IdP container?

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 14, 2026

Do you plan to add smoke tests that use the new Docker Compose cluster, including Keycloak or another IdP container?

Not in this PR. I kept this focused on STS runtime + Java integration coverage.

There is already a Keycloak Testcontainers IT with a real Keycloak container and real JWT exchange.

I agree a Docker Compose smoke test would be useful for packaged config/deployment coverage. If you think it is a blocker, I can add a small one here; otherwise I’d prefer a follow-up JIRA/PR.

@peterxcli
Copy link
Copy Markdown
Member

Or maybe you can split this into smaller prs and leave this for the discussion of design review and the reference of basic implementation/MVP

btw personally I really like this proposal because this make ozone more usable for modern cloud environment. actually I was trying to design this this morning haha

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 14, 2026

btw personally I really like this proposal because this make ozone more usable for modern cloud environment. actually I was trying to design this this morning haha

Thanks, that is exactly the motivation: make Ozone STS usable in OIDC/cloud-native environments while keeping Ranger/Ozone authorizer as the PDP.

I can split it if you think that would make review easier.. The split would be like:

  1. OIDC/JWKS + config + design doc
  2. AssumeRoleWithWebIdentity runtime
  3. E2E + Keycloak IT + docs + compose smoke test

My preference is to keep this PR together for now (I am lazy haha), since the pieces are connected already and the current PR already shows the full MVP flow end-to-end.

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 16, 2026

I can resolve conflicts here if needed @peterxcli

@peterxcli
Copy link
Copy Markdown
Member

Up to you, or we can just leave it as a discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants