HDDS-15273. Add OIDC AssumeRoleWithWebIdentity support to Ozone STS#10266
HDDS-15273. Add OIDC AssumeRoleWithWebIdentity support to Ozone STS#10266paf91 wants to merge 6 commits into
Conversation
Includes: - STS-focused OIDC/JWKS validator - OIDC config keys - AssumeRoleWithWebIdentity authorizer request shape - fail-closed default authorizer hook - S3G STS bootstrap auth bypass only for Action=AssumeRoleWithWebIdentity - design doc - unit tests
Includes: - OM protocol/client path for AssumeRoleWithWebIdentity - OM-side JWT validation in preExecute() - sanitized Ratis request without raw WebIdentityToken - WebIdentity-backed STS token identity model - backward-compatible STSTokenIdentifier authType - reuse of existing STS validation path for subsequent S3 requests - S3G STS XML response/routing for WebIdentity - tests proving no raw JWT persistence and replay determinism
|
Thanks @paf91 for the patch. As quick initial feedback: there is a compile error that prevents further test runs. |
My bad, the earlier validation was focused on selected unit/E2E tests and did not run the exact PR compile lane from CI from a clean state |
|
Thanks for the update. Please make sure to check results, there are checkstyle/pmd errors. |
Thanks, reran the relevant checks locally and fixed it. Now CI looks green |
|
Do you plan to add smoke tests that use the new Docker Compose cluster, including Keycloak or another IdP container? |
Not in this PR. I kept this focused on STS runtime + Java integration coverage. There is already a Keycloak Testcontainers IT with a real Keycloak container and real JWT exchange. I agree a Docker Compose smoke test would be useful for packaged config/deployment coverage. If you think it is a blocker, I can add a small one here; otherwise I’d prefer a follow-up JIRA/PR. |
|
Or maybe you can split this into smaller prs and leave this for the discussion of design review and the reference of basic implementation/MVP btw personally I really like this proposal because this make ozone more usable for modern cloud environment. actually I was trying to design this this morning haha |
Thanks, that is exactly the motivation: make Ozone STS usable in OIDC/cloud-native environments while keeping Ranger/Ozone authorizer as the PDP. I can split it if you think that would make review easier.. The split would be like:
My preference is to keep this PR together for now (I am lazy haha), since the pieces are connected already and the current PR already shows the full MVP flow end-to-end. |
|
I can resolve conflicts here if needed @peterxcli |
|
Up to you, or we can just leave it as a discussion. |
What changes were proposed in this pull request?
This PR adds OIDC/WebIdentity support to Apache Ozone STS by implementing
AssumeRoleWithWebIdentityon top of the existing STS runtime.The feature is disabled by default through
ozone.sts.web.identity.enabled=false, so there is no behavior change on upgrade unless explicitly enabled.This change allows clients to exchange a Keycloak/OIDC JWT for temporary S3 credentials. The returned credentials can then be used with normal AWS SigV4 requests and
x-amz-security-token.Architecture
The change extends the existing Ozone STS temporary credential path instead of introducing a parallel S3 authentication model:
x-amz-security-token.Keycloak groups and roles are treated only as identity attributes. Ranger or the configured Ozone authorizer remains the policy decision point.
This PR does not replace Kerberos daemon authentication, does not add OFS OIDC login, does not add CLI device-code login, does not add daemon-to-daemon OIDC authentication, and does not use Keycloak Authorization Services as the Ozone policy engine.
Backward compatibility
AssumeRoleflow is unchanged.AssumeRoletokens remain valid.AuthTypedeserialize asASSUME_ROLE, preserving compatibility with previously-issued tokens.Main implementation points
ozone.sts.web.identity.*configuration keys.AssumeRoleWithWebIdentityrequest/response models and S3G STS XML response handling./stsAction=AssumeRoleWithWebIdentitywhen explicitly enabled.WebIdentityTokenin OMpreExecute()before the Ratis-applied request is created.STSTokenIdentifierwith a backward-compatible auth type for WebIdentity-backed credentials.AssumeRoletoken compatibility.IAccessAuthorizer.generateAssumeRoleWithWebIdentitySessionPolicy(...).Security notes
SecretAccessKeyand temporary credential material follow the existing STS credential handling model.alg=noneis rejected.exp,nbf, andiatare validated.NOT_SUPPORTED_OPERATION.Configuration keys
ozone.sts.web.identity.enabledozone.sts.web.identity.issuer.uriozone.sts.web.identity.jwks.uriozone.sts.web.identity.audienceozone.sts.web.identity.username.claimozone.sts.web.identity.subject.claimozone.sts.web.identity.groups.claimozone.sts.web.identity.roles.claimozone.sts.web.identity.clock.skewozone.sts.web.identity.jwks.refresh.intervalozone.sts.web.identity.jwks.connect.timeoutozone.sts.web.identity.jwks.read.timeoutozone.sts.web.identity.jwks.size.limitozone.sts.web.identity.require.httpsozone.sts.web.identity.allow.insecure.http.for.testsDependency
Design / user docs added
hadoop-hdds/docs/content/design/oidc-assume-role-with-web-identity.mdhadoop-hdds/docs/content/security/OzoneSTSWebIdentityKeycloakRanger.mdWhat is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15273
How was this patch tested?
The patch was tested with focused unit, S3 Gateway, OM, mini-cluster, and Keycloak Testcontainers coverage.
Unit and component tests
Result:
Result:
Result:
Mini-cluster E2E test
The mini-cluster E2E test verifies:
AssumeRoleWithWebIdentity;AccessKeyId,SecretAccessKey, andSessionToken;mvn -Dmaven.repo.local=/tmp/m2-ozone \ -pl hadoop-ozone/integration-test-s3 \ -am \ -DskipShade \ -Dtest=TestAssumeRoleWithWebIdentityEndToEnd \ testResult:
Keycloak Testcontainers integration test
The Keycloak IT starts a real Keycloak container, imports a test realm, obtains a real Keycloak JWT, exchanges it through Ozone STS, and uses the returned temporary credentials for S3 operations.
env DOCKER_HOST=unix:///var/run/docker.sock \ TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE=/var/run/docker.sock \ mvn -Dapi.version=1.44 \ -Dmaven.repo.local=/tmp/m2-ozone \ -pl hadoop-ozone/integration-test-s3 \ -am \ -DskipShade \ -Dtest=TestAssumeRoleWithWebIdentityKeycloakIT \ testResult:
The
-Dapi.version=1.44parameter is used for Docker API compatibility in the local Docker environment.Static check
Result: