T keithyao/batch purge Adds ABAC support for purge command with batched token requests#577
T keithyao/batch purge Adds ABAC support for purge command with batched token requests#577keithy1012 wants to merge 15 commits intoAzure:mainfrom
Conversation
|
@keithy1012 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
cmd/acr/purge.go
Outdated
|
|
||
| // Request a token that covers all repositories in this batch | ||
| // The token scope will include: repository:repo1:pull repository:repo1:delete ... for each repo | ||
| if err := acrClient.RefreshTokenForAbac(ctx, batch); err != nil { |
There was a problem hiding this comment.
The token is only refreshed once at the start of the requests. This seems like it could be problematic as I presume these tokens time out. You should have some strategy to rotate this as time goes on.
There was a problem hiding this comment.
Now the token is refreshed per batch. In practice we want to generally avoid doing the refreshes upfront and just refresh as needed on the token acquisition portion. So for example if you look at the getManifest calls you will see we call refreshAcrCLIClientToken first on those. What you could do is modify that function to refresh the token you have if its necessary dynamically at that stage.
lizMSFT
left a comment
There was a problem hiding this comment.
You could follow https://github.com/Azure/acr-cli/tree/main?tab=readme-ov-file#installation to build the binary or check https://github.com/Azure/acr-cli/blob/main/scripts/experimental/README-purge-testing.md#1-test-purge-allsh-consolidated-test-suite and test the changes locally first.
estebanreyl
left a comment
There was a problem hiding this comment.
Looking good overall. I would recommend adding some test to the cli experimental tests as well. Just some minor things left to address
cmd/acr/purge.go
Outdated
|
|
||
| // Request a token that covers all repositories in this batch | ||
| // The token scope will include: repository:repo1:pull repository:repo1:delete ... for each repo | ||
| if err := acrClient.RefreshTokenForAbac(ctx, batch); err != nil { |
There was a problem hiding this comment.
Now the token is refreshed per batch. In practice we want to generally avoid doing the refreshes upfront and just refresh as needed on the token acquisition portion. So for example if you look at the getManifest calls you will see we call refreshAcrCLIClientToken first on those. What you could do is modify that function to refresh the token you have if its necessary dynamically at that stage.
cmd/acr/purge.go
Outdated
| // Process all repositories in this batch | ||
| for j, repoName := range batch { | ||
| // For ABAC registries, check if token expired and refresh for remaining repos in batch | ||
| if acrClient.IsAbac() && acrClient.IsTokenExpired() { |
There was a problem hiding this comment.
The token can fail at any moment, for example let's say there are a lot of images inside of a particular repository that need to be scanned. We could have the token expire mid iteration and thus that repo listing and exploration would fail. My suggestion would be to give the acquirer the ability to refresh itself, so for example I see that you are not storing the token anwyhere in this part of the code, instead its maintained inside of acrClient,
There was a problem hiding this comment.
I'm thinking about updating acrsdk.go to handle single repo token refresh:
// In acrsdk.go - modify refreshAcrCLIClientToken
func refreshAcrCLIClientToken(ctx context.Context, c *AcrCLIClient, repoName string) error {
var scope string
if c.isAbac && repoName != "" {
scope = fmt.Sprintf("repository:%s:pull,delete,metadata_read,metadata_write", repoName)
} else {
scope = "repository:*:*"
}
// ... refresh logic
}
This allows SDK methods like GetAcrTags() to auto-refresh with single repo scope when the token expires mid-operation. But we should still keep the batch-level token refresh logic in purge.go for efficiency. The batch refresh at the start of each batch reduces the number of token requests when processing multiple repositories. And to coordinate between these two, we could add a flag (e.g., NeedsBatchRefresh()) that gets set to true when a single repo refresh occurs. Then in purge.go, after processing each repository, we can check this flag and proactively refresh for the remaining repositories in the batch if needed. What do you think?
There was a problem hiding this comment.
Would this approach require 2 token refreshes (one for the current repoName, one for the rest of the batch)? Also, I think this would increase complexity for state management since we require another flag like NeedsBatchRefresh. I think the current implementation is less complex and accomplishes the same task. Does that make sense?
|
|
||
| // GetAcrTags list the tags of a repository with their attributes. | ||
| func (c *AcrCLIClient) GetAcrTags(ctx context.Context, repoName string, orderBy string, last string) (*acrapi.RepositoryTagsType, error) { | ||
| if c.isExpired() { |
There was a problem hiding this comment.
This block also needs to be updated. Otherwise, when GetAcrTags detects that the token has expired, it will obtain an RBAC access token instead. This also applies to DeleteAcrTag
There was a problem hiding this comment.
Additional potential ABAC expiry edge case:
refreshAcrCLIClientToken() seems to request wildcard scope (repository:*:*), and this expiry path seems to be called by SDK methods like GetAcrTags/DeleteAcrTag/GetAcrManifests.
Even with ABAC batch refresh in purge(), if expiry is hit during these SDK operations, it seems like this path could still fall back to wildcard refresh. On ABAC registries, wildcard refresh fallback attempts will likely fail.
Perhaps, you could make the SDK expiry refresh ABAC-aware (repo-scoped) as well? A targeted test in internal/api/acrsdk_test.go for this expiry path will likely help prevent regressions.
|
|
||
| // GetAcrTags list the tags of a repository with their attributes. | ||
| func (c *AcrCLIClient) GetAcrTags(ctx context.Context, repoName string, orderBy string, last string) (*acrapi.RepositoryTagsType, error) { | ||
| if c.isExpired() { |
There was a problem hiding this comment.
Additional potential ABAC expiry edge case:
refreshAcrCLIClientToken() seems to request wildcard scope (repository:*:*), and this expiry path seems to be called by SDK methods like GetAcrTags/DeleteAcrTag/GetAcrManifests.
Even with ABAC batch refresh in purge(), if expiry is hit during these SDK operations, it seems like this path could still fall back to wildcard refresh. On ABAC registries, wildcard refresh fallback attempts will likely fail.
Perhaps, you could make the SDK expiry refresh ABAC-aware (repo-scoped) as well? A targeted test in internal/api/acrsdk_test.go for this expiry path will likely help prevent regressions.
Purpose of the PR
Fixes #