Skip to content

Adding CCC based autoscaler files#109

Open
pulasthi wants to merge 3 commits intoAI-Hypercomputer:tpu7x-autofrom
pulasthi:tpu7x-auto
Open

Adding CCC based autoscaler files#109
pulasthi wants to merge 3 commits intoAI-Hypercomputer:tpu7x-autofrom
pulasthi:tpu7x-auto

Conversation

@pulasthi
Copy link

@pulasthi pulasthi commented Feb 9, 2026

Adding scripts for automated microbenchmark runs with CCC

Signed-off-by: pulasthi <pulasthi@google.com>
@hylin2002 hylin2002 self-requested a review February 10, 2026 02:02
@linamy85 linamy85 self-requested a review February 10, 2026 08:05
@linamy85
Copy link
Collaborator

Thanks @pulasthi! One question: when running the automation_launch.sh, I can see a lot of warning as such

Warning: Key 'cloud.google.com/gke-tpu-topology' is not recommended with node selector; Consider using Custom Compute Classes mechanisms, simultaneous use of both may lead to unexpected behavior, use with caution.
Warning: Key 'cloud.google.com/gke-tpu-accelerator' is not recommended with node selector; Consider using Custom Compute Classes mechanisms, simultaneous use of both may lead to unexpected behavior, use with caution.

Should we remove them from yaml?

@junjieqian
Copy link
Collaborator

We also need to add a README and update https://github.com/AI-Hypercomputer/accelerator-microbenchmarks/blob/tpu7x-auto/Ironwood/guides/automation/README.md to include the ccc solution.
Thanks

@pulasthi
Copy link
Author

Thanks @pulasthi! One question: when running the automation_launch.sh, I can see a lot of warning as such

Warning: Key 'cloud.google.com/gke-tpu-topology' is not recommended with node selector; Consider using Custom Compute Classes mechanisms, simultaneous use of both may lead to unexpected behavior, use with caution.
Warning: Key 'cloud.google.com/gke-tpu-accelerator' is not recommended with node selector; Consider using Custom Compute Classes mechanisms, simultaneous use of both may lead to unexpected behavior, use with caution.

Should we remove them from yaml?

This is a limitation in CCC when its been used in conjunction with Kueue. When CCC is used without Kueue the only nodeselector tag that is needed is the one pointing to the CCC class

    cloud.google.com/compute-class: tpuv7-2x2x1-class

However using just this Kueue does not create the workload correctly and hence the normal node selectors of

    cloud.google.com/compute-class: tpuv7-2x2x1-class
    cloud.google.com/gke-tpu-accelerator: tpu7x

need to be added to get pass Kueue. However CCC logs warnings for these since this information is already present in the CCC template. I will fill a bug about this and followup with the CCC team. But for now we would need to keep these

@pulasthi
Copy link
Author

We also need to add a README and update https://github.com/AI-Hypercomputer/accelerator-microbenchmarks/blob/tpu7x-auto/Ironwood/guides/automation/README.md to include the ccc solution. Thanks

Hi @junjieqian added the updated readme file. Most of the content is same as the current readme file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments