-
Notifications
You must be signed in to change notification settings - Fork 31
[DO NOT MERGE] increase number of blocks on spyre cards #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yannick Schnider <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
Signed-off-by: Yannick Schnider <[email protected]>
|
@yannicks1 I posted this on our internal issue as well but we need to have some way to try to check the version of the spyre runtime stack so that we can set these values appropriately. We wouldn't want a newer version of vllm-spyre to set these expanded limits when its installed alongside an older version of the spyre runtime that doesn't support them. (This would go away if we actually had APIs to call to get this data which we originally thought we would, but here we are 🤷 ) |
|
yeah, this makes sense! did not mean to merge this as is, but rather to have a branch to test this. |
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
|
note: tkv x batchsize constraint will not be increased. number of blocks will be increased (for prefix caching) |
tests/models/test_granite.py
Outdated
| ) | ||
|
|
||
| assert granite_3_8b_config.cache_config.num_gpu_blocks_override == 2080 | ||
| assert granite_3_8b_config.cache_config.num_gpu_blocks_override == 8192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As travis mentioned, you could check the torch_sendnn version to see if it already has support 8K. The version can be found: torch_sendnn._version.__version__. But I think the code should use 8192 as the default and downgrade to 2080 if an old version of torch sendnn is found instead of the contrary because if someone is hacking on a local environment where they have editable installs directly from local git repos the version information might be wrong.
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
changes:
Note: do not merge yet until...