-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Serve][3/n] Add router queue latency #59233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: abrar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new metric, serve_queue_wait_time_ms, to measure the time requests spend in the router queue. The implementation adds logic to record this latency in RequestRouter and FIFOMixin, and includes a new test case to verify the metric's correctness.
My review focuses on improving code maintainability by addressing duplication and ensuring consistency in metric tagging. I've also identified a bug in the new test case where time units are being compared incorrectly.
Key feedback points:
- Refactor duplicated metric recording logic into a helper method.
- Ensure consistent metric tagging for the new histogram.
- Correct the assertion in the new test to use the correct time unit (milliseconds).
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment from cursor seems legit, else LGTM
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
fixes ray-project#59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]> Signed-off-by: kriyanshii <[email protected]>
fixes #59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]>
fixes ray-project#59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]>
fixes ray-project#59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]>
fixes #59218
testing with 100 users