Description Autoscaling & Capacity
Missing Metric
Prometheus Name (Proposed)
Description
Reason/Debugging Value
Target Replicas
ray_serve_deployment_target_replicas
The target number of replicas the autoscaler wants to reach
Critical for understanding autoscaling lag. "Why aren't we at target?" is unanswerable today.
Autoscaling Decision
ray_serve_autoscaling_decision_replicas
The raw decision from the autoscaling policy before bounds
Debug why autoscaler chose a certain number; identify policy misconfiguration
Total Requests (Autoscaler View)
ray_serve_autoscaling_total_requests
Total requests as seen by the autoscaler
Verify autoscaler's input matches expected load
Replica Autoscaling Metrics Delay
ray_serve_autoscaling_replica_metrics_delay_ms
Time taken for the replica metrics to be reported to controller
Verify busy controller
Handle Autoscaling Metrics Delay
ray_serve_autoscaling_handle_metrics_delay_ms
Time taken for the handle metrics to be reported to controller
Verify busy controller
Request Batching
Missing Metric
Prometheus Name (Proposed)
Description
Reason/Debugging Value
Batch Wait Time
ray_serve_batch_wait_time_ms
Time requests waited for batch to fill
Debug latency caused by waiting for batches
Batch Queue Length
ray_serve_batch_queue_length
Number of requests waiting in the batch queue
Identify batching bottleneck vs processing bottleneck
Batch Utilization
ray_serve_batch_utilization_percent
actual_batch_size / max_batch_size * 100
Tune max_batch_size parameter; low utilization = batch timeout too aggressive
Batches Processed
ray_serve_batches_processed_total
Counter of batches executed
Measure batching throughput separate from request throughput
Batch Execution Time
ray_serve_batch_execution_time_ms
Latency Breakdown
Missing Metric
Prometheus Name (Proposed)
Description
Reason/Debugging Value
Queue Wait Time
ray_serve_queue_wait_time_ms
Time request spent waiting in queue before assignment
Critical : Separate queueing delay from processing delay
Replica Health & Lifecycle
Missing Metric
Prometheus Name (Proposed)
Description
Reason/Debugging Value
Replica Startup Latency
ray_serve_replica_startup_latency_ms
Time from replica creation to ready state
Debug slow cold starts; model loading time
Replica Initialization Latency
serve_replica_initialization_latency_ms
Replica Reconfigure Latency
ray_serve_replica_reconfigure_latency_ms
Time for replica to complete reconfigure
Debug slow reconfiguration; model loading time
Health Check Latency
ray_serve_health_check_latency_ms
Duration of health check calls
Identify slow health checks blocking scaling
Health Check Failures
ray_serve_health_check_failures_total
Count of failed health checks
Early warning before replica marked unhealthy
Replica Shutdown Duration
ray_serve_replica_shutdown_duration_ms
Time from shutdown signal to replica fully stopped
Debug slow draining during scale-down or rolling updates
Proxy Health
Missing Metric
Prometheus Name (Proposed)
Description
Reason/Debugging Value
Proxy Healthy
ray_serve_proxy_healthy
Total number of healthy proxies in system. Tags: node_id, node_ip_address
Proxy availability
Proxy Draining State
ray_serve_proxy_draining
Whether proxy is draining (1=draining, 0=not). Tags: node_id, node_ip_address
Visibility during rolling updates
Routing Stats Delay
ray_serve_routing_stats_delay_ms
Time taken for the routing stats to get from replica to controller
Controller performance
State Timeline
Missing Metric
Prometheus Name (Proposed)
Description
Reason/Debugging Value
Deployment Status
ray_serve_deployment_status
Numeric status of deployment (0=DEPLOY_FAILED, 1=UNHEALTHY, 2=UPDATING, 3=UPSCALING, 4=DOWNSCALING, 5=HEALTHY). Tags: deployment, application
State Timeline visualization; deployment lifecycle debugging
Application Status
ray_serve_application_status
Numeric status of application (0=NOT_STARTED, 1=DEPLOYING, 2=DEPLOY_FAILED, 3=RUNNING, 4=UNHEALTHY, 5=DELETING). Tags: application
State Timeline visualization; application lifecycle debugging
Long Poll
Missing Metric
Prometheus Name (Proposed)
Description
Reason/Debugging Value
Long Poll Latency
ray_serve_long_poll_latency_ms
Time for updates to propagate from controller to clients
Debug slow config propagation; impacts autoscaling response time
Long Poll Pending Clients
ray_serve_long_poll_pending_clients
Number of clients waiting for updates per namespace
Identify backpressure in notification system
You can’t perform that action at this time.