-
Notifications
You must be signed in to change notification settings - Fork 206
Description
What would you like to be added:
We would like to request support for adding custom volumes and volumeMounts to the epp container in the inferencepool helm chart template.
Why is this needed:
Currently, in the llm-d-kv-cache-manager project, which precise prefix-cache aware routing relies on, two approaches for tokenizing requests have been proposed:
- Mount a Python-based sidecar to the EPP (External Processing Proxy). This sidecar can communicate with the EPP via UDS (Unix Domain Socket) to achieve more flexible and compatible tokenization. For more context, see: feat: Add UDS-based external tokenizer service llm-d/llm-d-kv-cache-manager#137.
- The EPP can search for cached tokenizer configurations in the local path to avoid downloading from Hugging Face every time it starts. For more context, see: Add support for local tokenizer files llm-d/llm-d-kv-cache-manager#142.
Both approaches introduce the need for adding custom volumes and volumeMounts for new EPP plugins, and showcase potential use cases for this requirement:
- An EPP plugin may use some mounted data to perform its tasks (e.g., pre-downloaded tokenizer weights).
- An EPP plugin can share an emptyDir volume with the EPP sidecar, enabling communication between them via UDS.
Additionally, there is an extra requirement: making the epp sidecar configmap in the current epp-config optional. This would enhance the versatility of EPP sidecar deployments, especially considering the emerging need to support different EPP plugin functionalities through various EPP sidecars.
cc @vMaroon