Skip to content

feat: offload idle encoder from VRAM to RAM after configurable timeout#227

Merged
lstein merged 1 commit into
masterfrom
feat/encoder-idle-timeout
May 11, 2026
Merged

feat: offload idle encoder from VRAM to RAM after configurable timeout#227
lstein merged 1 commit into
masterfrom
feat/encoder-idle-timeout

Conversation

@lstein
Copy link
Copy Markdown
Owner

@lstein lstein commented May 9, 2026

Summary

  • The cached search encoder pinned multiple GB of VRAM indefinitely after the first query, blocking other GPU workloads from sharing the device while PhotoMap idled.
  • After encoder_idle_timeout_seconds (default 30, set to 0 to disable) of search inactivity, a background watcher moves the model from VRAM to system RAM. The next query auto-reloads it onto the GPU in well under a second — no Hub fetch, no tensor re-allocation.
  • New base-class ImageTextEncoder.offload() / _ensure_on_device() are guarded by a per-instance RLock so the watcher can't yank weights out from under an in-flight encode_* call.
  • Watcher is started/stopped via FastAPI lifespan, so CLI tools and tests aren't burdened with a background thread they don't need. Indexing path is unchanged — it already builds and closes its own short-lived encoder.

Test plan

  • pytest tests/backend — 207 passed (12 new tests covering offload/reload, watcher behavior, timeout=0 short-circuit, cache-timestamp refresh, config field default + YAML round-trip)
  • ruff check photomap tests — clean
  • Manual: run the server, perform a search, confirm via nvidia-smi that VRAM usage drops ~30s after the last query and is reclaimed by the next query

🤖 Generated with Claude Code

The cached search encoder pinned multiple GB of VRAM indefinitely, blocking
other GPU workloads from sharing the device while PhotoMap idled. Move the
model to system RAM after a period of inactivity so the next query can
reload it in well under a second instead of paying the Hub fetch / tensor
allocation cost of a fresh build.

- New ``encoder_idle_timeout_seconds`` Config field (default 30s, 0 disables)
- ``ImageTextEncoder.offload()`` / ``_ensure_on_device()`` with a per-instance
  RLock so the watcher can't yank weights mid-encode
- Daemon-thread watcher started/stopped via FastAPI lifespan; CLI tools and
  tests aren't burdened with it

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lstein lstein merged commit 9b7677c into master May 11, 2026
5 checks passed
@lstein lstein deleted the feat/encoder-idle-timeout branch May 11, 2026 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant