feat: offload idle encoder from VRAM to RAM after configurable timeout by lstein · Pull Request #227 · lstein/PhotoMapAI

lstein · 2026-05-09T22:40:17Z

Summary

The cached search encoder pinned multiple GB of VRAM indefinitely after the first query, blocking other GPU workloads from sharing the device while PhotoMap idled.
After encoder_idle_timeout_seconds (default 30, set to 0 to disable) of search inactivity, a background watcher moves the model from VRAM to system RAM. The next query auto-reloads it onto the GPU in well under a second — no Hub fetch, no tensor re-allocation.
New base-class ImageTextEncoder.offload() / _ensure_on_device() are guarded by a per-instance RLock so the watcher can't yank weights out from under an in-flight encode_* call.
Watcher is started/stopped via FastAPI lifespan, so CLI tools and tests aren't burdened with a background thread they don't need. Indexing path is unchanged — it already builds and closes its own short-lived encoder.

Test plan

pytest tests/backend — 207 passed (12 new tests covering offload/reload, watcher behavior, timeout=0 short-circuit, cache-timestamp refresh, config field default + YAML round-trip)
ruff check photomap tests — clean
Manual: run the server, perform a search, confirm via nvidia-smi that VRAM usage drops ~30s after the last query and is reclaimed by the next query

🤖 Generated with Claude Code

The cached search encoder pinned multiple GB of VRAM indefinitely, blocking other GPU workloads from sharing the device while PhotoMap idled. Move the model to system RAM after a period of inactivity so the next query can reload it in well under a second instead of paying the Hub fetch / tensor allocation cost of a fresh build. - New ``encoder_idle_timeout_seconds`` Config field (default 30s, 0 disables) - ``ImageTextEncoder.offload()`` / ``_ensure_on_device()`` with a per-instance RLock so the watcher can't yank weights mid-encode - Daemon-thread watcher started/stopped via FastAPI lifespan; CLI tools and tests aren't burdened with it Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lstein merged commit 9b7677c into master May 11, 2026
5 checks passed

lstein deleted the feat/encoder-idle-timeout branch May 11, 2026 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: offload idle encoder from VRAM to RAM after configurable timeout#227

feat: offload idle encoder from VRAM to RAM after configurable timeout#227
lstein merged 1 commit into
masterfrom
feat/encoder-idle-timeout

lstein commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lstein commented May 9, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant