Drop cache_loaded_ flag; gate init on metadata.empty() by doggeral · Pull Request #20245 · pytorch/executorch

doggeral · 2026-06-12T20:43:12Z

Summary:
Follow-up to D106717093. The cache_loaded_ flag was used as the gate
in initialize_for_runtime to decide between the CACHE_LOADED
fast-path (reopen fd only) and the load-or-fresh-write slow path.
But it was only set true by load_packed_cache(), not by the
fresh-write path. After a fresh-write + save_packed_index closed
the fd, the next init found cache_loaded_=false → re-entered
load_packed_cache → push a second whole-file mmap region on top
of the per-entry regions from reserve_space. Same file mapped
twice; redundant VM and bookkeeping.

Replace the flag with !name_to_packed_data_metadata_.empty() as
the gate. Metadata is non-empty after BOTH paths (load and
fresh-write), so the gate is correct for both. Removes 1 field,
3 writes, and the latent double-mmap on the fresh-write-then-save
path. The from_load entry comment is updated to drop the
now-stale reference to cache_loaded_ auto-invalidation.

Differential Revision: D108178967

Summary: Add cross-load reuse + multi-PTE safety to the file-backed packed weight cache (D106673663). The first PTE in a session calls `save_packed_index()` to append a trailer; subsequent process launches mmap the file and pre-populate `name_to_packed_data_metadata_` so `look_up()` hits for every saved weight and `xnn_create_runtime` skips packing entirely. ## Cache file format ``` [packed data regions] (written by reserve_space) [index entries] (written by save_packed_index) each: name_len(4B) | name(N) | file_offset(8B) | data_size(8B) [footer: 20 bytes] index_start(8B) | entry_count(4B) | magic "XPWC"(4B) | version(4B) ``` ## Lifecycle invariants - `cache_loaded_` gate: `load_packed_cache()` runs at most once per process per path. Subsequent PTE inits for the same path reopen the write fd without re-reading the trailer. - `from_load` flag: persistent entries (loaded from trailer or promoted on save) skip `delete_packed_data` cleanup. This keeps the mmap region and metadata alive across PTE unload/reload, so the next init hits the cache instead of repacking. Without this, every PTE destroy/recreate cycle appended a fresh copy to the file (~450 MB per cycle). - No-op save short-circuit: `save_packed_index` returns early when no new `reserve_space` happened since the last save, avoiding the mtime churn that previously made the cache file look modified on every model load. ## Multi-PTE behavior - Multiple PTEs (or methods that don't share weights) in the same model load share one cache file. Each PTE's `reserve_space` extends the file; `finalize_for_runtime` msyncs only newly added regions; `save_packed_index` writes one trailer covering all PTEs at the end of the load. - Sibling PTEs that opt out of the mmap path (caller passes empty `packed_cache_path`) early-return from `initialize_for_runtime` and fall through to heap allocation, without touching the singleton's PLLM state. - Cross-model coexistence relies on caller-side discipline: only models that opt in set a non-empty cache path. Setting different non-empty paths concurrently is not supported by this singleton design. ## Caller change `XNNPACKBackend::init` always calls `set_packed_cache_path` (with empty string for non-opted-in PTEs). This keeps the singleton path in sync with the current PTE instead of inheriting a sibling's path. ## Test Plan ``` buck2 test fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache # 5 pass buck2 build fbsource//xplat/executorch/backends/xnnpack:xnnpack_backendApple buck2 build fbsource//xplat/executorch/backends/xnnpack:xnnpack_backend buck2 build fbcode//executorch/backends/xnnpack:xnnpack_backend ``` On device (iOS Stella build, PLLM + Llama3 runner): - Cold start: load `(1184 entries)` from cache, `reserve_mmap=0` for cached weights - Cache file size stable at ~593 MB across PLLM unload/reload cycles - `app_peak ~700 MB` (vs ~2.5 GB pre-fix) - `compressed ~100 MB` (vs ~1.7 GB pre-fix) Differential Revision: D106717093

Summary: XNNPACK exposes `xnn_weights_cache_look_up_key.seed` — a per-ukernel value that XNNPACK guarantees is consistent across runs of the same ukernel and changes whenever a ukernel implementation changes. Store this seed per cache entry so a stale cached packing produced by an old XNNPACK ukernel is rejected after upgrade, instead of being handed back to a newer ukernel that expects a different layout. Changes: - `PackedDataMeta` gains `uint32_t seed{0}`. - `look_up` rejects (returns `SIZE_MAX`) when a name hit has a stored seed that doesn't match `cache_key->seed`. This forces `look_up_or_insert` to re-pack with the current ukernel and avoids the slow `memcmp` path catching it later. - `look_up_or_insert` records `cache_key->seed` on insert. - On-disk index entry layout extended to `[name_len:u32][name][file_offset:u64][data_size:u64][seed:u32]` (was 16 bytes after the name, now 20). - `load_packed_cache` reads the per-entry seed and bumps the trailing bytes bound check accordingly. - `kCacheVersion` bumped 1 → 2 so existing v1 files (which carry no seed) are rejected at load instead of being loaded with `seed=0` and mismatching every fresh `look_up`. Cleanup of orphaned in-memory and on-disk entries left by an invalidated look-up is a follow-up — this diff only adds the detection. Differential Revision: D108082431

Summary: Follow-up to D106717093. The cache_loaded_ flag was used as the gate in initialize_for_runtime to decide between the CACHE_LOADED fast-path (reopen fd only) and the load-or-fresh-write slow path. But it was only set true by load_packed_cache(), not by the fresh-write path. After a fresh-write + save_packed_index closed the fd, the next init found cache_loaded_=false → re-entered load_packed_cache → push a second whole-file mmap region on top of the per-entry regions from reserve_space. Same file mapped twice; redundant VM and bookkeeping. Replace the flag with !name_to_packed_data_metadata_.empty() as the gate. Metadata is non-empty after BOTH paths (load and fresh-write), so the gate is correct for both. Removes 1 field, 3 writes, and the latent double-mmap on the fresh-write-then-save path. The from_load entry comment is updated to drop the now-stale reference to cache_loaded_ auto-invalidation. Differential Revision: D108178967

pytorch-bot · 2026-06-12T20:43:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20245

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm] MI350 CI jobs will have longer queue times due to CI migration

❌ 4 New Failures, 1 Pending

As of commit 7889e6b with merge base 0cbece4 ():

NEW FAILURES - The following jobs have failed:

Cadence Build & Test / hifi-build / hifi4 (gh)
Input required and not supplied: aws-region
Cadence Build & Test / vision-build / vision (gh)
Input required and not supplied: aws-region
Lint / lintrunner (gh)
>>> Lint for backends/xnnpack/test/runtime/test_xnn_weights_cache.cpp:
pull / test-qnn-models-linux (dl3) / linux-job (gh)
RuntimeError: Command docker exec -t 6dd52175ebf31f5cff42b3c07ad3254a3bd69fd5b029701a15fb3de728777ea0 /exec failed with exit code 92

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-06-12T20:43:44Z

@doggeral has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108178967.

github-actions · 2026-06-12T20:44:39Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

doggeral added 3 commits June 10, 2026 07:22

doggeral requested a review from digantdesai as a code owner June 12, 2026 20:43

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 12, 2026

doggeral had a problem deploying to cadence June 12, 2026 20:43 — with GitHub Actions Failure

meta-codesync Bot added the meta-exported label Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop cache_loaded_ flag; gate init on metadata.empty()#20245

Drop cache_loaded_ flag; gate init on metadata.empty()#20245
doggeral wants to merge 3 commits into
pytorch:mainfrom
doggeral:export-D108178967

doggeral commented Jun 12, 2026

Uh oh!

pytorch-bot Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

doggeral commented Jun 12, 2026

Uh oh!

pytorch-bot Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20245

❗ 1 Active SEVs

❌ 4 New Failures, 1 Pending

Uh oh!

meta-codesync Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented Jun 12, 2026 •

edited

Loading

This PR needs a `release notes:` label