Implement direct serialization of .bpte:s#20217
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20217
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 9bf8f07 with merge base 6911e9f ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This patch is heavily inspired by pytorch#17333 Similarly to the old serialization of pte:s, bundled program serialization went through a program -> string json -> flatbuffer path, which is heavily inefficient. As in the PR above, generate python flatbuffer bindings from the fbs schema and use them to cut out the json step. Checked in flatbuffer bindings and fbs schema coherence is enforced by the validate_flatbuffer_gen job. e6586533af Testing: Tested by devtools/bundled_program/test/test_end2end.py, Additionally, did the following benchmark locally: /usr/bin/time -v -- aot_arm_compiler.py --delegate --bundleio -t vgf [args] Time metric: Elapsed (wall clock) time Memory metric: Maximum resident set size Model Baseline (s / MiB) With-patch (s / MiB) Speed-up Relative memory same.bpte bpte size (MiB) --------- ------------------- --------------------- -------- --------------- -------- --------- mv2 31.36 / 1349.7 22.37 / 845.5 1.40x 0.63x (-37.4%) yes 13.91 resnet50 128.57 / 10380.9 28.24 / 2003.4 4.55x 0.19x (-80.7%) yes 97.99 w2l 730.23 / 12102.9 15.85 / 2376.2 46.07x 0.20x (-80.4%) yes 131.45 deit_tiny 87.57 / 2669.5 70.23 / 1037.6 1.25x 0.39x (-61.1%) yes 22.88 Notice that the peak memory consumption is still a factor 20-50 x the final artifact size. Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: I98d24ad8e3d2537567e8f4b873d5aacae619eaf5
|
@claude review this PR. |
|
Claude finished @Erik-Lundell's task in 3m 59s —— View job PR Review: Direct serialization of
|
This patch is heavily inspired by
#17333
Similarly to the old serialization of pte:s,
bundled program serialization went through a
program -> string json -> flatbuffer path, which
is heavily inefficient. As in the PR above,
generate python flatbuffer bindings from the fbs schema and use them to cut out the json step.
Checked in flatbuffer bindings and fbs schema coherence is enforced by the validate_flatbuffer_gen job.
e6586533af
Testing:
Tested by devtools/bundled_program/test/test_end2end.py, Additionally, did the following benchmark locally:
/usr/bin/time -v -- aot_arm_compiler.py --delegate --bundleio -t vgf [args]
Time metric: Elapsed (wall clock) time
Memory metric: Maximum resident set size
cc @Gasoonjia @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani