Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18734
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 3 Unrelated FailuresAs of commit df72cc1 with merge base fcccda3 ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
0ad610d to
c745a39
Compare
| --qlinear_encoder_group_size 32 \ | ||
| --qlinear 8da4w \ | ||
| --qlinear_group_size 32 \ | ||
| --vulkan_force_fp16 \ |
There was a problem hiding this comment.
We cant use --dtype flag?
There was a problem hiding this comment.
--dtype fp16: inputs and outputs are also cast to fp16. From caller's perspective, input/output is fp16.--vulkan_force_fp16: inputs and outputs are still fp32. Vulkan backend will automatically convert inputs to fp16 within the delegate and outputs to fp32. From caller's perspective, input/output is fp32.
--vulkan_force_fp16 is a bit simpler for client code since they don't have to handle the conversion to/from fp32, so I defaulted to that.
Another thing was that for the export_parakeet_tdt.py script, with --dtype fp16 I see there is a guard:
export_parakeet_tdt.py: error: fp16 is not yet supported
I wasn't sure if this was because the runner binary doesn't handle fp16 input/output yet, so I opted for --vulkan_force_fp16 instead.
Would you prefer enabling usage of the --dtype flag for fp16 inference?
There was a problem hiding this comment.
Also updated the text to clarify the special properties of the --vulkan_force_fp16 flag that wouldn't be covered by --dtype
Summary: Add Vulkan backend documentation to the Parakeet README covering export commands, quantization options, build instructions, and runner examples. Guard `quantized_ops_lib` and `custom_ops` link targets with `if(TARGET ...)` in CMakeLists.txt. These targets don't exist in Vulkan-only or XNNPACK-only builds, causing a hard CMake configure error from `target_link_options()`. This matches the existing pattern used for `optimized_native_cpu_ops_lib`. Validated on Samsung S24 (Adreno 750), 8da4w quantization, test_audio.wav (7.2s): | Metric | XNNPACK (686 MB) | Vulkan (781 MB) | Vulkan fp16 (550 MB) | |----------------|-------------------|-----------------|----------------------| | Inference | 0.56s | 0.46s | 0.32s | | Encoder speed | 188 tok/s | 275 tok/s | 360 tok/s | | Decoder speed | 657 tok/s | 373 tok/s | 746 tok/s | Authored by Claude (Anthropic)
Summary:
Add Vulkan backend documentation to the Parakeet README covering export
commands, quantization options, build instructions, and runner examples.
Guard
quantized_ops_libandcustom_opslink targets withif(TARGET ...)in CMakeLists.txt. These targets don't exist inVulkan-only or XNNPACK-only builds, causing a hard CMake configure
error from
target_link_options(). This matches the existing patternused for
optimized_native_cpu_ops_lib.Validated on Samsung S24 (Adreno 750), 8da4w quantization, test_audio.wav (7.2s):
Authored by Claude (Anthropic)