Skip to content

Qualcomm AI Engine Direct - [Multimodal] granite-speech-3.3-2b#18740

Open
DannyYuyang-quic wants to merge 1 commit intopytorch:mainfrom
CodeLinaro:dev1/danny/support_audio-language_models
Open

Qualcomm AI Engine Direct - [Multimodal] granite-speech-3.3-2b#18740
DannyYuyang-quic wants to merge 1 commit intopytorch:mainfrom
CodeLinaro:dev1/danny/support_audio-language_models

Conversation

@DannyYuyang-quic
Copy link
Copy Markdown
Contributor

Summary

  • Support granite-speech-3.3-2b
  • Extend Audio modality in QNNMultimodal AOT flow
  • Extend Audio modality in QNNMultimodal runner
  • Support encoder model sharding

Test plan

CI

python -m backends.qualcomm.tests.test_qnn_delegate TestExampleMultimodalityScript.test_static_asr --model_name granite_speech_3_3-2b build-android --executorch_root . -a . -m SM8750 -s ${SERIAL_NUM}

Script

python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m SM8750 --decoder_model granite_speech_3_3-2b --model_mode hybrid --prefill_ar_len 128 --max_seq_len 1024 --prompt "can you transcribe the speech into a written format?" --audio_path "https://huggingface.co/ibm-granite/granite-speech-3.3-2b/resolve/main/10226_10111_000000.wav?download=true"

Audio file: https://huggingface.co/ibm-granite/granite-speech-3.3-2b/resolve/main/10226_10111_000000.wav?download=true
Prompt: "can you transcribe the speech into a written format?"
Result

I 00:00:16.333997 executorch:multimodal_runner.cpp:542] RSS after finishing text generation: 614.941406 MiB (0 if unsupported)
I 00:00:16.334231 executorch:stats.h:161] 	Prompt Tokens: 212    Generated Tokens: 201
I 00:00:16.334356 executorch:stats.h:167] 	Model Load Time:		1.460000 (seconds)
I 00:00:16.334419 executorch:stats.h:177] 	Total inference time:		14.871000 (seconds)		 Rate: 	13.516240 (tokens/second)
I 00:00:16.334480 executorch:stats.h:185] 		Prompt evaluation:	0.798000 (seconds)		 Rate: 	265.664160 (tokens/second)
I 00:00:16.334541 executorch:stats.h:196] 		Generated 201 tokens:	14.073000 (seconds)		 Rate: 	14.282669 (tokens/second)
I 00:00:16.334629 executorch:stats.h:204] 	Time to first generated token:	0.798000 (seconds)
I 00:00:16.334688 executorch:stats.h:211] 	Sampling time over 413 tokens:	0.479000 (seconds)
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device

PyTorchObserver {"prefill_token_per_sec":265.664,"decode_token_per_sec":14.2827,"prompt_tokens":212,"generated_tokens":201,"model_load_start_ms":1744743525724,"model_load_end_ms":1744743527184,"inference_start_ms":1744743527186,"inference_end_ms":1744743542057,"prompt_eval_end_ms":1744743527984,"first_token_ms":1744743527984,"aggregate_sampling_time_ms":479,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
/data/local/tmp/yuyazhua/executorch/static_llm/outputs/outputs.txt: 1 file pulled. 0.9 MB/s (1170 bytes in 0.001s)
/data/local/tmp/yuyazhua/executorch/static_llm/outputs/inference_speed.txt: 1 file pulled. 0.0 MB/s (7 bytes in 0.002s)
[INFO 2026-04-08 00:22:11,849 llama.py:243] Device Inference Results[0]:
<|start_of_role|>system<|end_of_role|>You are Granite, developed by IBM. You are a helpful AI assistant.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>can you transcribe the speech into a written format?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>It appears you've provided a fragment of a sentence, possibly from a poem or text, and you're asking for a transcription or translation into written format. However, without the complete context or original text, it's challenging to accurately transcribe or translate it.

If we were to proceed with a hypothetical example, here's a possible continuation of the sentence in a written format:

"After his nap, Timothy leisurely stretched his foot, first one then the other, carefully selecting the choicest bits. Turning over the food, he methodically picked out the desired portions, meticulously choosing what was to be included in his meal."

This continuation assumes a narrative style, where Timothy is taking care of food preparation. The original sentence seems to be a playful or poetic exploration of a character's actions, possibly related to food preparation or a cooking process.<|end_of_text|>

cc: @abhinaykukkadapu, @cccclai, @haowhsu-quic

Summary:
 - Support granite-speech-3.3-2b
 - Extend Audio modality in QNNMultimodal AOT flow
 - Extend Audio modality in QNNMultimodal runner
 - Support encoder model sharding
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 7, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18740

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated Failure

As of commit b01dff2 with merge base fcccda3 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 7, 2026
@DannyYuyang-quic
Copy link
Copy Markdown
Contributor Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Apr 7, 2026
@DannyYuyang-quic DannyYuyang-quic changed the title Qualcomm AI Engine Direct - [Multimodal] granite-3.3-2b-instruct Qualcomm AI Engine Direct - [Multimodal] granite-speech-3.3-2b Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant