Skip to content

How to handle mixed quantized/FP paths when discarding ops (softmax/rms_norm) in QnnQuantizer QAT? #16775

@ujma1234

Description

@ujma1234

I’m doing QNN quantization with QAT using QnnQuantizer for a custom model that I train and then lower to the ExecuTorch QNN backend.

During QAT, if I quantize the model “normally”, training does not proceed because the backward pass becomes NaN. The issue seems to be triggered by Softmax inside attention and RMSNorm (also similar behavior with layer_norm / group_norm).

quantizer = QnnQuantizer()
quantizer.set_default_quant_config(
    quant_dtype=QuantDtype.use_8a8w,
    is_qat=True,
    is_conv_per_channel=False,
    is_linear_per_channel=False,
)

With this setup, I see NaNs during backward when softmax / rms_norm are quantized.

To make training stable, I discarded softmax-related ops and normalization ops from the quantizer:

quantizer.add_discard_ops([
    torch.ops.aten.softmax.int,
    torch.ops.aten._softmax.default,
    torch.ops.aten._safe_softmax.default,
    torch.ops.aten.log_softmax.int,
    torch.ops.aten.scaled_dot_product_attention.default,

    torch.ops.aten.rms_norm.default,
    torch.ops.aten.layer_norm.default,
    torch.ops.aten.group_norm.default,

    torch.ops.aten.add.Tensor,
    torch.ops.aten.add_.Tensor,
    torch.ops.aten.mul.Tensor,
    torch.ops.aten.mul_.Tensor,
    torch.ops.aten.sub.Tensor,
    torch.ops.aten.div.Tensor,
])

With this discard list, QAT training works fine (no NaNs).

However, when I lower/partition to QNN, I get many failures with type mismatch / op validation errors:

...
ERROR] [Qnn ExecuTorch]:  <E> Failed to validate op aten_rms_norm_default_72 with error 0xc26

[WARNING] [Qnn ExecuTorch]: Qnn Backend op validation failed with error: 3110
[QNN Partitioner Op Support]: aten.rms_norm.default | False
[QNN Partitioner Op Support]: aten.slice_copy.Tensor | True
[QNN Partitioner Op Support]: aten.slice_copy.Tensor | True
[QNN Partitioner Op Support]: aten.add.Tensor | True
[QNN Partitioner Op Support]: aten.squeeze_copy.dims | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.convolution.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.unsqueeze_copy.default | True
[ERROR] [Qnn ExecuTorch]: Tensor 0 and 0 have mismatching datatypes. 0x232 != 0x408.

[ERROR] [Qnn ExecuTorch]: Tensor 0 and 0 have mismatching datatypes. 0x232 != 0x408.

[ERROR] [Qnn ExecuTorch]: Op specific validation failed.

[ERROR] [Qnn ExecuTorch]:  <E> validateNativeOps master op validator aten_mul_tensor_125:qti.aisw:ElementWiseMultiply failed 3110

[ERROR] [Qnn ExecuTorch]:  <E> QnnBackend_validateOpConfig failed 3110

[ERROR] [Qnn ExecuTorch]:  <E> Failed to validate op aten_mul_tensor_125 with error 0xc26

[WARNING] [Qnn ExecuTorch]: Qnn Backend op validation failed with error: 3110
[QNN Partitioner Op Support]: aten.mul.Tensor | False
[QNN Partitioner Op Support]: aten.squeeze_copy.dims | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.convolution.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.unsqueeze_copy.default | True
[ERROR] [Qnn ExecuTorch]: Tensor 0 and 0 have mismatching datatypes. 0x408 != 0x232.

[ERROR] [Qnn ExecuTorch]: Tensor 0 and 0 have mismatching datatypes. 0x408 != 0x232.

[ERROR] [Qnn ExecuTorch]: Op specific validation failed.

[ERROR] [Qnn ExecuTorch]:  <E> validateNativeOps master op validator aten_mul_tensor_124:qti.aisw:ElementWiseMultiply failed 3110

[ERROR] [Qnn ExecuTorch]:  <E> QnnBackend_validateOpConfig failed 3110

[ERROR] [Qnn ExecuTorch]:  <E> Failed to validate op aten_mul_tensor_124 with error 0xc26

[WARNING] [Qnn ExecuTorch]: Qnn Backend op validation failed with error: 3110
...

It looks like even though the quantizer “discarded” these ops, some branches/paths still receive quantized tensors as inputs, so later elementwise ops (mul/add/etc.) hit mixed dtype constraints and QNN validation fails.

Is there a recommended way to handle this situation?

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: qnnIssues related to Qualcomm's QNN delegate and code under backends/qualcomm/partner: qualcommFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions