Skip to content

[executorch] Propagate device metadata from partitioner result onto TensorSpecs#18078

Open
Gasoonjia wants to merge 13 commits intogh/gasoonjia/135/basefrom
gh/gasoonjia/135/head
Open

[executorch] Propagate device metadata from partitioner result onto TensorSpecs#18078
Gasoonjia wants to merge 13 commits intogh/gasoonjia/135/basefrom
gh/gasoonjia/135/head

Conversation

@Gasoonjia
Copy link
Copy Markdown
Contributor

@Gasoonjia Gasoonjia commented Mar 10, 2026

Stack from ghstack (oldest at bottom):

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use compile_spec to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: D95842511

…ensorSpecs

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 10, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18078

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 4708adb with merge base b5ae0b9 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

if lowered_module is None:
continue

result = _get_target_device_from_compile_specs(lowered_module)
Copy link
Copy Markdown
Contributor

@digantdesai digantdesai Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effectively assumes that we know the device 'name' AoT. In theory, we can have a multi-device delegate then the runtime might interpret this name differently and that can cause some confusion i.e cuda:0 device on Metal.

I am not sure about using generic names like 'gpu' but also not sure about following PyTorch's eager/jit style naming convention where you won't switch devices underneath.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I have your suggestions on the executorch device name?

Currently we set up the device name AOT and intentionally decouple dour device attribute with pytorch/pytorch device concept; we created a enum in the etensor schema for all devices we are supporting right now. In this way we can support as much as device as we want.

For the situaton you mentioned, if other backend like vulken need its own gpu device, they should add a new one to the enum. We should avoid using generic names like 'gpu'.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-device graph serialization will necessitate multiple graphs. We can maybe make an exception for input tensors, but for any intermediate the runtime needs to know what the device its loading intermediates onto.

Device is fixed at export aot. If you want to have some generic shader style lib where the gpu type is decided lazily then you will have to use a generic key like gpu.

…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
Gasoonjia added a commit that referenced this pull request Mar 13, 2026
… onto TensorSpecs

Pull Request resolved: #18078

Annotate the delegate's input and output tensors as specific device type

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.
ghstack-source-id: 352045003
@exported-using-ghexport

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
Gasoonjia added a commit that referenced this pull request Apr 6, 2026
… onto TensorSpecs

Pull Request resolved: #18078

Annotate the delegate's input and output tensors as specific device type

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device with correct device index.
ghstack-source-id: 363318415
@exported-using-ghexport

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)
…sult onto TensorSpecs"

Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.

Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

[ghstack-poisoned]
return device_type, device_index


def _get_lowered_module(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this type of util really should be placed in a single spot. There are other things like this in the passes. Lets take it as a follow up to have claude just search for generic utils like this and centralize them

device_index: int = 0,
) -> None:
"""Set the device attribute on a TensorSpec."""
spec.device = device_type
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these fields already in the TensorSpec class definition? Are they initialized to just cpu and 0?

for node in graph_module.graph.nodes:
if node.op == "call_function" and node.target == executorch_call_delegate:
lowered_module = _get_lowered_module(graph_module, node)
if lowered_module is None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should throw here no?

continue

result = _get_target_device_from_compile_specs(lowered_module)
if result is None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it not return cpu by default


# Second pass: propagate device through getitem nodes that extract
# individual outputs from a delegate call.
for node in graph_module.graph.nodes:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just do 1 pass. You can look at users of the delegate node to find the getitem nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants