Skip to content

Add Perfetto tracing and a test output directory option#2742

Merged
jstone-lucasfilm merged 43 commits intoAcademySoftwareFoundation:mainfrom
autodesk-forks:ppenenko/perfetto_and_output_dir
Feb 12, 2026
Merged

Add Perfetto tracing and a test output directory option#2742
jstone-lucasfilm merged 43 commits intoAcademySoftwareFoundation:mainfrom
autodesk-forks:ppenenko/perfetto_and_output_dir

Conversation

@ppenenko
Copy link
Copy Markdown
Contributor

@ppenenko ppenenko commented Jan 8, 2026

Summary

This PR adds a new MaterialXTrace module with optional Perfetto tracing infrastructure, plus a configurable test output directory, laying the groundwork for performance analysis and optimization work.

Background and motivation

MaterialX doesn't currently offer a tracing/instrumentation infrastructure sufficient for the following goals:

  • Profiling shader generation and optimization passes
  • Measuring the impact of proposed optimizations (like those in Shader Generator ND_mix_bsdf optimization #2499), and in particular per-material measurements
  • Correlating MaterialX performance with USD pipeline traces
  • Generating detailed flame graphs for performance analysis

In comparison, OpenUSD supports tracing in the Chrome Tracing format—the original format used by Chrome DevTools, viewable via chrome://tracing. It's JSON-based, and OpenUSD implements serialization without external dependencies.

Perfetto is the successor to Chrome Tracing which relies on a more optimal binary Protobuf-based serialization format. https://ui.perfetto.dev/ is capable of opening both Perfetto captures and legacy JSON Chrome Tracing captures. It has better performance than chrome://tracing, can open larger captures and supports more advanced analysis workflows, e.g. SQL queries.

Perfetto UI showing MaterialX trace

I've chosen Perfetto for the implementation in this PR mostly because of the format's superior performance. For context, using new scene index workflows with Hydra Storm, it's possible to generate multi-gigabyte Chrome Tracing captures that are too big to load in either profiler UI. The implementation relies on the Perfetto SDK.

At the same time, the design remains open for a potential integration of MaterialX tracing with USD tracing, by abstracting out the tracing implementation behind an interface. Such an integration would make it possible to analyze MaterialX shader gen performance in Hydra Storm holistically, in the same profiler UI session or script.

The Lobe Pruning optimization effort (#2680) is the immediate application for this infrastructure.

Features

1. Perfetto Tracing (MATERIALX_BUILD_TRACING)

  • Dedicated MaterialXTrace module – keeps tracing infrastructure separate from MaterialXCore
  • Zero overhead when disabled – entire module excluded from build when MATERIALX_BUILD_TRACING=OFF
  • Abstract sink interface (Tracing::Sink) – designed for future USD TraceCollector integration
  • Enum-based categories (Tracing::Category) for type safety and efficient dispatch:
    • Render – rendering operations
    • ShaderGen – shader generation
    • Optimize – optimization passes
    • Material – material identity markers
  • Template-based scope guard (Tracing::Scope<Category>) – zero stack overhead for category storage
  • RAII helpers (MX_TRACE_FUNCTION, MX_TRACE_SCOPE) and Dispatcher::ShutdownGuard
  • Instrumentation in shader generation tests (GenShaderUtil.cpp) and render tests (RenderGlsl.cpp) demonstrating per-material trace markers
  • Perfetto SDK fetched via CMake FetchContent (v43.0)
  • CI integration – extended builds (nightly/manual) enable tracing and upload .perfetto-trace artifacts

2. Test Output Directory (outputDirectory in _options.mtlx)

  • Redirects all test artifacts (logs, images, shaders, traces) to a user-specified directory
  • Prevents artifacts from different test runs overwriting each other
  • Prints effective output path at end of test run for easy IDE navigation

3. Runtime Tracing Toggle (enableTracing in _options.mtlx)

  • Enable/disable tracing without rebuilding (requires MATERIALX_BUILD_TRACING=ON at build time)
  • Default is false to avoid overhead when not profiling

Usage

Enable Tracing (Build Time)

cmake -DMATERIALX_BUILD_TRACING=ON ...

Enable Tracing (Runtime)

In resources/Materials/TestSuite/_options.mtlx:

<input name="enableTracing" type="boolean" value="true" />

Configure Output Directory

<input name="outputDirectory" type="string" value="/path/to/test/output" />

Analyze Traces

Open .perfetto-trace files at https://ui.perfetto.dev

For CI builds, download trace artifacts from the GitHub Actions run page (extended builds only).

Related


Assisted-by: Claude (Anthropic) via Cursor IDE
Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com>

- Add MATERIALX_BUILD_TRACING CMake option (OFF by default)
- Add abstract MxTraceBackend interface for pluggable tracing backends
- Add MxTraceCollector singleton (similar to USD's TraceCollector)
- Add MxTraceScope RAII helper and MX_TRACE_* macros
- Add MxPerfettoBackend implementation using Perfetto SDK
- Initialize/shutdown Perfetto in MaterialXTest Main.cpp
- Add trace instrumentation to RenderGlsl test

The abstract interface allows USD/Hydra to inject their own tracing
backend when calling MaterialX, enabling unified trace visualization.

Usage:
  cmake -DMATERIALX_BUILD_TRACING=ON ...
  # Run tests, then open materialx_test_trace.perfetto-trace at ui.perfetto.dev
- Add NOMINMAX and WIN32_LEAN_AND_MEAN to prevent min/max macro conflicts
- Add /bigobj flag for perfetto.cc (too many sections for MSVC default)
- Use BUILD_INTERFACE generator expression for include path
- Fix MxTracePerfetto.cpp: add missing <fstream> include and simplify
  Perfetto API usage (use fixed category with dynamic event names)
- Replace macro category definitions with constexpr namespace constants
  (MxTraceCategory::Render, ::ShaderGen, ::Optimize, ::Material)
- Keep legacy macros for backward compatibility
- Fix MX_TRACE_SCOPE macro to properly expand __LINE__ via helper macros
- Add MX_TRACE_CAT_MATERIAL category for material/shader identity markers
- Add material name trace scope in RenderGlsl for per-material tracking
- Add timestamp to trace output filename (format: YYYYMMDD_HHMMSS)
- Use MX_TRACE_FUNCTION macro for automatic function name extraction
The Catch2 Test Adapter for Visual Studio runs MaterialXTest.exe with
--list-test-names-only to discover tests. Perfetto initialization was
outputting to stderr, causing the adapter to fail with exit code 89.

Now we detect --list-* arguments and skip tracing initialization,
allowing VS Test Explorer to properly discover and debug tests.
Add optional outputDirectory setting to _options.mtlx that redirects all
test artifacts (logs, shaders, images, traces) to a user-specified directory.
This allows different test runs to be isolated without overwriting each other.

Changes:
- _options.mtlx: Add outputDirectory input (empty = default behavior)
- TestSuiteOptions: Add outputDirectory member and resolveOutputPath() helpers
- GenShaderUtil.cpp: Use outputDirectory for shader generation logs and dumps
- RenderUtil.cpp: Use outputDirectory for render logs, images, and traces
- Main.cpp: Remove tracing code (moved to RenderUtil.cpp)

Tracing improvements:
- Move Perfetto init/shutdown to ShaderRenderTester::validate()
- Each render test now produces its own trace file (e.g., genglsl_render_trace.perfetto-trace)
- Trace filenames now follow the same pattern as log files
- Removed timestamps from trace filenames for consistency

Usage:
  <input name="outputDirectory" type="string" value="C:/test_results/my_run" />
When outputDirectory is set, print the path at the end of test stdout.
This makes it clickable in terminals like Cursor/VS Code for quick access.
- Suppress MSVC warnings from Perfetto SDK templates (C4127, C4146, C4369)
- Replace MX_TRACE_CAT_* macros with 'namespace cat = mx::MxTraceCategory'
  for cleaner code and better IDE support
- Update RenderGlsl.cpp to use the new category alias pattern
@ppenenko ppenenko force-pushed the ppenenko/perfetto_and_output_dir branch from 36ae774 to fb056e3 Compare January 8, 2026 21:52
Rename files and classes to better align with MaterialX naming conventions
and improve API clarity:

Files:
- MxTrace.h/cpp -> Tracing.h/cpp
- MxTracePerfetto.h/cpp -> PerfettoSink.h/cpp

Classes (now in MaterialX::Tracing namespace):
- MxTraceBackend -> Tracing::Sink
- MxTraceCollector -> Tracing::Dispatcher
- MxTraceScope -> Tracing::Scope
- MxTraceCategory -> Tracing::Category
- MxPerfettoBackend -> Tracing::PerfettoSink

Rationale:
- 'Sink' is common terminology in logging/tracing for data destinations
- 'Dispatcher' better describes the routing behavior (vs 'Collector')
- Nested Tracing:: namespace groups related types cleanly
- File names match primary class names (PerfettoSink.h)
- Macros keep MX_ prefix for collision safety

Usage example:
  namespace trace = mx::Tracing;
  auto sink = trace::PerfettoSink::create();
  trace::Dispatcher::getInstance().setSink(sink);
  MX_TRACE_SCOPE(trace::Category::Render, "MyEvent");
- Explain amalgamated source compilation is Google's official recommended
  approach per https://perfetto.dev/docs/instrumentation/tracing-sdk
- Document ws2_32 dependency (Windows Sockets 2 for Perfetto IPC)
Dispatcher:
- Takes ownership of sink via unique_ptr (setSink)
- Explicit shutdownSink() destroys sink and writes output
- Asserts on double-set to catch programming errors

PerfettoSink:
- Constructor takes output path, starts session immediately
- Destructor writes trace file (true RAII)
- Uses std::call_once for thread-safe global Perfetto init
- Removed pimpl - simpler since only setup code includes this header

Scope:
- Removed _enabled caching - Dispatcher checks internally
- One less bool on the stack per trace scope

Usage:
- Added TracingGuard scope guard in RenderUtil.cpp for exception safety
- Added const to immutable members (_outputPath, _category)

The design now follows proper RAII patterns with clear ownership.
Move the scope guard pattern into the Dispatcher class for reusability.
Callers can now use mx::Tracing::Dispatcher::ShutdownGuard instead of
defining their own local struct.
- New 'enableTracing' boolean option (default: false)
- Parsed in TestSuiteOptions::readOptions()
- RenderUtil.cpp conditionally initializes PerfettoSink based on this option
- Uses std::optional<ShutdownGuard> for conditional scope guard
- Allows profiling to be enabled/disabled without rebuilding
- Replace namespace Category with constexpr strings -> enum class Category
- Template-based Scope<Category> avoids storing category on stack
- PerfettoSink uses switch statement (compiler optimizes to jump table)
- Removed categoryMatches() string comparison overhead
- Type-safe: can only pass valid Category enum values

The enum-based design is simpler, more efficient, and equally
compatible with a future USD TraceCollector sink (which uses
integer category IDs, not strings).
@ppenenko ppenenko marked this pull request as ready for review January 12, 2026 20:44
Comment thread source/MaterialXCore/PerfettoSink.cpp Outdated
@jstone-lucasfilm
Copy link
Copy Markdown
Member

@ppenenko I have a few additional questions, as I'd like to better understand the advantages of the approach that you're proposing.

Historically, we've used sampling profilers for performance analysis of the MaterialX project, and here's an example profile of the MaterialX Viewer using the built-in sampling profiler in Visual Studio:

MaterialXView_SamplingProfiler

Can you give a sense of the advantages provided by the instrumented profiling in your PR, in order to better understand the benefits that might be associated with this additional infrastructure in MaterialX?

And when profiling the results of shader generation in hardware shading languages (e.g. GLSL, MSL, Slang), I would naturally think of GPU profilers such as RenderDoc as the natural approach for recording and comparing performance data.

Is your proposed approach in this PR intended as a replacement for this form of traditional GPU profiling, or is it purely focused on CPU profiling? If it supports GPU profiling, what are its benefits over RenderDoc?

@ashwinbhat
Copy link
Copy Markdown
Contributor

ashwinbhat commented Jan 14, 2026

The current catch2 based benchmark test in MaterialX is very simple and having a more robust tracing system that integrates well in libraries/applications that MaterialX is used in will be valuable. For example when MaterialX is integrated into USD we can enable USDTrace and MaterialX tracing to get captures with more detailed view.

@jstone-lucasfilm would this PR be more acceptable if we create a new MaterialXTrace module that brings in Perfetto. So this would be similar to USD tracing but more simpler. Here is some additional info about USD Trace https://github.com/PixarAnimationStudios/OpenUSD/tree/dev/pxr/base/trace

@jstone-lucasfilm
Copy link
Copy Markdown
Member

@ashwinbhat For adding an instrumented trace concept to MaterialX, I agree a MaterialXTrace module would be better than adding new source code to MaterialXCore, and I'm open to this approach.

I'd still like to learn more, though, about the advantages offered by instrumented profilers such as Perfetto over the existing sampled profilers we've used previously in MaterialX performance evaluation, which don't require additional code to be inserted into applications at all. So far, I haven't encountered cases where the precision of sampled profilers was insufficient to fully understand the impact of a proposed optimization or codebase change, and I'd love to learn more about the cases that other teams have run into where sampling profilers were not up to the task.

Also, I'd like to better understand the relationship of Perfetto to GPU profiling, which is just as important (or perhaps more important) in evaluating the impact of a ShaderGen change on the real-time rendering performance of MaterialX content.

@ppenenko
Copy link
Copy Markdown
Contributor Author

@jstone-lucasfilm I agree that sampling profilers are of course useful tools in their own right, but instrumentation adds a different useful perspective. Some advantages:

  1. Most importantly, we can instrument with arbitrary strings that matter to us, e.g. material and shader names, not just C++ symbols.
  2. Cross-platform, while sampling profilers are typically not. Same trace format recorded on any platform.
  3. Sampling profilers can miss, under-represent or over-represent events due to temporal aliasing.

@ppenenko
Copy link
Copy Markdown
Contributor Author

Some benefits of Perfetto specifically:

  1. Captures can be analyzed in a web browser any platform, without the need to install any specialized tools. Thus, easy to share with anyone.
  2. Rich analysis functionality with SQL and a Python API.
  3. Potential for IPC backends.
  4. Overall, feature-rich and battle-tested by Chrome.

@ppenenko
Copy link
Copy Markdown
Contributor Author

ppenenko commented Jan 14, 2026

Good point about CPU vs. GPU traces @jstone-lucasfilm . AFAIK Perfetto doesn't offer facilities for GPU profiling out of the box, but the application itself can perform GPU perf queries and report the results via Perfetto. In fact, this is what I'm working on in another branch. At the same time, CPU and GPU events can be analyzed in the same UI in Perfetto, which is convenient. Besides, MaterialX has no CPU-side tracing system anyway, so integrating Perfetto will help many more CPU performance experiments in the future.

Of course, Renderdoc would offer a much more detailed picture of a GPU frame, but it's designed for profiling at a lower level. Its captures would contain GPU resources, which would have a much higher overhead (capture file size) than the per-material measurements I'm adding with Perfetto. So, I would suggest identifying the materials that are slow to render or benefit from a certain optimization by analyzing the existing render test across 180+ materials with Perfetto (I can already confirm that that is working well) and then, if necessary, profiling specific materials in Renderdoc, Nsight etc.

By the way, if we wanted to see material and shader names in Renderdoc captures, we'd have to instrument our code either way. So the instrumentation macros I'm proposing could have a Renderdoc implementation in the future.

@ppenenko
Copy link
Copy Markdown
Contributor Author

Great idea about MaterialXTrace @ashwinbhat - it does feel better encapsulated. It would be a very small library though. Anyway, I'll refactor this aspect.

@jstone-lucasfilm
Copy link
Copy Markdown
Member

@ppenenko @ashwinbhat Pivoting for a moment, let's focus on the "steelman" arguments for why a MaterialXTrace module providing instrumented build support might be a really good idea.

Would MaterialXTrace allow us to integrate an instrumented run of MaterialXTest into the GitHub CI for MaterialX, with the instrumented build support providing consistent profile results from run to run, independent of the actual time each run of MaterialXTest took on the GitHub CI virtual machine?

That would be a really compelling argument, if true, and it's something that a sampling profiler couldn't hope to provide, as the distribution of time samples are profoundly dependent on the performance of the virtual machines on which our GitHub CI workflows are run.

@ppenenko
Copy link
Copy Markdown
Contributor Author

ppenenko commented Jan 14, 2026

@jstone-lucasfilm Perfetto tracing is completely orthogonal to how a VM's configuration and load affect timings - it just records wall-clock times, so unfortunately it wouldn't improve on that. When testing on the same machine and avoiding heavy parallel loads (e.g. running Maya in parallel), the timings should be reliable. E.g. I can measure the expected speedups of @ld-kerley 's #2499 reliably.

But of course, we could record traces in CI/CD. At the very least, these would give us a kind of a rich log, with meaningful events stacked and grouped by thread. We can be sure, for example, that this or that shader graph optimization pass has been applied to the given material. And optimization passes #2499 should show similar relative improvements across different configurations.

A related use case would be if a user experiences performance issues with MaterialX - they can easily record a capture and share it with the maintainers. And, as I mentioned, it could one day be a USD trace including MaterialX events.

@jstone-lucasfilm
Copy link
Copy Markdown
Member

It does sound interesting to integrate MaterialXTrace into our GitHub CI in the future, even if it doesn't produce perfectly consistent results between runs.

It seems slightly surprising that you would want to use Perfetto to measure GPU timing improvements from shader generation optimizations, e.g. the hardware shader optimizations found in #2499.

Am I wrong to think that this category of optimization would normally be evaluated with either RenderDoc or a real-time, smoothed frame timer like the one we use in MaterialXView?

@ppenenko
Copy link
Copy Markdown
Contributor Author

ppenenko commented Jan 15, 2026

@jstone-lucasfilm could we please separate the concerns in this code review a bit?

First of all, can we all agree that a tracing mechanism would be an essential addition to MaterialX? I stated the motivations above, and the most immediate and straightforward one is to to measure how long a particular operation took for a particular MaterialX entity - material, node, shader - which is not identified uniquely by a C++ symbol, and therefore can't be captured by a sampling profiler. Let's set GPU timings aside for a moment, because there are a few types of CPU events that we care about in the context of shader graph optimizations: shader codegen timings and shader compilation timings. So even if we disagree on GPU profiling, I would argue that such a mechanism is a must-have. Chrome and USD corroborate this argument.

If we do agree on the above, let's discuss if Perfetto is the right implementation choice. I would argue that it is, due to its exceptional maturity and widespread adoption.

As for GPU instrumentation - my main argument is that if we have a tracing mechanism anyway, for all the other reasons, recording GPU timings there is simple and convenient. Taking the frame duration with OpenGL queries is about a dozen lines of C++, and then the result can be recorded via Perfetto.

As for Renderdoc - it's not well-suited for recording the durations of hundreds of frames because:

  1. Its number one use case is debugging and profiling a single frame.
  2. It records complete graphics API (e.g. OpenGL) call traces and GPU resources, for every frame - something we don't need for our kind of test and something that would amount to huge amounts of data across hundreds of frames.
  3. Its main use case if frame debugging, not frame profiling. For profiling, GPU hardware-specific tools (Nsight etc.) are more useful but they're even less cross-platform.
  4. GPU frame debuggers/profilers in general are known for compatibility issues across platforms, graphics APIs and application usage patterns.
  5. For measuring the duration of a frame or a draw call, Renderdoc would use the same graphics API calls that we can use - it doesn't have special hooks into the driver, unlike HW vendor tools. So our measurements can be as precise.

Now, regarding MaterialXView.

  1. It can only measure one material at a time.
  2. For some reason, the reported frame duration is truncated to integer milliseconds. E.g. 1.9f would be truncated to 1.
  3. It doesn't measure GPU durations. It also doesn't seem to do glFinish or glFlush, so even CPU frame durations are questionable. The only CPU/GPU synchronization it seems to have is spamming the video driver with frames until it starts blocking.
  4. It does do averaging, but we don't have to do it at run time. With a recorded trace, we can examine multiple events per material after the fact, either in a GUI or with Python. Perfetto UI can do all sorts of statistic analyses, and a Python script can bring the data into Pandas and Matplotlib. Having the complete data offline makes it possible to analyze distributions as well - e.g. when the first frame per material is much longer due to shader compilation.
  5. Running an interactive app and jotting down a single milliseconds value is less reproducible and shareable than running a test suite which records timings in nanosecond precision. The file that can be shared with others, the test suite can be run by others.

Use recursive glob build/**/*.perfetto-trace since traces are written
to current working directory (build/) not build/bin/
- Set up Perfetto sink in ShaderGeneratorTester::validate() when enableTracing=true
- Add MX_TRACE_SCOPE around generateCode() calls with element names
- Generates <target>_gen_trace.perfetto-trace files (e.g. genglsl_gen_trace.perfetto-trace)
- These tests run on all CI platforms, providing trace artifacts for download
Comment thread .github/workflows/main.yml
Comment thread source/MaterialXTrace/Tracing.h
@ppenenko
Copy link
Copy Markdown
Contributor Author

Refactoring done - MaterialXTrace is a separate module. GLSL render tests aren't enabled in CI/CD, so I added instrumentation in codegen tests. An extended build with trace artifacts can be found at https://github.com/autodesk-forks/MaterialX/actions/runs/21080475833/job/60632726674 A Windows step is still running, taking a while to install the MDL SDK, but otherwise it looks good.

@ppenenko
Copy link
Copy Markdown
Contributor Author

FYI @jstone-lucasfilm https://github.com/autodesk-forks/MaterialX/actions/runs/21080475833/ is green except for JavaScript which always fails on forks, and the traces are downloadable as artifacts.

ppenenko and others added 3 commits January 22, 2026 16:14
When outputDirectory is specified and shaderInterfaces=REDUCED, the render
tests were failing to create the parent material directory before attempting
to create the /reduced subdirectory, causing shader dumps to fail.

The fix ensures createDirectory() is called on the parent outputPath first,
then on outputPath/reduced if in REDUCED mode. Both calls are now within
the same ScopedTimer scope for consistent profiling.

Fixed in all four render test implementations:
- RenderGlsl.cpp
- RenderOsl.cpp
- RenderMsl.mm
- RenderSlang.cpp
Copy link
Copy Markdown
Member

@jstone-lucasfilm jstone-lucasfilm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on our last MaterialX TSC meeting, it's clear that we should move forward with this new work, so I've started writing up more detailed recommendations for improving it.

Comment thread source/MaterialXTest/MaterialXGenShader/GenShaderUtil.h
Comment thread source/MaterialXTest/MaterialXRenderGlsl/RenderGlsl.cpp Outdated
Comment thread CMakeLists.txt Outdated
Comment thread .github/workflows/main.yml Outdated
Address review: remove `using Cat` alias and spell out the full
mx::Tracing::Category names for clarity while the Tracing library
is still new.
Address review: use a Perfetto-specific flag name to leave a clear
path for future extensions to Tracy and other profiling backends.
Address review: instead of enabling tracing in all extended builds,
add a matrix.extended_build_perfetto flag and set it on one
non-Windows build to produce a single clear tracing artifact.
Only attempt to upload trace artifacts from the build that actually
has Perfetto enabled, avoiding no-op upload steps in other builds.
Comment thread .github/workflows/main.yml
Copy link
Copy Markdown
Member

@jstone-lucasfilm jstone-lucasfilm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, @ppenenko, and thanks for the contribution to MaterialX! Once we start seeing the new tracing artifacts in nightly builds, we can augment this with additional improvements.

@jstone-lucasfilm jstone-lucasfilm merged commit 0a1b924 into AcademySoftwareFoundation:main Feb 12, 2026
33 checks passed
@ppenenko ppenenko deleted the ppenenko/perfetto_and_output_dir branch February 12, 2026 22:16
ppenenko added a commit to ppenenko/MaterialX that referenced this pull request Feb 13, 2026
- Add GpuTimerQuery helper class and getCurrentTimeNs() for GPU timing
  via GL_TIME_ELAPSED queries (guarded by MATERIALX_BUILD_TRACING)
- Add multi-frame render loop using framesPerMaterial for statistical
  validity, emitting MX_TRACE_ASYNC events on the GPU track
- Replace stale Cat:: alias with mx::Tracing::Category:: (the alias
  was removed in PR AcademySoftwareFoundation#2742)
ppenenko added a commit to ppenenko/MaterialX that referenced this pull request Feb 13, 2026
The CMake flag was renamed from MATERIALX_BUILD_TRACING to
MATERIALX_BUILD_PERFETTO_TRACING in PR AcademySoftwareFoundation#2742. Update all #ifdef
guards in RenderGlsl.cpp to match.
ppenenko added a commit to autodesk-forks/MaterialX that referenced this pull request Feb 13, 2026
The previous commit accidentally set enableTracing to false, which
would prevent CI from producing Perfetto traces. Restore to true
to match the behavior established in PR AcademySoftwareFoundation#2742.
ppenenko added a commit to ppenenko/MaterialX that referenced this pull request Feb 19, 2026
The applyOptimizationFlags lambda in getGenerationOptions() was lost
when main (PR AcademySoftwareFoundation#2742) was merged into proto2 on Feb 13. This silently
left optEarlyPruning stuck at false regardless of _options.mtlx.
Also adds the missing optEarlyPruning XML input to _options.mtlx.

Assisted-by: Claude (Anthropic) via Cursor IDE
Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com>
ppenenko added a commit to autodesk-forks/MaterialX that referenced this pull request Mar 31, 2026
* add additional stage to optimize combination of 3 bsdfs that have been mixed with 2 nodes.

Add possible refactor of gltf_pbr that allows it to match the existing optimization rule

Add checkbox to MaterialXView to control bsdf mix optimization

add GenOptions option to control the optimization

remove trailing underscores from node names - because these still make variables with double underscores.

Fix asan crash.

remove debug print

remove double underscores from node names - appears to be illegal.

Refactor OpenPBR nodegraph optimization in to a ShaderGenerator optimization step for ND_mix_bsdf.

Replace the mix with an add and modify the upstream weight inputs accordingly to perform the same math as a mix.  This allows the BSDF nodes to early out if their contribution is not necessary in the mix.

* Disconnect outputs when removing a node

* Revert other attempted fixes

* Fix dangling node crash by iterating nodes by name

* Prototype of a GraphViz writer

* Fix a missing include

* Hook up GraphViz writing before and after the optimization

* Stub out optimization pass infrastructure

* Simplify pass enablement

* Specific types instead of `auto`

* Add gen option `optMaxPassIterations`

* Hook up the dot dumping option to the viewer GUI

* Add optimizeMixMixBsdf to PremultipliedAddPass for cascaded mix nodes

- Add optimizeMixMixBsdf() to handle pattern: mix(mix(A,B), C)
  where A, B, C are primitive BSDFs with weight inputs
- Refactor shared helpers into anonymous namespace:
  - redirectInput(), connectNodes(), replaceMixWithAdd()
  - PremultNodeDefs struct and getPremultNodeDefs()
- PremultipliedAddPass::run() now tries MixMix first, falls back to simple Mix
- Enables optimization for standard_surface and similar hierarchical materials

* Add Perfetto tracing infrastructure to MaterialXCore

- Add MATERIALX_BUILD_TRACING CMake option (OFF by default)
- Add abstract MxTraceBackend interface for pluggable tracing backends
- Add MxTraceCollector singleton (similar to USD's TraceCollector)
- Add MxTraceScope RAII helper and MX_TRACE_* macros
- Add MxPerfettoBackend implementation using Perfetto SDK
- Initialize/shutdown Perfetto in MaterialXTest Main.cpp
- Add trace instrumentation to RenderGlsl test

The abstract interface allows USD/Hydra to inject their own tracing
backend when calling MaterialX, enabling unified trace visualization.

Usage:
  cmake -DMATERIALX_BUILD_TRACING=ON ...
  # Run tests, then open materialx_test_trace.perfetto-trace at ui.perfetto.dev

* Exclude MxTrace.cpp when tracing disabled (zero overhead)

* Fix Perfetto SDK build on Windows

- Add NOMINMAX and WIN32_LEAN_AND_MEAN to prevent min/max macro conflicts
- Add /bigobj flag for perfetto.cc (too many sections for MSVC default)
- Use BUILD_INTERFACE generator expression for include path
- Fix MxTracePerfetto.cpp: add missing <fstream> include and simplify
  Perfetto API usage (use fixed category with dynamic event names)

* Improve tracing API: constexpr categories, material names, timestamps

- Replace macro category definitions with constexpr namespace constants
  (MxTraceCategory::Render, ::ShaderGen, ::Optimize, ::Material)
- Keep legacy macros for backward compatibility
- Fix MX_TRACE_SCOPE macro to properly expand __LINE__ via helper macros
- Add MX_TRACE_CAT_MATERIAL category for material/shader identity markers
- Add material name trace scope in RenderGlsl for per-material tracking
- Add timestamp to trace output filename (format: YYYYMMDD_HHMMSS)
- Use MX_TRACE_FUNCTION macro for automatic function name extraction

* Skip Perfetto tracing during test discovery mode

The Catch2 Test Adapter for Visual Studio runs MaterialXTest.exe with
--list-test-names-only to discover tests. Perfetto initialization was
outputting to stderr, causing the adapter to fail with exit code 89.

Now we detect --list-* arguments and skip tracing initialization,
allowing VS Test Explorer to properly discover and debug tests.

* Add outputDirectory option for test artifact redirection

Add optional outputDirectory setting to _options.mtlx that redirects all
test artifacts (logs, shaders, images, traces) to a user-specified directory.
This allows different test runs to be isolated without overwriting each other.

Changes:
- _options.mtlx: Add outputDirectory input (empty = default behavior)
- TestSuiteOptions: Add outputDirectory member and resolveOutputPath() helpers
- GenShaderUtil.cpp: Use outputDirectory for shader generation logs and dumps
- RenderUtil.cpp: Use outputDirectory for render logs, images, and traces
- Main.cpp: Remove tracing code (moved to RenderUtil.cpp)

Tracing improvements:
- Move Perfetto init/shutdown to ShaderRenderTester::validate()
- Each render test now produces its own trace file (e.g., genglsl_render_trace.perfetto-trace)
- Trace filenames now follow the same pattern as log files
- Removed timestamps from trace filenames for consistency

Usage:
  <input name="outputDirectory" type="string" value="C:/test_results/my_run" />

* Print output directory at end of test run for easy access

When outputDirectory is set, print the path at the end of test stdout.
This makes it clickable in terminals like Cursor/VS Code for quick access.

* Improve Perfetto tracing integration and suppress SDK warnings

- Suppress MSVC warnings from Perfetto SDK templates (C4127, C4146, C4369)
- Replace MX_TRACE_CAT_* macros with 'namespace cat = mx::MxTraceCategory'
  for cleaner code and better IDE support
- Update RenderGlsl.cpp to use the new category alias pattern

* Rename tracing classes to follow MaterialX conventions

Rename files and classes to better align with MaterialX naming conventions
and improve API clarity:

Files:
- MxTrace.h/cpp -> Tracing.h/cpp
- MxTracePerfetto.h/cpp -> PerfettoSink.h/cpp

Classes (now in MaterialX::Tracing namespace):
- MxTraceBackend -> Tracing::Sink
- MxTraceCollector -> Tracing::Dispatcher
- MxTraceScope -> Tracing::Scope
- MxTraceCategory -> Tracing::Category
- MxPerfettoBackend -> Tracing::PerfettoSink

Rationale:
- 'Sink' is common terminology in logging/tracing for data destinations
- 'Dispatcher' better describes the routing behavior (vs 'Collector')
- Nested Tracing:: namespace groups related types cleanly
- File names match primary class names (PerfettoSink.h)
- Macros keep MX_ prefix for collision safety

Usage example:
  namespace trace = mx::Tracing;
  auto sink = trace::PerfettoSink::create();
  trace::Dispatcher::getInstance().setSink(sink);
  MX_TRACE_SCOPE(trace::Category::Render, "MyEvent");

* Add clarifying comments for Perfetto SDK integration

- Explain amalgamated source compilation is Google's official recommended
  approach per https://perfetto.dev/docs/instrumentation/tracing-sdk
- Document ws2_32 dependency (Windows Sockets 2 for Perfetto IPC)

* Refactor tracing API for cleaner design

Dispatcher:
- Takes ownership of sink via unique_ptr (setSink)
- Explicit shutdownSink() destroys sink and writes output
- Asserts on double-set to catch programming errors

PerfettoSink:
- Constructor takes output path, starts session immediately
- Destructor writes trace file (true RAII)
- Uses std::call_once for thread-safe global Perfetto init
- Removed pimpl - simpler since only setup code includes this header

Scope:
- Removed _enabled caching - Dispatcher checks internally
- One less bool on the stack per trace scope

Usage:
- Added TracingGuard scope guard in RenderUtil.cpp for exception safety
- Added const to immutable members (_outputPath, _category)

The design now follows proper RAII patterns with clear ownership.

* Add Dispatcher::ShutdownGuard nested struct

Move the scope guard pattern into the Dispatcher class for reusability.
Callers can now use mx::Tracing::Dispatcher::ShutdownGuard instead of
defining their own local struct.

* Add enableTracing runtime option in _options.mtlx

- New 'enableTracing' boolean option (default: false)
- Parsed in TestSuiteOptions::readOptions()
- RenderUtil.cpp conditionally initializes PerfettoSink based on this option
- Uses std::optional<ShutdownGuard> for conditional scope guard
- Allows profiling to be enabled/disabled without rebuilding

* Refactor tracing categories to use enum class

- Replace namespace Category with constexpr strings -> enum class Category
- Template-based Scope<Category> avoids storing category on stack
- PerfettoSink uses switch statement (compiler optimizes to jump table)
- Removed categoryMatches() string comparison overhead
- Type-safe: can only pass valid Category enum values

The enum-based design is simpler, more efficient, and equally
compatible with a future USD TraceCollector sink (which uses
integer category IDs, not strings).

* Add category dispatch to PerfettoSink::counter()

* Remove unused resolveOutputPathWithSubdir method

* Add GPU timing support with async trace events

- Add AsyncTrack enum and asyncEvent() to Tracing::Sink interface
- Implement in PerfettoSink with dedicated GPU track (stable ID 1001)
- Track descriptor initialized thread-safely in std::call_once block
- Add GL_TIME_ELAPSED timer query wrapper in RenderGlsl.cpp
- Add glFinish() after render for accurate CPU trace scope timing
- Emit GPU render duration on separate 'GPU' track via MX_TRACE_ASYNC

The GPU track shows async slices with accurate GPU render duration,
aligned approximately with CPU submission time. This enables visual
comparison of CPU vs GPU work in Perfetto UI.

* Add framesPerMaterial option for multi-frame GPU timing

- Add framesPerMaterial to TestSuiteOptions (unsigned int, default 1)
- Parse from _options.mtlx with bounds checking
- Render loop in RenderGlsl.cpp iterates framesPerMaterial times
- Each frame emits a separate GPU trace event for Python analysis
- Enables warmup frame detection and statistical averaging offline

* Add optReplaceBsdfMixWithLinearCombination to test suite options

- Expose Lee's mix_bsdf optimization via _options.mtlx
- Parse and apply to GenOptions in getGenerationOptions()
- Allows benchmarking optimization impact without recompiling

* Move tracing to dedicated MaterialXTrace module

Addresses PR feedback to create a dedicated module for tracing rather
than adding it to MaterialXCore.

Changes:
- Create source/MaterialXTrace with Export.h, CMakeLists.txt
- Move Tracing.h/cpp and PerfettoSink.h/cpp to MaterialXTrace
- Update export macros from MX_CORE_API to MX_TRACE_API
- Update include paths in test code
- Link MaterialXTest against MaterialXTrace

The MaterialXTrace library depends on MaterialXCore and contains all
tracing infrastructure (Sink interface, Dispatcher singleton, PerfettoSink).
This keeps MaterialXCore minimal for users who don't need tracing.

* Only build MaterialXTrace when tracing is enabled

Skip add_subdirectory(MaterialXTrace) and test linking when
MATERIALX_BUILD_TRACING is OFF. This ensures zero overhead -
no extra library built or linked when tracing is disabled.

* Enable tracing by default and in nightly CI

- Set enableTracing default to true in _options.mtlx
- Enable MATERIALX_BUILD_TRACING=ON for nightly/extended builds in GHA

When MATERIALX_BUILD_TRACING is ON at build time and enableTracing is
true at runtime, trace files will be generated during test runs.

* Enable tracing by default and in nightly CI

- Set enableTracing default to true in _options.mtlx
- Enable MATERIALX_BUILD_TRACING=ON for nightly/extended builds in GHA

When MATERIALX_BUILD_TRACING is ON at build time and enableTracing is
true at runtime, trace files will be generated during test runs.

* Add Perfetto trace artifact upload for extended builds

Upload .perfetto-trace files from build/bin/ as artifacts when running
extended builds (nightly/workflow_dispatch). This allows retrieving
trace files for performance analysis.

* Fix an ambiguous comment

* Fix Linux build: link pthread for Perfetto SDK

Perfetto SDK requires pthread on Linux/Unix for thread-local storage
and synchronization. Add Threads::Threads dependency via CMake's
FindThreads module.

* Fix MaterialXTrace CMake: use MTLX_MODULES instead of LIBRARIES

The mx_add_library function expects MTLX_MODULES for dependencies,
not LIBRARIES.

* Suppress warnings-as-errors for Perfetto SDK on Unix

Perfetto SDK generates warnings that fail builds when
MATERIALX_WARNINGS_AS_ERRORS is ON. Add -Wno-error for
perfetto.cc on GCC/Clang.

* Fix tracing build on GCC/Clang and iOS

- Suppress warnings-as-errors for Perfetto SDK on Unix (GCC/Clang
  generate warnings that fail builds with MATERIALX_WARNINGS_AS_ERRORS)
- Skip pthread linking on iOS (threads are built-in to iOS SDK)

* Fix CMake: use COMPILE_FLAGS instead of COMPILE_OPTIONS for source files

COMPILE_OPTIONS is a target property. For set_source_files_properties,
use COMPILE_FLAGS instead. This fixes /bigobj on Windows and -Wno-error
on Unix.

* Fix Perfetto symbol visibility for shared library builds

- Define PERFETTO_COMPONENT_EXPORT for shared libs (required by Perfetto)
- Add -fvisibility=default for perfetto.cc on Unix (fixes GCC 10 monolithic build)

* Isolate Perfetto SDK from DLL boundaries

- Add createPerfettoSink() factory to Tracing.h (exported)
- PerfettoSink class in PerfettoSink.h not exported (internal header)
- Consumers include only Tracing.h and use factory function
- Perfetto types never cross DLL boundaries - only abstract Sink does
- Remove unnecessary PERFETTO_COMPONENT_EXPORT from CMakeLists.txt

* Fix Perfetto SDK compile flags for all build configurations

- Define MATERIALX_PERFETTO_COMPILE_DEFINITIONS and MATERIALX_PERFETTO_COMPILE_FLAGS
  as CACHE INTERNAL variables in root CMakeLists.txt
- Apply flags in both root (for monolithic builds) and MaterialXTrace (for non-monolithic)
- set_source_files_properties is directory-scoped, so must be called in both places
- Fixes -Werror=shadow on GCC (kDevNull shadowing)
- Fixes min/max macro conflicts on Windows (NOMINMAX)

* Fix Perfetto compile flags for monolithic builds

set_source_files_properties is directory-scoped, so:
- Root CMakeLists.txt: defines MATERIALX_PERFETTO_* variables
- source/CMakeLists.txt: applies flags for monolithic builds
- MaterialXTrace/CMakeLists.txt: applies flags for non-monolithic builds

* Exclude Perfetto SDK from coverage analysis

perfetto.cc is ~160k lines and generates gcov output that gcovr cannot parse.
Exclude it from coverage since it's third-party code anyway.

* Exclude Perfetto SDK from coverage, static analysis, and suppress MSVC warnings

- Add --exclude .*perfetto.* to gcovr (coverage) - perfetto.cc is ~160k
  lines and generates gcov output that gcovr cannot parse
- Add --suppress=*:*perfetto* to cppcheck (static analysis) - cppcheck
  doesn't define OS macros needed by Perfetto SDK's platform detection
- Use /W0 to disable all warnings for Perfetto SDK on MSVC (third-party code
  triggers C4146, C4369, C4996, C4459, C4065, C4244, C4267, C4293)

* Add /WX- to disable warnings-as-errors for Perfetto SDK on MSVC

/W0 alone doesn't override project-level /WX. Need /WX- explicitly to
prevent STL header warnings (from tuple, etc.) being promoted to errors
when compiling perfetto.cc.

* Fix Perfetto trace artifact upload path

Use recursive glob build/**/*.perfetto-trace since traces are written
to current working directory (build/) not build/bin/

* Add tracing instrumentation to shader generation tests

- Set up Perfetto sink in ShaderGeneratorTester::validate() when enableTracing=true
- Add MX_TRACE_SCOPE around generateCode() calls with element names
- Generates <target>_gen_trace.perfetto-trace files (e.g. genglsl_gen_trace.perfetto-trace)
- These tests run on all CI platforms, providing trace artifacts for download

* Add comparetraces.py for Perfetto trace analysis

* Polish comparetraces.py: optional deps, rename --output to --chart

* Fix chart layout: best improvements at top, speedup bars go right

* Polish comparetraces.py chart: delta labels, threshold display

- Filter by absolute delta threshold (--min-delta-ms) instead of baseline duration
- Show negative deltas for improvements (-XXms = time saved)
- Y-axis labels show both absolute delta and percentage
- Chart title displays the min delta filter threshold
- Duration labels to the right of each bar for readability

* Update comparetraces.py README with new options

* Add tracing for optimization passes

- Emit trace events when optimization passes run (wraps pass->run())
- Pass getName() returns GenOptions field name (e.g., optReplaceBsdfMixWithLinearCombination)
- Link MaterialXGenShader against MaterialXTrace when tracing enabled
- Trace macros expand to nothing when MATERIALX_BUILD_TRACING is off

* Emit optimization trace markers only when graph is modified

- Trace event fires only when pass->run() returns true (graph changed)
- Use isolated scope for near-zero duration marker event
- Avoids measuring dumpDot or other unrelated code

* Add --show-opt to highlight materials affected by optimization

- Query grandparent slice for material name (hierarchy: opt → GenerateShader → Material)
- Add baseline validation (error if baseline has optimization events)
- Update README with --show-opt and --min-delta-ms options

* Revert _options.mtlx to defaults after testing

* Add MixBsdfPruningPass for mix_bsdf constant factor pruning

- New optPruneMixBsdf option to enable mix factor pruning
- When mix=0, forwards background; when mix=1, forwards foreground
- Eliminates dead branches in shader graph
- Renamed from LobePruningPass to match option name
- Removed half-implemented BSDF weight pruning (future work)
- Use lambda for option flag replication in test harness

* Fix /reduced subdirectory creation when outputDirectory is set

When outputDirectory is specified and shaderInterfaces=REDUCED, the render
tests were failing to create the parent material directory before attempting
to create the /reduced subdirectory, causing shader dumps to fail.

The fix ensures createDirectory() is called on the parent outputPath first,
then on outputPath/reduced if in REDUCED mode. Both calls are now within
the same ScopedTimer scope for consistent profiling.

Fixed in all four render test implementations:
- RenderGlsl.cpp
- RenderOsl.cpp
- RenderMsl.mm
- RenderSlang.cpp

* Fix FILENAME inputs in SHADER_INTERFACE_REDUCED mode

In REDUCED mode, unconnected FILENAME inputs on internal nodes were
being inlined as literal path strings, causing GLSL compilation errors.
Real-time shading languages cannot handle file paths as literals.

The fix publishes FILENAME node inputs to graph input sockets even in
REDUCED mode, ensuring they become uniforms with proper variable names
instead of being inlined.

* Additional tracing markers in shader codegen

* Add trace markers for emit and graph traversal functions

Added MX_TRACE markers to key codegen functions for profiling:
- ShaderGraph: createConnectedNodes, addUpstreamDependencies, createNode
- ShaderGenerator: getImplementation, emitFunctionDefinitions, emitFunctionCalls
- CompoundNode: initialize, emitFunctionDefinition, emitFunctionCall

* Add NodeGraphTopology class for permutation analysis

Analyzes NodeGraphs to identify "topological" inputs that can gate
branches when constant (0 or 1). Used for permutation-aware caching
of CompoundNode implementations.

Key features:
- Detects mix, multiply, and PBR node inputs with uimin=0/uimax=1
- Caches analysis per NodeGraph definition (thread-safe)
- Computes permutation keys from constant input values
- Identifies nodes to skip for each permutation

* Hook up NodeGraphTopology for permutation-aware caching

- Add skip nodes methods to GenContext for early pruning
- Modify getImplementation() to compute permutation keys
- Use permutation-aware cache keys for NodeGraph implementations
- Add skip node check in createConnectedNodes (trace only for now)

The full skip logic (handling downstream connections) is TODO.
For now, this traces when nodes WOULD be skipped to verify
the topology analysis is working correctly.

* Refactor: rename classes and add getPermutationKey() virtual method

Naming improvements:
- NodeGraphTopology::Analysis -> NodeGraphTopology (the analysis result)
- NodeGraphTopology -> NodeGraphTopologyCache (the singleton cache)

Add virtual getPermutationKey() to ShaderNodeImpl:
- Returns _name by default (no permutation)
- CompoundNode overrides to return topology-based permutation key
- CompoundNode computes and stores _permutationKey during initialize()
- Hash now includes permutation key for better cache differentiation

* Virtualize permutation key computation in ShaderNodeImpl

Move permutation key logic from ShaderGenerator to ShaderNodeImpl:
- Add virtual computePermutationKey(element, context) method
- Default impl returns empty string (no permutation)
- CompoundNode overrides to use NodeGraphTopologyCache analysis
- ShaderGenerator now calls impl->computePermutationKey() polymorphically

This eliminates the NodeGraph-specific type check for permutation key
computation in getImplementation(), making the design more extensible.
The skipNodes computation still uses NodeGraphTopologyCache directly.

* Virtualize skipNodes computation in ShaderNodeImpl

Add virtual computeSkipNodes(element, context) method:
- Default impl returns empty StringSet
- CompoundNode overrides to use NodeGraphTopologyCache

ShaderGenerator now calls impl->computeSkipNodes() polymorphically,
eliminating the last NodeGraph-specific type check for optimization logic.

Remaining isA checks are only for impl creation dispatch (NodeGraph vs
Implementation), which is fundamental type instantiation.

* Add TODOs for instrumentation and API refactoring

- TODO: Add MX_TRACE instrumentation to analyze(), computePermutationKey(),
  and getNodesToSkip() to measure string operation overhead
- TODO: Move computePermutationKey() and getNodesToSkip() from
  NodeGraphTopologyCache to NodeGraphTopology class (better semantics)

* Implement TODOs: instrumentation and API refactoring

1. Add MX_TRACE instrumentation to:
   - NodeGraphTopologyCache::analyze()
   - NodeGraphTopology::computePermutationKey()
   - NodeGraphTopology::getNodesToSkip()

2. Move computePermutationKey() and getNodesToSkip() from
   NodeGraphTopologyCache to NodeGraphTopology class:
   - Methods now operate on 'this' topology data
   - Cleaner API: topology.computePermutationKey(node)
   - Cache only handles caching, topology handles queries

* Refactor: CompoundNode owns permutation via composition

- Add NodeGraphPermutation class as lightweight intermediate object
  representing call-site specific configuration (key + skipNodes)
- CompoundNode now owns permutation via unique_ptr passed at creation
- ShaderGenerator::getImplementation() computes permutation BEFORE
  creating ShaderNodeImpl, avoiding wasteful allocation on cache hits
- Remove virtual computePermutationKey/computeSkipNodes from ShaderNodeImpl
- Remove permutationKey from GenContext (no longer needed as parameter bag)
- Add permutation support to MDL compound nodes (CompoundNodeMdl,
  ClosureCompoundNodeMdl) via inheritance chain
- Clean up API: remove parameterless create() functions, require explicit
  permutation parameter (may be nullptr)
- Remove default parameter from virtual createShaderNodeImplForNodeGraph
  to avoid C++ gotcha with virtual default arguments

Architecture:
  NodeGraphTopology (shared, cached per NG definition)
         ↓
  NodeGraphPermutation (per call-site, computes key + skipNodes)
         ↓ (ownership transferred)
  CompoundNode (owns permutation, uses in initialize())
         ↓
  ShaderNodeImpl cache (keyed by baseName + permutation.key)

* Refactor: Move topology analysis into NodeGraphTopology constructor

Better encapsulation - NodeGraphTopology now analyzes itself in its
constructor rather than being populated externally by NodeGraphTopologyCache.
The cache now only manages storage/lookup, not analysis logic.

- Add NodeGraphTopology(const NodeGraph&) constructor with analysis
- Move isTopologicalInput, analyzeAffectedNodes, collectUpstreamNodes to NodeGraphTopology
- Change public members to private with getters
- Simplify NodeGraphTopologyCache to pure cache management
- Remove unused buildReverseConnectionMap method

* Add comment explaining thread-safety tradeoff in topology cache

Document why we release the mutex during topology construction and
why the potential race condition is safe and acceptable.

* Refactor: Use const references instead of copying shared_ptr

Change computePermutationKey and NodeGraphPermutation constructor to
accept const Node& instead of ConstNodePtr to avoid unnecessary
shared_ptr copy overhead.

* Refactor: Combine permutation key and skip nodes computation

Add NodeGraphTopology::createPermutation() factory method that computes
key and skip nodes in a single pass, avoiding redundant string parsing.
Make NodeGraphPermutation constructor private with NodeGraphTopology as
friend. Returns nullptr when no optimization is possible.

* Fix: Restore _options.mtlx to defaults (remove local paths)

* Removed unnecessary members

* Refactor: Pass permutation via function stack instead of GenContext

Remove _skipNodes from GenContext and pass NodeGraphPermutation* directly
through ShaderGraph::create, addUpstreamDependencies, and createConnectedNodes.
This fixes potential bugs with nested CompoundNodes overwriting each other's
skip nodes on the shared GenContext.

* Implement early pruning: skip node creation for pruned branches

When a node is in the skip set, return early from createConnectedNodes
without creating the ShaderNode. The downstream input remains unconnected
and uses its default value (e.g., transparent BSDF for closure types).

* Add optEarlyPruning GenOption to control early pruning feature

- Add optEarlyPruning to GenOptions (defaults to false)
- Wire up option in ShaderGenerator::getImplementation()
- Add test option parsing and propagation to GenOptions
- Early pruning is now opt-in for A/B testing

* Replace comparetraces.py with diff_test_results.py

Unified script that compares baseline vs optimized MaterialX test runs:
- Performance traces: loads Perfetto .perfetto-trace files, computes
  timing deltas, generates bar charts with matplotlib
- Rendered images: pixel-wise RMSE comparison with PIL/numpy

Merges trace comparison with image comparison functionality. Supports
filtering by --min-delta-ms threshold and generates combined reports.

Usage: python diff_test_results.py baseline/ optimized/ [--traces] [--images]

* Use NVIDIA FLIP for perceptual image comparison

Replace numpy RMSE with FLIP (pip install flip-evaluator), which
approximates human perception of differences when flipping between
images. FLIP score: 0 = identical, 1 = maximally different.

- Default threshold changed from 0.001 (RMSE) to 0.05 (FLIP)
- Add --ppd option for pixels-per-degree viewing distance
- Update README with new installation and options

* Add HTML report generation with side-by-side image comparison

New --report flag generates an HTML report containing:
- Summary cards (improvements/regressions/pass/fail counts)
- Performance chart (auto-generated if not specified)
- Side-by-side image grid: Baseline | Optimized | FLIP Heatmap
- Color-coded pass/fail status for each image

FLIP heatmaps are saved to a heatmaps/ subdirectory using the magma
colormap to visualize perceptual differences.

* Use reference counting for early pruning to handle shared dependencies

The previous implementation would incorrectly prune nodes that were
shared between multiple downstream consumers. When one consumer was
pruned (e.g., specular BSDF with weight=0), all upstream dependencies
were also pruned, even if they were still needed by other consumers
(e.g., metal BSDF).

This fix introduces reference counting: each node tracks how many
downstream nodes consume its output. A node is only pruned when its
reference count drops to zero. This correctly preserves shared nodes
like roughness and tangent calculations that feed multiple BSDFs.

* Expose envSampleCount option in MaterialXTest for IBL sampling control

Add configurable environment radiance sample count to test suite options,
allowing game-representative performance testing (1-16 samples) vs reference
quality (1024+ samples). Lower sample counts make shader complexity a larger
fraction of GPU time, enabling meaningful runtime performance comparisons.

* Fix early pruning to check inherited NodeDef default values

Use getActiveInput() instead of getInput() when checking NodeDef defaults
for topological inputs. This properly resolves inherited input values,
enabling pruning of lobes that default to 0 in parent NodeDefs.

Results: 10-18% improvement in shader generation time, 15-18% reduction
in generated shader code (298-342 lines per shader).

Also removes unused nodeGraph parameter from analyzeAffectedNodes().

* Instrument `GlslProgram::build`

* Instrument `DumpGeneratedCode`

* Extract image comparison into standalone diff_images.py script

Split the monolithic diff_test_results.py into focused scripts:
- New diff_images.py in python/MaterialXTest/diff_test_runs/ for
  FLIP-based image comparison with a simple CLI
- diff_test_results.py is now trace-only (removed image functions,
  FLIP dependency, and image-related CLI flags)

* Rename and move `diff_traces.py`

* Rename references in diff_traces.py after move from Scripts/

Update module docstring, logger name, HTML footer, and cross-references
to reflect the new filename and location.

* Refine diff_traces.py CLI: --slice, --gpu, and -o outputfile

Replace the generic --track and --report/--chart args with two explicit
comparison modes:

  --gpu            GPU render durations per material (hardcodes GPU track)
  --slice NAME ... CPU child-slice durations per material (e.g. GenerateShader)

Multiple --slice names produce multiple charts in one run.  HTML report
is always generated via -o/--outputfile (default: trace_diff.html),
matching the tests_to_html.py convention.  Chart PNGs are auto-derived
from the report filename.  No flags prints help and exits.

* Simplify diff_traces.py: require pandas, remove --csv

Make perfetto and pandas hard requirements instead of optional
dependencies with fallback paths.  This removes ~120 lines of
dual-path code (list-of-dicts vs DataFrame) throughout the script.

Also remove --csv export since the HTML report is always generated
and covers the same data.

* Show directory names and frame counts in diff_traces.py output

Use leaf directory names (e.g. Prism_MtlX_env16_baseline) instead of
generic "Baseline"/"Optimized" in chart legends, table headers, chart
titles, and the HTML report page title.

For GPU comparisons, display the number of frames averaged over in
the chart title (e.g. "averaged over 10 frames").

* Auto-derive report filename from directory names in diff_traces.py

Default -o output is now <baseline>_vs_<optimized>.html instead of
the static trace_diff.html.  Chart PNGs always include the event
name suffix (e.g. _GPU.png, _GenerateShader.png).

* Print absolute report path and auto-open in browser

After generating the HTML report, print its absolute path
prominently and open it in the default browser automatically.

* Show active filter options in chart subtitles and HTML report

Display min-delta-ms and show-opt as a subtitle line on charts
and in the HTML report header when those filters are active.

* Load trace files once and reuse TraceProcessor instances

Instead of creating a new TraceProcessor for each comparison,
load baseline and optimized traces once in main() and pass
the instances through. Cuts trace parsing from N*2 to just 2.

* Switch charts from PNG to inline SVG with searchable text

Save charts as SVG and embed them directly in the HTML report
so material names and values are selectable and searchable via
Ctrl+F. Set svg.fonttype='none' to emit real <text> elements.

* Extract shared reporting into _report.py and add diff_shaders.py

Factor chart generation, table formatting, and HTML report building
out of diff_traces.py into a reusable _report.py module. Add
diff_shaders.py for comparing dumped shader files (LOC, SPIR-V size)
between baseline and optimized test runs using the same reporting
infrastructure.

* Add --warmup-frames to discard initial GPU frames before averaging

Allows discarding a burn-in period of N initial frames per material
when comparing GPU render durations, reducing noise from driver
warm-up and transient system states. Frame ordering is preserved
via ORDER BY slice.ts. Warmup info appears in chart titles and
the filter subtitle.

* Fix SPIR-V compilation in diff_shaders.py for MaterialX GLSL

Use OpenGL semantics (-G) instead of Vulkan (-V), and
--auto-map-locations for automatic layout assignment, since
MaterialX generates OpenGL GLSL without explicit layout qualifiers.

* Add compilation time and spirv-opt time metrics to diff_shaders.py

Refactor SPIR-V compilation into a single-pass cache so binaries are
compiled once and reused for all downstream metrics (size, compile time,
spirv-opt size, spirv-opt time, AMD RGA VGPRs).

New metrics:
  - glslangValidator compile time (ms per shader)
  - spirv-opt optimisation time (ms per shader)
  - AMD RGA VGPR count (when rga is in PATH)

* Remove RGA support and fix bar color for zero-delta metrics

Drop AMD RGA VGPR analysis for simplicity -- spirv-opt already proves
that optimised binaries are identical, so VGPRs would be too.

Use green bars (not red) when delta is zero, since identical output
is a positive result, not a regression.

* Instrument `GlslProgram::build`

* Instrument `DumpGeneratedCode`

* Expose envSampleCount option in MaterialXTest for IBL sampling control

Add configurable environment radiance sample count to test suite options,
allowing game-representative performance testing (1-16 samples) vs reference
quality (1024+ samples). Lower sample counts make shader complexity a larger
fraction of GPU time, enabling meaningful runtime performance comparisons.

* Add diff_test_runs package for comparing MaterialX test outputs

Python package (MaterialXTest.diff_test_runs) with three comparison
scripts and shared reporting utilities:

  diff_images.py  -- perceptual image comparison via NVIDIA FLIP,
                     with HTML side-by-side reports
  diff_traces.py  -- Perfetto trace comparison with per-material
                     CPU slice and GPU render time analysis,
                     multiple --slice filters, --warmup-frames
                     burn-in, inline SVG charts
  diff_shaders.py -- offline shader analysis (LOC, SPIR-V size,
                     compile time, spirv-opt time); auto-discovers
                     Vulkan SDK tools in PATH
  _report.py      -- shared comparison tables, SVG chart generation,
                     and HTML report builder

* Add framesPerMaterial and envSampleCount to _options.mtlx

Declare the new test suite options so the C++ test runner can read
them from the options file.  Also reset enableTracing default to
false (opt-in for profiling runs).

* Additional tracing markers in shader codegen

* Add trace markers for emit and graph traversal functions

Added MX_TRACE markers to key codegen functions for profiling:
- ShaderGraph: createConnectedNodes, addUpstreamDependencies, createNode
- ShaderGenerator: getImplementation, emitFunctionDefinitions, emitFunctionCalls
- CompoundNode: initialize, emitFunctionDefinition, emitFunctionCall

* Revert stale README for python/Scripts

The old monolithic diff_test_results.py script was replaced by the
diff_test_runs package under python/MaterialXTest/. Revert README.md
to its upstream state.

* Remove stale merge conflict marker in RenderGlsl.cpp

A leftover <<<<<<< ours marker from the main merge was left behind.

* Restore missing envSampleCount option in _options.mtlx

The envSampleCount input was lost during the merge of main into
lobe_pruning_proto2. Add it back before the optimization options.

* Add GPU timing infrastructure and fix trace aliases in RenderGlsl.cpp

- Add GpuTimerQuery helper class and getCurrentTimeNs() for GPU timing
  via GL_TIME_ELAPSED queries (guarded by MATERIALX_BUILD_TRACING)
- Add multi-frame render loop using framesPerMaterial for statistical
  validity, emitting MX_TRACE_ASYNC events on the GPU track
- Replace stale Cat:: alias with mx::Tracing::Category:: (the alias
  was removed in PR AcademySoftwareFoundation#2742)

* Link MaterialXTrace to MaterialXGenShader when tracing is enabled

Shader codegen source files (ShaderGraph.cpp, ShaderGenerator.cpp, etc.)
include MaterialXTrace/Tracing.h and use MX_TRACE macros. When Perfetto
tracing is enabled, MaterialXGenShader needs to link against MaterialXTrace
for the trace event symbols.

* Use MATERIALX_BUILD_PERFETTO_TRACING in RenderGlsl.cpp

The CMake flag was renamed from MATERIALX_BUILD_TRACING to
MATERIALX_BUILD_PERFETTO_TRACING in PR AcademySoftwareFoundation#2742. Update all #ifdef
guards in RenderGlsl.cpp to match.

* Fix `MATERIALX_BUILD_PERFETTO_TRACING` renaming

* Fix `MATERIALX_BUILD_PERFETTO_TRACING` in an XML comment

* Add async event support for GPU timing in MaterialXTrace

Port AsyncTrack enum, Sink::asyncEvent(), and MX_TRACE_ASYNC macro
from the pre-merge proto2 branch. This enables GPU timer query results
to be emitted as events on a dedicated "GPU" track in Perfetto traces.

Use std::numeric_limits<uint64_t>::max() for the GPU track ID to
guarantee no collision with OS thread IDs.

* Fix enableTracing default to true for CI trace generation

The previous commit accidentally set enableTracing to false, which
would prevent CI from producing Perfetto traces. Restore to true
to match the behavior established in PR AcademySoftwareFoundation#2742.

* Make matplotlib a hard requirement for diff_test_runs reports

Without matplotlib, the HTML reports are generated with no charts,
making them essentially empty. Fail early with a clear install
message, consistent with how perfetto and pandas are handled.

* Use directory names instead of Baseline/Optimized in diff_images.py

Derive display labels from the actual directory names passed on the
command line, matching the convention already used by diff_traces.py.
Affects console output, HTML report title, and per-image labels.

Assisted-by: Claude (Anthropic) via Cursor IDE
Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com>

* Auto-open HTML report in diff_images.py

Use the shared openReport() from _report.py, matching the behavior
already present in diff_traces.py and diff_shaders.py.

Assisted-by: Claude (Anthropic) via Cursor IDE
Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com>

* Fix _report import in diff_images.py to use try/except pattern

Use the same relative/fallback import pattern as diff_traces.py and
diff_shaders.py so the script works both as a package and standalone.

Assisted-by: Claude (Anthropic) via Cursor IDE
Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com>

* Restore optEarlyPruning propagation lost in merge d3ecbbb

The applyOptimizationFlags lambda in getGenerationOptions() was lost
when main (PR AcademySoftwareFoundation#2742) was merged into proto2 on Feb 13. This silently
left optEarlyPruning stuck at false regardless of _options.mtlx.
Also adds the missing optEarlyPruning XML input to _options.mtlx.

Assisted-by: Claude (Anthropic) via Cursor IDE
Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com>

* Encapsulate skip-node lookup as NodeGraphPermutation::shouldSkip()

Replace direct getSkipNodes().count() access with a shouldSkip()
query method for better encapsulation. Remove the raw accessor.

* Simplify NodeGraphTopology and rename cache accessor

Throw ExceptionShaderGenError instead of silently returning when
NodeDef is missing -- a half-initialized topology is worse than
a clear error. Remove speculative fallback lookup.
Rename NodeGraphTopologyCache::analyze() to get().

* Remove unused methods and member from NodeGraphTopology

Drop getTopology(), clearCache(), hasOptimizations(), and
_nodeGraphName -- none are called outside their definitions.

* Clarify how the node instance is obtained in getImplementation

* Hide NodeGraphTopology behind NodeGraphTopologyCache

Move NodeGraphTopology class from the header into the .cpp as an
implementation detail.  The cache's public API is now a single
createPermutation(nodeGraph, node) that combines topology lookup
and permutation creation.  The cache remains an explicit singleton
so its lifetime can be controlled in the future.

* Clean up refactoring artifacts

* Minor cleanups in ShaderGenerator and ShaderGraph

Merge cachedImpl/impl into a single variable in getImplementation().
Use existing newNode local instead of redundant getNode() lookup.

* Simplify cache key handling in getImplementation

Use a single 'name' variable instead of separate baseName/cacheKey.
Append permutation suffix inside the NodeGraph branch.

* Nest permutation key append under parent node check

* More nitpicking

* Move NodeGraphTopologyCache from global singleton to GenContext

The topology cache now lives as a value member on GenContext, giving
clients explicit control over its lifetime and invalidation. This
mirrors how GenContext already manages the ShaderNodeImpl cache.

- Expose NodeGraphTopology and TopologicalInput in the header so the
  cache can be held by value without incomplete-type issues.
- Make NodeGraphTopologyCache copyable (copy = fresh empty cache) to
  preserve GenContext's existing deep-copy semantics.
- Add NodeGraphTopologyCache::clear(), called from
  clearNodeImplementations() so all existing invalidation sites
  automatically cover the topology cache.
- Mark GenContext constructor explicit.

* Share immutable NodeGraphTopology objects across cache copies

Use shared_ptr for cached topologies so that copying a
NodeGraphTopologyCache (via GenContext copy) shares the existing
analyses instead of discarding them. The topology objects are
immutable once constructed, so sharing is safe. Copy operations
properly lock the source mutex.

* Merge PBR node sets into kWeightedPbrNodes and use unordered_set

* Refactor per-node data into NodeInfo struct (AOS instead of SOA)

* Fix downstreamRefCount to increment per unique upstream

Previously, ref counts were incremented per-input, but death
propagation decremented per-unique-upstream (via the StringSet).
This mismatch could prevent pruning when a node had multiple
inputs connected to the same upstream.

* Clean up createPermutation: structured bindings, rename key, init-if

* Minor refactoring

* Extract pruneNode, removeConsumer, applyConstantInput lambdas

Collapse four duplicate value==0/value==1 blocks into composing
lambdas. The worklist DCE loop also reuses removeConsumer.

* Refine createPermutation: descriptive names, flatten control flow

Move applyConstantInput inside the loop to absorb value extraction
and flag assignment. Invert connected-node check and use init-if
for nodeDef/defaultInput lookups.

* Explicit lambda captures and extract applyConstantValue

* Rename skip -> prune, explicit unordered_set, formatting cleanup

* Complete skip -> prune rename across pruning code

* Arrays instead of paris of members

* Turn analyzeAffectedNodes into TopologicalInput constructor

* Rename nodesToPrune/maybeDead to prunableAtValue/potentiallyPrunableAtValue

* Revert gltf_pbr.mtlx to upstream main

This file was modified by Lee Kerley's SG rewriting optimization
to match a new BSDF mix pattern rule. That work is out of scope
for the early pruning PR.

* Revert main.yml to target branch

Restores the adsk_contrib/dev version of the CI workflow.
The diff was a main vs adsk_contrib/dev conflict unrelated to pruning.

* Remove diff_test_runs Python scripts

These comparison utilities (diff_images, diff_traces, diff_shaders)
are covered by a separate infrastructure PR (AcademySoftwareFoundation#2822).

* Remove GPU timing, trace markers, and test options from pruning PR

These changes belong to the separate infrastructure PRs:
- GPU async timing (AsyncTrack, GpuTimerQuery, MX_TRACE_ASYNC) -> AcademySoftwareFoundation#2824
- Trace markers in GlslShaderGenerator -> AcademySoftwareFoundation#2820
- envSampleCount / framesPerMaterial options -> AcademySoftwareFoundation#2821

Also restores the MATERIALX_CONTRIB option from adsk_contrib/dev.

* Remove SG rewriting pass infrastructure

Remove the ShaderGraph optimization pass manager, debug dumping,
and late-stage rewriting passes (PremultipliedAdd, MixBsdfPruning,
ConstantFolding pass, etc.). These are parked pending test coverage
and are out of scope for early pruning.

Restores ShaderGraph::optimize() to upstream main's implementation.

* Remove SG rewriting GenOptions and UI controls

Remove optReplaceBsdfMixWithLinearCombination, optPruneMixBsdf,
optDumpShaderGraphDot, and optMaxPassIterations from GenOptions.
Also remove framesPerMaterial and envSampleCount from TestSuiteOptions
(covered by PR AcademySoftwareFoundation#2821). Revert Viewer.cpp and RenderView.cpp UI
additions for these options.

Keeps only optEarlyPruning, which is the core of this PR.

* Restore MATERIALX_CONTRIB option to upstream

The removal was unrelated to early pruning.

* Remove tracing and envSampleCount from pruning PR

Remove trace markers from HwShaderGenerator and GlslProgram (PR AcademySoftwareFoundation#2820).
Revert RenderMsl.mm envSampleCount to upstream pattern (PR AcademySoftwareFoundation#2821).

* Remove MaterialXTrace dependency from pruning PR

Remove all MX_TRACE macros, #include <MaterialXTrace/Tracing.h>,
and the conditional CMake link from MaterialXGenShader. These trace
markers belong to the instrumentation PR (AcademySoftwareFoundation#2820), not early pruning.

* Revert SG rewriting artifacts from ShaderGraph

Remove changes from Lee's ShaderGraph rewriting work that leaked
into the pruning branch: removeNode(), getDocument(), public bypass/
createNode, finalize() FILENAME/REDUCED refactoring, bypass bounds-
check removal, and #include <cstdlib>.

* Revert enableTracing changes in _options.mtlx

Keep only the optEarlyPruning addition; the tracing comment and
default value changes belong to the infrastructure PRs.

* Rename optEarlyPruning to enableLobePruning

Align with MaterialX naming conventions (enable* prefix) and
the feature's final name: lobe pruning.

* Inline applyOptimizationFlags lambda in RenderUtil

Only one optimization flag (enableLobePruning) exists, so the
lambda indirection is unnecessary.

* Use std::make_shared in CompoundNode factory methods

Replace shared_ptr(new T(...)) with make_shared<T>(...). The
constructors were already public (default) before our changes,
so keep them public with the new permutation parameter.

* Move NodeGraphTopology.h include from CompoundNode header to source

Forward-declare NodeGraphPermutation in the header and add an
explicit destructor so unique_ptr can be destroyed where the
full type is visible.

* Remove unused MIX_BSDF classification bit

Leftover from the SG rewriting pass infrastructure; never read.

* Fix shadowed structured binding warning in NodeGraphTopology

Pass topoInput as a lambda parameter (renamed to topo) instead of
capturing it, because capturing structured bindings requires C++20
and MaterialX targets C++17.

* Fix testUniqueNames to use getName() for ShaderGraph node lookup

ShaderGraph::getNode() now takes a simple name, not a full path.
Update the test to use node1->getName() instead of getNamePath().

* Backward-compatible ShaderGraph::getNode() for name and namePath

Internal ShaderGraph methods now look up _nodeMap directly by simple
name. The public getNode() API also accepts a legacy full namePath
(e.g., "outerGraph/innerGraph/nodeName"): it validates the prefix
against the graph's own cached _namePath and strips it before lookup,
so existing callers are not silently broken.

* Revert unnecessary name-vs-namePath indexing change in ShaderGraph

The _nodeMap keying by name instead of uniqueId/namePath was not
required by the early pruning optimization. Revert to the original
namePath-based indexing to minimize the PR diff.

---------

Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com>
Co-authored-by: Lee Kerley <ld_kerley@icloud.com>
Co-authored-by: Lee Kerley <154285602+ld-kerley@users.noreply.github.com>
jstone-lucasfilm pushed a commit to jstone-lucasfilm/MaterialX that referenced this pull request Apr 2, 2026
…ademySoftwareFoundation#2820)

Building on the Perfetto tracing from AcademySoftwareFoundation#2742, this PR adds CPU trace markers to additional codegen and rendering hot paths. This also required conditionally linking `MaterialXTrace` to `MaterialXGenShader`, which was not needed by AcademySoftwareFoundation#2742 (tracing was only in test code) but is now required for the new codegen instrumentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants