Add Perfetto tracing and a test output directory option by ppenenko · Pull Request #2742 · AcademySoftwareFoundation/MaterialX

ppenenko · 2026-01-08T21:46:44Z

Summary

This PR adds a new MaterialXTrace module with optional Perfetto tracing infrastructure, plus a configurable test output directory, laying the groundwork for performance analysis and optimization work.

Background and motivation

MaterialX doesn't currently offer a tracing/instrumentation infrastructure sufficient for the following goals:

Profiling shader generation and optimization passes
Measuring the impact of proposed optimizations (like those in Shader Generator ND_mix_bsdf optimization #2499), and in particular per-material measurements
Correlating MaterialX performance with USD pipeline traces
Generating detailed flame graphs for performance analysis

In comparison, OpenUSD supports tracing in the Chrome Tracing format—the original format used by Chrome DevTools, viewable via chrome://tracing. It's JSON-based, and OpenUSD implements serialization without external dependencies.

Perfetto is the successor to Chrome Tracing which relies on a more optimal binary Protobuf-based serialization format. https://ui.perfetto.dev/ is capable of opening both Perfetto captures and legacy JSON Chrome Tracing captures. It has better performance than chrome://tracing, can open larger captures and supports more advanced analysis workflows, e.g. SQL queries.

I've chosen Perfetto for the implementation in this PR mostly because of the format's superior performance. For context, using new scene index workflows with Hydra Storm, it's possible to generate multi-gigabyte Chrome Tracing captures that are too big to load in either profiler UI. The implementation relies on the Perfetto SDK.

At the same time, the design remains open for a potential integration of MaterialX tracing with USD tracing, by abstracting out the tracing implementation behind an interface. Such an integration would make it possible to analyze MaterialX shader gen performance in Hydra Storm holistically, in the same profiler UI session or script.

The Lobe Pruning optimization effort (#2680) is the immediate application for this infrastructure.

Features

1. Perfetto Tracing (`MATERIALX_BUILD_TRACING`)

Dedicated MaterialXTrace module – keeps tracing infrastructure separate from MaterialXCore
Zero overhead when disabled – entire module excluded from build when MATERIALX_BUILD_TRACING=OFF
Abstract sink interface (Tracing::Sink) – designed for future USD TraceCollector integration
Enum-based categories (Tracing::Category) for type safety and efficient dispatch:
- Render – rendering operations
- ShaderGen – shader generation
- Optimize – optimization passes
- Material – material identity markers
Template-based scope guard (Tracing::Scope<Category>) – zero stack overhead for category storage
RAII helpers (MX_TRACE_FUNCTION, MX_TRACE_SCOPE) and Dispatcher::ShutdownGuard
Instrumentation in shader generation tests (GenShaderUtil.cpp) and render tests (RenderGlsl.cpp) demonstrating per-material trace markers
Perfetto SDK fetched via CMake FetchContent (v43.0)
CI integration – extended builds (nightly/manual) enable tracing and upload .perfetto-trace artifacts

2. Test Output Directory (`outputDirectory` in `_options.mtlx`)

Redirects all test artifacts (logs, images, shaders, traces) to a user-specified directory
Prevents artifacts from different test runs overwriting each other
Prints effective output path at end of test run for easy IDE navigation

3. Runtime Tracing Toggle (`enableTracing` in `_options.mtlx`)

Enable/disable tracing without rebuilding (requires MATERIALX_BUILD_TRACING=ON at build time)
Default is false to avoid overhead when not profiling

Usage

Enable Tracing (Build Time)

cmake -DMATERIALX_BUILD_TRACING=ON ...

Enable Tracing (Runtime)

In resources/Materials/TestSuite/_options.mtlx:

<input name="enableTracing" type="boolean" value="true" />

Configure Output Directory

<input name="outputDirectory" type="string" value="/path/to/test/output" />

Analyze Traces

Open .perfetto-trace files at https://ui.perfetto.dev

For CI builds, download trace artifacts from the GitHub Actions run page (extended builds only).

- Add MATERIALX_BUILD_TRACING CMake option (OFF by default) - Add abstract MxTraceBackend interface for pluggable tracing backends - Add MxTraceCollector singleton (similar to USD's TraceCollector) - Add MxTraceScope RAII helper and MX_TRACE_* macros - Add MxPerfettoBackend implementation using Perfetto SDK - Initialize/shutdown Perfetto in MaterialXTest Main.cpp - Add trace instrumentation to RenderGlsl test The abstract interface allows USD/Hydra to inject their own tracing backend when calling MaterialX, enabling unified trace visualization. Usage: cmake -DMATERIALX_BUILD_TRACING=ON ... # Run tests, then open materialx_test_trace.perfetto-trace at ui.perfetto.dev

- Add NOMINMAX and WIN32_LEAN_AND_MEAN to prevent min/max macro conflicts - Add /bigobj flag for perfetto.cc (too many sections for MSVC default) - Use BUILD_INTERFACE generator expression for include path - Fix MxTracePerfetto.cpp: add missing <fstream> include and simplify Perfetto API usage (use fixed category with dynamic event names)

- Replace macro category definitions with constexpr namespace constants (MxTraceCategory::Render, ::ShaderGen, ::Optimize, ::Material) - Keep legacy macros for backward compatibility - Fix MX_TRACE_SCOPE macro to properly expand __LINE__ via helper macros - Add MX_TRACE_CAT_MATERIAL category for material/shader identity markers - Add material name trace scope in RenderGlsl for per-material tracking - Add timestamp to trace output filename (format: YYYYMMDD_HHMMSS) - Use MX_TRACE_FUNCTION macro for automatic function name extraction

The Catch2 Test Adapter for Visual Studio runs MaterialXTest.exe with --list-test-names-only to discover tests. Perfetto initialization was outputting to stderr, causing the adapter to fail with exit code 89. Now we detect --list-* arguments and skip tracing initialization, allowing VS Test Explorer to properly discover and debug tests.

Add optional outputDirectory setting to _options.mtlx that redirects all test artifacts (logs, shaders, images, traces) to a user-specified directory. This allows different test runs to be isolated without overwriting each other. Changes: - _options.mtlx: Add outputDirectory input (empty = default behavior) - TestSuiteOptions: Add outputDirectory member and resolveOutputPath() helpers - GenShaderUtil.cpp: Use outputDirectory for shader generation logs and dumps - RenderUtil.cpp: Use outputDirectory for render logs, images, and traces - Main.cpp: Remove tracing code (moved to RenderUtil.cpp) Tracing improvements: - Move Perfetto init/shutdown to ShaderRenderTester::validate() - Each render test now produces its own trace file (e.g., genglsl_render_trace.perfetto-trace) - Trace filenames now follow the same pattern as log files - Removed timestamps from trace filenames for consistency Usage: <input name="outputDirectory" type="string" value="C:/test_results/my_run" />

When outputDirectory is set, print the path at the end of test stdout. This makes it clickable in terminals like Cursor/VS Code for quick access.

- Suppress MSVC warnings from Perfetto SDK templates (C4127, C4146, C4369) - Replace MX_TRACE_CAT_* macros with 'namespace cat = mx::MxTraceCategory' for cleaner code and better IDE support - Update RenderGlsl.cpp to use the new category alias pattern

Rename files and classes to better align with MaterialX naming conventions and improve API clarity: Files: - MxTrace.h/cpp -> Tracing.h/cpp - MxTracePerfetto.h/cpp -> PerfettoSink.h/cpp Classes (now in MaterialX::Tracing namespace): - MxTraceBackend -> Tracing::Sink - MxTraceCollector -> Tracing::Dispatcher - MxTraceScope -> Tracing::Scope - MxTraceCategory -> Tracing::Category - MxPerfettoBackend -> Tracing::PerfettoSink Rationale: - 'Sink' is common terminology in logging/tracing for data destinations - 'Dispatcher' better describes the routing behavior (vs 'Collector') - Nested Tracing:: namespace groups related types cleanly - File names match primary class names (PerfettoSink.h) - Macros keep MX_ prefix for collision safety Usage example: namespace trace = mx::Tracing; auto sink = trace::PerfettoSink::create(); trace::Dispatcher::getInstance().setSink(sink); MX_TRACE_SCOPE(trace::Category::Render, "MyEvent");

- Explain amalgamated source compilation is Google's official recommended approach per https://perfetto.dev/docs/instrumentation/tracing-sdk - Document ws2_32 dependency (Windows Sockets 2 for Perfetto IPC)

Dispatcher: - Takes ownership of sink via unique_ptr (setSink) - Explicit shutdownSink() destroys sink and writes output - Asserts on double-set to catch programming errors PerfettoSink: - Constructor takes output path, starts session immediately - Destructor writes trace file (true RAII) - Uses std::call_once for thread-safe global Perfetto init - Removed pimpl - simpler since only setup code includes this header Scope: - Removed _enabled caching - Dispatcher checks internally - One less bool on the stack per trace scope Usage: - Added TracingGuard scope guard in RenderUtil.cpp for exception safety - Added const to immutable members (_outputPath, _category) The design now follows proper RAII patterns with clear ownership.

Move the scope guard pattern into the Dispatcher class for reusability. Callers can now use mx::Tracing::Dispatcher::ShutdownGuard instead of defining their own local struct.

- New 'enableTracing' boolean option (default: false) - Parsed in TestSuiteOptions::readOptions() - RenderUtil.cpp conditionally initializes PerfettoSink based on this option - Uses std::optional<ShutdownGuard> for conditional scope guard - Allows profiling to be enabled/disabled without rebuilding

- Replace namespace Category with constexpr strings -> enum class Category - Template-based Scope<Category> avoids storing category on stack - PerfettoSink uses switch statement (compiler optimizes to jump table) - Removed categoryMatches() string comparison overhead - Type-safe: can only pass valid Category enum values The enum-based design is simpler, more efficient, and equally compatible with a future USD TraceCollector sink (which uses integer category IDs, not strings).

jstone-lucasfilm · 2026-01-13T21:44:16Z

@ppenenko I have a few additional questions, as I'd like to better understand the advantages of the approach that you're proposing.

Historically, we've used sampling profilers for performance analysis of the MaterialX project, and here's an example profile of the MaterialX Viewer using the built-in sampling profiler in Visual Studio:

Can you give a sense of the advantages provided by the instrumented profiling in your PR, in order to better understand the benefits that might be associated with this additional infrastructure in MaterialX?

And when profiling the results of shader generation in hardware shading languages (e.g. GLSL, MSL, Slang), I would naturally think of GPU profilers such as RenderDoc as the natural approach for recording and comparing performance data.

Is your proposed approach in this PR intended as a replacement for this form of traditional GPU profiling, or is it purely focused on CPU profiling? If it supports GPU profiling, what are its benefits over RenderDoc?

ashwinbhat · 2026-01-14T18:00:11Z

The current catch2 based benchmark test in MaterialX is very simple and having a more robust tracing system that integrates well in libraries/applications that MaterialX is used in will be valuable. For example when MaterialX is integrated into USD we can enable USDTrace and MaterialX tracing to get captures with more detailed view.

@jstone-lucasfilm would this PR be more acceptable if we create a new MaterialXTrace module that brings in Perfetto. So this would be similar to USD tracing but more simpler. Here is some additional info about USD Trace https://github.com/PixarAnimationStudios/OpenUSD/tree/dev/pxr/base/trace

jstone-lucasfilm · 2026-01-14T18:22:00Z

@ashwinbhat For adding an instrumented trace concept to MaterialX, I agree a MaterialXTrace module would be better than adding new source code to MaterialXCore, and I'm open to this approach.

I'd still like to learn more, though, about the advantages offered by instrumented profilers such as Perfetto over the existing sampled profilers we've used previously in MaterialX performance evaluation, which don't require additional code to be inserted into applications at all. So far, I haven't encountered cases where the precision of sampled profilers was insufficient to fully understand the impact of a proposed optimization or codebase change, and I'd love to learn more about the cases that other teams have run into where sampling profilers were not up to the task.

Also, I'd like to better understand the relationship of Perfetto to GPU profiling, which is just as important (or perhaps more important) in evaluating the impact of a ShaderGen change on the real-time rendering performance of MaterialX content.

ppenenko · 2026-01-14T18:50:40Z

@jstone-lucasfilm I agree that sampling profilers are of course useful tools in their own right, but instrumentation adds a different useful perspective. Some advantages:

Most importantly, we can instrument with arbitrary strings that matter to us, e.g. material and shader names, not just C++ symbols.
Cross-platform, while sampling profilers are typically not. Same trace format recorded on any platform.
Sampling profilers can miss, under-represent or over-represent events due to temporal aliasing.

ppenenko · 2026-01-14T18:53:06Z

Some benefits of Perfetto specifically:

Captures can be analyzed in a web browser any platform, without the need to install any specialized tools. Thus, easy to share with anyone.
Rich analysis functionality with SQL and a Python API.
Potential for IPC backends.
Overall, feature-rich and battle-tested by Chrome.

ppenenko · 2026-01-14T19:09:47Z

Good point about CPU vs. GPU traces @jstone-lucasfilm . AFAIK Perfetto doesn't offer facilities for GPU profiling out of the box, but the application itself can perform GPU perf queries and report the results via Perfetto. In fact, this is what I'm working on in another branch. At the same time, CPU and GPU events can be analyzed in the same UI in Perfetto, which is convenient. Besides, MaterialX has no CPU-side tracing system anyway, so integrating Perfetto will help many more CPU performance experiments in the future.

Of course, Renderdoc would offer a much more detailed picture of a GPU frame, but it's designed for profiling at a lower level. Its captures would contain GPU resources, which would have a much higher overhead (capture file size) than the per-material measurements I'm adding with Perfetto. So, I would suggest identifying the materials that are slow to render or benefit from a certain optimization by analyzing the existing render test across 180+ materials with Perfetto (I can already confirm that that is working well) and then, if necessary, profiling specific materials in Renderdoc, Nsight etc.

By the way, if we wanted to see material and shader names in Renderdoc captures, we'd have to instrument our code either way. So the instrumentation macros I'm proposing could have a Renderdoc implementation in the future.

ppenenko · 2026-01-14T21:02:17Z

Great idea about MaterialXTrace @ashwinbhat - it does feel better encapsulated. It would be a very small library though. Anyway, I'll refactor this aspect.

jstone-lucasfilm · 2026-01-14T21:41:44Z

@ppenenko @ashwinbhat Pivoting for a moment, let's focus on the "steelman" arguments for why a MaterialXTrace module providing instrumented build support might be a really good idea.

Would MaterialXTrace allow us to integrate an instrumented run of MaterialXTest into the GitHub CI for MaterialX, with the instrumented build support providing consistent profile results from run to run, independent of the actual time each run of MaterialXTest took on the GitHub CI virtual machine?

That would be a really compelling argument, if true, and it's something that a sampling profiler couldn't hope to provide, as the distribution of time samples are profoundly dependent on the performance of the virtual machines on which our GitHub CI workflows are run.

ppenenko · 2026-01-14T22:03:17Z

@jstone-lucasfilm Perfetto tracing is completely orthogonal to how a VM's configuration and load affect timings - it just records wall-clock times, so unfortunately it wouldn't improve on that. When testing on the same machine and avoiding heavy parallel loads (e.g. running Maya in parallel), the timings should be reliable. E.g. I can measure the expected speedups of @ld-kerley 's #2499 reliably.

But of course, we could record traces in CI/CD. At the very least, these would give us a kind of a rich log, with meaningful events stacked and grouped by thread. We can be sure, for example, that this or that shader graph optimization pass has been applied to the given material. And optimization passes #2499 should show similar relative improvements across different configurations.

A related use case would be if a user experiences performance issues with MaterialX - they can easily record a capture and share it with the maintainers. And, as I mentioned, it could one day be a USD trace including MaterialX events.

jstone-lucasfilm · 2026-01-14T23:19:59Z

It does sound interesting to integrate MaterialXTrace into our GitHub CI in the future, even if it doesn't produce perfectly consistent results between runs.

It seems slightly surprising that you would want to use Perfetto to measure GPU timing improvements from shader generation optimizations, e.g. the hardware shader optimizations found in #2499.

Am I wrong to think that this category of optimization would normally be evaluated with either RenderDoc or a real-time, smoothed frame timer like the one we use in MaterialXView?

ppenenko · 2026-01-15T15:32:42Z

@jstone-lucasfilm could we please separate the concerns in this code review a bit?

First of all, can we all agree that a tracing mechanism would be an essential addition to MaterialX? I stated the motivations above, and the most immediate and straightforward one is to to measure how long a particular operation took for a particular MaterialX entity - material, node, shader - which is not identified uniquely by a C++ symbol, and therefore can't be captured by a sampling profiler. Let's set GPU timings aside for a moment, because there are a few types of CPU events that we care about in the context of shader graph optimizations: shader codegen timings and shader compilation timings. So even if we disagree on GPU profiling, I would argue that such a mechanism is a must-have. Chrome and USD corroborate this argument.

If we do agree on the above, let's discuss if Perfetto is the right implementation choice. I would argue that it is, due to its exceptional maturity and widespread adoption.

As for GPU instrumentation - my main argument is that if we have a tracing mechanism anyway, for all the other reasons, recording GPU timings there is simple and convenient. Taking the frame duration with OpenGL queries is about a dozen lines of C++, and then the result can be recorded via Perfetto.

As for Renderdoc - it's not well-suited for recording the durations of hundreds of frames because:

Its number one use case is debugging and profiling a single frame.
It records complete graphics API (e.g. OpenGL) call traces and GPU resources, for every frame - something we don't need for our kind of test and something that would amount to huge amounts of data across hundreds of frames.
Its main use case if frame debugging, not frame profiling. For profiling, GPU hardware-specific tools (Nsight etc.) are more useful but they're even less cross-platform.
GPU frame debuggers/profilers in general are known for compatibility issues across platforms, graphics APIs and application usage patterns.
For measuring the duration of a frame or a draw call, Renderdoc would use the same graphics API calls that we can use - it doesn't have special hooks into the driver, unlike HW vendor tools. So our measurements can be as precise.

Now, regarding MaterialXView.

It can only measure one material at a time.
For some reason, the reported frame duration is truncated to integer milliseconds. E.g. 1.9f would be truncated to 1.
It doesn't measure GPU durations. It also doesn't seem to do glFinish or glFlush, so even CPU frame durations are questionable. The only CPU/GPU synchronization it seems to have is spamming the video driver with frames until it starts blocking.
It does do averaging, but we don't have to do it at run time. With a recorded trace, we can examine multiple events per material after the fact, either in a GUI or with Python. Perfetto UI can do all sorts of statistic analyses, and a Python script can bring the data into Pandas and Matplotlib. Having the complete data offline makes it possible to analyze distributions as well - e.g. when the first frame per material is much longer due to shader compilation.
Running an interactive app and jotting down a single milliseconds value is less reproducible and shareable than running a test suite which records timings in nanosecond precision. The file that can be shared with others, the test suite can be run by others.

Use recursive glob build/**/*.perfetto-trace since traces are written to current working directory (build/) not build/bin/

- Set up Perfetto sink in ShaderGeneratorTester::validate() when enableTracing=true - Add MX_TRACE_SCOPE around generateCode() calls with element names - Generates <target>_gen_trace.perfetto-trace files (e.g. genglsl_gen_trace.perfetto-trace) - These tests run on all CI platforms, providing trace artifacts for download

ppenenko · 2026-01-16T21:48:02Z

Refactoring done - MaterialXTrace is a separate module. GLSL render tests aren't enabled in CI/CD, so I added instrumentation in codegen tests. An extended build with trace artifacts can be found at https://github.com/autodesk-forks/MaterialX/actions/runs/21080475833/job/60632726674 A Windows step is still running, taking a while to install the MDL SDK, but otherwise it looks good.

ppenenko · 2026-01-16T22:33:30Z

FYI @jstone-lucasfilm https://github.com/autodesk-forks/MaterialX/actions/runs/21080475833/ is green except for JavaScript which always fails on forks, and the traces are downloadable as artifacts.

When outputDirectory is specified and shaderInterfaces=REDUCED, the render tests were failing to create the parent material directory before attempting to create the /reduced subdirectory, causing shader dumps to fail. The fix ensures createDirectory() is called on the parent outputPath first, then on outputPath/reduced if in REDUCED mode. Both calls are now within the same ScopedTimer scope for consistent profiling. Fixed in all four render test implementations: - RenderGlsl.cpp - RenderOsl.cpp - RenderMsl.mm - RenderSlang.cpp

jstone-lucasfilm

Following up on our last MaterialX TSC meeting, it's clear that we should move forward with this new work, so I've started writing up more detailed recommendations for improving it.

Address review: remove `using Cat` alias and spell out the full mx::Tracing::Category names for clarity while the Tracing library is still new.

Address review: use a Perfetto-specific flag name to leave a clear path for future extensions to Tracy and other profiling backends.

Address review: instead of enabling tracing in all extended builds, add a matrix.extended_build_perfetto flag and set it on one non-Windows build to produce a single clear tracing artifact.

Only attempt to upload trace artifacts from the build that actually has Perfetto enabled, avoiding no-op upload steps in other builds.

jstone-lucasfilm

This looks great to me, @ppenenko, and thanks for the contribution to MaterialX! Once we start seeing the new tracing artifacts in nightly builds, we can augment this with additional improvements.

- Add GpuTimerQuery helper class and getCurrentTimeNs() for GPU timing via GL_TIME_ELAPSED queries (guarded by MATERIALX_BUILD_TRACING) - Add multi-frame render loop using framesPerMaterial for statistical validity, emitting MX_TRACE_ASYNC events on the GPU track - Replace stale Cat:: alias with mx::Tracing::Category:: (the alias was removed in PR AcademySoftwareFoundation#2742)

The CMake flag was renamed from MATERIALX_BUILD_TRACING to MATERIALX_BUILD_PERFETTO_TRACING in PR AcademySoftwareFoundation#2742. Update all #ifdef guards in RenderGlsl.cpp to match.

The previous commit accidentally set enableTracing to false, which would prevent CI from producing Perfetto traces. Restore to true to match the behavior established in PR AcademySoftwareFoundation#2742.

The applyOptimizationFlags lambda in getGenerationOptions() was lost when main (PR AcademySoftwareFoundation#2742) was merged into proto2 on Feb 13. This silently left optEarlyPruning stuck at false regardless of _options.mtlx. Also adds the missing optEarlyPruning XML input to _options.mtlx. Assisted-by: Claude (Anthropic) via Cursor IDE Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com>

* add additional stage to optimize combination of 3 bsdfs that have been mixed with 2 nodes. Add possible refactor of gltf_pbr that allows it to match the existing optimization rule Add checkbox to MaterialXView to control bsdf mix optimization add GenOptions option to control the optimization remove trailing underscores from node names - because these still make variables with double underscores. Fix asan crash. remove debug print remove double underscores from node names - appears to be illegal. Refactor OpenPBR nodegraph optimization in to a ShaderGenerator optimization step for ND_mix_bsdf. Replace the mix with an add and modify the upstream weight inputs accordingly to perform the same math as a mix. This allows the BSDF nodes to early out if their contribution is not necessary in the mix. * Disconnect outputs when removing a node * Revert other attempted fixes * Fix dangling node crash by iterating nodes by name * Prototype of a GraphViz writer * Fix a missing include * Hook up GraphViz writing before and after the optimization * Stub out optimization pass infrastructure * Simplify pass enablement * Specific types instead of `auto` * Add gen option `optMaxPassIterations` * Hook up the dot dumping option to the viewer GUI * Add optimizeMixMixBsdf to PremultipliedAddPass for cascaded mix nodes - Add optimizeMixMixBsdf() to handle pattern: mix(mix(A,B), C) where A, B, C are primitive BSDFs with weight inputs - Refactor shared helpers into anonymous namespace: - redirectInput(), connectNodes(), replaceMixWithAdd() - PremultNodeDefs struct and getPremultNodeDefs() - PremultipliedAddPass::run() now tries MixMix first, falls back to simple Mix - Enables optimization for standard_surface and similar hierarchical materials * Add Perfetto tracing infrastructure to MaterialXCore - Add MATERIALX_BUILD_TRACING CMake option (OFF by default) - Add abstract MxTraceBackend interface for pluggable tracing backends - Add MxTraceCollector singleton (similar to USD's TraceCollector) - Add MxTraceScope RAII helper and MX_TRACE_* macros - Add MxPerfettoBackend implementation using Perfetto SDK - Initialize/shutdown Perfetto in MaterialXTest Main.cpp - Add trace instrumentation to RenderGlsl test The abstract interface allows USD/Hydra to inject their own tracing backend when calling MaterialX, enabling unified trace visualization. Usage: cmake -DMATERIALX_BUILD_TRACING=ON ... # Run tests, then open materialx_test_trace.perfetto-trace at ui.perfetto.dev * Exclude MxTrace.cpp when tracing disabled (zero overhead) * Fix Perfetto SDK build on Windows - Add NOMINMAX and WIN32_LEAN_AND_MEAN to prevent min/max macro conflicts - Add /bigobj flag for perfetto.cc (too many sections for MSVC default) - Use BUILD_INTERFACE generator expression for include path - Fix MxTracePerfetto.cpp: add missing <fstream> include and simplify Perfetto API usage (use fixed category with dynamic event names) * Improve tracing API: constexpr categories, material names, timestamps - Replace macro category definitions with constexpr namespace constants (MxTraceCategory::Render, ::ShaderGen, ::Optimize, ::Material) - Keep legacy macros for backward compatibility - Fix MX_TRACE_SCOPE macro to properly expand __LINE__ via helper macros - Add MX_TRACE_CAT_MATERIAL category for material/shader identity markers - Add material name trace scope in RenderGlsl for per-material tracking - Add timestamp to trace output filename (format: YYYYMMDD_HHMMSS) - Use MX_TRACE_FUNCTION macro for automatic function name extraction * Skip Perfetto tracing during test discovery mode The Catch2 Test Adapter for Visual Studio runs MaterialXTest.exe with --list-test-names-only to discover tests. Perfetto initialization was outputting to stderr, causing the adapter to fail with exit code 89. Now we detect --list-* arguments and skip tracing initialization, allowing VS Test Explorer to properly discover and debug tests. * Add outputDirectory option for test artifact redirection Add optional outputDirectory setting to _options.mtlx that redirects all test artifacts (logs, shaders, images, traces) to a user-specified directory. This allows different test runs to be isolated without overwriting each other. Changes: - _options.mtlx: Add outputDirectory input (empty = default behavior) - TestSuiteOptions: Add outputDirectory member and resolveOutputPath() helpers - GenShaderUtil.cpp: Use outputDirectory for shader generation logs and dumps - RenderUtil.cpp: Use outputDirectory for render logs, images, and traces - Main.cpp: Remove tracing code (moved to RenderUtil.cpp) Tracing improvements: - Move Perfetto init/shutdown to ShaderRenderTester::validate() - Each render test now produces its own trace file (e.g., genglsl_render_trace.perfetto-trace) - Trace filenames now follow the same pattern as log files - Removed timestamps from trace filenames for consistency Usage: <input name="outputDirectory" type="string" value="C:/test_results/my_run" /> * Print output directory at end of test run for easy access When outputDirectory is set, print the path at the end of test stdout. This makes it clickable in terminals like Cursor/VS Code for quick access. * Improve Perfetto tracing integration and suppress SDK warnings - Suppress MSVC warnings from Perfetto SDK templates (C4127, C4146, C4369) - Replace MX_TRACE_CAT_* macros with 'namespace cat = mx::MxTraceCategory' for cleaner code and better IDE support - Update RenderGlsl.cpp to use the new category alias pattern * Rename tracing classes to follow MaterialX conventions Rename files and classes to better align with MaterialX naming conventions and improve API clarity: Files: - MxTrace.h/cpp -> Tracing.h/cpp - MxTracePerfetto.h/cpp -> PerfettoSink.h/cpp Classes (now in MaterialX::Tracing namespace): - MxTraceBackend -> Tracing::Sink - MxTraceCollector -> Tracing::Dispatcher - MxTraceScope -> Tracing::Scope - MxTraceCategory -> Tracing::Category - MxPerfettoBackend -> Tracing::PerfettoSink Rationale: - 'Sink' is common terminology in logging/tracing for data destinations - 'Dispatcher' better describes the routing behavior (vs 'Collector') - Nested Tracing:: namespace groups related types cleanly - File names match primary class names (PerfettoSink.h) - Macros keep MX_ prefix for collision safety Usage example: namespace trace = mx::Tracing; auto sink = trace::PerfettoSink::create(); trace::Dispatcher::getInstance().setSink(sink); MX_TRACE_SCOPE(trace::Category::Render, "MyEvent"); * Add clarifying comments for Perfetto SDK integration - Explain amalgamated source compilation is Google's official recommended approach per https://perfetto.dev/docs/instrumentation/tracing-sdk - Document ws2_32 dependency (Windows Sockets 2 for Perfetto IPC) * Refactor tracing API for cleaner design Dispatcher: - Takes ownership of sink via unique_ptr (setSink) - Explicit shutdownSink() destroys sink and writes output - Asserts on double-set to catch programming errors PerfettoSink: - Constructor takes output path, starts session immediately - Destructor writes trace file (true RAII) - Uses std::call_once for thread-safe global Perfetto init - Removed pimpl - simpler since only setup code includes this header Scope: - Removed _enabled caching - Dispatcher checks internally - One less bool on the stack per trace scope Usage: - Added TracingGuard scope guard in RenderUtil.cpp for exception safety - Added const to immutable members (_outputPath, _category) The design now follows proper RAII patterns with clear ownership. * Add Dispatcher::ShutdownGuard nested struct Move the scope guard pattern into the Dispatcher class for reusability. Callers can now use mx::Tracing::Dispatcher::ShutdownGuard instead of defining their own local struct. * Add enableTracing runtime option in _options.mtlx - New 'enableTracing' boolean option (default: false) - Parsed in TestSuiteOptions::readOptions() - RenderUtil.cpp conditionally initializes PerfettoSink based on this option - Uses std::optional<ShutdownGuard> for conditional scope guard - Allows profiling to be enabled/disabled without rebuilding * Refactor tracing categories to use enum class - Replace namespace Category with constexpr strings -> enum class Category - Template-based Scope<Category> avoids storing category on stack - PerfettoSink uses switch statement (compiler optimizes to jump table) - Removed categoryMatches() string comparison overhead - Type-safe: can only pass valid Category enum values The enum-based design is simpler, more efficient, and equally compatible with a future USD TraceCollector sink (which uses integer category IDs, not strings). * Add category dispatch to PerfettoSink::counter() * Remove unused resolveOutputPathWithSubdir method * Add GPU timing support with async trace events - Add AsyncTrack enum and asyncEvent() to Tracing::Sink interface - Implement in PerfettoSink with dedicated GPU track (stable ID 1001) - Track descriptor initialized thread-safely in std::call_once block - Add GL_TIME_ELAPSED timer query wrapper in RenderGlsl.cpp - Add glFinish() after render for accurate CPU trace scope timing - Emit GPU render duration on separate 'GPU' track via MX_TRACE_ASYNC The GPU track shows async slices with accurate GPU render duration, aligned approximately with CPU submission time. This enables visual comparison of CPU vs GPU work in Perfetto UI. * Add framesPerMaterial option for multi-frame GPU timing - Add framesPerMaterial to TestSuiteOptions (unsigned int, default 1) - Parse from _options.mtlx with bounds checking - Render loop in RenderGlsl.cpp iterates framesPerMaterial times - Each frame emits a separate GPU trace event for Python analysis - Enables warmup frame detection and statistical averaging offline * Add optReplaceBsdfMixWithLinearCombination to test suite options - Expose Lee's mix_bsdf optimization via _options.mtlx - Parse and apply to GenOptions in getGenerationOptions() - Allows benchmarking optimization impact without recompiling * Move tracing to dedicated MaterialXTrace module Addresses PR feedback to create a dedicated module for tracing rather than adding it to MaterialXCore. Changes: - Create source/MaterialXTrace with Export.h, CMakeLists.txt - Move Tracing.h/cpp and PerfettoSink.h/cpp to MaterialXTrace - Update export macros from MX_CORE_API to MX_TRACE_API - Update include paths in test code - Link MaterialXTest against MaterialXTrace The MaterialXTrace library depends on MaterialXCore and contains all tracing infrastructure (Sink interface, Dispatcher singleton, PerfettoSink). This keeps MaterialXCore minimal for users who don't need tracing. * Only build MaterialXTrace when tracing is enabled Skip add_subdirectory(MaterialXTrace) and test linking when MATERIALX_BUILD_TRACING is OFF. This ensures zero overhead - no extra library built or linked when tracing is disabled. * Enable tracing by default and in nightly CI - Set enableTracing default to true in _options.mtlx - Enable MATERIALX_BUILD_TRACING=ON for nightly/extended builds in GHA When MATERIALX_BUILD_TRACING is ON at build time and enableTracing is true at runtime, trace files will be generated during test runs. * Enable tracing by default and in nightly CI - Set enableTracing default to true in _options.mtlx - Enable MATERIALX_BUILD_TRACING=ON for nightly/extended builds in GHA When MATERIALX_BUILD_TRACING is ON at build time and enableTracing is true at runtime, trace files will be generated during test runs. * Add Perfetto trace artifact upload for extended builds Upload .perfetto-trace files from build/bin/ as artifacts when running extended builds (nightly/workflow_dispatch). This allows retrieving trace files for performance analysis. * Fix an ambiguous comment * Fix Linux build: link pthread for Perfetto SDK Perfetto SDK requires pthread on Linux/Unix for thread-local storage and synchronization. Add Threads::Threads dependency via CMake's FindThreads module. * Fix MaterialXTrace CMake: use MTLX_MODULES instead of LIBRARIES The mx_add_library function expects MTLX_MODULES for dependencies, not LIBRARIES. * Suppress warnings-as-errors for Perfetto SDK on Unix Perfetto SDK generates warnings that fail builds when MATERIALX_WARNINGS_AS_ERRORS is ON. Add -Wno-error for perfetto.cc on GCC/Clang. * Fix tracing build on GCC/Clang and iOS - Suppress warnings-as-errors for Perfetto SDK on Unix (GCC/Clang generate warnings that fail builds with MATERIALX_WARNINGS_AS_ERRORS) - Skip pthread linking on iOS (threads are built-in to iOS SDK) * Fix CMake: use COMPILE_FLAGS instead of COMPILE_OPTIONS for source files COMPILE_OPTIONS is a target property. For set_source_files_properties, use COMPILE_FLAGS instead. This fixes /bigobj on Windows and -Wno-error on Unix. * Fix Perfetto symbol visibility for shared library builds - Define PERFETTO_COMPONENT_EXPORT for shared libs (required by Perfetto) - Add -fvisibility=default for perfetto.cc on Unix (fixes GCC 10 monolithic build) * Isolate Perfetto SDK from DLL boundaries - Add createPerfettoSink() factory to Tracing.h (exported) - PerfettoSink class in PerfettoSink.h not exported (internal header) - Consumers include only Tracing.h and use factory function - Perfetto types never cross DLL boundaries - only abstract Sink does - Remove unnecessary PERFETTO_COMPONENT_EXPORT from CMakeLists.txt * Fix Perfetto SDK compile flags for all build configurations - Define MATERIALX_PERFETTO_COMPILE_DEFINITIONS and MATERIALX_PERFETTO_COMPILE_FLAGS as CACHE INTERNAL variables in root CMakeLists.txt - Apply flags in both root (for monolithic builds) and MaterialXTrace (for non-monolithic) - set_source_files_properties is directory-scoped, so must be called in both places - Fixes -Werror=shadow on GCC (kDevNull shadowing) - Fixes min/max macro conflicts on Windows (NOMINMAX) * Fix Perfetto compile flags for monolithic builds set_source_files_properties is directory-scoped, so: - Root CMakeLists.txt: defines MATERIALX_PERFETTO_* variables - source/CMakeLists.txt: applies flags for monolithic builds - MaterialXTrace/CMakeLists.txt: applies flags for non-monolithic builds * Exclude Perfetto SDK from coverage analysis perfetto.cc is ~160k lines and generates gcov output that gcovr cannot parse. Exclude it from coverage since it's third-party code anyway. * Exclude Perfetto SDK from coverage, static analysis, and suppress MSVC warnings - Add --exclude .*perfetto.* to gcovr (coverage) - perfetto.cc is ~160k lines and generates gcov output that gcovr cannot parse - Add --suppress=*:*perfetto* to cppcheck (static analysis) - cppcheck doesn't define OS macros needed by Perfetto SDK's platform detection - Use /W0 to disable all warnings for Perfetto SDK on MSVC (third-party code triggers C4146, C4369, C4996, C4459, C4065, C4244, C4267, C4293) * Add /WX- to disable warnings-as-errors for Perfetto SDK on MSVC /W0 alone doesn't override project-level /WX. Need /WX- explicitly to prevent STL header warnings (from tuple, etc.) being promoted to errors when compiling perfetto.cc. * Fix Perfetto trace artifact upload path Use recursive glob build/**/*.perfetto-trace since traces are written to current working directory (build/) not build/bin/ * Add tracing instrumentation to shader generation tests - Set up Perfetto sink in ShaderGeneratorTester::validate() when enableTracing=true - Add MX_TRACE_SCOPE around generateCode() calls with element names - Generates <target>_gen_trace.perfetto-trace files (e.g. genglsl_gen_trace.perfetto-trace) - These tests run on all CI platforms, providing trace artifacts for download * Add comparetraces.py for Perfetto trace analysis * Polish comparetraces.py: optional deps, rename --output to --chart * Fix chart layout: best improvements at top, speedup bars go right * Polish comparetraces.py chart: delta labels, threshold display - Filter by absolute delta threshold (--min-delta-ms) instead of baseline duration - Show negative deltas for improvements (-XXms = time saved) - Y-axis labels show both absolute delta and percentage - Chart title displays the min delta filter threshold - Duration labels to the right of each bar for readability * Update comparetraces.py README with new options * Add tracing for optimization passes - Emit trace events when optimization passes run (wraps pass->run()) - Pass getName() returns GenOptions field name (e.g., optReplaceBsdfMixWithLinearCombination) - Link MaterialXGenShader against MaterialXTrace when tracing enabled - Trace macros expand to nothing when MATERIALX_BUILD_TRACING is off * Emit optimization trace markers only when graph is modified - Trace event fires only when pass->run() returns true (graph changed) - Use isolated scope for near-zero duration marker event - Avoids measuring dumpDot or other unrelated code * Add --show-opt to highlight materials affected by optimization - Query grandparent slice for material name (hierarchy: opt → GenerateShader → Material) - Add baseline validation (error if baseline has optimization events) - Update README with --show-opt and --min-delta-ms options * Revert _options.mtlx to defaults after testing * Add MixBsdfPruningPass for mix_bsdf constant factor pruning - New optPruneMixBsdf option to enable mix factor pruning - When mix=0, forwards background; when mix=1, forwards foreground - Eliminates dead branches in shader graph - Renamed from LobePruningPass to match option name - Removed half-implemented BSDF weight pruning (future work) - Use lambda for option flag replication in test harness * Fix /reduced subdirectory creation when outputDirectory is set When outputDirectory is specified and shaderInterfaces=REDUCED, the render tests were failing to create the parent material directory before attempting to create the /reduced subdirectory, causing shader dumps to fail. The fix ensures createDirectory() is called on the parent outputPath first, then on outputPath/reduced if in REDUCED mode. Both calls are now within the same ScopedTimer scope for consistent profiling. Fixed in all four render test implementations: - RenderGlsl.cpp - RenderOsl.cpp - RenderMsl.mm - RenderSlang.cpp * Fix FILENAME inputs in SHADER_INTERFACE_REDUCED mode In REDUCED mode, unconnected FILENAME inputs on internal nodes were being inlined as literal path strings, causing GLSL compilation errors. Real-time shading languages cannot handle file paths as literals. The fix publishes FILENAME node inputs to graph input sockets even in REDUCED mode, ensuring they become uniforms with proper variable names instead of being inlined. * Additional tracing markers in shader codegen * Add trace markers for emit and graph traversal functions Added MX_TRACE markers to key codegen functions for profiling: - ShaderGraph: createConnectedNodes, addUpstreamDependencies, createNode - ShaderGenerator: getImplementation, emitFunctionDefinitions, emitFunctionCalls - CompoundNode: initialize, emitFunctionDefinition, emitFunctionCall * Add NodeGraphTopology class for permutation analysis Analyzes NodeGraphs to identify "topological" inputs that can gate branches when constant (0 or 1). Used for permutation-aware caching of CompoundNode implementations. Key features: - Detects mix, multiply, and PBR node inputs with uimin=0/uimax=1 - Caches analysis per NodeGraph definition (thread-safe) - Computes permutation keys from constant input values - Identifies nodes to skip for each permutation * Hook up NodeGraphTopology for permutation-aware caching - Add skip nodes methods to GenContext for early pruning - Modify getImplementation() to compute permutation keys - Use permutation-aware cache keys for NodeGraph implementations - Add skip node check in createConnectedNodes (trace only for now) The full skip logic (handling downstream connections) is TODO. For now, this traces when nodes WOULD be skipped to verify the topology analysis is working correctly. * Refactor: rename classes and add getPermutationKey() virtual method Naming improvements: - NodeGraphTopology::Analysis -> NodeGraphTopology (the analysis result) - NodeGraphTopology -> NodeGraphTopologyCache (the singleton cache) Add virtual getPermutationKey() to ShaderNodeImpl: - Returns _name by default (no permutation) - CompoundNode overrides to return topology-based permutation key - CompoundNode computes and stores _permutationKey during initialize() - Hash now includes permutation key for better cache differentiation * Virtualize permutation key computation in ShaderNodeImpl Move permutation key logic from ShaderGenerator to ShaderNodeImpl: - Add virtual computePermutationKey(element, context) method - Default impl returns empty string (no permutation) - CompoundNode overrides to use NodeGraphTopologyCache analysis - ShaderGenerator now calls impl->computePermutationKey() polymorphically This eliminates the NodeGraph-specific type check for permutation key computation in getImplementation(), making the design more extensible. The skipNodes computation still uses NodeGraphTopologyCache directly. * Virtualize skipNodes computation in ShaderNodeImpl Add virtual computeSkipNodes(element, context) method: - Default impl returns empty StringSet - CompoundNode overrides to use NodeGraphTopologyCache ShaderGenerator now calls impl->computeSkipNodes() polymorphically, eliminating the last NodeGraph-specific type check for optimization logic. Remaining isA checks are only for impl creation dispatch (NodeGraph vs Implementation), which is fundamental type instantiation. * Add TODOs for instrumentation and API refactoring - TODO: Add MX_TRACE instrumentation to analyze(), computePermutationKey(), and getNodesToSkip() to measure string operation overhead - TODO: Move computePermutationKey() and getNodesToSkip() from NodeGraphTopologyCache to NodeGraphTopology class (better semantics) * Implement TODOs: instrumentation and API refactoring 1. Add MX_TRACE instrumentation to: - NodeGraphTopologyCache::analyze() - NodeGraphTopology::computePermutationKey() - NodeGraphTopology::getNodesToSkip() 2. Move computePermutationKey() and getNodesToSkip() from NodeGraphTopologyCache to NodeGraphTopology class: - Methods now operate on 'this' topology data - Cleaner API: topology.computePermutationKey(node) - Cache only handles caching, topology handles queries * Refactor: CompoundNode owns permutation via composition - Add NodeGraphPermutation class as lightweight intermediate object representing call-site specific configuration (key + skipNodes) - CompoundNode now owns permutation via unique_ptr passed at creation - ShaderGenerator::getImplementation() computes permutation BEFORE creating ShaderNodeImpl, avoiding wasteful allocation on cache hits - Remove virtual computePermutationKey/computeSkipNodes from ShaderNodeImpl - Remove permutationKey from GenContext (no longer needed as parameter bag) - Add permutation support to MDL compound nodes (CompoundNodeMdl, ClosureCompoundNodeMdl) via inheritance chain - Clean up API: remove parameterless create() functions, require explicit permutation parameter (may be nullptr) - Remove default parameter from virtual createShaderNodeImplForNodeGraph to avoid C++ gotcha with virtual default arguments Architecture: NodeGraphTopology (shared, cached per NG definition) ↓ NodeGraphPermutation (per call-site, computes key + skipNodes) ↓ (ownership transferred) CompoundNode (owns permutation, uses in initialize()) ↓ ShaderNodeImpl cache (keyed by baseName + permutation.key) * Refactor: Move topology analysis into NodeGraphTopology constructor Better encapsulation - NodeGraphTopology now analyzes itself in its constructor rather than being populated externally by NodeGraphTopologyCache. The cache now only manages storage/lookup, not analysis logic. - Add NodeGraphTopology(const NodeGraph&) constructor with analysis - Move isTopologicalInput, analyzeAffectedNodes, collectUpstreamNodes to NodeGraphTopology - Change public members to private with getters - Simplify NodeGraphTopologyCache to pure cache management - Remove unused buildReverseConnectionMap method * Add comment explaining thread-safety tradeoff in topology cache Document why we release the mutex during topology construction and why the potential race condition is safe and acceptable. * Refactor: Use const references instead of copying shared_ptr Change computePermutationKey and NodeGraphPermutation constructor to accept const Node& instead of ConstNodePtr to avoid unnecessary shared_ptr copy overhead. * Refactor: Combine permutation key and skip nodes computation Add NodeGraphTopology::createPermutation() factory method that computes key and skip nodes in a single pass, avoiding redundant string parsing. Make NodeGraphPermutation constructor private with NodeGraphTopology as friend. Returns nullptr when no optimization is possible. * Fix: Restore _options.mtlx to defaults (remove local paths) * Removed unnecessary members * Refactor: Pass permutation via function stack instead of GenContext Remove _skipNodes from GenContext and pass NodeGraphPermutation* directly through ShaderGraph::create, addUpstreamDependencies, and createConnectedNodes. This fixes potential bugs with nested CompoundNodes overwriting each other's skip nodes on the shared GenContext. * Implement early pruning: skip node creation for pruned branches When a node is in the skip set, return early from createConnectedNodes without creating the ShaderNode. The downstream input remains unconnected and uses its default value (e.g., transparent BSDF for closure types). * Add optEarlyPruning GenOption to control early pruning feature - Add optEarlyPruning to GenOptions (defaults to false) - Wire up option in ShaderGenerator::getImplementation() - Add test option parsing and propagation to GenOptions - Early pruning is now opt-in for A/B testing * Replace comparetraces.py with diff_test_results.py Unified script that compares baseline vs optimized MaterialX test runs: - Performance traces: loads Perfetto .perfetto-trace files, computes timing deltas, generates bar charts with matplotlib - Rendered images: pixel-wise RMSE comparison with PIL/numpy Merges trace comparison with image comparison functionality. Supports filtering by --min-delta-ms threshold and generates combined reports. Usage: python diff_test_results.py baseline/ optimized/ [--traces] [--images] * Use NVIDIA FLIP for perceptual image comparison Replace numpy RMSE with FLIP (pip install flip-evaluator), which approximates human perception of differences when flipping between images. FLIP score: 0 = identical, 1 = maximally different. - Default threshold changed from 0.001 (RMSE) to 0.05 (FLIP) - Add --ppd option for pixels-per-degree viewing distance - Update README with new installation and options * Add HTML report generation with side-by-side image comparison New --report flag generates an HTML report containing: - Summary cards (improvements/regressions/pass/fail counts) - Performance chart (auto-generated if not specified) - Side-by-side image grid: Baseline | Optimized | FLIP Heatmap - Color-coded pass/fail status for each image FLIP heatmaps are saved to a heatmaps/ subdirectory using the magma colormap to visualize perceptual differences. * Use reference counting for early pruning to handle shared dependencies The previous implementation would incorrectly prune nodes that were shared between multiple downstream consumers. When one consumer was pruned (e.g., specular BSDF with weight=0), all upstream dependencies were also pruned, even if they were still needed by other consumers (e.g., metal BSDF). This fix introduces reference counting: each node tracks how many downstream nodes consume its output. A node is only pruned when its reference count drops to zero. This correctly preserves shared nodes like roughness and tangent calculations that feed multiple BSDFs. * Expose envSampleCount option in MaterialXTest for IBL sampling control Add configurable environment radiance sample count to test suite options, allowing game-representative performance testing (1-16 samples) vs reference quality (1024+ samples). Lower sample counts make shader complexity a larger fraction of GPU time, enabling meaningful runtime performance comparisons. * Fix early pruning to check inherited NodeDef default values Use getActiveInput() instead of getInput() when checking NodeDef defaults for topological inputs. This properly resolves inherited input values, enabling pruning of lobes that default to 0 in parent NodeDefs. Results: 10-18% improvement in shader generation time, 15-18% reduction in generated shader code (298-342 lines per shader). Also removes unused nodeGraph parameter from analyzeAffectedNodes(). * Instrument `GlslProgram::build` * Instrument `DumpGeneratedCode` * Extract image comparison into standalone diff_images.py script Split the monolithic diff_test_results.py into focused scripts: - New diff_images.py in python/MaterialXTest/diff_test_runs/ for FLIP-based image comparison with a simple CLI - diff_test_results.py is now trace-only (removed image functions, FLIP dependency, and image-related CLI flags) * Rename and move `diff_traces.py` * Rename references in diff_traces.py after move from Scripts/ Update module docstring, logger name, HTML footer, and cross-references to reflect the new filename and location. * Refine diff_traces.py CLI: --slice, --gpu, and -o outputfile Replace the generic --track and --report/--chart args with two explicit comparison modes: --gpu GPU render durations per material (hardcodes GPU track) --slice NAME ... CPU child-slice durations per material (e.g. GenerateShader) Multiple --slice names produce multiple charts in one run. HTML report is always generated via -o/--outputfile (default: trace_diff.html), matching the tests_to_html.py convention. Chart PNGs are auto-derived from the report filename. No flags prints help and exits. * Simplify diff_traces.py: require pandas, remove --csv Make perfetto and pandas hard requirements instead of optional dependencies with fallback paths. This removes ~120 lines of dual-path code (list-of-dicts vs DataFrame) throughout the script. Also remove --csv export since the HTML report is always generated and covers the same data. * Show directory names and frame counts in diff_traces.py output Use leaf directory names (e.g. Prism_MtlX_env16_baseline) instead of generic "Baseline"/"Optimized" in chart legends, table headers, chart titles, and the HTML report page title. For GPU comparisons, display the number of frames averaged over in the chart title (e.g. "averaged over 10 frames"). * Auto-derive report filename from directory names in diff_traces.py Default -o output is now <baseline>_vs_<optimized>.html instead of the static trace_diff.html. Chart PNGs always include the event name suffix (e.g. _GPU.png, _GenerateShader.png). * Print absolute report path and auto-open in browser After generating the HTML report, print its absolute path prominently and open it in the default browser automatically. * Show active filter options in chart subtitles and HTML report Display min-delta-ms and show-opt as a subtitle line on charts and in the HTML report header when those filters are active. * Load trace files once and reuse TraceProcessor instances Instead of creating a new TraceProcessor for each comparison, load baseline and optimized traces once in main() and pass the instances through. Cuts trace parsing from N*2 to just 2. * Switch charts from PNG to inline SVG with searchable text Save charts as SVG and embed them directly in the HTML report so material names and values are selectable and searchable via Ctrl+F. Set svg.fonttype='none' to emit real <text> elements. * Extract shared reporting into _report.py and add diff_shaders.py Factor chart generation, table formatting, and HTML report building out of diff_traces.py into a reusable _report.py module. Add diff_shaders.py for comparing dumped shader files (LOC, SPIR-V size) between baseline and optimized test runs using the same reporting infrastructure. * Add --warmup-frames to discard initial GPU frames before averaging Allows discarding a burn-in period of N initial frames per material when comparing GPU render durations, reducing noise from driver warm-up and transient system states. Frame ordering is preserved via ORDER BY slice.ts. Warmup info appears in chart titles and the filter subtitle. * Fix SPIR-V compilation in diff_shaders.py for MaterialX GLSL Use OpenGL semantics (-G) instead of Vulkan (-V), and --auto-map-locations for automatic layout assignment, since MaterialX generates OpenGL GLSL without explicit layout qualifiers. * Add compilation time and spirv-opt time metrics to diff_shaders.py Refactor SPIR-V compilation into a single-pass cache so binaries are compiled once and reused for all downstream metrics (size, compile time, spirv-opt size, spirv-opt time, AMD RGA VGPRs). New metrics: - glslangValidator compile time (ms per shader) - spirv-opt optimisation time (ms per shader) - AMD RGA VGPR count (when rga is in PATH) * Remove RGA support and fix bar color for zero-delta metrics Drop AMD RGA VGPR analysis for simplicity -- spirv-opt already proves that optimised binaries are identical, so VGPRs would be too. Use green bars (not red) when delta is zero, since identical output is a positive result, not a regression. * Instrument `GlslProgram::build` * Instrument `DumpGeneratedCode` * Expose envSampleCount option in MaterialXTest for IBL sampling control Add configurable environment radiance sample count to test suite options, allowing game-representative performance testing (1-16 samples) vs reference quality (1024+ samples). Lower sample counts make shader complexity a larger fraction of GPU time, enabling meaningful runtime performance comparisons. * Add diff_test_runs package for comparing MaterialX test outputs Python package (MaterialXTest.diff_test_runs) with three comparison scripts and shared reporting utilities: diff_images.py -- perceptual image comparison via NVIDIA FLIP, with HTML side-by-side reports diff_traces.py -- Perfetto trace comparison with per-material CPU slice and GPU render time analysis, multiple --slice filters, --warmup-frames burn-in, inline SVG charts diff_shaders.py -- offline shader analysis (LOC, SPIR-V size, compile time, spirv-opt time); auto-discovers Vulkan SDK tools in PATH _report.py -- shared comparison tables, SVG chart generation, and HTML report builder * Add framesPerMaterial and envSampleCount to _options.mtlx Declare the new test suite options so the C++ test runner can read them from the options file. Also reset enableTracing default to false (opt-in for profiling runs). * Additional tracing markers in shader codegen * Add trace markers for emit and graph traversal functions Added MX_TRACE markers to key codegen functions for profiling: - ShaderGraph: createConnectedNodes, addUpstreamDependencies, createNode - ShaderGenerator: getImplementation, emitFunctionDefinitions, emitFunctionCalls - CompoundNode: initialize, emitFunctionDefinition, emitFunctionCall * Revert stale README for python/Scripts The old monolithic diff_test_results.py script was replaced by the diff_test_runs package under python/MaterialXTest/. Revert README.md to its upstream state. * Remove stale merge conflict marker in RenderGlsl.cpp A leftover <<<<<<< ours marker from the main merge was left behind. * Restore missing envSampleCount option in _options.mtlx The envSampleCount input was lost during the merge of main into lobe_pruning_proto2. Add it back before the optimization options. * Add GPU timing infrastructure and fix trace aliases in RenderGlsl.cpp - Add GpuTimerQuery helper class and getCurrentTimeNs() for GPU timing via GL_TIME_ELAPSED queries (guarded by MATERIALX_BUILD_TRACING) - Add multi-frame render loop using framesPerMaterial for statistical validity, emitting MX_TRACE_ASYNC events on the GPU track - Replace stale Cat:: alias with mx::Tracing::Category:: (the alias was removed in PR AcademySoftwareFoundation#2742) * Link MaterialXTrace to MaterialXGenShader when tracing is enabled Shader codegen source files (ShaderGraph.cpp, ShaderGenerator.cpp, etc.) include MaterialXTrace/Tracing.h and use MX_TRACE macros. When Perfetto tracing is enabled, MaterialXGenShader needs to link against MaterialXTrace for the trace event symbols. * Use MATERIALX_BUILD_PERFETTO_TRACING in RenderGlsl.cpp The CMake flag was renamed from MATERIALX_BUILD_TRACING to MATERIALX_BUILD_PERFETTO_TRACING in PR AcademySoftwareFoundation#2742. Update all #ifdef guards in RenderGlsl.cpp to match. * Fix `MATERIALX_BUILD_PERFETTO_TRACING` renaming * Fix `MATERIALX_BUILD_PERFETTO_TRACING` in an XML comment * Add async event support for GPU timing in MaterialXTrace Port AsyncTrack enum, Sink::asyncEvent(), and MX_TRACE_ASYNC macro from the pre-merge proto2 branch. This enables GPU timer query results to be emitted as events on a dedicated "GPU" track in Perfetto traces. Use std::numeric_limits<uint64_t>::max() for the GPU track ID to guarantee no collision with OS thread IDs. * Fix enableTracing default to true for CI trace generation The previous commit accidentally set enableTracing to false, which would prevent CI from producing Perfetto traces. Restore to true to match the behavior established in PR AcademySoftwareFoundation#2742. * Make matplotlib a hard requirement for diff_test_runs reports Without matplotlib, the HTML reports are generated with no charts, making them essentially empty. Fail early with a clear install message, consistent with how perfetto and pandas are handled. * Use directory names instead of Baseline/Optimized in diff_images.py Derive display labels from the actual directory names passed on the command line, matching the convention already used by diff_traces.py. Affects console output, HTML report title, and per-image labels. Assisted-by: Claude (Anthropic) via Cursor IDE Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com> * Auto-open HTML report in diff_images.py Use the shared openReport() from _report.py, matching the behavior already present in diff_traces.py and diff_shaders.py. Assisted-by: Claude (Anthropic) via Cursor IDE Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com> * Fix _report import in diff_images.py to use try/except pattern Use the same relative/fallback import pattern as diff_traces.py and diff_shaders.py so the script works both as a package and standalone. Assisted-by: Claude (Anthropic) via Cursor IDE Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com> * Restore optEarlyPruning propagation lost in merge d3ecbbb The applyOptimizationFlags lambda in getGenerationOptions() was lost when main (PR AcademySoftwareFoundation#2742) was merged into proto2 on Feb 13. This silently left optEarlyPruning stuck at false regardless of _options.mtlx. Also adds the missing optEarlyPruning XML input to _options.mtlx. Assisted-by: Claude (Anthropic) via Cursor IDE Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com> * Encapsulate skip-node lookup as NodeGraphPermutation::shouldSkip() Replace direct getSkipNodes().count() access with a shouldSkip() query method for better encapsulation. Remove the raw accessor. * Simplify NodeGraphTopology and rename cache accessor Throw ExceptionShaderGenError instead of silently returning when NodeDef is missing -- a half-initialized topology is worse than a clear error. Remove speculative fallback lookup. Rename NodeGraphTopologyCache::analyze() to get(). * Remove unused methods and member from NodeGraphTopology Drop getTopology(), clearCache(), hasOptimizations(), and _nodeGraphName -- none are called outside their definitions. * Clarify how the node instance is obtained in getImplementation * Hide NodeGraphTopology behind NodeGraphTopologyCache Move NodeGraphTopology class from the header into the .cpp as an implementation detail. The cache's public API is now a single createPermutation(nodeGraph, node) that combines topology lookup and permutation creation. The cache remains an explicit singleton so its lifetime can be controlled in the future. * Clean up refactoring artifacts * Minor cleanups in ShaderGenerator and ShaderGraph Merge cachedImpl/impl into a single variable in getImplementation(). Use existing newNode local instead of redundant getNode() lookup. * Simplify cache key handling in getImplementation Use a single 'name' variable instead of separate baseName/cacheKey. Append permutation suffix inside the NodeGraph branch. * Nest permutation key append under parent node check * More nitpicking * Move NodeGraphTopologyCache from global singleton to GenContext The topology cache now lives as a value member on GenContext, giving clients explicit control over its lifetime and invalidation. This mirrors how GenContext already manages the ShaderNodeImpl cache. - Expose NodeGraphTopology and TopologicalInput in the header so the cache can be held by value without incomplete-type issues. - Make NodeGraphTopologyCache copyable (copy = fresh empty cache) to preserve GenContext's existing deep-copy semantics. - Add NodeGraphTopologyCache::clear(), called from clearNodeImplementations() so all existing invalidation sites automatically cover the topology cache. - Mark GenContext constructor explicit. * Share immutable NodeGraphTopology objects across cache copies Use shared_ptr for cached topologies so that copying a NodeGraphTopologyCache (via GenContext copy) shares the existing analyses instead of discarding them. The topology objects are immutable once constructed, so sharing is safe. Copy operations properly lock the source mutex. * Merge PBR node sets into kWeightedPbrNodes and use unordered_set * Refactor per-node data into NodeInfo struct (AOS instead of SOA) * Fix downstreamRefCount to increment per unique upstream Previously, ref counts were incremented per-input, but death propagation decremented per-unique-upstream (via the StringSet). This mismatch could prevent pruning when a node had multiple inputs connected to the same upstream. * Clean up createPermutation: structured bindings, rename key, init-if * Minor refactoring * Extract pruneNode, removeConsumer, applyConstantInput lambdas Collapse four duplicate value==0/value==1 blocks into composing lambdas. The worklist DCE loop also reuses removeConsumer. * Refine createPermutation: descriptive names, flatten control flow Move applyConstantInput inside the loop to absorb value extraction and flag assignment. Invert connected-node check and use init-if for nodeDef/defaultInput lookups. * Explicit lambda captures and extract applyConstantValue * Rename skip -> prune, explicit unordered_set, formatting cleanup * Complete skip -> prune rename across pruning code * Arrays instead of paris of members * Turn analyzeAffectedNodes into TopologicalInput constructor * Rename nodesToPrune/maybeDead to prunableAtValue/potentiallyPrunableAtValue * Revert gltf_pbr.mtlx to upstream main This file was modified by Lee Kerley's SG rewriting optimization to match a new BSDF mix pattern rule. That work is out of scope for the early pruning PR. * Revert main.yml to target branch Restores the adsk_contrib/dev version of the CI workflow. The diff was a main vs adsk_contrib/dev conflict unrelated to pruning. * Remove diff_test_runs Python scripts These comparison utilities (diff_images, diff_traces, diff_shaders) are covered by a separate infrastructure PR (AcademySoftwareFoundation#2822). * Remove GPU timing, trace markers, and test options from pruning PR These changes belong to the separate infrastructure PRs: - GPU async timing (AsyncTrack, GpuTimerQuery, MX_TRACE_ASYNC) -> AcademySoftwareFoundation#2824 - Trace markers in GlslShaderGenerator -> AcademySoftwareFoundation#2820 - envSampleCount / framesPerMaterial options -> AcademySoftwareFoundation#2821 Also restores the MATERIALX_CONTRIB option from adsk_contrib/dev. * Remove SG rewriting pass infrastructure Remove the ShaderGraph optimization pass manager, debug dumping, and late-stage rewriting passes (PremultipliedAdd, MixBsdfPruning, ConstantFolding pass, etc.). These are parked pending test coverage and are out of scope for early pruning. Restores ShaderGraph::optimize() to upstream main's implementation. * Remove SG rewriting GenOptions and UI controls Remove optReplaceBsdfMixWithLinearCombination, optPruneMixBsdf, optDumpShaderGraphDot, and optMaxPassIterations from GenOptions. Also remove framesPerMaterial and envSampleCount from TestSuiteOptions (covered by PR AcademySoftwareFoundation#2821). Revert Viewer.cpp and RenderView.cpp UI additions for these options. Keeps only optEarlyPruning, which is the core of this PR. * Restore MATERIALX_CONTRIB option to upstream The removal was unrelated to early pruning. * Remove tracing and envSampleCount from pruning PR Remove trace markers from HwShaderGenerator and GlslProgram (PR AcademySoftwareFoundation#2820). Revert RenderMsl.mm envSampleCount to upstream pattern (PR AcademySoftwareFoundation#2821). * Remove MaterialXTrace dependency from pruning PR Remove all MX_TRACE macros, #include <MaterialXTrace/Tracing.h>, and the conditional CMake link from MaterialXGenShader. These trace markers belong to the instrumentation PR (AcademySoftwareFoundation#2820), not early pruning. * Revert SG rewriting artifacts from ShaderGraph Remove changes from Lee's ShaderGraph rewriting work that leaked into the pruning branch: removeNode(), getDocument(), public bypass/ createNode, finalize() FILENAME/REDUCED refactoring, bypass bounds- check removal, and #include <cstdlib>. * Revert enableTracing changes in _options.mtlx Keep only the optEarlyPruning addition; the tracing comment and default value changes belong to the infrastructure PRs. * Rename optEarlyPruning to enableLobePruning Align with MaterialX naming conventions (enable* prefix) and the feature's final name: lobe pruning. * Inline applyOptimizationFlags lambda in RenderUtil Only one optimization flag (enableLobePruning) exists, so the lambda indirection is unnecessary. * Use std::make_shared in CompoundNode factory methods Replace shared_ptr(new T(...)) with make_shared<T>(...). The constructors were already public (default) before our changes, so keep them public with the new permutation parameter. * Move NodeGraphTopology.h include from CompoundNode header to source Forward-declare NodeGraphPermutation in the header and add an explicit destructor so unique_ptr can be destroyed where the full type is visible. * Remove unused MIX_BSDF classification bit Leftover from the SG rewriting pass infrastructure; never read. * Fix shadowed structured binding warning in NodeGraphTopology Pass topoInput as a lambda parameter (renamed to topo) instead of capturing it, because capturing structured bindings requires C++20 and MaterialX targets C++17. * Fix testUniqueNames to use getName() for ShaderGraph node lookup ShaderGraph::getNode() now takes a simple name, not a full path. Update the test to use node1->getName() instead of getNamePath(). * Backward-compatible ShaderGraph::getNode() for name and namePath Internal ShaderGraph methods now look up _nodeMap directly by simple name. The public getNode() API also accepts a legacy full namePath (e.g., "outerGraph/innerGraph/nodeName"): it validates the prefix against the graph's own cached _namePath and strips it before lookup, so existing callers are not silently broken. * Revert unnecessary name-vs-namePath indexing change in ShaderGraph The _nodeMap keying by name instead of uniqueId/namePath was not required by the early pruning optimization. Revert to the original namePath-based indexing to minimize the PR diff. --------- Signed-off-by: Pavlo Penenko <pavlo.penenko@autodesk.com> Co-authored-by: Lee Kerley <ld_kerley@icloud.com> Co-authored-by: Lee Kerley <154285602+ld-kerley@users.noreply.github.com>

…ademySoftwareFoundation#2820) Building on the Perfetto tracing from AcademySoftwareFoundation#2742, this PR adds CPU trace markers to additional codegen and rendering hot paths. This also required conditionally linking `MaterialXTrace` to `MaterialXGenShader`, which was not needed by AcademySoftwareFoundation#2742 (tracing was only in test code) but is now required for the new codegen instrumentation.

ppenenko added 8 commits January 8, 2026 16:27

Exclude MxTrace.cpp when tracing disabled (zero overhead)

c147aaf

Print output directory at end of test run for easy access

940cafb

When outputDirectory is set, print the path at the end of test stdout. This makes it clickable in terminals like Cursor/VS Code for quick access.

ppenenko force-pushed the ppenenko/perfetto_and_output_dir branch from 36ae774 to fb056e3 Compare January 8, 2026 21:52

ppenenko added 8 commits January 9, 2026 17:18

Add clarifying comments for Perfetto SDK integration

1cb1b4e

- Explain amalgamated source compilation is Google's official recommended approach per https://perfetto.dev/docs/instrumentation/tracing-sdk - Document ws2_32 dependency (Windows Sockets 2 for Perfetto IPC)

Add Dispatcher::ShutdownGuard nested struct

6e1c923

Move the scope guard pattern into the Dispatcher class for reusability. Callers can now use mx::Tracing::Dispatcher::ShutdownGuard instead of defining their own local struct.

Add category dispatch to PerfettoSink::counter()

dc10eb6

Remove unused resolveOutputPathWithSubdir method

4633341

ppenenko marked this pull request as ready for review January 12, 2026 20:44

jstone-lucasfilm reviewed Jan 12, 2026

View reviewed changes

Comment thread source/MaterialXCore/PerfettoSink.cpp Outdated

ppenenko added 2 commits January 16, 2026 15:20

Fix Perfetto trace artifact upload path

7367222

Use recursive glob build/**/*.perfetto-trace since traces are written to current working directory (build/) not build/bin/

ppenenko commented Jan 16, 2026

View reviewed changes

Comment thread .github/workflows/main.yml

ppenenko commented Jan 16, 2026

View reviewed changes

Comment thread source/MaterialXTrace/Tracing.h

ppenenko and others added 3 commits January 22, 2026 16:14

Merge branch 'main' into ppenenko/perfetto_and_output_dir

6a46204

Merge branch 'main' into ppenenko/perfetto_and_output_dir

31a6921

jstone-lucasfilm reviewed Feb 12, 2026

View reviewed changes

Comment thread source/MaterialXTest/MaterialXGenShader/GenShaderUtil.h

Comment thread source/MaterialXTest/MaterialXRenderGlsl/RenderGlsl.cpp Outdated

Comment thread CMakeLists.txt Outdated

Comment thread .github/workflows/main.yml Outdated

ppenenko added 4 commits February 12, 2026 16:10

Use fully qualified mx::Tracing::Category in RenderGlsl.cpp

8af946e

Address review: remove `using Cat` alias and spell out the full mx::Tracing::Category names for clarity while the Tracing library is still new.

Rename MATERIALX_BUILD_TRACING to MATERIALX_BUILD_PERFETTO_TRACING

742a011

Address review: use a Perfetto-specific flag name to leave a clear path for future extensions to Tracy and other profiling backends.

Enable Perfetto tracing in a single CI build (Linux GCC 14)

e2ad867

Address review: instead of enabling tracing in all extended builds, add a matrix.extended_build_perfetto flag and set it on one non-Windows build to produce a single clear tracing artifact.

Gate Perfetto trace upload on matrix.extended_build_perfetto

e098349

Only attempt to upload trace artifacts from the build that actually has Perfetto enabled, avoiding no-op upload steps in other builds.

ppenenko commented Feb 12, 2026

View reviewed changes

Comment thread .github/workflows/main.yml

Merge branch 'main' into ppenenko/perfetto_and_output_dir

2876b06

jstone-lucasfilm approved these changes Feb 12, 2026

View reviewed changes

jstone-lucasfilm merged commit 0a1b924 into AcademySoftwareFoundation:main Feb 12, 2026
33 checks passed

ppenenko deleted the ppenenko/perfetto_and_output_dir branch February 12, 2026 22:16

ppenenko mentioned this pull request Feb 13, 2026

Test comparison tooling and extended tracing instrumentation #2774

Closed

6 tasks

Conversation

ppenenko commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background and motivation

Features

1. Perfetto Tracing (MATERIALX_BUILD_TRACING)

2. Test Output Directory (outputDirectory in _options.mtlx)

3. Runtime Tracing Toggle (enableTracing in _options.mtlx)

Usage

Enable Tracing (Build Time)

Enable Tracing (Runtime)

Configure Output Directory

Analyze Traces

Related

Uh oh!

Uh oh!

jstone-lucasfilm commented Jan 13, 2026

Uh oh!

ashwinbhat commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jstone-lucasfilm commented Jan 14, 2026

Uh oh!

ppenenko commented Jan 14, 2026

Uh oh!

ppenenko commented Jan 14, 2026

Uh oh!

ppenenko commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ppenenko commented Jan 14, 2026

Uh oh!

jstone-lucasfilm commented Jan 14, 2026

Uh oh!

ppenenko commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jstone-lucasfilm commented Jan 14, 2026

Uh oh!

ppenenko commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ppenenko commented Jan 16, 2026

Uh oh!

ppenenko commented Jan 16, 2026

Uh oh!

jstone-lucasfilm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jstone-lucasfilm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ppenenko commented Jan 8, 2026 •

edited

Loading

1. Perfetto Tracing (`MATERIALX_BUILD_TRACING`)

2. Test Output Directory (`outputDirectory` in `_options.mtlx`)

3. Runtime Tracing Toggle (`enableTracing` in `_options.mtlx`)

ashwinbhat commented Jan 14, 2026 •

edited

Loading

ppenenko commented Jan 14, 2026 •

edited

Loading

ppenenko commented Jan 14, 2026 •

edited

Loading

ppenenko commented Jan 15, 2026 •

edited

Loading