feature: Add validation layer Timing Checker for host-side API timing#481
Open
MichalMrozek wants to merge 1 commit into
Open
feature: Add validation layer Timing Checker for host-side API timing#481MichalMrozek wants to merge 1 commit into
MichalMrozek wants to merge 1 commit into
Conversation
Add a new validation-layer checker (ZEL_ENABLE_TIMING_CHECKER) that measures the host-side (CPU) duration of every Level Zero API call and aggregates per-API statistics (call count, total, min, max, average in nanoseconds, and percentage share of total host time). For each API the checker stamps a high-resolution monotonic timestamp in the Prologue and reads it again in the Epilogue (QueryPerformanceCounter on Windows, CLOCK_MONOTONIC_RAW elsewhere). All APIs are covered through per-API override headers that are generated at build time from scripts/templates/validation/timing.h.mako (wired into scripts/generate_code.py). These generated headers are not checked in. Output is written directly to stderr and is independent of the loader logging system, so no ZEL_*_LOGGING variables are required. Three independently controlled modes: - summary table printed at teardown, sorted by percentage share (default) - ZEL_TIMING_CHECKER_CSV: export per-API stats to a PID-suffixed CSV - ZEL_TIMING_CHECKER_LIVE: print each call's duration as it happens Adds README documentation, a dedicated usage guide (checkers/timing/timing_checker.md), and a transparency unit test asserting the checker does not alter API results. Signed-off-by: Michal Mrozek <michal.mrozek@intel.com>
There was a problem hiding this comment.
Pull request overview
Adds a new validation-layer “Timing Checker” that measures host-side (CPU) duration of Level Zero API calls and reports per-API aggregated statistics, with optional live stderr logging and CSV export. This fits into the validation layer’s existing checker framework (global checker registration + generated per-API overrides).
Changes:
- Introduces a new
timingchecker (engine + registration) that records Prologue/Epilogue timestamps and aggregates per-function timing stats, printing a teardown summary and optionally exporting CSV. - Wires build-time generation of per-API timing override headers via a new Mako template and updates the code generator to ensure output directories exist.
- Adds documentation and loader/validation-layer unit tests to validate transparency and live per-call timing output.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/loader_validation_layer.cpp | Adds gtest coverage for timing checker transparency and live per-call output parsing |
| test/CMakeLists.txt | Registers new timing-checker-focused CTest entries with required env vars |
| source/layers/validation/README.md | Documents enabling and usage of the new timing checker mode |
| source/layers/validation/checkers/timing/zel_timing_checker.h | Declares timing checker and generated entrypoint wrappers |
| source/layers/validation/checkers/timing/zel_timing_checker.cpp | Implements timestamping, aggregation, stderr output, CSV writing, and handler registration |
| source/layers/validation/checkers/timing/zel_global_timing_state.h | Defines shared aggregation state and APIs for timing collection/output |
| source/layers/validation/checkers/timing/timing_checker.md | Adds detailed user documentation and examples for summary/CSV/live modes |
| source/layers/validation/checkers/timing/CMakeLists.txt | Adds timing checker sources to the validation layer build |
| source/layers/validation/checkers/CMakeLists.txt | Hooks the new timing checker subdirectory into the checker build |
| scripts/templates/validation/timing.h.mako | Adds generated Prologue/Epilogue overrides that call into GlobalTimingState |
| scripts/generate_code.py | Registers timing template output and ensures generated output subdirs are created |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+35
to
+45
| static LARGE_INTEGER frequency = {}; | ||
| if (frequency.QuadPart == 0) { | ||
| QueryPerformanceFrequency(&frequency); | ||
| } | ||
| LARGE_INTEGER ticks; | ||
| QueryPerformanceCounter(&ticks); | ||
| if (frequency.QuadPart == 0) { | ||
| return 0; | ||
| } | ||
| return static_cast<uint64_t>(ticks.QuadPart) * (NSEC_IN_SEC / static_cast<uint64_t>(frequency.QuadPart)); | ||
| #else |
Comment on lines
+1101
to
+1104
| const auto matchesEnd = std::sregex_iterator(); | ||
| const long reports = std::distance( | ||
| std::sregex_iterator(captured.begin(), captured.end(), driverGetLine), matchesEnd); | ||
|
|
| timing_checker.zetValidation = zetChecker; | ||
| timing_checker.zesValidation = zesChecker; | ||
| timing_checker.zerValidation = zerChecker; | ||
| validation_layer::context.getInstance().validationHandlers.push_back(&timing_checker); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new validation-layer checker, enabled with
ZEL_ENABLE_TIMING_CHECKER=1, that measures the host-side (CPU) duration of every Level Zero API call and aggregates per-API statistics: call count, total, min, max, average, and percentage share of total host time.For each API the checker stamps a high-resolution monotonic timestamp in the Prologue and reads it again in the Epilogue (
QueryPerformanceCounteron Windows,clock_gettime(CLOCK_MONOTONIC_RAW)elsewhere). The measured span is dominated by the underlying driver call and is consistent across calls, making it suitable for relative host-cost analysis.Output
Output is written directly to
stderrand is independent of the loader logging system, so theZEL_*_LOGGINGvariables are not required. Summary rows are sorted by percentage share of total host time, highest first.ZEL_ENABLE_TIMING_CHECKER0ZEL_TIMING_CHECKER_CSV=<path>ZEL_TIMING_CHECKER_LIVE0Example summary:
Changes
source/layers/validation/checkers/timing/(engine, registration, CMake) and a usage guidetiming_checker.md, referenced from the validation-layer README.scripts/templates/validation/timing.h.mako(wired intoscripts/generate_code.py) and are not checked in.test/loader_validation_layer.cppcovering result transparency and per-API timing data.