test(profiling): add profiles dictionary benchmarks#2088
Conversation
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 427e0ea0f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2088 +/- ##
==========================================
- Coverage 73.54% 73.53% -0.02%
==========================================
Files 475 475
Lines 79007 79007
==========================================
- Hits 58109 58095 -14
- Misses 20898 20912 +14
🚀 New features to boost your workflow:
|
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 5c54a65 | Docs | Datadog PR Page | Give us feedback! |
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
2e23e15 to
477834f
Compare
morrisonlevi
left a comment
There was a problem hiding this comment.
How stable is the benchmark in your experience so far? It seems like it would be pretty variable, but at least you have a startup barrier in there.
What does this PR do?
Adds a focused Criterion benchmark for
ProfilesDictionaryunique string insertion.The benchmark uses
ProfilesDictionaryand covers 1, 2, 4, and 16 producer threads, so follow-up arena sizing/growth changes have a baseline in the GitLab benchmark job.Why these thread counts?
This benchmark is focused on dictionary string interning. It is not a full end-to-end profiler benchmark.
In dd-trace-py, profile mutation and serialization are guarded by
profile_mtx, but dictionary interning can happen before a sample is added to the profile. This means dictionary insertion can still be concurrent even when profile writes are serialized.What this does not cover
This benchmark does not model every profiler behavior. In particular, it does not cover:
Why only
ProfilesDictionary?I originally tried an additional synthetic benchmark comparing 4 vs 16 shards. Local exploratory results suggested 16 shards helps once there is concurrent insertion:
However, that synthetic comparison also changed total initial hash-table capacity because the capacity was applied per shard. Since the current follow-up keeps the production shard count at 16, this PR stays minimal and only adds the
ProfilesDictionarybenchmark.If we revisit shard count later, we should add a dedicated shard-count benchmark that holds total starting capacity constant across shard counts.
How to test the change?
PROF-14423