-
Notifications
You must be signed in to change notification settings - Fork 3k
[perf] feat: add optional stage selection in discrete mode for NPUProfiler #4490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable performance feature by allowing selective profiling of different stages ('roles') in the NPU profiler's discrete mode. This will help reduce data volume and parsing time. The implementation is mostly well-done, with corresponding updates to documentation and worker annotations. However, there is a critical inconsistency in the configuration naming (stages vs. roles) that will prevent the feature from working. I've also included a suggestion to improve the maintainability of the configuration validation logic. Addressing these points will make this a solid contribution.
6cfc85b to
748b7e7
Compare
0aff344 to
ecaf261
Compare
| npu: | ||
| discrete: False | ||
| # rollout & ref follow actor settings | ||
| # rollout & ref follow actor settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unnecessary diff
| discrete: True | ||
| # rollout & ref follow actor settings | ||
| roles: ["all"] | ||
| # rollout & ref follow actor settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
tardis-key
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- rename 'role'
- explain the necessity of stage filtering on the premise that role filtering is already available
- gpu and npu test
b043570 to
fd9064c
Compare
fd9064c to
c3948f7
Compare
tardis-key
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- rename 'role'
- explain the necessity of stage filtering on the premise that role filtering is already available
- gpu and npu test
What does this PR do?
Currently, whether in end-to-end mode or discrete mode, all roles are fully collected. As the sequence length continues to increase, the volume of collected data becomes large, leading to slow parsing. Therefore, we introduce a new feature in the NPU Profiler that allows optional role selection in discrete mode, enabling quick collection of specific roles.
We have added a new roles parameter in npu_profile.yaml to specify the roles to be collected. The currently supported options are: all, rollout_generate, actor_compute_log_prob, actor_update and ref_compute_log_prob. Setting roles to ["all"] means all roles will be collected. Other options can be freely combined, for example: ["actor_update", "ref_compute_log_prob"]
to close issue #4400
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)