perf: disable find_unused_parameters for faster DDP training#68
Open
perf: disable find_unused_parameters for faster DDP training#68
Conversation
Problem: find_unused_parameters=True causes DDP to traverse the autograd graph after every forward pass to detect unused parameters. This adds CPU overhead and can reduce comm/compute overlap. Analysis: Code review confirms this training loop does NOT have unused parameters: - All backbone outputs flow to loss via pfc() for all data strategies (residual, frame sampling, collage) - Branching in the training loop is about input shape/indexing, not skipping parameterized modules - The only 'optional' module is the pooling head (gated by use_head), but training code always uses pooler_output/head_output static_graph=True is already set, which: - Caches the used/unused parameter set after warmup - Mitigates overhead if find_unused_parameters was needed - But the cleanest performance path is find_unused_parameters=False Testing: If this causes 'Expected to have finished reduction' errors: 1. Re-enable find_unused_parameters=True 2. Debug with TORCH_DISTRIBUTED_DEBUG=INFO to identify unused params
3b0d868 to
f7e7781
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
find_unused_parametersin DDP wrapper for performance improvementProblem
find_unused_parameters=Trueadds overhead:Analysis
Code review confirms this training loop does NOT have unused parameters:
pfc()for all strategiesWhat find_unused_parameters does
If no params are unused, this traversal is pure overhead.
What static_graph=True does
Caches the parameter usage pattern after warmup, mitigating some overhead.
But if no params are unused,
find_unused_parameters=Falseis still cleaner.Testing Instructions
If this change causes errors:
The debug output will show which parameters didn't receive gradients.
Rollback
If issues occur, simply change line to:
Files Changed
training/train.py- DDP configuration with documentation