-
Notifications
You must be signed in to change notification settings - Fork 751
AI Generated DPL PR v1 #8873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
AI Generated DPL PR v1 #8873
Conversation
|
DCO needs fixing. It would be helpful to rebase to head of master as this a month old. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| for (GridX x = cell_grid.xlo; x < cell_grid.xhi; x++) { | ||
| Pixel* pixel = gridPixel(x, y); | ||
| if (pixel && pixel->is_valid) { | ||
| const int pixel_idx = y.v * row_site_count_.v + x.v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]
| const int pixel_idx = y.v * row_site_count_.v + x.v; | |
| const int pixel_idx = (y.v * row_site_count_.v) + x.v; |
|
|
||
| // We iterate manually to find max to avoid multiple passes or copies | ||
| for(float v : total_area_) { | ||
| if(v > max_area) max_area = v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: use std::max instead of > [readability-use-std-min-max]
| if(v > max_area) max_area = v; | |
| max_area = std::max(v, max_area); |
| if(v > max_area) max_area = v; | ||
| } | ||
| for(float v : total_pins_) { | ||
| if(v > max_pins) max_pins = v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: use std::max instead of > [readability-use-std-min-max]
| if(v > max_pins) max_pins = v; | |
| max_pins = std::max(v, max_pins); |
|
|
||
| // Avoid division by zero | ||
| if (max_area == 0.0f) | ||
| max_area = 1.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: statement should be inside braces [google-readability-braces-around-statements]
| max_area = 1.0f; | |
| if (max_area == 0.0f) { | |
| max_area = 1.0f; | |
| } |
| if (max_area == 0.0f) | ||
| max_area = 1.0f; | ||
| if (max_pins == 0.0f) | ||
| max_pins = 1.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: statement should be inside braces [google-readability-braces-around-statements]
| max_pins = 1.0f; | |
| if (max_pins == 0.0f) { | |
| max_pins = 1.0f; | |
| } |
|
|
||
| // Calculate pixel indices (row-major order) | ||
| const int row_site_count = grid->getRowSiteCount().v; | ||
| const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]
| const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v; | |
| const int orig_pixel_idx = (orig_grid_y.v * row_site_count) + orig_grid_x.v; |
| // Calculate pixel indices (row-major order) | ||
| const int row_site_count = grid->getRowSiteCount().v; | ||
| const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v; | ||
| const int new_pixel_idx = new_grid_y.v * row_site_count + new_grid_x.v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]
| const int new_pixel_idx = new_grid_y.v * row_site_count + new_grid_x.v; | |
| const int new_pixel_idx = (new_grid_y.v * row_site_count) + new_grid_x.v; |
| } | ||
|
|
||
| // Within budget: evaluate combined profit | ||
| double combined_profit = hpwl_delta + congestion_weight_ * congestion_improvement; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]
| double combined_profit = hpwl_delta + congestion_weight_ * congestion_improvement; | |
| double combined_profit = hpwl_delta + (congestion_weight_ * congestion_improvement); |
|
|
||
| // Calculate pixel indices | ||
| const int row_site_count = grid->getRowSiteCount().v; | ||
| const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]
| const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v; | |
| const int orig_pixel_idx = (orig_grid_y.v * row_site_count) + orig_grid_x.v; |
| // Calculate pixel indices | ||
| const int row_site_count = grid->getRowSiteCount().v; | ||
| const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v; | ||
| const int new_pixel_idx = new_grid_y.v * row_site_count + new_grid_x.v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]
| const int new_pixel_idx = new_grid_y.v * row_site_count + new_grid_x.v; | |
| const int new_pixel_idx = (new_grid_y.v * row_site_count) + new_grid_x.v; |
|
From reading just the description:
|
|
clang-format and clang-tidy would also need addressing. |
| // Distribute cell's contribution across its pixels | ||
| const float area_per_pixel | ||
| = cell_area / static_cast<float>(cell_pixel_count); | ||
| const float pins_per_pixel = static_cast<float>(num_pins) | ||
| / static_cast<float>(cell_pixel_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat inaccurate as the cell may fully cover some pixels and partially cover others. Probably not a huge effect.
| const float pins_per_pixel = static_cast<float>(num_pins) | ||
| / static_cast<float>(cell_pixel_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it suffices to cast just one of these.
| // Avoid division by zero | ||
| if (max_area == 0.0f) | ||
| max_area = 1.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for safety but it is an impossible condition and should be an error instead. Likewise max_pins == 0.
| utilization_dirty_ = false; | ||
| } | ||
|
|
||
| void Grid::updateUtilizationMap(Node* node, DbuX x, DbuY y, bool add) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This largely duplicates logic from computeUtilizationMap. Common code should be refactored out.
| // To make this work better without re-normalizing, we could return | ||
| // the raw value scaled by the *old* max, but that's complex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that complex?
| int orig_disp_x, orig_disp_y; | ||
| mgr_->getMaxDisplacement(orig_disp_x, orig_disp_y); | ||
|
|
||
| // Get chip dimensions for unleashing the optimizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird wording - getting the dimensions is pretty prosaic for "unleashing"
| budget_hpwl_ = optimal_hpwl * 1.10; | ||
|
|
||
| mgr_->getLogger()->info(DPL, 908, | ||
| "Profiling complete. Optimal HPWL={:.2f}, Budget HPWL={:.2f} (+10%)", | ||
| optimal_hpwl, budget_hpwl_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from hard-coding 1.10 I wouldn't duplicate that choice in the message when you can easily compute the (+10%) from budget_hpwl_ & optimal_hpwl (eg resilient to change to 1.20).
| // Set dynamic displacement limits based on iteration stage | ||
| if (iteration == 0) { | ||
| // Iteration 1: Unleash the optimizer completely (chip-wide moves allowed) | ||
| mgr_->setMaxDisplacement(chip_width, chip_height); | ||
| mgr_->getLogger()->info(DPL, 921, "Unleashing optimizer: max displacement set to chip dimensions ({}, {})", | ||
| chip_width, chip_height); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bad idea. You have no timing awareness and can make destructive choices (somewhat limited by the HPWL bound). dpo should be viewed a local optimization not a global one.
I don't see any mention of changing max displacement in the PR description. That is a substantial omission.
|
In your testing did you measure changes in wns/tns? |
This PR is one version of a DPL improver which only kicks in meaningfully under high core utilization values.
For validation, it is recommended to tune the hyperparameters (detailed here) on any circuit of choice that struggles with high core utilization at or near the highest core utilization value it completes the flow at.
The expected outcome will be a small but meaningful DRWL reduction of 2-3% in the above scenario. Outliers of up to 8% were observed.
Note : For the purpose of CI, the default PR leaves the alternate flow as "on". It should be merged as an alternate setting which is default "off". This is done for testing CI-level regressions.
We are also aware that the hyperparameters should all, in general, solely reside in TCL and not be hardcoded. Please also use a reasonable variation (2-4 choices each) of hyperparameters.
See also : #8874
--AI Generated content follows--
Feature: Two-Pass Congestion-Aware Global Swap (DPL)
Motivation
In high-density designs (utilization > 85%), standard detailed placement optimizations often degrade local routability while pursuing global wirelength reductions. The legacy
GlobalSwapalgorithm optimized strictly for HPWL (Half-Perimeter Wirelength), frequently moving cells into already congested regions to minimize net length. This "clumping" behavior creates local density hotspots that standard routers cannot resolve, leading to DRC violations and timing degradation due to detour routing.This PR introduces a Guarded Two-Pass Global Swap strategy. Instead of blind HPWL optimization, it employs a "Profile-Then-Optimize" approach: it first identifies the theoretical minimum wirelength (Pass 1), then re-optimizes the design (Pass 2) using a congestion-aware cost function constrained by a wirelength budget derived from the first pass. This ensures that congestion relief does not come at the cost of uncontrolled wirelength regression.
Technical Implementation
1. Utilization Map Infrastructure
A new
UtilizationMapmechanism has been added to theGridclass to quantify local congestion costs.Where
2. Two-Pass Optimization Algorithm
The
DetailedGlobalSwapsolver has been refactored into a state-preserving two-pass workflow:Pass 1: HPWL Profiling
Journalto atomically undo all moves, restoring the exact initial placement state while retaining thePass 2: Guarded Congestion Optimization
Usage & Configuration
This feature is enabled by default within the
improve_placementTCL command.Enabling / Disabling
There is currently no explicit disable flag exposed to TCL, as the two-pass logic is now the standard behavior for
GlobalSwap. To revert to legacy behavior, one would need to modify the C++ source to bypass the profiling pass, though this is not recommended for high-density designs.Hyperparameter Configuration
User-Configurable (TCL)
These flags control the core search heuristics of the swap engine:
-x <float>(Tradeoff): Controls the ratio of "Random Exploration" vs. "Smart" (Wirelength-Optimal) moves.0.2(20% random / 80% smart).0.4or0.5for difficult designs to escape local minima.-t <float>(Tolerance): Convergence threshold for the HPWL optimization loop.0.01(1%).-p <int>(Passes): Number of inner-loop optimization passes per stage.Example:
# High-effort mode for difficult designs improve_placement -x 0.4 -t 0.005Internal Compilation Parameters (
detailed_global.cxx)Advanced tuning requires modifying hardcoded constants:
budget_multipliers:{1.50, 1.25, 1.10, 1.04}. Controls the annealing-like schedule of the wirelength budget.area_weight/pin_weight:0.4/0.6. Weights for the utilization map cost function.user_knob:35.0. Scaling factor that determines how much wirelength we are willing to trade for a unit of congestion relief.Expected Impact
This optimization is specifically targeted at high-density / high-utilization designs (e.g., >85% placement density).