AI Generated DPL PR v1 #8873

AmurG · 2025-11-20T08:42:27Z

This PR is one version of a DPL improver which only kicks in meaningfully under high core utilization values.

For validation, it is recommended to tune the hyperparameters (detailed here) on any circuit of choice that struggles with high core utilization at or near the highest core utilization value it completes the flow at.

The expected outcome will be a small but meaningful DRWL reduction of 2-3% in the above scenario. Outliers of up to 8% were observed.

Note : For the purpose of CI, the default PR leaves the alternate flow as "on". It should be merged as an alternate setting which is default "off". This is done for testing CI-level regressions.

We are also aware that the hyperparameters should all, in general, solely reside in TCL and not be hardcoded. Please also use a reasonable variation (2-4 choices each) of hyperparameters.

Feature: Two-Pass Congestion-Aware Global Swap (DPL)

Motivation

In high-density designs (utilization > 85%), standard detailed placement optimizations often degrade local routability while pursuing global wirelength reductions. The legacy GlobalSwap algorithm optimized strictly for HPWL (Half-Perimeter Wirelength), frequently moving cells into already congested regions to minimize net length. This "clumping" behavior creates local density hotspots that standard routers cannot resolve, leading to DRC violations and timing degradation due to detour routing.

This PR introduces a Guarded Two-Pass Global Swap strategy. Instead of blind HPWL optimization, it employs a "Profile-Then-Optimize" approach: it first identifies the theoretical minimum wirelength (Pass 1), then re-optimizes the design (Pass 2) using a congestion-aware cost function constrained by a wirelength budget derived from the first pass. This ensures that congestion relief does not come at the cost of uncontrolled wirelength regression.

Technical Implementation

1. Utilization Map Infrastructure

A new UtilizationMap mechanism has been added to the Grid class to quantify local congestion costs.

Binning: Reuses the existing detailed placement pixel grid.
Cost Metric: For every pixel $i$, the utilization cost $C_i$ is calculated as a weighted sum of area density and pin density:
$$C_i = W_{area} \cdot \frac{Area_i}{MaxArea} + W_{pin} \cdot \frac{Pins_i}{MaxPins}$$
Where $Area_i$ and $Pins_i$ are the accumulated cell area and pin counts overlapping pixel $i$.
Normalization: The final map is normalized to $[0, 1]$ to provide a consistent cost basis for the optimizer.

2. Two-Pass Optimization Algorithm

The DetailedGlobalSwap solver has been refactored into a state-preserving two-pass workflow:

Pass 1: HPWL Profiling

Executes a standard global swap with infinite budget.
Purpose: To find the Target HPWL ($L_{opt}$), representing the best possible wirelength achievable without congestion constraints.
Rollback: Upon completion, the system utilizes the Journal to atomically undo all moves, restoring the exact initial placement state while retaining the $L_{opt}$ metric.

Pass 2: Guarded Congestion Optimization

Re-runs the optimization from the initial state.
Budget Constraint: Defines a maximum allowable wirelength $L_{budget} = L_{opt} \times M_{stage}$, where $M_{stage}$ is a tightening multiplier schedule (e.g., 1.50 $\to$ 1.04).
Acceptance Criteria: A move is accepted if and only if:
1. The resulting HPWL $\le L_{budget}$ (Hard Constraint).
2. The combined profit $P = \Delta HPWL + W_{cong} \cdot \Delta Congestion > 0$ (Soft Objective).
Dynamics: This allows the solver to accept moves that degrade HPWL (negative $\Delta HPWL$) if they provide significant relief to local congestion ($\Delta Congestion$), provided the global wirelength stays within the calculated budget.

Usage & Configuration

This feature is enabled by default within the improve_placement TCL command.

Enabling / Disabling

There is currently no explicit disable flag exposed to TCL, as the two-pass logic is now the standard behavior for GlobalSwap. To revert to legacy behavior, one would need to modify the C++ source to bypass the profiling pass, though this is not recommended for high-density designs.

Hyperparameter Configuration

User-Configurable (TCL)

These flags control the core search heuristics of the swap engine:

-x <float> (Tradeoff): Controls the ratio of "Random Exploration" vs. "Smart" (Wirelength-Optimal) moves.
- Default: 0.2 (20% random / 80% smart).
- Tuning: Increase to 0.4 or 0.5 for difficult designs to escape local minima.
-t <float> (Tolerance): Convergence threshold for the HPWL optimization loop.
- Default: 0.01 (1%).
-p <int> (Passes): Number of inner-loop optimization passes per stage.

Example:

# High-effort mode for difficult designs
improve_placement -x 0.4 -t 0.005

Internal Compilation Parameters (`detailed_global.cxx`)

Advanced tuning requires modifying hardcoded constants:

budget_multipliers: {1.50, 1.25, 1.10, 1.04}. Controls the annealing-like schedule of the wirelength budget.
area_weight / pin_weight: 0.4 / 0.6. Weights for the utilization map cost function.
user_knob: 35.0. Scaling factor that determines how much wirelength we are willing to trade for a unit of congestion relief.

Expected Impact

This optimization is specifically targeted at high-density / high-utilization designs (e.g., >85% placement density).

Better: Local routability and effective timing in congested regions.
Trade-off: Slight increase in global wirelength compared to a pure HPWL-driven approach, but with significantly improved manufacturability and reduced routing detours.

maliberty · 2025-11-24T16:28:54Z

DCO needs fixing. It would be helpful to rebase to head of master as this a month old.

github-actions

clang-tidy made some suggestions

github-actions · 2025-11-24T16:35:20Z

src/dpl/src/infrastructure/Grid.cpp

+      for (GridX x = cell_grid.xlo; x < cell_grid.xhi; x++) {
+        Pixel* pixel = gridPixel(x, y);
+        if (pixel && pixel->is_valid) {
+          const int pixel_idx = y.v * row_site_count_.v + x.v;


warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]

Suggested change

const int pixel_idx = y.v * row_site_count_.v + x.v;

const int pixel_idx = (y.v * row_site_count_.v) + x.v;

github-actions · 2025-11-24T16:35:20Z

src/dpl/src/infrastructure/Grid.cpp

+
+  // We iterate manually to find max to avoid multiple passes or copies
+  for(float v : total_area_) {
+      if(v > max_area) max_area = v;


warning: use std::max instead of > [readability-use-std-min-max]

Suggested change

if(v > max_area) max_area = v;

max_area = std::max(v, max_area);

github-actions · 2025-11-24T16:35:21Z

src/dpl/src/infrastructure/Grid.cpp

+      if(v > max_area) max_area = v;
+  }
+  for(float v : total_pins_) {
+      if(v > max_pins) max_pins = v;


warning: use std::max instead of > [readability-use-std-min-max]

Suggested change

if(v > max_pins) max_pins = v;

max_pins = std::max(v, max_pins);

github-actions · 2025-11-24T16:35:21Z

src/dpl/src/infrastructure/Grid.cpp

+
+  // Avoid division by zero
+  if (max_area == 0.0f)
+    max_area = 1.0f;


warning: statement should be inside braces [google-readability-braces-around-statements]

Suggested change

max_area = 1.0f;

if (max_area == 0.0f) {

max_area = 1.0f;

}

github-actions · 2025-11-24T16:35:21Z

src/dpl/src/infrastructure/Grid.cpp

+  if (max_area == 0.0f)
+    max_area = 1.0f;
+  if (max_pins == 0.0f)
+    max_pins = 1.0f;


warning: statement should be inside braces [google-readability-braces-around-statements]

Suggested change

max_pins = 1.0f;

if (max_pins == 0.0f) {

max_pins = 1.0f;

}

github-actions · 2025-11-24T16:35:23Z

src/dpl/src/optimization/detailed_global.cxx

+
+          // Calculate pixel indices (row-major order)
+          const int row_site_count = grid->getRowSiteCount().v;
+          const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v;


warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]

Suggested change

const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v;

const int orig_pixel_idx = (orig_grid_y.v * row_site_count) + orig_grid_x.v;

github-actions · 2025-11-24T16:35:23Z

src/dpl/src/optimization/detailed_global.cxx

+          // Calculate pixel indices (row-major order)
+          const int row_site_count = grid->getRowSiteCount().v;
+          const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v;
+          const int new_pixel_idx = new_grid_y.v * row_site_count + new_grid_x.v;


warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]

Suggested change

const int new_pixel_idx = new_grid_y.v * row_site_count + new_grid_x.v;

const int new_pixel_idx = (new_grid_y.v * row_site_count) + new_grid_x.v;

github-actions · 2025-11-24T16:35:23Z

src/dpl/src/optimization/detailed_global.cxx

+    }
+
+    // Within budget: evaluate combined profit
+    double combined_profit = hpwl_delta + congestion_weight_ * congestion_improvement;


warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]

Suggested change

double combined_profit = hpwl_delta + congestion_weight_ * congestion_improvement;

double combined_profit = hpwl_delta + (congestion_weight_ * congestion_improvement);

github-actions · 2025-11-24T16:35:23Z

src/dpl/src/optimization/detailed_global.cxx

+
+        // Calculate pixel indices
+        const int row_site_count = grid->getRowSiteCount().v;
+        const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v;


warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]

Suggested change

const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v;

const int orig_pixel_idx = (orig_grid_y.v * row_site_count) + orig_grid_x.v;

github-actions · 2025-11-24T16:35:23Z

src/dpl/src/optimization/detailed_global.cxx

+        // Calculate pixel indices
+        const int row_site_count = grid->getRowSiteCount().v;
+        const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v;
+        const int new_pixel_idx = new_grid_y.v * row_site_count + new_grid_x.v;


warning: '*' has higher precedence than '+'; add parentheses to explicitly specify the order of operations [readability-math-missing-parentheses]

Suggested change

const int new_pixel_idx = new_grid_y.v * row_site_count + new_grid_x.v;

const int new_pixel_idx = (new_grid_y.v * row_site_count) + new_grid_x.v;

maliberty · 2025-11-24T16:39:41Z

From reading just the description:

The cost metric in UtilizationMap is not mentioned as being updated. This gives a very static view of congestion that will become increasingly less accurate. Also the computation is per pixel which will produce a very uneven map.
Δ Congestion is not defined. I can guess but it would be good to define it
HPWL degrading moves are allowed but no consideration of timing criticality is used. This may degrade critical paths when a non-critical instance could have been moved instead.

maliberty · 2025-11-24T16:40:00Z

clang-format and clang-tidy would also need addressing.

maliberty · 2025-11-24T16:42:17Z

src/dpl/src/infrastructure/Grid.cpp

+    // Distribute cell's contribution across its pixels
+    const float area_per_pixel
+        = cell_area / static_cast<float>(cell_pixel_count);
+    const float pins_per_pixel = static_cast<float>(num_pins)
+                                 / static_cast<float>(cell_pixel_count);


Somewhat inaccurate as the cell may fully cover some pixels and partially cover others. Probably not a huge effect.

maliberty · 2025-11-24T16:45:13Z

src/dpl/src/infrastructure/Grid.cpp

+    const float pins_per_pixel = static_cast<float>(num_pins)
+                                 / static_cast<float>(cell_pixel_count);


it suffices to cast just one of these.

maliberty · 2025-11-24T16:47:01Z

src/dpl/src/infrastructure/Grid.cpp

+  // Avoid division by zero
+  if (max_area == 0.0f)
+    max_area = 1.0f;


Ok for safety but it is an impossible condition and should be an error instead. Likewise max_pins == 0.

maliberty · 2025-11-24T16:48:43Z

src/dpl/src/infrastructure/Grid.cpp

+  utilization_dirty_ = false;
+}
+
+void Grid::updateUtilizationMap(Node* node, DbuX x, DbuY y, bool add)


This largely duplicates logic from computeUtilizationMap. Common code should be refactored out.

maliberty · 2025-11-24T16:51:00Z

src/dpl/src/infrastructure/Grid.cpp

+  // To make this work better without re-normalizing, we could return
+  // the raw value scaled by the *old* max, but that's complex.


Is it that complex?

maliberty · 2025-11-24T16:53:05Z

src/dpl/src/optimization/detailed_global.cxx

+  int orig_disp_x, orig_disp_y;
+  mgr_->getMaxDisplacement(orig_disp_x, orig_disp_y);
+
+  // Get chip dimensions for unleashing the optimizer


Weird wording - getting the dimensions is pretty prosaic for "unleashing"

maliberty · 2025-11-24T16:56:34Z

src/dpl/src/optimization/detailed_global.cxx

+  budget_hpwl_ = optimal_hpwl * 1.10;
+
+  mgr_->getLogger()->info(DPL, 908, 
+                         "Profiling complete. Optimal HPWL={:.2f}, Budget HPWL={:.2f} (+10%)", 
+                         optimal_hpwl, budget_hpwl_);


Aside from hard-coding 1.10 I wouldn't duplicate that choice in the message when you can easily compute the (+10%) from budget_hpwl_ & optimal_hpwl (eg resilient to change to 1.20).

maliberty · 2025-11-24T17:01:55Z

src/dpl/src/optimization/detailed_global.cxx

+    // Set dynamic displacement limits based on iteration stage
+    if (iteration == 0) {
+      // Iteration 1: Unleash the optimizer completely (chip-wide moves allowed)
+      mgr_->setMaxDisplacement(chip_width, chip_height);
+      mgr_->getLogger()->info(DPL, 921, "Unleashing optimizer: max displacement set to chip dimensions ({}, {})", 
+                             chip_width, chip_height);


This is a bad idea. You have no timing awareness and can make destructive choices (somewhat limited by the HPWL bound). dpo should be viewed a local optimization not a global one.

I don't see any mention of changing max displacement in the PR description. That is a substantial omission.

maliberty · 2025-11-24T17:07:19Z

In your testing did you measure changes in wns/tns?

Modified DPL v1

421062c

AmurG mentioned this pull request Nov 20, 2025

AI Generated DPL PR v2 #8874

Open

github-actions bot reviewed Nov 24, 2025

View reviewed changes

maliberty requested changes Nov 24, 2025

View reviewed changes

	const int pixel_idx = y.v * row_site_count_.v + x.v;
	const int pixel_idx = (y.v * row_site_count_.v) + x.v;

	if(v > max_area) max_area = v;
	max_area = std::max(v, max_area);

	if(v > max_pins) max_pins = v;
	max_pins = std::max(v, max_pins);

	const int orig_pixel_idx = orig_grid_y.v * row_site_count + orig_grid_x.v;
	const int orig_pixel_idx = (orig_grid_y.v * row_site_count) + orig_grid_x.v;

	const int new_pixel_idx = new_grid_y.v * row_site_count + new_grid_x.v;
	const int new_pixel_idx = (new_grid_y.v * row_site_count) + new_grid_x.v;

	double combined_profit = hpwl_delta + congestion_weight_ * congestion_improvement;
	double combined_profit = hpwl_delta + (congestion_weight_ * congestion_improvement);

		const float pins_per_pixel = static_cast<float>(num_pins)
		/ static_cast<float>(cell_pixel_count);

		// To make this work better without re-normalizing, we could return
		// the raw value scaled by the old max, but that's complex.

AI Generated DPL PR v1 #8873

Are you sure you want to change the base?

AI Generated DPL PR v1 #8873

Uh oh!

Conversation

AmurG commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Feature: Two-Pass Congestion-Aware Global Swap (DPL)

Motivation

Technical Implementation

1. Utilization Map Infrastructure

2. Two-Pass Optimization Algorithm

Pass 1: HPWL Profiling

Pass 2: Guarded Congestion Optimization

Usage & Configuration

Enabling / Disabling

Hyperparameter Configuration

User-Configurable (TCL)

Internal Compilation Parameters (detailed_global.cxx)

Expected Impact

Uh oh!

maliberty commented Nov 24, 2025

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

maliberty commented Nov 24, 2025

Uh oh!

maliberty commented Nov 24, 2025

Uh oh!

maliberty Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

maliberty Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maliberty Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

maliberty Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

maliberty Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

maliberty Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

maliberty Nov 24, 2025

AmurG commented Nov 20, 2025 •

edited

Loading

Internal Compilation Parameters (`detailed_global.cxx`)

maliberty Nov 24, 2025 •

edited

Loading