Sponsored by
; contest hosted at the FPL'26 conference.
Hosted on GitHub Pages using the Dinky theme
One major way to improve your agent is by creating new backend optimizations or "recipes." In fact, we hope that you focus most of your development effort in this area ahead of time so that the LLM resources at runtime only have to choose from the available set of optimization options rather than attempting to invent them on-the-fly (which would be both expensive in both time expended and tokens). Using AI coding agents for development is fully allowed and encouraged to help you in building out these recipes.
To help you get started, here is a simple pattern to follow that can provide success when tackling design optimization.
Physical optimizations can follow this simple four-step cycle:
| Step | Goal | Typical Tools |
|---|---|---|
| 1. Identify | Pick a specific physical optimization target | Domain knowledge, literature review |
| 2. Analyze | Build a tool to find where this optimization applies | Vivado timing reports, RapidWright analysis scripts |
| 3. Optimize | Implement the transformation | RapidWright APIs, Vivado ECO commands |
| 4. Measure | Quantify the impact on Fmax | Vivado report_timing_summary |
The cycle is meant to be iterated: measure results, refine the analysis heuristics, tune the optimization parameters, and repeat. Different benchmarks will exhibit different optimization opportunites and will be layered like an onion. When you solve some of the outer ones, there will be different optimization opportunities underneath.
After place-and-route, some cells on critical paths may be placed far from their ideal location. (Perhaps the first critical paths encountered in a design would not normally experience this, however, after substantial progress is made in optimizing the first set of critical paths, other paths that become critical may experience this). This causes the router to take long detours, consuming timing margin. If we can detect these cases and surgically move cells closer to the centroid of their connections, the router can find shorter paths and timing improves.
Here is our heuristic: A critical path is sub-optimal due to placement when a net segment's routed path length (through actual routing PIPs) is significantly longer than the Manhattan distance between its source and sink tiles. We call this ratio the detour ratio:
detour_ratio = routed_path_length / manhattan_distance
A detour ratio of 1.0 means the route is perfectly direct. Ratios above ~2.0 suggest the cell may benefit from re-placement. The cell should only be moved if the surrounding path segments have adequate slack to absorb any perturbation.
We implement this as an MCP tool (analyze_net_detour) in RapidWrightMCP/rapidwright_tools.py so the agent can call it directly.
The core idea is to walk backward through a net's PIPs from a sink pin to the source, summing tile-to-tile Manhattan distances to get the routed path length. The detour ratio is then:
detour_ratio = routed_path_length / manhattan_distance(source_tile, sink_tile)
In pseudo-code:
compute_routed_path_length(net, sink_pin):
build node_map from net's PIPs (end_node → start_node)
walk backward from sink_pin's node to source_pin's node
sum tile Manhattan distances at each hop
return total length
detour_ratio(net, sink_pin):
manhattan = source_tile.getManhattanDistance(sink_tile)
return compute_routed_path_length(net, sink_pin) / manhattan
See _compute_routed_path_length() and _detour_ratio() in rapidwright_tools.py for the full implementation.
The analyze_net_detour tool performs a cell-centric analysis. For each interior cell on the critical path, it examines both the incoming net (feeding the cell) and the outgoing net (driven by it) to compute the worst-case detour ratio across the source pin and all sink pins. A high detour ratio may be indicative that the cell is poorly placed.
The input is a pin-path list as produced by extract_critical_path_pins:
["src_ff/Q", "lut1/I2", "lut1/O", "lut2/I0", "lut2/O", "dst_ff/D"]
Consecutive pins from the same cell (e.g. lut1/I2, lut1/O) identify the cell's data path:
analyze_net_detour(pin_paths, detour_threshold):
for each pin_path:
resolve each pin name → EDIFHierPortInst
group consecutive pins belonging to the same cell → (in_pin, out_pin)
for each pin in (in_pin, out_pin):
get the physical net and SitePinInst
if output pin → check detour to all sink pins (max)
if input pin → check detour to this pin
record max detour ratio
return cells where max_detour_ratio > threshold, sorted descending
For output pins, we iterate over all sink pins because the SitePinInst is the net's source — checking it against itself would yield zero distance. Taking the max across sinks catches cells whose outgoing net has a long detour.
See analyze_net_detour() in rapidwright_tools.py for the full implementation.
The analysis requires data from both tools:
extract_critical_path_pins — extracts ordered pin paths from the timing reportreport_timing_summary — provides baseline WNS/TNS and per-path slackanalyze_net_detour — compares Manhattan distance to routed path lengthThe optimize_cell_placement MCP tool moves candidate cells to the centroid of their connections:
optimize_cell_placement(cell_names):
for each cell:
1. Collect tile locations of all connected pins
2. Compute centroid (ideal placement) using ECOPlacementHelper
3. Unplace cell and unroute its nets
4. Spiral outward from centroid to find an empty compatible site
5. Place cell at new site and re-route intra-site wiring
The key steps in more detail:
ECOPlacementHelper.getCentroidOfPoints() computes the arithmetic mean of all connected pin tile coordinates and snaps to the nearest SLICE site.ECOPlacementHelper.spiralOutFrom() iterates neighboring sites outward until an empty compatible site is found.DesignTools.fullyUnplaceCell() cleanly removes the cell's physical placement; design.placeCell() and siteInst.routeSite() establish the new placement with correct intra-site routing.net.unroute() clears PIPs so Vivado's route_design will incrementally re-route only the affected nets.See optimize_cell_placement() and _get_cell_physical_nets() in rapidwright_tools.py for the full implementation.
After re-placing cells in RapidWright, write the modified checkpoint and use Vivado to re-route and re-time:
rapidwright: write_checkpoint("optimized.dcp")
vivado: open_checkpoint("optimized.dcp")
vivado: route_design ← re-routes only the unrouted nets
vivado: report_timing_summary ← compare new WNS to baseline
Because optimize_cell_placement calls net.unroute() on every net connected to the moved cells, Vivado's route_design will incrementally re-route only those nets—the rest of the design stays intact.
If the optimization does not yield any benefit, the path of exploration can be abandoned or further explored to see if compounding multiple optimizations together can yield a larger benefit. This decision is entirely up to the team for implementation. Another idea to consider would be to batch optimizations in RapidWright (applying multiple optimizations at once) and then pushing the result back to Vivado to amortize the roundtrip cost of re-loading the DCP in Vivado.
There are many possible variations of this optimization. One other idea could be to replicate the cell (instead of re-placing it) in the new location. This could potentially be less invasive and would not disturb any existing closed timing routes.
Here is the complete sequence of MCP tool calls an agent would make:
┌─── Baseline ──────────────────────────────────────────────────────────┐
│ 1. vivado: open_checkpoint(input.dcp) │
│ 2. vivado: report_timing_summary → baseline WNS │
│ 3. vivado: extract_critical_path_pins(num_paths=10) │
└───────────────────────────────────────────────────────────────────────┘
┌─── Analyze ───────────────────────────────────────────────┐
│ 4. rapidwright: read_checkpoint(input.dcp) │
│ 5. rapidwright: analyze_net_detour(critical_paths) │
│ → returns ranked candidates with detour ratios │
└───────────────────────────────────────────────────────────┘
┌─── Optimize ──────────────────────────────────────────────┐
│ 6. rapidwright: optimize_cell_placement(candidate_cells) │
│ 7. rapidwright: write_checkpoint(optimized.dcp) │
└───────────────────────────────────────────────────────────┘
┌─── Measure ───────────────────────────────────────────────┐
│ 8. vivado: open_checkpoint(optimized.dcp) │
│ 9. vivado: route_design │
│ 10. vivado: report_timing_summary → new WNS │
│ Compare new WNS to baseline — did Fmax improve? │
└───────────────────────────────────────────────────────────┘
The benchmark vexriscv_re-place_2025.1.dcp has a critical path with a deliberately misplaced LUT2 that the recipe can fix. Here is a complete Python script that runs all four steps. It assumes the DCP is in the fpl26_contest_benchmarks/ directory:
#!/usr/bin/env python3
"""End-to-end cell re-placement optimization example."""
import sys, json, os, re
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "VivadoMCP"))
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "RapidWrightMCP"))
import vivado_mcp_server as vivado
import rapidwright_tools as rw
DCP = "fpl26_contest_benchmarks/vexriscv_re-place_2025.1.dcp"
OPT_DCP = "vexriscv_optimized.dcp"
CONTEST_CLOCK = "clk_fpl26contest"
def get_fmax():
"""Return (wns, fmax_mhz) for the contest clock."""
result = vivado.run_tcl_command(
f"set p [get_timing_paths -max_paths 1 -group {CONTEST_CLOCK}]; "
"if {[llength $p] > 0} {get_property SLACK $p} else {puts 0.0}",
timeout=60,
)
m = re.search(r"[-]?\d+\.\d+", result)
wns = float(m.group()) if m else None
result = vivado.run_tcl_command(
f"get_property PERIOD [get_clocks {CONTEST_CLOCK}]", timeout=60
)
m = re.search(r"\d+\.\d+", result)
period = float(m.group()) if m else None
if wns is not None and period is not None:
fmax = 1000.0 / (period - wns)
else:
fmax = None
return wns, period, fmax
# ── Step 1: Baseline ────────────────────────────────────────────────
print("=" * 60)
print("Step 1 Vivado baseline")
print("=" * 60)
vivado.start_vivado()
vivado.run_tcl_command(f"open_checkpoint }", timeout=300)
baseline_wns, clk_period, baseline_fmax = get_fmax()
print(f" Clock period: {clk_period} ns")
print(f" Baseline WNS: {baseline_wns} ns")
print(f" Baseline Fmax: {baseline_fmax:.2f} MHz")
pins_json = vivado.extract_critical_path_pins(num_paths=10)
critical_paths = json.loads(pins_json)
print(f" Extracted {len(critical_paths)} critical path pin lists")
vivado.cleanup_vivado()
# ── Step 2: Analyze ─────────────────────────────────────────────────
print("\n" + "=" * 60)
print("Step 2 RapidWright analysis")
print("=" * 60)
rw.initialize_rapidwright()
rw.read_checkpoint(DCP)
analysis = rw.analyze_net_detour(
critical_paths_data=critical_paths, detour_threshold=2.0
)
candidates = analysis.get("candidates", [])
print(f" Cells analyzed: {analysis['cells_analyzed']}")
print(f" Candidates (detour > 2.0): {len(candidates)}")
for c in candidates[:5]:
print(f" {str(c['cell']):55s} ratio={c['max_detour_ratio']}")
# Filter to unique cells on the worst path(s) — paths 1 and 2
worst_path_cells = list(set(
str(c["cell"]) for c in candidates if c["path"] <= 2
))
print(f"\n Targeting {len(worst_path_cells)} cells on paths 1-2:")
for name in worst_path_cells:
print(f" {name}")
# ── Step 3: Optimize ────────────────────────────────────────────────
print("\n" + "=" * 60)
print("Step 3 RapidWright optimization")
print("=" * 60)
opt_result = rw.optimize_cell_placement(cell_names=worst_path_cells)
for r in opt_result.get("results", []):
print(f" {r['cell']}: {r['status']} — {r['message']}")
rw.write_checkpoint(OPT_DCP)
print(f" Wrote {OPT_DCP}")
# ── Step 4: Measure ─────────────────────────────────────────────────
print("\n" + "=" * 60)
print("Step 4 Vivado verification")
print("=" * 60)
vivado.start_vivado()
vivado.run_tcl_command(f"open_checkpoint }", timeout=300)
vivado.run_tcl_command("route_design", timeout=600)
route_status = vivado.run_tcl_command(
"report_route_status -return_string", timeout=60
)
errors = re.search(
r"# of nets with routing errors.*?:\s+(\d+)", route_status
)
error_count = int(errors.group(1)) if errors else -1
new_wns, _, new_fmax = get_fmax()
print(f" Routing errors: {error_count}")
print(f" Baseline WNS: {baseline_wns} ns → Fmax {baseline_fmax:.2f} MHz")
print(f" Optimized WNS: {new_wns} ns → Fmax {new_fmax:.2f} MHz")
if new_fmax is not None and baseline_fmax is not None:
delta = new_fmax - baseline_fmax
print(f" Fmax improvement: {delta:+.2f} MHz")
vivado.cleanup_vivado()
On the vexriscv_re-place_2025.1.dcp benchmark this moves the misplaced LUT2 from SLICE_X115Y2 to SLICE_X111Y17 and improves Fmax from 310 MHz to 350 MHz — a gain of +40 MHz.
The cell re-placement example above is just one of many possible optimization recipes. We encourage contestants to identify new opportunities and build their own. Here are some ideas to get started:
optimize_lut_input_cone MCP tool.For each idea, follow the same four-step recipe:
report_timing_summaryRemember the guidelines for choosing between RapidWright and Vivado for different tasks. In general: use RapidWright for fast analysis, placement modifications, and netlist ECOs; use Vivado for routing (route_design) and authoritative timing (report_timing_summary).