Scoring Criteria

Teams will be scored as follows:

After running each team's submission on each benchmark in the suite of hidden evaluation benchmarks, compute a score according to:
- Benchmark Score = α - (0.1 * α)β - (0.1 * α)γ; where
  - α = ΔFmax Improvement (in MHz)
  - β = OpenRouter Cost (in $USD)
  - γ = (1/3600) * Wall Clock Runtime (in seconds)
- Higher scores are better, if the calculation above renders a result less than zero, the score will be zero.
- If a result fails to validate (the design is not fully placed and routed without errors or simulation uncovers divergent logical behavior), the score for that benchmark will be zero.
For each benchmark rank all teams based on their score
- If multiple teams achieve an identical score (e.g. inf) assign the same rank to each
For each team compute the arithmetic mean of all rankings
- Higher mean ranking is better
Sort teams by mean ranking, and assign prizes in ascending order

ℹ️ NOTE
The contest organizers reserve the right to disqualify poorly performing submissions.

For example, if a contestant submission improves a particular design by 50.0 MHz, uses $0.25 USD in OpenRouter tokens, and runs in 1200 seconds, the total score would be:

α = 50.0, β = 0.25, γ = 1200/3600

Benchmark Score = 50.0 - (0.1 * 50.0) * 0.25 - (0.1 * 50) * 1200/3600 = 47.083

Over time, we plan to release a number of additional public benchmarks on which all competing submissions will be evaluated. Contestants will also be evaluated on a set of hidden benchmarks which will not be made public until after the contest has concluded.

Runtime Limit

Since testing and validation will occur on AWS instances, we will limit runtime to 1 hour of wall clock time per benchmark on the contest runtime environment. After 1 hour expires, the last solution design generated by the team's submission will be calculated and validated. Teams should update the best solution they've found on the output DCP filename location as they go.

OpenRouter Cost Limit

For each team's submission, a new API key provisioned with at least $1.00 USD per benchmark will be allocated for the entire evaluation. Teams cannot provide their own API keys to enable additional spend beyond this limit.

Scorecard

After each evaluation run, teams receive a detailed scorecard with all of the per-run and per-benchmark fields used to compute their score. See the Scorecard Reference for a full description of every field reported.