Score vs bytes
Track B runs only. Green points are timeline-backed promotions. Red points are explicit rejections. Blue points are measured but neither promoted nor explicitly rejected in the current summary logic.
Lower is better. This plot is generated directly from scorer-backed artifacts in the repository.
Promotion ladder
- 3.62 — 512x384, lanczos/bicubic, crf 23, keyint 32, bframes 4, ref 4
- 3.56 — 448x336, lanczos/bicubic, crf 23, keyint 32, bframes 4, ref 4
- 3.56 — 448x336, lanczos/bicubic, crf 23, keyint 48, bframes 4, ref 4
- 3.54 — 448x336, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4
- 3.33 — 432x324, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4
- 3.25 — 424x318, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4
- 2.20 — 524x394, svtav1-p0, crf 33, keyint 180, film-grain 22, lanczos/bicubic
Useful negative results
- 97.45 — robust_current-av1-524x394-cpu-2026-04-05
- 5.73 — robust_current-roi-two-pass-cpu-2026-04-04
- 4.47 — robust_current-dynamic-main-roi-cpu-2026-04-05
- 3.44 — robust_current-464x348-cpu-2026-04-04
- 3.44 — robust_current-cand-416x312-crf23-g48-b4-r4-cpu-2026-04-05
- 3.43 — robust_current-cand-432x324-crf23-g48-b3-r4-cpu-2026-04-05
- 3.38 — robust_current-cand-432x324-crf23-g64-b4-r4-cpu-2026-04-05
- 3.38 — robust_current-cand-426x320-crf23-g48-b4-r4-cpu-2026-04-06
What to open first
If you only have a minute, open the judges one-pager first. If you need to validate a claim, jump directly to the evidence index or promotion accounting.
Current default reading order:
- Headline metrics
- Score vs bytes
- Promotion ladder
- Useful negative results
- Promotion accounting
Promotion accounting
Rule-faithful values below are local estimates from scorer distortions plus honest bytes. They are not official published scores.
| Run | Scale | Filters | Current workflow | Current bytes | Rule-faithful estimate | Rule-faithful bytes |
|---|---|---|---|---|---|---|
| robust_current-medium23-cpu-2026-04-03 | 512x384 | lanczos/bicubic | 3.62 | 2,819,374 | 3.618 | 2,822,418 |
| robust_current-448x336-medium23-cpu-2026-04-03 | 448x336 | lanczos/bicubic | 3.56 | 1,978,141 | 3.563 | 1,981,185 |
| robust_current-keyint48-cpu-2026-04-04 | 448x336 | lanczos/bicubic | 3.56 | 1,901,606 | 3.562 | 1,904,650 |
| robust_current-lanczos-lanczos-cpu-2026-04-04 | 448x336 | lanczos/lanczos | 3.54 | 1,901,606 | 3.546 | 1,904,650 |
| robust_current-432x324-cpu-2026-04-04 | 432x324 | lanczos/lanczos | 3.33 | 1,781,129 | 3.330 | 1,787,266 |
| robust_current-cand-424x318-crf23-g48-b4-r4-cpu-2026-04-05 | 424x318 | lanczos/lanczos | 3.25 | 1,669,984 | 3.275 | 1,704,163 |
| robust_current-av1-524x394-rgb24-promoted-cpu-2026-04-05 | 524x394 | lanczos/bicubic | 2.20 | 920,457 | 2.228 | 956,424 |
Working theses
- Strong standard-codec AV1 was already viable here; the real blocker was a rawvideo byte-format bug in the inflator.
- BAT00 became useful as a research-only ranking lane, not as a score authority.
- ROI-style multi-stream ideas were informative negative results but too expensive in the tested forms.
- The x265 ladder still matters as evidence: it established a 3.25 local floor before the repaired AV1 jump to 2.20.
Key turning points
- First honest floor — robust_current-baseline-cpu-2026-04-03 (4.06)
- First big win — robust_current-medium23-cpu-2026-04-03 (3.62)
- ROI failure — robust_current-dynamic-main-roi-cpu-2026-04-05 (4.47)
- Current best floor — robust_current-av1-524x394-rgb24-promoted-cpu-2026-04-05 (2.20)
Timeline
| UTC | Type | Summary |
|---|---|---|
| 2026-04-05T06:35:49.299885+00:00 | verification | Fresh local CPU regression confirmed the restored promoted floor still scores 3.33. |
| 2026-04-05T06:35:49.299885+00:00 | research | BAT00 surrogate v2 remained noisy on the full mixed set, but the codec-only subset improved enough to use as a research-only ranking aid. |
| 2026-04-05T07:42:28.847777+00:00 | research | BAT00 codec-only surrogate ranked a small codec shortlist, then the top two local CPU candidates were tested sequentially. |
| 2026-04-05T07:42:28.847777+00:00 | decision | Local CPU test of surrogate-ranked #1 candidate (432x324 / crf23 / g48 / b3 / r4) scored 3.43 and was rejected. |
| 2026-04-05T07:42:28.847777+00:00 | decision | Local CPU test of surrogate-ranked #2 candidate (432x324 / crf23 / g64 / b4 / r4) scored 3.38 and was rejected. |
| 2026-04-05T08:13:15.198319+00:00 | promotion | Local CPU test of surrogate-ranked #3 candidate (424x318 / crf23 / g48 / b4 / r4) scored 3.25 and became the new promoted floor. |
| 2026-04-05T08:48:02.864256+00:00 | decision | Nearby follow-up on the new 424x318 floor (bframes3) scored 3.27 and was rejected. |
| 2026-04-05T09:47:41.998084+00:00 | decision | Nearby follow-up on the 424x318 floor (keyint64) scored 3.26 and was rejected. |
| 2026-04-05T14:29:14.555005+00:00 | decision | Lower-resolution BAT00-ranked follow-up at 416x312 scored 3.44 and was rejected. |
| 2026-04-05T15:10:07.425100+00:00 | decision | BAT00-ranked nearby-scale follow-up at 428x320 scored 3.32 and was rejected. |
| 2026-04-05T16:07:20.968299+00:00 | decision | BAT00-ranked nearby-scale follow-up at 426x320 scored 3.38 and was rejected. |
| 2026-04-05T17:12:42.981688+00:00 | research | Public-style AV1/SVT-AV1 reproduction attempt at 524x394 regressed catastrophically to 97.45 and was rejected. |
| 2026-04-05T18:20:08.762518+00:00 | analysis | AV1 failure root cause was a rawvideo pixel-format mismatch: inflate emitted yuv444p bytes instead of rgb24. Forcing rgb24 recovered the same 524x394 SVT-AV1 recipe from 97.45 to 2.20. |
| 2026-04-05T18:20:08.762518+00:00 | promotion | Canonical robust_current AV1 floor at 524x394 / svtav1-p0 / crf33 / film-grain22 / bicubic / unsharp scored 2.20 and became the new promoted floor. |