Score vs bytes
Track B runs only. Green points are timeline-backed promotions. Red points are explicit rejections. Blue points are measured but neither promoted nor explicitly rejected in the current summary logic.
Lower is better. This plot is generated directly from scorer-backed artifacts in the repository.
Experimental lineage
A branch view of the lab’s real search path: honest x265 improvements, ROI dead ends, the AV1 byte-layout bug, and the repaired AV1 branch that became the new frontier.
This graph is intentionally selective. It highlights the decisions that changed beliefs, not every single run in the ledger.
Why these results move
Teacher-facing pipeline
The evaluator does not “watch” the clip like a person. It resizes and judges task behavior. That makes bitrate placement more important than visual polish.
PoseNet sees both frames; SegNet only sees the last frame in each pair. That asymmetry is why tiny encoding and pipeline bugs can move the score more than they would in a human-facing benchmark.
AV1 mechanism
The current honest floor is 2.19 at 864,455 bytes.
Against the first honest baseline (4.06 at 3,735,828 bytes), the lab kept reducing bytes while preserving enough task signal to keep score moving down.
The bug that changed everything
The codec recipe was not the real blocker. The raw byte layout was.
The failed path emitted rawvideo as yuv444p bytes. The repaired path forces rgb24 bytes, which is what the evaluator expects when it reads the raw frame buffer.
Promotion ladder
- 3.62 — 512x384, lanczos/bicubic, crf 23, keyint 32, bframes 4, ref 4
- 3.56 — 448x336, lanczos/bicubic, crf 23, keyint 32, bframes 4, ref 4
- 3.56 — 448x336, lanczos/bicubic, crf 23, keyint 48, bframes 4, ref 4
- 3.54 — 448x336, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4
- 3.33 — 432x324, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4
- 3.25 — 424x318, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4
- 2.20 — 524x394, svtav1-p0, crf 33, keyint 180, film-grain 22, lanczos/bicubic
- 2.19 — 524x394, svtav1-p0, crf 34, keyint 180, film-grain 22, lanczos/bicubic
Useful negative results
- 97.45 — robust_current-av1-524x394-cpu-2026-04-05
- 5.73 — robust_current-roi-two-pass-cpu-2026-04-04
- 4.47 — robust_current-dynamic-main-roi-cpu-2026-04-05
- 3.44 — robust_current-464x348-cpu-2026-04-04
- 3.44 — robust_current-cand-416x312-crf23-g48-b4-r4-cpu-2026-04-05
- 3.43 — robust_current-cand-432x324-crf23-g48-b3-r4-cpu-2026-04-05
- 3.38 — robust_current-cand-432x324-crf23-g64-b4-r4-cpu-2026-04-05
- 3.38 — robust_current-cand-426x320-crf23-g48-b4-r4-cpu-2026-04-06
What to open first
If you only have a minute, open the judges one-pager first. If you need to validate a claim, jump directly to the evidence index or promotion accounting.
Current default reading order:
- Headline metrics
- Score vs bytes
- Experimental lineage
- Why these results move
- Promotion ladder
- Promotion review
- Experiment journal
Promotion accounting
Rule-faithful values below are local estimates from scorer distortions plus honest bytes. They are not official published scores.
| Run | Scale | Filters | Current workflow | Current bytes | Rule-faithful estimate | Rule-faithful bytes |
|---|---|---|---|---|---|---|
| robust_current-medium23-cpu-2026-04-03 | 512x384 | lanczos/bicubic | 3.62 | 2,819,374 | 3.618 | 2,822,418 |
| robust_current-448x336-medium23-cpu-2026-04-03 | 448x336 | lanczos/bicubic | 3.56 | 1,978,141 | 3.563 | 1,981,185 |
| robust_current-keyint48-cpu-2026-04-04 | 448x336 | lanczos/bicubic | 3.56 | 1,901,606 | 3.562 | 1,904,650 |
| robust_current-lanczos-lanczos-cpu-2026-04-04 | 448x336 | lanczos/lanczos | 3.54 | 1,901,606 | 3.546 | 1,904,650 |
| robust_current-432x324-cpu-2026-04-04 | 432x324 | lanczos/lanczos | 3.33 | 1,781,129 | 3.330 | 1,787,266 |
| robust_current-cand-424x318-crf23-g48-b4-r4-cpu-2026-04-05 | 424x318 | lanczos/lanczos | 3.25 | 1,669,984 | 3.275 | 1,704,163 |
| robust_current-av1-524x394-rgb24-promoted-cpu-2026-04-05 | 524x394 | lanczos/bicubic | 2.20 | 920,457 | 2.228 | 956,424 |
| robust_current-av1-524x394-crf34-promoted-cpu-2026-04-06 | 524x394 | lanczos/bicubic | 2.19 | 864,455 | 2.215 | 900,954 |
Working theses
- Strong standard-codec AV1 was already viable here; the real blocker was a rawvideo byte-format bug in the inflator.
- BAT00 became useful as a research-only ranking lane, not as a score authority.
- ROI-style multi-stream ideas were informative negative results but too expensive in the tested forms.
- The x265 ladder still matters as evidence: it established a 3.25 local floor before the repaired AV1 branch advanced through 2.20 to 2.19.
Key turning points
- First honest floor — robust_current-baseline-cpu-2026-04-03 (4.06)
- First big win — robust_current-medium23-cpu-2026-04-03 (3.62)
- ROI failure — robust_current-dynamic-main-roi-cpu-2026-04-05 (4.47)
- Current best floor — robust_current-av1-524x394-crf34-promoted-cpu-2026-04-06 (2.19)
Timeline
| UTC | Type | Summary |
|---|---|---|
| 2026-04-05T08:48:02.864256+00:00 | decision | Nearby follow-up on the new 424x318 floor (bframes3) scored 3.27 and was rejected. |
| 2026-04-05T09:47:41.998084+00:00 | decision | Nearby follow-up on the 424x318 floor (keyint64) scored 3.26 and was rejected. |
| 2026-04-05T14:29:14.555005+00:00 | decision | Lower-resolution BAT00-ranked follow-up at 416x312 scored 3.44 and was rejected. |
| 2026-04-05T15:10:07.425100+00:00 | decision | BAT00-ranked nearby-scale follow-up at 428x320 scored 3.32 and was rejected. |
| 2026-04-05T16:07:20.968299+00:00 | decision | BAT00-ranked nearby-scale follow-up at 426x320 scored 3.38 and was rejected. |
| 2026-04-05T17:12:42.981688+00:00 | research | Public-style AV1/SVT-AV1 reproduction attempt at 524x394 regressed catastrophically to 97.45 and was rejected. |
| 2026-04-05T18:20:08.762518+00:00 | analysis | AV1 failure root cause was a rawvideo pixel-format mismatch: inflate emitted yuv444p bytes instead of rgb24. Forcing rgb24 recovered the same 524x394 SVT-AV1 recipe from 97.45 to 2.20. |
| 2026-04-05T18:20:08.762518+00:00 | promotion | Canonical robust_current AV1 floor at 524x394 / svtav1-p0 / crf33 / film-grain22 / bicubic / unsharp scored 2.20 and became the new promoted floor. |
| 2026-04-06T04:41:42.160831+00:00 | analysis | CRF34 AV1 neighborhood probe cut bytes by 56,002 (6.08%) versus the 2.20 CRF33 AV1 floor, while pose and seg distortion rose slightly; net score improved by 0.01 to 2.19. |
| 2026-04-06T04:41:42.160831+00:00 | promotion | Canonical robust_current AV1 floor at 524x394 / svtav1-p0 / crf34 / film-grain22 / bicubic / unsharp scored 2.19 and replaced the prior 2.20 CRF33 floor. |
| 2026-04-06T05:26:53.551355+00:00 | analysis | CRF35 was estimated to maybe land around 2.17-2.20 if byte savings still dominated, but it regressed to 2.21. Bytes fell by 56,419 (6.53%), yet pose and seg rose too far. |
| 2026-04-06T05:26:53.551355+00:00 | decision | One-axis AV1 probe at crf35 scored 2.21 and was rejected; the 2.19 crf34 floor remains canonical. |
| 2026-04-06T06:13:03.889813+00:00 | analysis | Unsharp 0.30 was estimated to maybe land around 2.18-2.20 by reducing postfilter aggression, but it stayed at 864,455 bytes and regressed to 2.20. Pose and seg both worsened slightly. |
| 2026-04-06T06:13:03.889813+00:00 | decision | One-axis AV1 probe with lighter unsharp (0.30) scored 2.20 and was rejected; the 2.19 unsharp 0.35 floor remains canonical. |