comma-lab

Compression frontier.
Evidence intact.

A scorer-backed compression lab for the comma video challenge. The site tracks the current honest floor, preserves informative failures, and packages the strongest writeup assets without blurring authority boundaries.

Best Track B current_workflow: 2.19
Best bytes: 864,455
Promotions: 8
Rejections: 14
Best Track B current_workflow score
2.19
Authoritative local CPU scorer-backed floor.
Best Track B current_workflow bytes
864,455
Published-path byte burden for the promoted floor.
Latest measured Track B score
3.33
Latest run can differ from the promoted floor.
Track A current_workflow
0.00
Exploit lane, kept separate from honest promotion logic.

Score vs bytes

Track B runs only. Green points are timeline-backed promotions. Red points are explicit rejections. Blue points are measured but neither promoted nor explicitly rejected in the current summary logic.

robust_current-baseline-cpu-2026-04-03 | score=4.06 | bytes=3735828robust_current-medium21-cpu-2026-04-03 | score=4.74 | bytes=5005390robust_current-medium23-cpu-2026-04-03 | score=3.62 | bytes=2819374robust_current-slow22-cpu-2026-04-03 | score=4.13 | bytes=3812776robust_current-448x336-medium23-cpu-2026-04-03 | score=3.56 | bytes=1978141robust_current-576x432-medium23-cpu-2026-04-03 | score=4.26 | bytes=3868552robust_current-keyint24-cpu-2026-04-04 | score=3.64 | bytes=2018006robust_current-keyint48-cpu-2026-04-04 | score=3.56 | bytes=1901606robust_current-keyint64-cpu-2026-04-04 | score=3.61 | bytes=1862992robust_current-bicubic-bicubic-cpu-2026-04-04 | score=3.67 | bytes=1829217robust_current-lanczos-lanczos-cpu-2026-04-04 | score=3.54 | bytes=1901606robust_current-bframes3-ref4-cpu-2026-04-04 | score=3.57 | bytes=2021782robust_current-bframes5-ref4-cpu-2026-04-04 | score=3.71 | bytes=1897819robust_current-bframes4-ref5-cpu-2026-04-04 | score=3.55 | bytes=1894366robust_current-roi-two-pass-cpu-2026-04-04 | score=5.73 | bytes=1472589robust_current-432x324-cpu-2026-04-04 | score=3.33 | bytes=1781129robust_current-464x348-cpu-2026-04-04 | score=3.44 | bytes=2139211robust_current-dynamic-main-roi-cpu-2026-04-05 | score=4.47 | bytes=2660388robust_current-cand-432x324-crf23-g48-b3-r4-cpu-2026-04-05 | score=3.43 | bytes=1898751robust_current-cand-432x324-crf23-g64-b4-r4-cpu-2026-04-05 | score=3.38 | bytes=1753611robust_current-cand-424x318-crf23-g48-b4-r4-cpu-2026-04-05 | score=3.25 | bytes=1669984robust_current-cand-424x318-crf23-g48-b3-r4-cpu-2026-04-05 | score=3.27 | bytes=1776026robust_current-cand-424x318-crf23-g64-b4-r4-cpu-2026-04-06 | score=3.26 | bytes=1646886robust_current-cand-416x312-crf23-g48-b4-r4-cpu-2026-04-05 | score=3.44 | bytes=1573803robust_current-cand-428x320-crf23-g48-b4-r4-cpu-2026-04-05 | score=3.32 | bytes=1741924robust_current-cand-426x320-crf23-g48-b4-r4-cpu-2026-04-06 | score=3.38 | bytes=1766456robust_current-av1-524x394-cpu-2026-04-05 | score=97.45 | bytes=920457robust_current-av1-524x394-rgb24-promoted-cpu-2026-04-05 | score=2.2 | bytes=920457robust_current-av1-524x394-crf34-promoted-cpu-2026-04-06 | score=2.19 | bytes=864455robust_current-av1-524x394-crf35-rejected-cpu-2026-04-06 | score=2.21 | bytes=808036robust_current-av1-524x394-unsharp030-rejected-cpu-2026-04-06 | score=2.2 | bytes=864455robust_current-av1-524x394-filmgrain0-rejected-cpu-2026-04-06 | score=3.33 | bytes=719096

Lower is better. This plot is generated directly from scorer-backed artifacts in the repository.

Experimental lineage

A branch view of the lab’s real search path: honest x265 improvements, ROI dead ends, the AV1 byte-layout bug, and the repaired AV1 branch that became the new frontier.

x265 / earlier promotions current best floor diagnostic bug node rejection / failed branch
robust_current-baseline-cpu-2026-04-03 | score=4.06 | bytes=3,735,828baseline4.06 · 512x384, lanczos/bicubic, crf 22, keyint 32, bframes 4, ref 4robust_current-medium23-cpu-2026-04-03 | score=3.62 | bytes=2,819,374first big win3.62 · 512x384, lanczos/bicubic, crf 23, keyint 32, bframes 4, ref 4robust_current-448x336-medium23-cpu-2026-04-03 | score=3.56 | bytes=1,978,141448x3363.56 · 448x336, lanczos/bicubic, crf 23, keyint 32, bframes 4, ref 4robust_current-lanczos-lanczos-cpu-2026-04-04 | score=3.54 | bytes=1,901,606lanczos/lanczos3.54 · 448x336, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4robust_current-432x324-cpu-2026-04-04 | score=3.33 | bytes=1,781,129tiny-resolution win3.33 · 432x324, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4robust_current-cand-424x318-crf23-g48-b4-r4-cpu-2026-04-05 | score=3.25 | bytes=1,669,984x265 3.25 floor3.25 · 424x318, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4robust_current-dynamic-main-roi-cpu-2026-04-05 | score=4.47 | bytes=2,660,388dynamic ROI failed4.47 · 432x324, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4robust_current-cand-428x320-crf23-g48-b4-r4-cpu-2026-04-05 | score=3.32 | bytes=1,741,924nearby scale reject3.32 · 428x320, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4robust_current-av1-524x394-cpu-2026-04-05 | score=97.45 | bytes=920,457AV1 bug97.45 · 524x394, svtav1-p0, crf 33, keyint 180, film-grain 22, lanczos/bicubicrobust_current-av1-524x394-crf34-promoted-cpu-2026-04-06 | score=2.19 | bytes=864,455current best2.19 · 524x394, svtav1-p0, crf 34, keyint 180, film-grain 22, lanczos/bicubic

This graph is intentionally selective. It highlights the decisions that changed beliefs, not every single run in the ledger.

Why these results move

Teacher-facing pipeline

The evaluator does not “watch” the clip like a person. It resizes and judges task behavior. That makes bitrate placement more important than visual polish.

PoseNet sees both frames; SegNet only sees the last frame in each pair. That asymmetry is why tiny encoding and pipeline bugs can move the score more than they would in a human-facing benchmark.

2-frame sample resize 512×384 PoseNet: both frames SegNet: last frame

AV1 mechanism

The current honest floor is 2.19 at 864,455 bytes.

Against the first honest baseline (4.06 at 3,735,828 bytes), the lab kept reducing bytes while preserving enough task signal to keep score moving down.

source HEVC 45% lanczos SVT-AV1 film-grain 22 bicubic + unsharp rgb24 rawvideo

The bug that changed everything

The codec recipe was not the real blocker. The raw byte layout was.

The failed path emitted rawvideo as yuv444p bytes. The repaired path forces rgb24 bytes, which is what the evaluator expects when it reads the raw frame buffer.

Broken pathrawvideo bytes looked like yuv444p
same codec recipe, measured 97.45
Fixed pathrawvideo bytes forced to rgb24
same codec recipe, measured 2.20

Local frontier microscope

This is the part of the search space that matters most right now. The AV1 floor improved once at CRF34, then lost when pushed harder on compression (CRF35) and when softened on reconstruction (unsharp 0.30). That shape is exactly what a real lab wants: not just a best point, but a believable knee.

RunChanged axisScoreBytesΔ scoreΔ bytesVerdict
robust_current-av1-524x394-rgb24-promoted-cpu-2026-04-05postfilter unsharp=9:9:0.35:9:9:0.02.20920,457promoted
robust_current-av1-524x394-crf34-promoted-cpu-2026-04-06postfilter unsharp=9:9:0.35:9:9:0.02.19864,455-0.01-56,002promoted
robust_current-av1-524x394-crf35-rejected-cpu-2026-04-06postfilter unsharp=9:9:0.35:9:9:0.02.21808,036+0.02-56,419rejected
robust_current-av1-524x394-unsharp030-rejected-cpu-2026-04-06postfilter unsharp=9:9:0.30:9:9:0.02.20864,455-0.01+56,419rejected
robust_current-av1-524x394-filmgrain0-rejected-cpu-2026-04-06film-grain 03.33719,096+1.13-145,359rejected

Promotion ladder

  • 3.62 — 512x384, lanczos/bicubic, crf 23, keyint 32, bframes 4, ref 4
  • 3.56 — 448x336, lanczos/bicubic, crf 23, keyint 32, bframes 4, ref 4
  • 3.56 — 448x336, lanczos/bicubic, crf 23, keyint 48, bframes 4, ref 4
  • 3.54 — 448x336, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4
  • 3.33 — 432x324, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4
  • 3.25 — 424x318, lanczos/lanczos, crf 23, keyint 48, bframes 4, ref 4
  • 2.20 — 524x394, svtav1-p0, crf 33, keyint 180, film-grain 22, lanczos/bicubic
  • 2.19 — 524x394, svtav1-p0, crf 34, keyint 180, film-grain 22, lanczos/bicubic

Useful negative results

  • 97.45 — robust_current-av1-524x394-cpu-2026-04-05
  • 5.73 — robust_current-roi-two-pass-cpu-2026-04-04
  • 4.47 — robust_current-dynamic-main-roi-cpu-2026-04-05
  • 3.44 — robust_current-464x348-cpu-2026-04-04
  • 3.44 — robust_current-cand-416x312-crf23-g48-b4-r4-cpu-2026-04-05
  • 3.43 — robust_current-cand-432x324-crf23-g48-b3-r4-cpu-2026-04-05
  • 3.38 — robust_current-cand-432x324-crf23-g64-b4-r4-cpu-2026-04-05
  • 3.38 — robust_current-cand-426x320-crf23-g48-b4-r4-cpu-2026-04-06

What to open first

If you only have a minute, open the judges one-pager first. If you need to validate a claim, jump directly to the evidence index or promotion accounting.

Current default reading order:

  1. Headline metrics
  2. Score vs bytes
  3. Experimental lineage
  4. Why these results move
  5. Promotion ladder
  6. Promotion review
  7. Experiment journal

Promotion accounting

Rule-faithful values below are local estimates from scorer distortions plus honest bytes. They are not official published scores.

RunScaleFiltersCurrent workflowCurrent bytesRule-faithful estimateRule-faithful bytes
robust_current-medium23-cpu-2026-04-03512x384lanczos/bicubic3.622,819,3743.6182,822,418
robust_current-448x336-medium23-cpu-2026-04-03448x336lanczos/bicubic3.561,978,1413.5631,981,185
robust_current-keyint48-cpu-2026-04-04448x336lanczos/bicubic3.561,901,6063.5621,904,650
robust_current-lanczos-lanczos-cpu-2026-04-04448x336lanczos/lanczos3.541,901,6063.5461,904,650
robust_current-432x324-cpu-2026-04-04432x324lanczos/lanczos3.331,781,1293.3301,787,266
robust_current-cand-424x318-crf23-g48-b4-r4-cpu-2026-04-05424x318lanczos/lanczos3.251,669,9843.2751,704,163
robust_current-av1-524x394-rgb24-promoted-cpu-2026-04-05524x394lanczos/bicubic2.20920,4572.228956,424
robust_current-av1-524x394-crf34-promoted-cpu-2026-04-06524x394lanczos/bicubic2.19864,4552.215900,954

Working theses

  • Strong standard-codec AV1 was already viable here; the real blocker was a rawvideo byte-format bug in the inflator.
  • BAT00 became useful as a research-only ranking lane, not as a score authority.
  • ROI-style multi-stream ideas were informative negative results but too expensive in the tested forms.
  • The x265 ladder still matters as evidence: it established a 3.25 local floor before the repaired AV1 branch advanced through 2.20 to 2.19.

Key turning points

  • First honest floor — robust_current-baseline-cpu-2026-04-03 (4.06)
  • First big win — robust_current-medium23-cpu-2026-04-03 (3.62)
  • ROI failure — robust_current-dynamic-main-roi-cpu-2026-04-05 (4.47)
  • Current best floor — robust_current-av1-524x394-crf34-promoted-cpu-2026-04-06 (2.19)

Timeline

UTCTypeSummary
2026-04-05T14:29:14.555005+00:00decisionLower-resolution BAT00-ranked follow-up at 416x312 scored 3.44 and was rejected.
2026-04-05T15:10:07.425100+00:00decisionBAT00-ranked nearby-scale follow-up at 428x320 scored 3.32 and was rejected.
2026-04-05T16:07:20.968299+00:00decisionBAT00-ranked nearby-scale follow-up at 426x320 scored 3.38 and was rejected.
2026-04-05T17:12:42.981688+00:00researchPublic-style AV1/SVT-AV1 reproduction attempt at 524x394 regressed catastrophically to 97.45 and was rejected.
2026-04-05T18:20:08.762518+00:00analysisAV1 failure root cause was a rawvideo pixel-format mismatch: inflate emitted yuv444p bytes instead of rgb24. Forcing rgb24 recovered the same 524x394 SVT-AV1 recipe from 97.45 to 2.20.
2026-04-05T18:20:08.762518+00:00promotionCanonical robust_current AV1 floor at 524x394 / svtav1-p0 / crf33 / film-grain22 / bicubic / unsharp scored 2.20 and became the new promoted floor.
2026-04-06T04:41:42.160831+00:00analysisCRF34 AV1 neighborhood probe cut bytes by 56,002 (6.08%) versus the 2.20 CRF33 AV1 floor, while pose and seg distortion rose slightly; net score improved by 0.01 to 2.19.
2026-04-06T04:41:42.160831+00:00promotionCanonical robust_current AV1 floor at 524x394 / svtav1-p0 / crf34 / film-grain22 / bicubic / unsharp scored 2.19 and replaced the prior 2.20 CRF33 floor.
2026-04-06T05:26:53.551355+00:00analysisCRF35 was estimated to maybe land around 2.17-2.20 if byte savings still dominated, but it regressed to 2.21. Bytes fell by 56,419 (6.53%), yet pose and seg rose too far.
2026-04-06T05:26:53.551355+00:00decisionOne-axis AV1 probe at crf35 scored 2.21 and was rejected; the 2.19 crf34 floor remains canonical.
2026-04-06T06:13:03.889813+00:00analysisUnsharp 0.30 was estimated to maybe land around 2.18-2.20 by reducing postfilter aggression, but it stayed at 864,455 bytes and regressed to 2.20. Pose and seg both worsened slightly.
2026-04-06T06:13:03.889813+00:00decisionOne-axis AV1 probe with lighter unsharp (0.30) scored 2.20 and was rejected; the 2.19 unsharp 0.35 floor remains canonical.
2026-04-06T07:12:44.279648+00:00analysisFilm-grain 0 was estimated to maybe land around 2.16-2.24 if synthetic grain was hurting task metrics, but it regressed catastrophically to 3.33. Bytes fell by 145,359 (16.82%), yet PoseNet distortion exploded.
2026-04-06T07:12:44.279648+00:00decisionOne-axis AV1 probe with film-grain disabled scored 3.33 and was rejected; the 2.19 film-grain22 floor remains canonical.