Start here
Read this page first if you want the outcome, the score, and the plain-English explanation of what is being optimized.
This site tracks scorer-backed experiments for the comma.ai video compression challenge. It is designed to work at two levels: a plain-English executive layer for judges and newcomers, and a deeper evidence trail for researchers who want the exact runs, artifacts, and failure cases.
Read this page first if you want the outcome, the score, and the plain-English explanation of what is being optimized.
Jump to the compact packet and the promotion review if you want the measured result, the evidence root, and the short argument for why it matters.
Use the notebook, manifests, and raw evidence if you want the experimental path, the exact artifacts, and the rejected branches too.
comma.ai’s public challenge asks entrants to ship an archive.zip that inflates to video. The published score combines archive bytes, SegNet distortion, and PoseNet distortion on the public test clip.
comma-lab is a public experiment log and submission repo maintained by Alejandro Pena. It publishes measured runs, rejected branches, and the current promoted operating point in one place.
Private-facing surfaces stay in-repo: report history and comma-lab sched status for scheduler state.
Apr 9, 2026, 4:58 PM CDT
Track A remains separate: `current_workflow` 0.00 at 167 bytes. The robust run ledger currently contains 46 measured runs, 18 promotions, and 18 explicit rejections.
robust_current · libsvtav1 · 524x394 · film-grain=22 · lanczos/long1000-qat-ema-postfilter-h64 · long1000 QAT+EMA learned int8 post-filter h64
The challenge score mixes filesize and task distortion. Lower is better, but the score is not a generic visual metric. It rewards preserving the signals that the scorer models actually use.
Smaller is better, but only if the reconstructed video still preserves the task signal that the scorer measures.
This term measures how much the compressed video changes driving-related pose outputs. Lower is better.
This term measures disagreement on segmentation labels. At the current operating point, tiny SegNet regressions matter a lot.
Track B runs only. Better runs move toward the lower-left. The x-axis uses log scaling. The y-axis is linear. The severe AV1 bug run at 97.45 is omitted here so the operating range stays legible; it remains documented in the search-path section.
Lower is better. The plot is generated directly from scorer-backed artifacts in the repository.
The x-axis is turning-point order. The y-axis is actual current_workflow score. Lower is better. The AV1 failure is called out as an off-scale diagnostic spike rather than squeezed into the honest operating range.
This chart is selective by design. It shows the milestones that changed the operating point or changed the lab’s understanding of the evaluator.
Compare the strongest scorer-backed internal runs against the source clip. drag to move the zoom window, change its size, and keep every pane frame-synced.
Selected run: 1.73 current floor · score 1.73 · long1000 h64.
The scorer resizes frames and measures task distortion. It is sensitive to pipeline details, not just visual appearance.
PoseNet sees both frames. SegNet sees only the last frame in each pair. That asymmetry is why small encoding or decode-path changes can move the score.
The main AV1 failure was a byte-layout error, not a codec-limit problem.
The failed path emitted rawvideo as yuv444p bytes. The corrected path forces rgb24, which matches the evaluator’s raw-frame expectation.
The current honest floor is 1.73 at 864,167 bytes.
Against the first honest baseline (4.06 at 3,735,828 bytes), the lab reduced bytes while preserving task signal well enough to keep the score moving down.
| Metric | Prior | Current | Delta |
|---|---|---|---|
| current_workflow score | 1.84 | 1.73 | -0.11 |
| archive bytes | 864,168 | 864,167 | -1 |
| PoseNet distortion | 0.04678315 | 0.03317023 | -0.01361292 |
| SegNet distortion | 0.00581610 | 0.00575544 | -0.00006066 |
The new floor kept bytes in the same regime while lowering the distortion that mattered most in this comparison.
This table isolates the local AV1 neighborhood around the promoted floor. It shows which nearby changes improved the result and which did not.
| Variant | Changed axis | Score | Bytes | Δ score | Δ bytes | Verdict |
|---|---|---|---|---|---|---|
| crf 33 | crf 33 | 2.20 | 920,457 | promoted | ||
| crf 34 | crf 34 | 2.19 | 864,455 | -0.01 | -56,002 | promoted |
| crf 35 | crf 35 | 2.21 | 808,036 | +0.02 | -56,419 | rejected |
| unsharp 0.30 | postfilter unsharp=9:9:0.30:9:9:0.0 | 2.20 | 864,455 | -0.01 | +56,419 | rejected |
| film-grain 0 | film-grain 0 | 3.33 | 719,096 | +1.13 | -145,359 | rejected |
| 522x392 | geometry 522x392 | 2.23 | 862,238 | +0.04 | -2,217 | rejected |
| lanczos upscale | upscale lanczos | 2.18 | 864,455 | -0.01 | +0 | promoted |
| crf 34 | crf 34 | 2.08 | 864,168 | -0.10 | -287 | promoted |
| consensus stack | crf33 + scd0 + hqdn3d | 2.13 | 909,307 | +0.05 | +45,139 | rejected |
| ROI preprocess | corridor preprocessing | 2.52 | 785,302 | +0.39 | -124,005 | rejected |
| grain mask | decode saliency-masked grain synthesis | 2.30 | 716,797 | -0.22 | -68,505 | rejected |
| 522x392 | geometry 522x392 | 2.05 | 861,986 | -0.14 | -2,469 | promoted |
On narrow screens, swipe horizontally to inspect the full table.
Primary artifacts
Turning points
The landing page is the guided entry point. Full depth lives in the linked notebook, packet, and raw evidence artifacts.