comma-lab

A lab notebook for compression that machines can trust and people can follow.

This site tracks scorer-backed experiments for the comma.ai video compression challenge. It is designed to work at two levels: a plain-English executive layer for judges and newcomers, and a deeper evidence trail for researchers who want the exact runs, artifacts, and failure cases.

Current floor: 1.73 Rule-faithful: 1.795 46 measured runs

Start here

Read this page first if you want the outcome, the score, and the plain-English explanation of what is being optimized.

For judges

Jump to the compact packet and the promotion review if you want the measured result, the evidence root, and the short argument for why it matters.

For researchers

Use the notebook, manifests, and raw evidence if you want the experimental path, the exact artifacts, and the rejected branches too.

Contest

comma.ai’s public challenge asks entrants to ship an archive.zip that inflates to video. The published score combines archive bytes, SegNet distortion, and PoseNet distortion on the public test clip.

Who we are

comma-lab is a public experiment log and submission repo maintained by Alejandro Pena. It publishes measured runs, rejected branches, and the current promoted operating point in one place.

Private ops

Private-facing surfaces stay in-repo: report history and comma-lab sched status for scheduler state.

Scheduler details remain read-only and are not exposed as a public launch surface.

Last updated

Apr 9, 2026, 4:58 PM CDT

Generated from repository state plus scorer-backed artifacts stored in this repo.

Track B current_workflow
1.73
864,167 bytes
Track B rule_faithful
1.795
966,071 bytes
Delta vs published baseline
-2.66
4.39 → 1.73
Delta vs prior floor
-0.11
-1 bytes

Track A remains separate: `current_workflow` 0.00 at 167 bytes. The robust run ledger currently contains 46 measured runs, 18 promotions, and 18 explicit rejections.

robust_current · libsvtav1 · 524x394 · film-grain=22 · lanczos/long1000-qat-ema-postfilter-h64 · long1000 QAT+EMA learned int8 post-filter h64

What the score means

The challenge score mixes filesize and task distortion. Lower is better, but the score is not a generic visual metric. It rewards preserving the signals that the scorer models actually use.

Archive bytes

Smaller is better, but only if the reconstructed video still preserves the task signal that the scorer measures.

PoseNet distortion

This term measures how much the compressed video changes driving-related pose outputs. Lower is better.

SegNet distortion

This term measures disagreement on segmentation labels. At the current operating point, tiny SegNet regressions matter a lot.

Score vs bytes

Track B runs only. Better runs move toward the lower-left. The x-axis uses log scaling. The y-axis is linear. The severe AV1 bug run at 97.45 is omitted here so the operating range stays legible; it remains documented in the search-path section.

900k1.2M1.7M2.5M4.0M5.0M22.5345archive size (bytes, log scale)current_workflow score (linear, lower is better)
900k1.2M1.7M2.5M22.534archive size (bytes, log scale)current_workflow score (linear, lower is better)
promotion explicit rejection measured run

Lower is better. The plot is generated directly from scorer-backed artifacts in the repository.

Trajectory and branch points

The x-axis is turning-point order. The y-axis is actual current_workflow score. Lower is better. The AV1 failure is called out as an off-scale diagnostic spike rather than squeezed into the honest operating range.

mainline improvements rejected side branches diagnostic off-scale event
2.02.53.03.54.04.55.0 robust_current-cand-428x320-crf23-g48-b4-r4-cpu-2026-04-05 | score=3.32 | bytes=1,741,924nearby scale3.32 · x265robust_current-av1-524x394-cpu-2026-04-05 | score=97.45 | bytes=920,457AV1 failure97.45 off-scaleoff-scale diagnostic spike robust_current-baseline-cpu-2026-04-03 | score=4.06 | bytes=3,735,828baseline4.06 · x265robust_current-cand-424x318-crf23-g48-b4-r4-cpu-2026-04-05 | score=3.25 | bytes=1,669,984x265 floor3.25 · x265robust_current-av1-524x394-rgb24-promoted-cpu-2026-04-05 | score=2.20 | bytes=920,457AV1 repair2.20 · 524x394robust_current-sharpness1-promoted-cpu-2026-04-06 | score=2.08 | bytes=864,168sharpness=12.08 · 524x394robust_current-av1-522x392-postfilter-promoted-cpu-2026-04-07 | score=2.05 | bytes=861,986first post-filter2.05 · 522x392robust_current-long500-h16-promoted-cpu-2026-04-08 | score=1.99 | bytes=864,167long500 h161.99 · 524x394robust_current-long1000-h64-promoted-cpu-2026-04-09 | score=1.73 | bytes=864,167current floor1.73 · 524x394 measured turning-point order actual current_workflow score

This chart is selective by design. It shows the milestones that changed the operating point or changed the lab’s understanding of the evaluator.

Top runs explorer

Compare the strongest scorer-backed internal runs against the source clip. drag to move the zoom window, change its size, and keep every pane frame-synced.

Internal media explorer last refreshed Apr 9, 2026, 10:52 AM CDT. Public entries below are reference metadata only because we do not have their artifacts or synced media. Official reference: comma.ai leaderboard and the public PR stream in the official challenge repo.

Unofficial community leaderboard snapshot
Public #11.89neural_inflatePublic #21.94roi_v2Public #31.95av1_roi_lanczos_unsharp

Original

Drag to move the zoom window.

1.73 current floor

long1000 h64

Original zoom

1.73 current floor zoom

Selected run: 1.73 current floor · score 1.73 · long1000 h64.

How to read this lab

  1. Evaluator path

    The scorer resizes frames and measures task distortion. It is sensitive to pipeline details, not just visual appearance.

    PoseNet sees both frames. SegNet sees only the last frame in each pair. That asymmetry is why small encoding or decode-path changes can move the score.

  2. Critical bug

    The main AV1 failure was a byte-layout error, not a codec-limit problem.

    The failed path emitted rawvideo as yuv444p bytes. The corrected path forces rgb24, which matches the evaluator’s raw-frame expectation.

  3. Current operating point

    The current honest floor is 1.73 at 864,167 bytes.

    Against the first honest baseline (4.06 at 3,735,828 bytes), the lab reduced bytes while preserving task signal well enough to keep the score moving down.

Why 1.73 beat 1.84

prior floor
1.84 864,168 bytes
weighted ensemble h32 + MC 75/25
current floor
1.73 864,167 bytes
long1000 QAT+EMA learned int8 post-filter h64
MetricPriorCurrentDelta
current_workflow score1.841.73-0.11
archive bytes864,168864,167-1
PoseNet distortion0.046783150.03317023-0.01361292
SegNet distortion0.005816100.00575544-0.00006066

The new floor kept bytes in the same regime while lowering the distortion that mattered most in this comparison.

Local neighborhood

This table isolates the local AV1 neighborhood around the promoted floor. It shows which nearby changes improved the result and which did not.

VariantChanged axisScoreBytesΔ scoreΔ bytesVerdict
crf 33crf 332.20920,457promoted
crf 34crf 342.19864,455-0.01-56,002promoted
crf 35crf 352.21808,036+0.02-56,419rejected
unsharp 0.30postfilter unsharp=9:9:0.30:9:9:0.02.20864,455-0.01+56,419rejected
film-grain 0film-grain 03.33719,096+1.13-145,359rejected
522x392geometry 522x3922.23862,238+0.04-2,217rejected
lanczos upscaleupscale lanczos2.18864,455-0.01+0promoted
crf 34crf 342.08864,168-0.10-287promoted
consensus stackcrf33 + scd0 + hqdn3d2.13909,307+0.05+45,139rejected
ROI preprocesscorridor preprocessing2.52785,302+0.39-124,005rejected
grain maskdecode saliency-masked grain synthesis2.30716,797-0.22-68,505rejected
522x392geometry 522x3922.05861,986-0.14-2,469promoted

On narrow screens, swipe horizontally to inspect the full table.

References

Turning points

  • Initial honest floor — robust_current-baseline-cpu-2026-04-03 (4.06)
  • 512x384 / crf23 — robust_current-medium23-cpu-2026-04-03 (3.62)
  • ROI branch rejection — robust_current-dynamic-main-roi-cpu-2026-04-05 (4.47)
  • Current floor — robust_current-long1000-h64-promoted-cpu-2026-04-09 (1.73)

The landing page is the guided entry point. Full depth lives in the linked notebook, packet, and raw evidence artifacts.