comma-lab

A lab notebook for compression that machines can trust and people can follow.

This site tracks the Apogee public supplement for the comma.ai video compression challenge. The current score-bearing frontier is the PR100 HNeRV-LC-v2 adapter replay: exact Tesla T4 A++ score 0.22826947142244708 on the exact archive bytes and runtime tree.

Current exact frontier: 0.22826947142244708 178,981 bytes A++ exact T4 CUDA

Start here

Read this page first if you want the outcome, the score, and the plain-English explanation of what is being optimized.

For judges

Jump to the compact packet and manifest if you want the measured result, the evidence root, and the short argument for why it matters.

For researchers

Use the notebook, manifests, and evidence index if you want the experimental path, the exact artifacts, and the rejected branches too.

Contest

comma.ai’s public challenge asks entrants to ship an archive.zip that inflates to video. The published score combines archive bytes, SegNet distortion, and PoseNet distortion on the public test clip.

Who we are

comma-lab is a public experiment log and submission repo maintained by Alejandro Pena. It publishes measured runs, rejected branches, and the current exact operating point in one place.

Public bundle

The deployable Cloudflare bundle is generated from sanitized public files. Raw provider state, secrets, local paths, and job transcripts are not part of this surface.

Placeholders stay placeholders until a sanitized release manifest intentionally publishes URLs.

Last updated

May 4, 2026

Aligned to the PR100 exact Tesla T4 A++ frontier and sanitized release manifest.

Current exact frontier - PR100 HNeRV-LC-v2 adapter replay:

Exact T4 A++ score
0.228269
600 samples
Archive bytes
178,981
SHA afd53348...eb80641
SegNet / PoseNet
6.76e-4 / 1.72e-4
component distances
Runtime custody
ef632353
runtime tree SHA prefix

Score authority is the adjudicated exact auth-eval JSON for the archive/runtime pair. Public PR body scores, comments, static anatomy, and roadmap probes are attribution or research context until exact CUDA replay validates the final bytes.

archive.zip -> inflate.sh -> upstream/evaluate.py · score_json=experiments/results/lightning_batch/exact_eval_public_pr100_hnerv_lc_v2_adapter_t4_20260504T1213Z/contest_auth_eval.adjudicated.json · manifest=./apogee_release_manifest.json

Era 2 - neural renderer controls (historical, contest-CUDA verified):

Lane G v3 contest-CUDA
1.05
694,074 bytes
Modal T4 reproduction
1.04
drift 0.01 within noise
Fallback floor (Lane A)
1.15
694,074 bytes
Era 2 baseline (CUDA-true)
0.90
293 KB

Recipe: dilated-h64 neural renderer + KL distill weight=0.002 + pose TTO retry on Lane A anchor. PoseNet 0.0034 / SegNet 0.0040 / Rate 0.0185. This lane is now a historical control: PR100 supersedes it as the exact public-supplement frontier.

dilated-h64 renderer · CRF=50 mask · KL distill T=2.0 weight=0.002 · pose TTO from baseline poses · contest-CUDA inflate.sh → upstream/evaluate.py

Era 1 — codec + tiny CNN post-filter (historical Track B arc):

Track B current_workflow
1.73
864,167 bytes
Track B rule_faithful
1.795
966,071 bytes
Delta vs published baseline
-2.66
4.39 → 1.73
Delta vs prior floor
-0.11
-1 bytes

Track A remains separate: `current_workflow` 0.00 at 167 bytes. The robust run ledger currently contains 46 measured runs, 18 promotions, and 18 explicit rejections. The Era 1 floor (1.73) was eclipsed by Era 2 once the lab abandoned the codec entirely; the AV1+post-filter mathematical investigation (Jacobian rank-1, CNN residual mid-frequency analysis) is preserved below as historical narrative.

robust_current · libsvtav1 · 524x394 · film-grain=22 · lanczos/long1000-qat-ema-postfilter-h64 · long1000 QAT+EMA learned int8 post-filter h64

What the score means

The challenge score mixes filesize and task distortion. Lower is better, but the score is not a generic visual metric. It rewards preserving the signals that the scorer models actually use.

Archive bytes

Smaller is better, but only if the reconstructed video still preserves the task signal that the scorer measures.

PoseNet distortion

This term measures how much the compressed video changes driving-related pose outputs. Lower is better.

SegNet distortion

This term measures disagreement on segmentation labels. At the current operating point, tiny SegNet regressions matter a lot.

Score vs bytes

Track B runs only. Better runs move toward the lower-left. The x-axis uses log scaling. The y-axis is linear. The severe AV1 bug run at 97.45 is omitted here so the operating range stays legible; it remains documented in the search-path section.

900k1.2M1.7M2.5M4.0M5.0M22.5345archive size (bytes, log scale)current_workflow score (linear, lower is better)
900k1.2M1.7M2.5M22.534archive size (bytes, log scale)current_workflow score (linear, lower is better)
promotion explicit rejection measured run

Lower is better. The plot is generated directly from scorer-backed artifacts in the repository.

Trajectory and branch points

The x-axis is turning-point order. The y-axis is actual current_workflow score. Lower is better. The AV1 failure is called out as an off-scale diagnostic spike rather than squeezed into the honest operating range.

mainline improvements rejected side branches diagnostic off-scale event
2.02.53.03.54.04.55.0 robust_current-cand-428x320-crf23-g48-b4-r4-cpu-2026-04-05 | score=3.32 | bytes=1,741,924nearby scale3.32 · x265robust_current-av1-524x394-cpu-2026-04-05 | score=97.45 | bytes=920,457AV1 failure97.45 off-scaleoff-scale diagnostic spike robust_current-baseline-cpu-2026-04-03 | score=4.06 | bytes=3,735,828baseline4.06 · x265robust_current-cand-424x318-crf23-g48-b4-r4-cpu-2026-04-05 | score=3.25 | bytes=1,669,984x265 floor3.25 · x265robust_current-av1-524x394-rgb24-promoted-cpu-2026-04-05 | score=2.20 | bytes=920,457AV1 repair2.20 · 524x394robust_current-sharpness1-promoted-cpu-2026-04-06 | score=2.08 | bytes=864,168sharpness=12.08 · 524x394robust_current-av1-522x392-postfilter-promoted-cpu-2026-04-07 | score=2.05 | bytes=861,986first post-filter2.05 · 522x392robust_current-long500-h16-promoted-cpu-2026-04-08 | score=1.99 | bytes=864,167long500 h161.99 · 524x394robust_current-long1000-h64-promoted-cpu-2026-04-09 | score=1.73 | bytes=864,167current floor1.73 · 524x394 measured turning-point order actual current_workflow score

This chart is selective by design. It shows the milestones that changed the operating point or changed the lab’s understanding of the evaluator.

Top runs explorer

Compare historical internal visual clips against the source clip. Drag to move the zoom window, change its size, and keep every pane frame-synced.

These GIF/MP4 assets are judge-facing illustrations, not score authority. The PR100 score claim remains the exact auth-eval JSON and release manifest. Official reference: comma.ai leaderboard and the public PR stream in the official challenge repo.

Exact local frontier
PR100 adapter0.228269A++ T4Bytes178,981archive closedBoundaryexact JSON onlyno body-score ranking

Original

Drag to move the zoom window.

1.73 current floor

long1000 h64

Original zoom

1.73 current floor zoom

Selected run: 1.73 current floor · score 1.73 · long1000 h64.

How to read this lab

  1. Evaluator path

    The scorer resizes frames and measures task distortion. It is sensitive to pipeline details, not just visual appearance.

    PoseNet sees both frames. SegNet sees only the last frame in each pair. That asymmetry is why small encoding or decode-path changes can move the score.

  2. Critical bug

    The main AV1 failure was a byte-layout error, not a codec-limit problem.

    The failed path emitted rawvideo as yuv444p bytes. The corrected path forces rgb24, which matches the evaluator’s raw-frame expectation.

  3. Current operating point

    The current exact public-supplement frontier is 0.22826947142244708 at 178,981 bytes.

    The score claim is tied to archive SHA afd53348f50303bf0ec6a7ffecc1ac037df2f1c70745244b9c45c72e8eb80641 and runtime tree ef6323533666c9cac1c204a9d3f7054157d44a185b16fc859fb3f0438ccd1832.

Why 1.73 beat 1.84

prior floor
1.84 864,168 bytes
weighted ensemble h32 + MC 75/25
current floor
1.73 864,167 bytes
long1000 QAT+EMA learned int8 post-filter h64
MetricPriorCurrentDelta
current_workflow score1.841.73-0.11
archive bytes864,168864,167-1
PoseNet distortion0.046783150.03317023-0.01361292
SegNet distortion0.005816100.00575544-0.00006066

The new floor kept bytes in the same regime while lowering the distortion that mattered most in this comparison.

Local neighborhood

This table isolates the local AV1 neighborhood around the promoted floor. It shows which nearby changes improved the result and which did not.

VariantChanged axisScoreBytesΔ scoreΔ bytesVerdict
crf 33crf 332.20920,457promoted
crf 34crf 342.19864,455-0.01-56,002promoted
crf 35crf 352.21808,036+0.02-56,419rejected
unsharp 0.30postfilter unsharp=9:9:0.30:9:9:0.02.20864,455-0.01+56,419rejected
film-grain 0film-grain 03.33719,096+1.13-145,359rejected
522x392geometry 522x3922.23862,238+0.04-2,217rejected
lanczos upscaleupscale lanczos2.18864,455-0.01+0promoted
crf 34crf 342.08864,168-0.10-287promoted
consensus stackcrf33 + scd0 + hqdn3d2.13909,307+0.05+45,139rejected
ROI preprocesscorridor preprocessing2.52785,302+0.39-124,005rejected
grain maskdecode saliency-masked grain synthesis2.30716,797-0.22-68,505rejected
522x392geometry 522x3922.05861,986-0.14-2,469promoted

On narrow screens, swipe horizontally to inspect the full table.

References

Turning points

  • Initial honest floor — robust_current-baseline-cpu-2026-04-03 (4.06)
  • Era 1 postfilter floor — robust_current-long1000-h64-promoted-cpu-2026-04-09 (1.73)
  • Era 2 neural renderer controls — 0.90 CUDA baseline and Lane G v3 at 1.05
  • C067 public-floor reproduction — exact T4 0.31561703078448233
  • Current exact frontier — PR100 HNeRV-LC-v2 adapter replay (0.22826947142244708)

The landing page is the guided entry point. Full depth lives in the linked notebook, packet, release manifest, and evidence index.