comma-lab

Contest

comma.ai’s public challenge asks entrants to ship an archive.zip that inflates to video. The published score combines archive bytes, SegNet distortion, and PoseNet distortion on the public test clip.

Official challenge repo

Who we are

comma-lab is a public experiment log and submission repo maintained by Alejandro Pena. It publishes measured runs, rejected branches, and the current exact operating point in one place.

adpena/comma-lab Experiment manifest

Public bundle

The deployable Cloudflare bundle is generated from sanitized public files. Raw provider state, secrets, local paths, and job transcripts are not part of this surface.

Placeholders stay placeholders until a sanitized release manifest intentionally publishes URLs.

Last updated

May 4, 2026

Aligned to the PR100 exact Tesla T4 A++ frontier and sanitized release manifest.

Current exact frontier - PR100 HNeRV-LC-v2 adapter replay:

Exact T4 A++ score

0.228269

600 samples

Archive bytes

178,981

SHA afd53348...eb80641

SegNet / PoseNet

6.76e-4 / 1.72e-4

component distances

Runtime custody

ef632353

runtime tree SHA prefix

Score authority is the adjudicated exact auth-eval JSON for the archive/runtime pair. Public PR body scores, comments, static anatomy, and roadmap probes are attribution or research context until exact CUDA replay validates the final bytes.

archive.zip -> inflate.sh -> upstream/evaluate.py · score_json=experiments/results/lightning_batch/exact_eval_public_pr100_hnerv_lc_v2_adapter_t4_20260504T1213Z/contest_auth_eval.adjudicated.json · manifest=./apogee_release_manifest.json

Era 2 - neural renderer controls (historical, contest-CUDA verified):

Lane G v3 contest-CUDA

1.05

694,074 bytes

Modal T4 reproduction

1.04

drift 0.01 within noise

Fallback floor (Lane A)

1.15

694,074 bytes

Era 2 baseline (CUDA-true)

0.90

293 KB

Recipe: dilated-h64 neural renderer + KL distill weight=0.002 + pose TTO retry on Lane A anchor. PoseNet 0.0034 / SegNet 0.0040 / Rate 0.0185. This lane is now a historical control: PR100 supersedes it as the exact public-supplement frontier.

dilated-h64 renderer · CRF=50 mask · KL distill T=2.0 weight=0.002 · pose TTO from baseline poses · contest-CUDA inflate.sh → upstream/evaluate.py

Era 1 — codec + tiny CNN post-filter (historical Track B arc):

Track B current_workflow

1.73

864,167 bytes

Track B rule_faithful

1.795

966,071 bytes

Delta vs published baseline

-2.66

4.39 → 1.73

Delta vs prior floor

-0.11

-1 bytes

Track A remains separate: `current_workflow` 0.00 at 167 bytes. The robust run ledger currently contains 46 measured runs, 18 promotions, and 18 explicit rejections. The Era 1 floor (1.73) was eclipsed by Era 2 once the lab abandoned the codec entirely; the AV1+post-filter mathematical investigation (Jacobian rank-1, CNN residual mid-frequency analysis) is preserved below as historical narrative.

robust_current · libsvtav1 · 524x394 · film-grain=22 · lanczos/long1000-qat-ema-postfilter-h64 · long1000 QAT+EMA learned int8 post-filter h64

What the score means

The challenge score mixes filesize and task distortion. Lower is better, but the score is not a generic visual metric. It rewards preserving the signals that the scorer models actually use.

Archive bytes

Smaller is better, but only if the reconstructed video still preserves the task signal that the scorer measures.

PoseNet distortion

This term measures how much the compressed video changes driving-related pose outputs. Lower is better.

SegNet distortion

This term measures disagreement on segmentation labels. At the current operating point, tiny SegNet regressions matter a lot.

Score vs bytes

Track B runs only. Better runs move toward the lower-left. The x-axis uses log scaling. The y-axis is linear. The severe AV1 bug run at 97.45 is omitted here so the operating range stays legible; it remains documented in the search-path section.

promotion explicit rejection measured run

Lower is better. The plot is generated directly from scorer-backed artifacts in the repository.

Trajectory and branch points

The x-axis is turning-point order. The y-axis is actual current_workflow score. Lower is better. The AV1 failure is called out as an off-scale diagnostic spike rather than squeezed into the honest operating range.

mainline improvements rejected side branches diagnostic off-scale event

This chart is selective by design. It shows the milestones that changed the operating point or changed the lab’s understanding of the evaluator.

Top runs explorer

Compare historical internal visual clips against the source clip. Drag to move the zoom window, change its size, and keep every pane frame-synced.

These GIF/MP4 assets are judge-facing illustrations, not score authority. The PR100 score claim remains the exact auth-eval JSON and release manifest. Official reference: comma.ai leaderboard and the public PR stream in the official challenge repo.

Exact local frontier

PR100 adapter0.228269A++ T4Bytes178,981archive closedBoundaryexact JSON onlyno body-score ranking

Original

Drag to move the zoom window.

1.73 current floor

long1000 h64

Original zoom

1.73 current floor zoom

Selected run: 1.73 current floor · score 1.73 · long1000 h64.

How to read this lab

Evaluator path

The scorer resizes frames and measures task distortion. It is sensitive to pipeline details, not just visual appearance.

PoseNet sees both frames. SegNet sees only the last frame in each pair. That asymmetry is why small encoding or decode-path changes can move the score.
Critical bug

The main AV1 failure was a byte-layout error, not a codec-limit problem.

The failed path emitted rawvideo as yuv444p bytes. The corrected path forces rgb24, which matches the evaluator’s raw-frame expectation.
Current operating point

The current exact public-supplement frontier is 0.22826947142244708 at 178,981 bytes.

The score claim is tied to archive SHA afd53348f50303bf0ec6a7ffecc1ac037df2f1c70745244b9c45c72e8eb80641 and runtime tree ef6323533666c9cac1c204a9d3f7054157d44a185b16fc859fb3f0438ccd1832.

Why 1.73 beat 1.84

prior floor

1.84 864,168 bytes

weighted ensemble h32 + MC 75/25

current floor

1.73 864,167 bytes

long1000 QAT+EMA learned int8 post-filter h64

Metric	Prior	Current	Delta
current_workflow score	1.84	1.73	-0.11
archive bytes	864,168	864,167	-1
PoseNet distortion	0.04678315	0.03317023	-0.01361292
SegNet distortion	0.00581610	0.00575544	-0.00006066

The new floor kept bytes in the same regime while lowering the distortion that mattered most in this comparison.

Local neighborhood

This table isolates the local AV1 neighborhood around the promoted floor. It shows which nearby changes improved the result and which did not.

Variant	Changed axis	Score	Bytes	Δ score	Δ bytes	Verdict
crf 33	crf 33	2.20	920,457			promoted
crf 34	crf 34	2.19	864,455	-0.01	-56,002	promoted
crf 35	crf 35	2.21	808,036	+0.02	-56,419	rejected
unsharp 0.30	postfilter unsharp=9:9:0.30:9:9:0.0	2.20	864,455	-0.01	+56,419	rejected
film-grain 0	film-grain 0	3.33	719,096	+1.13	-145,359	rejected
522x392	geometry 522x392	2.23	862,238	+0.04	-2,217	rejected
lanczos upscale	upscale lanczos	2.18	864,455	-0.01	+0	promoted
crf 34	crf 34	2.08	864,168	-0.10	-287	promoted
consensus stack	crf33 + scd0 + hqdn3d	2.13	909,307	+0.05	+45,139	rejected
ROI preprocess	corridor preprocessing	2.52	785,302	+0.39	-124,005	rejected
grain mask	decode saliency-masked grain synthesis	2.30	716,797	-0.22	-68,505	rejected
522x392	geometry 522x392	2.05	861,986	-0.14	-2,469	promoted

On narrow screens, swipe horizontally to inspect the full table.

References

Primary artifacts

Turning points

Initial honest floor — robust_current-baseline-cpu-2026-04-03 (4.06)
Era 1 postfilter floor — robust_current-long1000-h64-promoted-cpu-2026-04-09 (1.73)
Era 2 neural renderer controls — 0.90 CUDA baseline and Lane G v3 at 1.05
C067 public-floor reproduction — exact T4 0.31561703078448233
Current exact frontier — PR100 HNeRV-LC-v2 adapter replay (0.22826947142244708)

The landing page is the guided entry point. Full depth lives in the linked notebook, packet, release manifest, and evidence index.

A lab notebook for compression that machines can trust and people can follow.

Start here

For judges

For researchers

What the score means

Archive bytes

PoseNet distortion

SegNet distortion

Score vs bytes

Trajectory and branch points

Top runs explorer

Original

1.73 current floor

Original zoom

1.73 current floor zoom

How to read this lab

Evaluator path

Critical bug

Current operating point

Why 1.73 beat 1.84

Local neighborhood

References