# experiment journal

## AV1 repair: byte-layout bug to working frontier

### prior baseline

- honest floor before AV1 repair: `3.25` x265 at `424x318`
- broken AV1 reproduction: `97.45`

### hypothesis

The public-style AV1 recipe was probably not actually terrible; the implementation path was likely wrong at the byte-layout boundary.

### estimated difference

If the evaluator was reading the wrong raw byte layout, fixing that could produce a dramatic recovery into the low-2.x range without changing the high-level codec recipe.

### measured result

- broken AV1 path: `97.45`
- repaired AV1 path with explicit `rgb24`: `2.20`

### reflection

The hypothesis held strongly. The big lesson was not “AV1 is magic.” The lesson was that evaluator-facing byte correctness matters enough to completely dominate the outcome.

## AV1 neighborhood probe: `crf33 -> crf34`

### prior baseline

- live AV1 floor: `2.20` at `920,457` bytes

### hypothesis

A one-step CRF increase might reduce bytes enough to improve total score, provided pose and seg distortions rose only slightly.

### estimated difference

- expected score improvement: about `0.01`–`0.03`
- expected reason: rate term would fall more than the distortion terms rose

### measured result

- candidate: `2.19` at `864,456` bytes
- canonical regression: `2.19` at `864,455` bytes
- byte delta: `-56,002` (`-6.084%`)

### reflection

The hypothesis held. Distortions rose slightly, but the byte drop was still worth it. This is a clean example of why small, single-axis experiments are valuable: the direction of improvement was understandable, not mysterious.


## AV1 neighborhood probe: `crf34 -> crf35`

### prior baseline

- live AV1 floor: `2.19` at `864,455` bytes

### hypothesis

Another one-step CRF increase might still improve total score if byte savings continued to dominate.

### estimated difference

- expected score: about `2.17`–`2.20`
- expected reason: further rate reduction might still beat a moderate distortion increase

### measured result

- candidate: `2.21` at `808,036` bytes
- byte delta: `-56,419` (`-6.527%`)
- pose delta: `+0.00789357`
- seg delta: `+0.00024012`

### reflection

The hypothesis failed. The bytes got smaller again, but this time the distortion increase was too large. That makes the local frontier shape much clearer: `crf34` appears to be near the useful knee, while `crf35` crosses past it.


## AV1 neighborhood probe: `unsharp 0.35 -> 0.30`

### prior baseline

- live AV1 floor: `2.19` at `864,455` bytes

### hypothesis

Slightly less aggressive decoder-side sharpening might reduce evaluator-facing artifacts enough to improve total score.

### estimated difference

- expected score: about `2.18`–`2.20`
- expected reason: slightly cleaner reconstructed structure with similar bytes

### measured result

- candidate: `2.20` at `864,455` bytes
- byte delta: `0` (`0.000%`)
- pose delta: `+0.00088620`
- seg delta: `+0.00000462`

### reflection

The hypothesis failed. Weakening the postfilter did not save bytes and slightly worsened both task distortions. This helps show that the current `0.35` setting is not arbitrary: a nearby softer reconstruction already lost.


## AV1 neighborhood probe: `film-grain 22 -> 0`

### prior baseline

- live AV1 floor: `2.19` at `864,455` bytes

### hypothesis

Disabling film-grain synthesis might reduce evaluator-facing synthetic-noise side effects enough to improve score.

### estimated difference

- expected score: about `2.16`–`2.24`
- expected reason: cleaner decoded frames, but with real risk because public strong recipes all used film-grain

### measured result

- candidate: `3.33` at `719,096` bytes
- byte delta: `-145,359` (`-16.815%`)
- pose delta: `+0.41476892`
- seg delta: `-0.00010482`

### reflection

The hypothesis failed badly. Disabling film-grain did save a lot of bytes, but PoseNet distortion exploded. That strongly suggests film-grain synthesis is helping preserve task-relevant structure in this evaluator regime rather than merely adding cosmetic detail.