valo.email · comma video compression challenge · writeup may 04, 2026 · valtteri valo

a thirty-day compression log

kitchen
sink.

final submitted PR #105 scored 0.197974 on the CPU runner. the long route there went through exact masks, tiny control labels, test-time Adam, one DALI scare, and finally a small HNeRV latent sidecar.

A single SegNet mask from the encoded stream. the actual compressed object — mask 300 / 600 · the actual compressed object 5 classes · 384 × 512

0.197974 kitchen_sink PR #105 · final submitted CPU route

00 · last-hour update

the ending moved twice.

by may 2, the hand-built semantic codec was ahead of every legit-looking public PR. then HNeRV crushed the rate term. the final sprint became a different problem: tune that representation against the exact runner, squeeze the latent table, and ship before the board moved again.

submitted cpu PR #105 · 0.197974 confirmed dali fallback 0.209457 late public bar PR #102 · 0.19499

hand-built phase 0.257328

kitchen_sink: exact masks, pose deltas, FP4 weights, selector tails, and test-time Adam. this was the best score before the HNeRV wave.

submitted · PR #105 0.197974

HNeRV schema v2 plus a one-code latent sidecar. full raw output confirmed on the CPU runner. this became kitchen_sink.

fallback · PR #106 0.209457

DALI-target fine-tune with sidecar. slower to win, but useful after the PyAV versus DALI target mismatch burned a day.

late public bar 0.19499

PR #102 landed after #100 by changing only decode constants. #103 claimed 0.19487 with a repack. the final board was moving until the buzzer.

01 · the score function

don't compress
the pixels.

score is bytes plus two model-disagreement terms. SegNet only looks at frame 1 and cares about pixel labels. PoseNet looks at both frames and cares about motion estimates. the useful object kept changing: first masks, then optimizer controls, then HNeRV latents.

baseline svt-av1 → 4.39 semantic stack 0.2573 kitchen_sink PR #105 0.1980 late public bar 0.19499

score= 100 · seg+ 25 · rate+ √(10 · pose)

02 · the race

my best vs the public board.

log scale. rust line is my local best. cream line is the public board. the chart used to end with a clean 0.257 lead. the final day made that look ancient.

submitted low 0.197974 late public bar 0.19499 state submitted

my best (local) leaderboard best field ahead i'm ahead

03 · april 18

PR #55.

i had been making small wins all week. then Quantizr dropped 0.33 in one PR and made me shit my pants a bit.

before · my best 0.356 · public best 0.60

after · public 0.33 · ruh roh i'm losing for the first time

04 · the trick worth keeping

SegNet only reads f1.
PoseNet reads both.

the gap was worth about 0.06 of score. f1 had to look right because segmentation was scored there. f0 didn't have to look like anything specific, as long as the (f0, f1) pair told PoseNet the right motion. so f1 spent its capacity on segmentation, and f0 ended up as a pose sponge that absorbed leftover error once f1 was already locked in.

pose stages 2 (prepass + repair) seg stages 1 (f1 refine) asymmetry size ~0.06 score

f1 serves segmentation, f0 serves pose. inflate runs a short pose-only prepass on f0, then 75 to 86 Adam steps on f1 against the exact mask, then 8 more pose-only steps on f0 to clean up whatever pose error is left.

two-stage f0, one-stage f1. the post-f1 f0 repair was the late mechanism that paid. by the time it ran, f1 was already decided, so f0 could chase pose error without dragging segmentation backwards.

05 · kitchen_sink archive

231,891 bytes.
most of it masks.

this is the hand-built semantic codec, not the final HNeRV branch. a 12-byte header points to four streams. hover any segment of the bar below to see what it is and how it's encoded.

zip overhead 100 B header 12 B (3 × uint32) raw output 3.66 GB compression ratio 15,793×

06 · the m stream · 169,686 bytes · 73.2%

the compressed object
is the mask cache.

600 frames at 384 × 512, 5 classes, encoded losslessly with a sparse-CTW Markov model and brotli. the optimizer always knew exactly what f1 was supposed to be: argmax-equal to this. below, the actual decoded masks evenly spaced through the corpus.

bytes per frame 282.8 (mean) codec sparse-CTW M11 + brotli encoder seg_sparse_m11_codec.py

48 mask thumbnails, evenly spaced through 600 frames

07 · the p stream · 3,133 bytes · 1.4%

3,133 bytes for the motion.

a 6-DOF pose vector per frame, encoded as deltas with a per-dimension Laplace coder behind a range coder. 5.22 bytes per frame, lossless. forward speed is the dominant dimension by a lot. chart below is dim 0, vehicle speed in m/s, across 600 frames.

bytes per frame 5.22 codec delta + Laplace + range coder lossless yes

08 · the r stream · 691 bytes · 0.3%

691 bytes, 600 decisions,
one optimizer to steer.

selector tails are a tiny side-channel that nudges the optimizer pair by pair. less than one byte per pair. they steer which basin the search lands in. three of them stack here.

tails 4 packed selectors magic PSEL4 density 0.92 B per decision

F1S4B · f1 seed selector · 4 bits / pair · 600 pairs · 307 B simulated no-rate Δ −0.01080

PSEL3 · pose target factor · 3 bits / pair · 600 pairs · 232 B simulated net Δ −0.00693

LCT3B · loss control selector · 3 bits / batch · 100 batches · 45 B 8 actions · re-weights pose vs seg per batch

09 · inflate

the decompressor was a search.

30 minutes on a T4, 600 raw pairs to write. after decoding the four streams, almost the entire wall goes into Adam. f0 prepass, f1 refinement against the exact mask, f0 repair. eight cells, left to right. this was the main trick of the semantic-codec phase. HNeRV later won rate in a cleaner way, but this is where the big private lead came from.

public-shaped wall 1709.9 s budget 1800 s on T4 headroom +90.1 s

10 · the dead lanes

most of what got tried didn't work.

i ran a ton of experiments and most of them didn't work. each one taught me something concrete though. the mechanism column is the part worth saving from each. some of the same ideas kept showing up wearing a different hat for days at a time.

experiment dirs 216 submission attempts 525 shown below 30

11 · notes for next time

use the real scorer.

five things that would survive another competition. the late PyAV versus DALI mess is the boring one and also the most expensive one.

12 · where the name came from

everything but
the kitchen sink.

at peak grind, my submission directory names looked like this. each underscore-separated token is a different rate-lowering or scoring trick stacked on top of all the others. some made it into the final candidate. plenty didn't. by the end of week two it really was every trick i could think of, except the literal kitchen sink. so i shipped that too and called it kitchen_sink.

longest dir name 139 chars final dir name 30 chars final ship name kitchen_sink

13 · kitchen_sink manifest

what survived that branch.

no source payload in the archive. one zip member. 600 raw pairs from cold under the public-shaped 30-minute clock. score is the mean of two clean local rows. sha-256 below.

exact masks. pose deltas. packed weights. tiny selector tails. a search that ran after unpacking. some discipline about full rows and public-shaped timing. and a graveyard underneath.

kitchensink.