kitchen_sink: exact masks, pose deltas, FP4 weights, selector tails, and test-time Adam. this was the best score before the HNeRV wave.
a thirty-day compression log
kitchen
sink.
final submitted PR #105 scored 0.197974 on the CPU runner. the long route there went through exact masks, tiny control labels, test-time Adam, one DALI scare, and finally a small HNeRV latent sidecar.
00 · last-hour update
the ending moved twice.
by may 2, the hand-built semantic codec was ahead of every legit-looking public PR. then HNeRV crushed the rate term. the final sprint became a different problem: tune that representation against the exact runner, squeeze the latent table, and ship before the board moved again.
HNeRV schema v2 plus a one-code latent sidecar. full raw output confirmed on the CPU runner. this became kitchen_sink.
DALI-target fine-tune with sidecar. slower to win, but useful after the PyAV versus DALI target mismatch burned a day.
PR #102 landed after #100 by changing only decode constants. #103 claimed 0.19487 with a repack. the final board was moving until the buzzer.
01 · the score function
don't compress
the pixels.
score is bytes plus two model-disagreement terms. SegNet only looks at frame 1 and cares about pixel labels. PoseNet looks at both frames and cares about motion estimates. the useful object kept changing: first masks, then optimizer controls, then HNeRV latents.
02 · the race
my best vs the public board.
log scale. rust line is my local best. cream line is the public board. the chart used to end with a clean 0.257 lead. the final day made that look ancient.
03 · april 18
PR #55.
i had been making small wins all week. then Quantizr dropped 0.33 in one PR and made me shit my pants a bit.
04 · the trick worth keeping
SegNet only reads f1.
PoseNet reads both.
the gap was worth about 0.06 of score. f1 had to look right because segmentation was scored there. f0 didn't have to look like anything specific, as long as the (f0, f1) pair told PoseNet the right motion. so f1 spent its capacity on segmentation, and f0 ended up as a pose sponge that absorbed leftover error once f1 was already locked in.
f1 serves segmentation, f0 serves pose. inflate runs a short pose-only prepass on f0, then 75 to 86 Adam steps on f1 against the exact mask, then 8 more pose-only steps on f0 to clean up whatever pose error is left.
two-stage f0, one-stage f1. the post-f1 f0 repair was the late mechanism that paid. by the time it ran, f1 was already decided, so f0 could chase pose error without dragging segmentation backwards.
05 · kitchen_sink archive
231,891 bytes.
most of it masks.
this is the hand-built semantic codec, not the final HNeRV branch. a 12-byte header points to four streams. hover any segment of the bar below to see what it is and how it's encoded.
06 · the m stream · 169,686 bytes · 73.2%
the compressed object
is the mask cache.
600 frames at 384 × 512, 5 classes, encoded losslessly with a sparse-CTW Markov model and brotli. the optimizer always knew exactly what f1 was supposed to be: argmax-equal to this. below, the actual decoded masks evenly spaced through the corpus.
07 · the p stream · 3,133 bytes · 1.4%
3,133 bytes for the motion.
a 6-DOF pose vector per frame, encoded as deltas with a per-dimension Laplace coder behind a range coder. 5.22 bytes per frame, lossless. forward speed is the dominant dimension by a lot. chart below is dim 0, vehicle speed in m/s, across 600 frames.
08 · the r stream · 691 bytes · 0.3%
691 bytes, 600 decisions,
one optimizer to steer.
selector tails are a tiny side-channel that nudges the optimizer pair by pair. less than one byte per pair. they steer which basin the search lands in. three of them stack here.
09 · inflate
the decompressor was a search.
30 minutes on a T4, 600 raw pairs to write. after decoding the four streams, almost the entire wall goes into Adam. f0 prepass, f1 refinement against the exact mask, f0 repair. eight cells, left to right. this was the main trick of the semantic-codec phase. HNeRV later won rate in a cleaner way, but this is where the big private lead came from.
10 · the dead lanes
most of what got tried didn't work.
i ran a ton of experiments and most of them didn't work. each one taught me something concrete though. the mechanism column is the part worth saving from each. some of the same ideas kept showing up wearing a different hat for days at a time.
11 · notes for next time
use the real scorer.
five things that would survive another competition. the late PyAV versus DALI mess is the boring one and also the most expensive one.
12 · where the name came from
everything but
the kitchen sink.
at peak grind, my submission directory names looked like this. each underscore-separated token is a different rate-lowering or scoring trick stacked on top of all the others. some made it into the final candidate. plenty didn't. by the end of week two it really was every trick i could think of, except the literal kitchen sink. so i shipped that too and called it kitchen_sink.
13 · kitchen_sink manifest
what survived that branch.
no source payload in the archive. one zip member. 600 raw pairs from cold under the public-shaped 30-minute clock. score is the mean of two clean local rows. sha-256 below.
exact masks. pose deltas. packed weights. tiny selector tails. a search that ran after unpacking. some discipline about full rows and public-shaped timing. and a graveyard underneath.