Integer tokenization is now much less insane

Just over a year ago, I wrote about how integer tokenization using the GPT2 and GPT3 tokenizer was insane. This was because that it failed to create a coherent number representation in token space since large numbers of integers were assigned a single unique token, and even multi-token integers were... [Read More]

Alignment In The Age Of Synthetic Data

Synthetic data is a new frontier in AI training. Phi3 and Llama3 and other recent models have demonstrated the ability of large amounts of synthetic, well-tailored data to significantly improve performance of small models to bring them closer to the frontier by implicitly cheaply distilling from larger, more powerful, models.... [Read More]

Addendum to the Surprising Parameter Efficiency of Vision Models

In a post from last year – On the Surprising Parameter Efficiency of Vision Models, I discussed a question which had been puzzling me at the time – that image models appear to reach or exceed human parity with significantly fewer parameters than are seemingly used by the brain. This... [Read More]