Fingerprinting LLMs with their unconditioned distribution

When playing around with the OpenAI playground models, I noticed something very interesting occurs if we study the unconditioned distribution of the models. LLMs are generative models that try to learn the full joint distribution of tokens across text data on their internet and are trained with an autoregressive objective... [Read More]

The solution to alignment is many not one

The goal of this post is to argue against a common implicit assumption I see people making – that there is, and must be one single solution to alignment such that when we have this solution alignment is 100% solved, and while we don’t have such a solution, we are... [Read More]

Intellectual progress in 2022

2022 has been an interesting year. Perhaps the biggest change is that I left academia and started getting serious about AI safety. I am now head of research at Conjecture, a London-based startup with the mission of solving alignment. We are serious about this and we are giving it our... [Read More]