Orthogonality is expensive

A common assumption about AGI is the orthogonality thesis, which argues that goals/utility functions and the core intelligence of an AGI system are orthogonal or can be cleanly factored apart. More concretely, this perfract factoring occurs in model-based planning algorithms where it is assumed that we have a world model,... [Read More]

Against ubiquitous alignment taxes

It is often argued that any alignment technique that works primarily by constraining the capabilities of an AI system to be within some bounds cannot work because it imposes too high an ‘alignment tax’ on the ML system. The argument is that people will either refuse to apply any method... [Read More]

Fingerprinting LLMs with their unconditioned distribution

When playing around with the OpenAI playground models, I noticed something very interesting occurs if we study the unconditioned distribution of the models. LLMs are generative models that try to learn the full joint distribution of tokens across text data on their internet and are trained with an autoregressive objective... [Read More]