CEM as inference

Author’s note: I originally wrote this draft in mid 2020 as maths for a paper I never got around to writing. I think it may be somewhat valuable but primarily archiving for historical reasons. [Read More]

The Paperclip King, by GPT4

I got access to the GPT4 API yesterday and was playing around. GPT4 managed to zero-shot this entire greentext with nothing other than the prompt: “Please write a 4chan greentext about a self replicating probe that converts the universe into paperclips”. [Read More]

Orthogonality is expensive

A common assumption about AGI is the orthogonality thesis, which argues that goals/utility functions and the core intelligence of an AGI system are orthogonal or can be cleanly factored apart. More concretely, this perfect factoring occurs in model-based planning algorithms where it is assumed that we have a world model,... [Read More]

Against ubiquitous alignment taxes

It is often argued that any alignment technique that works primarily by constraining the capabilities of an AI system to be within some bounds cannot work because it imposes too high an ‘alignment tax’ on the ML system. The argument is that people will either refuse to apply any method... [Read More]