Don't argmax; Distribution match

I mentioned this briefly in a previous post, but thought I should expand on it a little. Basically, using argmax objectives, as in AIXI or many RL systems are intrinsically exceptionally bad from an alignment perspective due to the standard and well-known issues of goodhearting, ignoring uncertainty etc. There have... [Read More]

AGI will have learnt reward models.

There has been a lot of debate and discussion recently in the AI safety community about whether AGI will likely optimize for fixed goals or be a wrapper mind. The term wrapper mind is largely a restatement of the old idea of a utility maximizer, with AIXI as a canonical... [Read More]

Why not just stop FOOM?

AI alignment given FOOM seems exceptionally challenging in general. This is fundamentally because we have no reasonable bounds on the optimization power a post-FOOM agent can apply. Hence, for all we know, such an agent can go arbitrarily off distribution, which destroys our proxies, and also could defeat any method... [Read More]

Deconfusing direct vs amortized optimization

Here, I want to present a new frame on different types of optimization, with the goal of helping deconfuse some of the discussions in AI safety around questions like whether RL agents directly optimize for reward, and whether generative models (i.e. simulators) are likely to develop agency. The key distinction... [Read More]

Intellectual progress in 2021

Overall 2021 was much less of a productive and growth year than was 2020. The main reason for this is that, in retrospect, 2020 was exceptional in that for the first time in Sussex in Chris Buckley’s lab I had proper research mentorship and direction and that I also lucked... [Read More]