Deconfusing direct vs amortized optimization

Here, I want to present a new frame on different types of optimization, with the goal of helping deconfuse some of the discussions in AI safety around questions like whether RL agents directly optimize for reward, and whether generative models (i.e. simulators) are likely to develop agency. The key distinction... [Read More]

Intellectual progress in 2021

Overall 2021 was much less of a productive and growth year than was 2020. The main reason for this is that, in retrospect, 2020 was exceptional in that for the first time in Sussex in Chris Buckley’s lab I had proper research mentorship and direction and that I also lucked... [Read More]

Towards concrete threat models for AGI

There are many facets to the alignment problem but one is as a computer security problem. We want to design a secure system to test our AGIs in to ensure they are aligned, which they cannot ‘break out of’. Having such a secure AGI box is necessary to have any... [Read More]

Probabilities multiply in our favour for AGI containment

This is a short post for a short point. One thing I just realized, which should have been obvious, is that for prosaic AGI containment mechanisms like various boxing variants,simulation, airgapping, adding regularizers like low impact, automatic interpretability checking for safe vs unsafe thoughts, constraining the training data, automatic booby-traps... [Read More]