Beren's Blog

Deconfusing direct vs amortized optimization

Posted on September 25, 2022

Here, I want to present a new frame on different types of optimization, with the goal of helping deconfuse some of the discussions in AI safety around questions like whether RL agents directly optimize for reward, and whether generative models (i.e. simulators) are likely to develop agency. The key distinction... [Read More]

Intellectual progress in 2021

Posted on September 24, 2022

Overall 2021 was much less of a productive and growth year than was 2020. The main reason for this is that, in retrospect, 2020 was exceptional in that for the first time in Sussex in Chris Buckley’s lab I had proper research mentorship and direction and that I also lucked... [Read More]

Scaling laws vs individual differences

Posted on September 18, 2022

This is a quick post on something I have been confused about for a while. If an answer to this is known, please reach out and let me know! [Read More]

Towards concrete threat models for AGI

Posted on August 27, 2022

There are many facets to the alignment problem but one is as a computer security problem. We want to design a secure system to test our AGIs in to ensure they are aligned, which they cannot ‘break out of’. Having such a secure AGI box is necessary to have any... [Read More]

Probabilities multiply in our favour for AGI containment

Posted on August 27, 2022

This is a short post for a short point. One thing I just realized, which should have been obvious, is that for prosaic AGI containment mechanisms like various boxing variants,simulation, airgapping, adding regularizers like low impact, automatic interpretability checking for safe vs unsafe thoughts, constraining the training data, automatic booby-traps... [Read More]