Beren's Blog

My rough categorization of AI risk types

Posted on May 30, 2023

This is just a quick post to share a rough scheme I have for assessing the potential AI risk caused by specific models. It probably needs a lot more fleshing out and discussion to be fully general but I think it is a useful intuition pump. [Read More]

The case for removing alignment and ML research from the training data

Posted on May 28, 2023

Many possible sources of danger from AI stem from the AI knowing a lot about itself, us, and the potential auditing and alignment methods we might try to use to either align the AI or to detect misaligned models. Suppose we have an AGI that contains a misaligned mesaoptimizer playing... [Read More]

BCIs and the ecosystem of modular minds

Posted on April 23, 2023

Epistemic status: Much more speculative than previous posts but points towards an aspect of the future that is becoming clearer which I think is underappreciated at present. If you are interested in any of these thoughts please reach out. [Read More]

Eigenvector successor representations

Posted on April 21, 2023

This technical note was originally written by me in early 2021 on ways to generalize the successor matrix and enable flexible generalization of reward functions across changing environments. It probably does not make much sense unless you are familiar with the successor representation in RL. I originally planned to explore... [Read More]

Hedonic loops and taming RL

Posted on April 19, 2023

Everybody knows about the hedonic treadmill. Your hedonic state adjusts to your circumstances over time and quickly reverts to a mostly stable baseline. This is true of basic physiological needs – you feel hungry; you seek out food; you eat; you feel sated, and you no longer seek food. This... [Read More]