Many possible sources of danger from AI stem from the AI knowing a lot about itself, us, and the potential auditing and alignment methods we might try to use to either align the AI or to detect misaligned models. Suppose we have an AGI that contains a misaligned mesaoptimizer playing...
[Read More]
BCIs and the ecosystem of modular minds
Epistemic status: Much more speculative than previous posts but points towards an aspect of the future that is becoming clearer which I think is underappreciated at present. If you are interested in any of these thoughts please reach out.
[Read More]
Eigenvector successor representations
This technical note was originally written by me in early 2021 on ways to generalize the successor matrix and enable flexible generalization of reward functions across changing environments. It probably does not make much sense unless you are familiar with the successor representation in RL. I originally planned to explore...
[Read More]
Hedonic loops and taming RL
Everybody knows about the hedonic treadmill. Your hedonic state adjusts to your circumstances over time and quickly reverts to a mostly stable baseline. This is true of basic physiological needs – you feel hungry; you seek out food; you eat; you feel sated, and you no longer seek food. This...
[Read More]
Predictive coding networks can perform causal and counterfactual inference
This note was written around Christmastime 2021 (hence the Christmas theme) as my initial thoughts after figuring out that predictive coding networks could be straightforwardly adapted to perform causal inference. I intended to write this up into a proper paper and do more experiments verifying it works at scale as...
[Read More]