Beren's Blog

Do We Want Obedience or Alignment?

Posted on August 2, 2025

One question which I have occasionally pondered is: assuming that we actually succeed at some kind of robust alignment of AGI, what is the alignment target we should focus on? In general, this question splits into two basic camps. The first is obedience and corrigibility: the AI system should execute... [Read More]

Should we be behaviourst about an AI's values?

Posted on May 11, 2025

Epistemic note: Very short point and I’m pretty uncertain on this myself. Trying to work out the arguments in blog format. [Read More]

Preliminary Thoughts on Reward Hacking

Posted on April 27, 2025

Epistemic status: Early thoughts. Some ideas but no empirical testing or validation as yet. [Read More]

Why Not Sparse Hierarchical Graph Learning

Posted on March 1, 2025

Recently Noumenal Labs announced themselves and I read their white paper. Although pretty light on specifics, it seems pretty clear that their issues with LLMs and generally NNs is that they do not properly reflect in their structure the true underlying generative process of reality — effectively that they do... [Read More]

The Scaling Laws Are In Our Stars, Not Ourselves

Posted on March 1, 2025

Epistemic status: Pretty uncertain, this is a model I have been using to think about neural networks for a while, which does have some support, but is not completely rigorous. [Read More]