One question which I have occasionally pondered is: assuming that we actually succeed at some kind of robust alignment of AGI, what is the alignment target we should focus on? In general, this question splits into two basic camps. The first is obedience and corrigibility: the AI system should execute...
[Read More]
Should we be behaviourst about an AI's values?
Epistemic note: Very short point and I’m pretty uncertain on this myself. Trying to work out the arguments in blog format.
[Read More]
Preliminary Thoughts on Reward Hacking
Epistemic status: Early thoughts. Some ideas but no empirical testing or validation as yet.
[Read More]
Why Not Sparse Hierarchical Graph Learning
Recently Noumenal Labs announced themselves and I read their white paper. Although pretty light on specifics, it seems pretty clear that their issues with LLMs and generally NNs is that they do not properly reflect in their structure the true underlying generative process of reality — effectively that they do...
[Read More]
The Scaling Laws Are In Our Stars, Not Ourselves
Epistemic status: Pretty uncertain, this is a model I have been using to think about neural networks for a while, which does have some support, but is not completely rigorous.
[Read More]