Overall 2021 was much less of a productive and growth year than was 2020. The main reason for this is that, in retrospect, 2020 was exceptional in that for the first time in Sussex in Chris Buckley’s lab I had proper research mentorship and direction and that I also lucked...
[Read More]
Scaling laws vs individual differences
This is a quick post on something I have been confused about for a while. If an answer to this is known, please reach out and let me know!
[Read More]
Towards concrete threat models for AGI
There are many facets to the alignment problem but one is as a computer security problem. We want to design a secure system to test our AGIs in to ensure they are aligned, which they cannot ‘break out of’. Having such a secure AGI box is necessary to have any...
[Read More]
Probabilities multiply in our favour for AGI containment
This is a short post for a short point. One thing I just realized, which should have been obvious, is that for prosaic AGI containment mechanisms like various boxing variants,simulation, airgapping, adding regularizers like low impact, automatic interpretability checking for safe vs unsafe thoughts, constraining the training data, automatic booby-traps...
[Read More]
Alignment needs empirical evidence
There has recently been a lot of discussion on Lesswrong about whether alignment is a uniquely hard problem because of the intrinsic lack of empirical evidence. Once we have an AGI, it seems unlikely we could safely experiment on it for a long time (potentially decades) until we crack alignment....
[Read More]