Intellectual Progress in 2023

2023 has also been an interesting year. The first half of the year was at Conjecture with a brief stint cofounding Apollo and then cofounding a soon-to-be-revealed (with any luck) startup which I shall have to remain fairly quiet on for now. There has been lots of change and personal and professional growth — including the process of moving from Oxford to London and then from the UK to the Bay Area (at long last — it was inevitable). I feel like I have gained a very significant amount of tacit knowledge about how startups work — including fundraising, management, hiring, leadership, strategy etc — which I was completely lacking in practical experience in before this year and definitely before I left academia, and that these are all important skills to try to hone, although they are much less legible than standard academic skills reflected in degrees and papers. I have also gained a fair bit more practical experience in ML and training ML systems which may hopefully prove valuable in the future. Indeed, how exactly large style ML systems are trained has moved from the theoretical knowledge to the practical for me and hopefully will become more so over time.

This has also been a year of absolutely stunning AI progress (as I suspect many years to come will also be) which has been incredibly hard to keep up with in all the specifics, although I don’t actually think much has changed in the fundamental worldviews. Perhaps the biggest change is the democratisation and open-sourcing of powerful LLMs and diffusion models (crazy to think that stable diffusion was only in late 2022 and Llama was only in Feb 2023) which has lead to a massive surge in work and extensions to these models and which I feel has given us many many more datapoints about what AI looks like in progress, the beginnings of a slow takeoff, and how to control and actually utilise these models in the economy. Moreover, I feel that this year has given us a huge amount of new datapoints around AI alignment, now that techniques like RLHF/ RLAIF have been democratised and made widely available for the open-source community to tinker with. Indeed, it seems that at least for the LLMs and diffusion models — a large part of ‘alignment’ i.e. steerability and controllability have essentially been solved, and this fact is slowly being understood and admitted by the world. Of course, unsupervised generative models were never the core challenge of alignment to begin with, but nevertheless this is an extremely welcome update.

Probably the biggest intellectual change has been my rapidly evolving views on AI safety from when I started at Conjecture. Before joining, I had largely not thought about these issues deeply and imbibed pretty closely the standard Yudkowskian/MIRI worldview about many things as a kind of default view for somebody who read less wrong recreationally. Being at Conjecture forced me to seriously dive into these issues as well as grapple with the rapid ML progress which has led me to refine my views and update fairly strongly away from the MIRI consensus (although some of this may just be contrarianism due to being surrounded by similar views all day). I am now much more optimistic about alignment than many and I now think that alignment in the current paradigm (unsupervised generative models of sensory stimuli) is basically, or will be shortly, solved and was never even a problem to begin with. LLMs and the like as base models pose no inherent danger and can be fairly trivially aligned. Moreover, the open-source community has made great strides in experimenting with and making alignment a practical engineering question instead of some pseudomystical less wrong discussion. This is how all alignment should and must go — alignment will be solved when it becomes a practical engineering problem PhD students can tinker with. The main alignment risks are twofold: 1.) Alignment of general purpose agents with intrinsic drives and 2.) automation of the economy slowly rendering (at least baseline) humans entirely obsolete.

For 1.) it is clear that while existing models can be fairly straightforwardly aligned, they are also not AGI in the full sense. They are tools. The sensory cortices of a future AGI. However agency is coming and coming soon now that we have the base latent spaces to build upon, and aligning these general autonomous agents, perhaps equipped with intrinsic curiosity or worse Omohundro drives will be much more challenging and is still fairly open. I suspect that once again we can solve this as an engineering problem by gaining a careful practical understanding of what specific drives do to RL agents and how to build robust homeostatic motivational systems similar to (but much more precise and careful) the one instantiated in humans. This will however require open agent substrates for people to iteratively tinker with an improve, ideally before a massive lab scales something and ships with minimal testing ¹.

The big issue then becomes misuse, which I do not consider to likely cause existential risk (i.e. with current base models). With strongly agentic systems, misuse does have a serious X-risk potential which must be mitigated although I think realistically the only way to avoid this is to simply have a large population of aligned agents such that attempts to build a misaligned agent and have it gain power get thwarted by the fact that it faces extremely large amounts of aligned adversaries with the majority of the world’s resources behind them. Of course this assumption relies on the offense-defense balance holding and a slow takeoff scenario where there are not ridiculous power or intelligence disparities between agents. Given the irreversible open-source AI proliferation in the last year, I think that the key issue will realistically be building or evolving to some kind of solution which is robust to misuse, because misuse will happen. Even if we can perfectly align agentic AI to any goal, people will nevertheless build power-maximising selfish AGIs or even ones that desire the destruction of humanity for fun or scientific curiosity, and we must become robust to those agents existing in small numbers. However, AI proliferation also significantly improves alignment speed, since there are more AIs for people to tinker with and improve and alignment is very strongly encouraged by market dynamics since an unaligned AI is pretty useless for the things people want. There is thus a tradeoff and it seems that currently this tradeoff is weighted towards openness given the risks of misuse with current AI systems is so low, although this will likely change in the future.

For 2.) Even if we pass this filter, the second issue is essentially the slow automation of the economy and consequent obsolescing of humans entirely. If left unchecked there is little reason why this could not lead to human extinction due to essentially evolutionary competition with populations of ‘wild’ AI agents. To solve this we need to get back into the standard transhumanist playbook to work towards human transcendence — i.e. augmentation of natural humans via BCIs, uploading, genetic engineering, and other technologies to allow us to enhance our own capabilities and eventually merge with AI systems while preserving our values and consciousness ². Such augmentation will allow upgraded humans to realistically compete and cooperate with our own AI systems (which will ideally be mostly aligned). The second condition required for this is the maintenance of sufficient slack both to enable humans to participate in the ecosystem against potentially significantly advantaged pure AI systems as well as to enable large populations of baseline humans who either don’t want to be uploaded/augmented or cannot afford to for whatever reason to have good and fulfilling lives. This is fundamentally a question of both technical alignment and the specifics of future technologies as well as (and ultimately) a social and political question since the key issue is how to distribute the massive surplus from rapid technological advances — do we either spend it on increasing the malthusian limit for AI agents, or do we spend it on generating slack and directing the surplus to existing biological humans? The future is not yet written so we must all do what we can here.

Most of the existing analysis around AI systems assumes that we will be re-entering a fundamentally Malthusian era AGIs rapidly reproducing until all available energy and compute is taken by them which then leads to humanity being outcompeted. As far as I can tell this view originated from Robin Hanson. There are certain compelling reasons for this on priors — namely the triviality of copying — ‘reproducing’ an AGI compared to a human however I feel we lack a good enough understanding both of how the economics of AGI will play out as well as why humans are not currently at the Malthusian limit (i.e. whence the demographic transition) before we should place strong confidence on such a future in the short or medium term.

In terms of AI, the post-academia period (from when I left to join Conjecture to the present) has very much solidified my views on AI, AGI, and the singularity and confirmed that we are extremely likely in a slow takeoff world potentially lasting decades until we reach a singularity although with very short timelines for ‘AGI’ although due to compute limitations it will not be immediately transformative (arguably we may already have reached AGI or proto-AGI with GPT4). The broad contours of the future leading up to the singularity are, I think, fairly obvious and clear now although much remains open and uncertain. Moreover, the ‘blueprint’ for building AGI is also extremely clear and obvious now and while there are a few technical problems remaining — of significant difficulty — there is no reason why I think they cannot be addressed within the next few years. I think it is clear that we will have AGI (although not necessarily super intelligence) by 2030 and likely significantly sooner. However, it is becoming clear that the invention of ‘AGI’ won’t be a single point at the singularity but a point significantly before — perhaps similar to the event horizon. Once ‘AGI’ is invented (maybe?) our fate is inevitable but we have not yet hit the singularity, and will not for some time. In fact, nothing will appear to change in the short term. However, our fates will be largely sealed by the confluence of technological and economic forces that will come to bear and we will not be able to return to what we had; to what we have now.

On a personal note, the past year (really counting from September 2022 when I left academia) was significantly transformative for me — leaving academia, joining Conjecture and getting deeply enmeshed for the first time in the AI safety / less wrong / EA communities, figuring out my AI safety and AI views in general, experiencing startup life for the first time, and then leaving and going on to new ventures. All of it has led to major change and significant personal growth, although it was much less visible externally than the my previous periods of rapid growth in academia, where output in terms of papers is much easier to measure than now. In fact, my external output has been much diminished this year, although I continued to coast along on citations from previous papers and a few works still in the pipeline from my time in academia. Ultimately, though, I need to step the pace back up on this or else my academic life will continue to decline into irrelevancy. I still have many ideas and projects, I just need to get back into the paper publishing mindset and things will come back quickly. This year also I crossed the 1000 citation mark, which places me in a not-embarassing, but not-great place in the academic ranking. After five years in academia (roughly equivalent to US PhD) I am probably in the top 10% of PhD students by impact but not the top 1%, which is unfortunate and could be much improved and hopefully will be in the future. I definitely reached a good position within my super niche subfield of active inference and PC, now I just need to translate that to a much broader field of ML. ML, especially in pretraining is entering a weird phase where many things are extremely impactful and widely used but not cited that much — primarily because the labs that can do large scale retraining are now no longer publishing much. In the long run, we should expect this to dramatically slow-down ML progress due to a lack of openness and sharing of information, leading to the cartelization of information in the heads of the employees of the big labs, but whether this effect will kick in before AGI is unclear.

I also gained a very close understanding of and immersion in the effective altruism and AI safety communities while at Conjecture which was fascinating sociologically as well as very inspiring. Overall, the people in these communities are extremely smart, dedicated, and resourceful, the recruiting funnel into EA is insane as is the level of talent density they attract. EA is essentially the modern day Fabianism for good or for ill (undoubtedly a mix of both) and in the long run EA ideals and organizations will inevitably have a massive impact due to cohort effects. A significant fraction of the young high IQ elites in the west are at least exposed to EA ideas or affiliated in some way or another and thus will age into positions of power over the next 20 years. EA orgs are also well funded and well organised with extremely dedicated and idealistic members. EA is the only real competitor and counterreligion to wokeism in the minds of the college demographic today and is overall a much healthier and less pathological belief system. While I have never been completely onboard with EA from a philosophical standpoint, I respect and admire the movement as a whole and I think that overall it is highly effective and remarkably free from entropy at present (although sadly this cannot and will not last). Undoubtedly EA groups will continue gaining influence over AI developments and AI safety, both in governments and through entryism to the big AI labs. We must hope that they use their power wisely with a conservative but not insane focus on the tradeoffs of AI systems and an understanding of where the danger truly comes from. My hope is to continue putting down my thoughts here to have at least some small impact in the way things go in this debate.

This year I have also made big progress with my blog. I have started posting to less wrong and also semi-reliably posting to my personal blog — especially while I was still at Conjecture. Not only that but I have also been getting readers (!) and occasionally people discussing things on twitter, LW, or hacker news. This has been very gratifying and made me realise that I need to dedicate time to writing up all the many many thoughts I have which I think are valuable especially about topics which are not AI alignment. I will try to do some more of this next year but I can make no promises. Overall, though, I think I have managed to write enough posts to put together a reasonable complete view on AI alignment which is distinct from the mainstream and which was up to date with my views up until about the summer of this year so that is good. In the longer term, I hope to expand and broaden the scope of topics here away from just AI alignment towards more general questions of the future and understanding of the present. I also need to get back into posting on twitter since I spend too much time on it anyway and I should at least have some output there to market my writings here. The trouble with twitter and the like is that it is a highly addictive ideological vortex which you need time away from in order to develop and truly differentiated thinking but at the same time you cannot develop your thoughts in isolation without running into irrelevance. As always, a balance is needed.

Finally, a big downside to this greater professional experience has been a complete failure to do any meaningful extracurricular studies. My lecture video habit has completely collapsed now that I have so much more work to do including helping to run a startup, and I feel that while in the short term this is valuable it has hidden long term costs as my understanding of the world stagnates instead of grows and there are so many important intellectual frontiers I have not even got close to. My math skills essentially I have never got up to a high level and I strongly wish to improve my mathematical maturity yet I feel like I am unlikely to ever find the time to make the serious study of this that it deserves. I also have many random projects and thoughts I would like to devote time to but realistically cannot and am unlikely ever to be able to find the time given the current trajectory. Opportunity costs are biting me very strongly at present. Similarly, my book reading habits have completely collapsed and I read less than 5 books last year, which is extremely bad. I need to get this habits started again as I am just accumulating so many books which are sitting on my shelf unread, and there is something beautiful about returning to past eras as an antidote to the chaos of today.

One big benefit to academia (as opposed to industry / startups) I have noticed is that deep contemplation is very easy there and extremely difficult here. I used to be able to spend entire days working through the math of some particular paper or thinking deeply about some question. Now, there is no time for anything like that and if I do have time there are also constant disruptions and fires to put out. I now understand on a visceral level why it is so hard to maintain a good level of intellectual productivity and original intellectual / academic thinking in an industry and business environment. This is presumably also why the public output of most startup and business leaders is so anodyne and unoriginal — it is simply extremely hard to get the time for deep thought required — and the skills needed for success there are not those of the intellectual. Ultimately, I have to try to thread this needle, and I hope to be able to in the longer term find a way around this and dedicate enough time to intellectual development and exploration, including writing and deep focus on learning new topics, since this is fundamentally necessary to long term intellectual growth. Sadly, if things go well for me at the startup, things will be much more hectic and my time will become even more precious and constrained, so we shall see how this goes in practice. I find myself already wanting a sabbatical of a month or so simply to wind-down, think, and write up my increasing backlog of thoughts. Unfortunately, the opportunity costs are too great. You live and die by opportunity costs.

The world yet moves apace. Hopefully over the next few years, I can graduate from being a spectator in it to a player. I have taken some small steps towards that goal over the last year, but nevertheless everything is extremely precarious. I am very interested to see what 2024 has in store. No matter what you say about this timeline, it certainly is interesting.

On the plus side, from an intellectual perspective this is fascinating. Slowly, layer by layer, we are peeling back the mysteries of intelligence and revealing the core aspects. Undoubtedly, this will be one of the greatest intellectual breakthroughs in humanity’s history and give us a much greater insight into who we are, what we are, and how we came to be. What has been done to biology, through the theory of evolution, will be done to neuroscience and cognitive science. Ultimately this is a stage we have to pass through as a species and individual beings in this cosmos. We should be slow and cautious if we can, but through we must go. ↩
As an aside it is always super weird to me how much the philosophy of transhumanism seems to have been entirely forgotten and disappeared as movement at exactly the point when we are entering the slow takeoff to the singularity while it was huge a decade and a half ago (when I started interacting with it) and the transhumanist sphere formed the initial cultural milieu of less wrong and adjacent cultural hubs. This is also perhaps a symptom of the complete death of science fiction in the last decade or so which I also don’t understand at all. It is always weird how the discourse moves on and on and the great debates and questions of one period are never even broached in the next, even if nothing has appreciably changed ‘on the ground’. Ultimately, they are just intellectual fashions and change without trace like fashion, and yet they are irresistable to nerds like me, and they seem so real. ↩