The alignment problem is not new. We have been grappling with the fundamental core of alignment – making an agent follow the beliefs and values of another – for the entirety of human history. Any time anybody tries to get multiple people to work together in a coherent way to accomplish some specific goal there is an alignment problem. The ‘human alignment problem’ is fundamental. People have slightly differing values, goals, thoughts, information, and circumstances, and this heterogeneity prevents full cooperation and inevitably leads to the panopoly of failure modes studied in economics such as principal agent problems, ‘markets for lemons’, internal coalition politics and the ‘organizational imperative’, and so on. While the human alignment problem is very difficult and far from being solved, humanity has developed a suite of ‘social technologies’ which can increase effective alignment between individuals and reduce these coordination costs. Innovations such as states and governments, hierarchical organizations, property rights, standing armies, democracy, constitutions, and voting schemes, laws and courts, money and contracts, capitalism and the joint stock corporation, organized religion, ideology, propaganda etc can all be thought of as fundamentally attacking the human alignment problem. All of these innovations enable the behaviour and values of a large number of people to be synchronized and pointed at some specific target. A vast amount of human progress and power comes from these social ‘alignment’ technologies which also provide the resources and slack for technological progress to take place.

In AI alignment, we face a very similar, but fundamentally easier problem. This is because while with human alignment we have to build structures that can handle existing human minds solely through controlling (some of) their input, in AI alignment we have direct control over the construction of the mind itself including its architecture, the entirety of its inputs, its training process and, with perfect interpretability tools, the ability to monitor exactly what it is learning, how it is learning, and to directly edit and control its internal thoughts and knowledge. None of these abilities are (currently) possible with human minds. In theory, this should mean that we can align AI intelligences much better than we can align humans and that our abilities should scale much farther than current social technology. Indeed, since we expect AIs to reach extremely high levels of intelligence and capability compared to humans, if we are stuck with current human alignment methods such as markets, governemnts etc, then we are likely doomed.

Importantly, at the macro-scale, these coordination costs caused by alignment failures determine the shape of human civilization and history. Whenever there is an improvement to alignment, such as the first creation of governments and laws, written language and then the printing press enabling mass dissemination of ideology, rapid communication such as telegraphy and radio, centralized bureaucratic governments with large standing armies, have been accompanied with upheavals and the better alignment technology spreading out and ‘conquering’ other regions with this technology. Moreover, our level of alignment ability determines the maximum size of large scale entities that can be supported 1. We see this all the time historically. With better alignment technology, larger and more powerful states and organizations can exist stably. Moreover, large and powerful empires almost always collapse due to internal politics (i.e. coordination costs) and not (directly) due to an external threat.

An identical story plays out in economics along similar lines: we almost always see that the death or obselescence of existing large firms is caused by internal decay leading to a slow stagnation and eventual slide into irrelevancy rather than a direct death due either to competitors or an abrupt technological change. This is also analyzed explicitly in organizational economics and was first introduced by Coase in the ‘theory of the firm’. He asked, given that decentralized market price coordination, why do we organize economic activity into firms, which are coherent economic entities with internals that are not determined by market mechanisms, at all? Why should not the economy be organized just with everyone as an independent economic agent contracting out their services on the market? The answer he famously proposed is ‘transaction costs’. That is, there are fixed costs in contracting all work out to individual contractors such as the administrative overhead of managing this, search costs, legal costs, and fundamental issues with information asymmetry. These costs mean it is more efficient to vertically integrate many functions together into a combined economic unit which functions as a command economy internally.

But the converse question could instead be posed and is in fact more fundamental: Why do we have a decentralized market with many firms at all? Why is the whole economy not just organized into one gigantic ‘firm’? The fundamental reason, again, is coordination costs In economics the reason why firms don’t grow to simply swallow the entire economy is thought of as credit assignment costs. Theoretically, decentralized economic prices provide useful credit assignment signals to firms as to whether what they are doing is positive sum (profits) or negative sum (making losses). Without such corrective signals, when prices are controlled by external forces such as the government, or are simply nonexistent in large firms, then huge economic misallocations of resources can appear and continue on indefinitely until the surrounding structure collapses. But what are these credit assignment costs comprised of in reality? Again we find typical coordination costs such as imperfect information and principal agent problems. Even an ideal government, staffed with incorruptible and well-meaning bureaucrats, would find it incredibly difficult to figure out the exact consumer demand for millions of different goods, as well as regularly invent new ones as entrepreneurs in a market economy do. Just not enough information can reach and be parsed by a centralized government economic planner compared to the vast decentralized information processing given by market prices and a large number of entrepreneurs trying novel ideas. A much larger problem is principal agent problems where agents making up the larger ‘entity’ such as an organization or government act in their own selfish interest as opposed to the global interest of the higher level entity. This is essentially a kind of organizational cancer. Without any corrective price signals, there is nothing stopping internal individuals or coalitions from siphoning resources away from productive uses and towards enriching themselves or building up their own internal power base. Since they siphon resources to themselves, they will typically out-resource and hence outcompete internal competitors who are aligned and hence spend large amounts of their own resources on the actual organizational mission. In a competitive equilibrium, the growth of these internal cancers is constrained by the need to stay competitive with external adversaries and will eventually lead to the regular stagnation and death of organizations to be replaced with more internally aligned competitors which eventually succumb to the same malady. However, without any competitive pressure at all, these cancers can grow and grow until almost all resources are consumed by them and hardly anything is spent on the putative organizational mission.

The fact that similar phenomena and effects show up time and time again in all kinds of different circumstances point to coordination costs being fundamental. Moreover, these costs appear to be the main limiting constraints on the scale of a single ‘entity’ such as a firm, an organization ,or an empire. The purely ‘physical’ returns to scale are pretty much always positive. These give rise to positive feedback loops like more territory -> more resources -> larger armies -> can conquer more territory which could, if unhindered by internal coordination costs, continue on indefinitely until the entire universe is conquered by a single unified entity. This is the archetypal paperclipper 2. The negative returns on scale, that prevent this from happening come in the form of coordination costs. However, alignment technology is fundamentally about reducing these coordination costs (to 0 in the case of perfect alignment). If we assume that there is some unknown but fixed alignment ‘tech tree’ and that at equilibrium a superintelligence or universe of superintelligences will have maxed out this tech tree, then we can quite straightforwardly see that the maximum level of alignment, or the minimum coordination costs possible relative to scale, will determine the maximum size of ‘entities’ in the future and hence the long term equilibrium distribution of agents in the universe.

The fundamental constraint in the universe is distance, or communication time. Assuming no FTL technology is possible, then colonizing and exploiting the full volume of our lightcone will require keeping ‘copies’ of our civilization aligned despite communication lags of billions of years. This essentially requires perfect ‘zero-shot’ alignment, meaning that we can create agents, which exactly and perfectly perform whatever we want despite being completely independent superintelligences for an arbitrarily long amount of time, even as they accumulate vast amounts of resources, for instance entire galaxies worth of matter and energy if we send out Von-Neumann probes to colonize distant galaxies. Any entity, whether a paperclipper or some ensemble version of our civilization faces this exact and fundamental issue. Hence, whether the universe is ultimately controlled by one unfied entity, or by an incredible number of entities with diverse values (even if originating from a single entity before alignment breakdown) depends centrally on how successful alignment can ultimately be.

If complete zero-shot alignment is possible then, given the positive physical returns to scale, and the lack of any countervailing coordination costs, then the universe will eventually come to be dominated by a single unified entity. Whether this entity pursues some arbitrary goal or is just a pure power-seeker depends on how competitive was the equilibrium which it arose from. On the other hand, if zero-shot alignment is not possible, and divergence of values and goals is inevitable without correction, which cannot take place across astronomical distances, then we inevitably end up with a diverse universe of misaligned agents who are each pursuing separate goals in their own separated regions of the universe. The ‘size’ of the region controlled by each entity will then depend essentially on how rapidly divergence and misalignment sets in among copies of the original agent as it expands.

A natural followup thought is the longtermist question of what kind of future is preferable in terms of human values. If alignment is fully solvable so coordination costs go to 0, then it seems almost inevitable that the lightcone will be dominated by a single entity: if maintaining alignment and projecting power over astronomical distances is ‘easy’, then in an equilibrium of many initial agents, there will likely be very little slack and instrumental convergence will drive everyone towards pure power-seeking agents. Whichever of these agents ultimately ‘wins’ and conquers the lightcone, it seems very unlikely that it will create much of value according to our current values. On the other hand, if humanity is alone among the stars, and we manage to created an aligned superintelligence which gives us a decisive initial strategic advantage, then we can parlay that into total control over the lightcone with the concomitant vast potential utility that entails. This future is thus very high variance.

On the other hand, if alignment is fundamentally difficult, then human values are never going to be expressed in a large proportion of the lightcone before they mutate away into something inhuman and alien. However, since power projection is hard, then the values expressed in these alien regions are likely to be the result of plenty of slack, and not be pure powerseeking. The real question, then, is how valuable (to us) is the kind of values that emerge from divergence from our original values?

  1. Funnily enough, you can also see this phenomenon occur all the time in strategy games like Civilization or Paradox games. Almost always in such games ‘playing wide’ i.e. conquering/settling large territories is by far and away the dominant strategy, but then the game designers try to counteract this by adding direct maluses for growth such as technology or stability or something gets more expensive or worse. These maluses are essentially hacky ways to simulate the increasing coordination and alignment costs that must occur in larger organizations without being able to simulate the fine-grained details of e.g. the principal agent problems that actually cause these costs. 

  2. Interestingly, the paperclipper dominating the lightcone scenario requires both that a.) humanty cannot solve alignment, hence the creation of the paperclipper, but b.) that the paperclipper can solve alignment, such that it can fundamentally keep all of its copies, including those billions of lightyears away, perfectly aligned to the core paperclipping mission)