Priors on the 10x Programmer

So, as regularly happens, Hackernews tied itself up in knots about the supposed 10x programmer. The usual conversation ensued. Some people point to John Carmack, Linus Torvalds, or whoever the programming hero of the day is. Surely, they exclaim, they must be 10x. Others have anecdotes about some programmer at their work who seems much more productive than everyone else, and has a hero story about them wading onto some crazy codebase and finding the deadly bug with hours to go before the deadline.

Then the other side dive in. Of course 10x programmers don’t exist; it’s all about the work environment. Some programmers get awesome greenfield projects which then become famous, making them look like 10x programmers, others get assigned cobol legacy codebase from 1984 strewn with Lovecraftian horrors to which even contributing a single line is a soul-wracking process. One type of programmer enjoys fame and plenty of HN admirers who think of then as 10x while the other type are doomed to an invisible stagnation.

Or there aren’t any 10x programmers because it’s all about writing clean code, and the supposed 10xs just write super complex unreadable hacks, which everyone else worships due to its complexity, but which is unmaintainable and generally disastrous. The “10x”er then leaves their complex mess behind for someone else (usually the commenter) to clean up.

Or there may be 10x programmers, but they are also all toxic jerks (just look at Linus), and getting along with people is much more important than code anyway.

This same conversation or some variant has repeated itself regularly about every few weeks for the entire time I’ve been reading HN. Every time I still read it for some reason, and every time I come away generally frustrated and uninspired. So, finally, I decided to write my two cents on this subject.

It is generally always surprising to me how, for such a technical community, HN discussions around the 10x are just so completely devoid of statistical thinking and ignorant of the basic facts of human variation. My aim here is not so much to comment directly on this “debate” as to set out what we know about variation and skill in other fields, which should then form the prior assumptions of the debate.

Prior 1: Programmer aptitude and skill are likely both distributed on a normal distribution.

This is because nearly every metric of intrinsic human variation is normally distributed. If either programming aptitude or skill was not normally distributed, this would be a huge and important finding.

Corrollary – this means +2sd, +3sd, 4sd and even +5sd programmers definitely exist. This is as unavoidable as it is in any other field. +2 and +3 STDs programmers are common enough that everyone in their career will likely encounter some number of them. +2sd programmers aren’t even that remarkable. These are only the top 5% and will have a solid distribution throughout industry. Elite companies with high selection bars (assuming they can test for programming aptitude with high specificity) will have substantial concentrations of these programmers. They will not seem particularly remarkable. Likely a good chunk of the HN userbase are +2sd programmers. +3sd programmers are top 0.1%. That is rare but easily common enough for the average HM commenter to meet many over the course of a career. These programmers are likely the fodder for most of the HN anecdotes. Given that the tech industry is at least moderately meritocratic, you would expect these to be primarily concentrated in elite specialist roles or high position. These are your senior engineers at google, your embedded or kernel developers, your game or graphics engine developers, your senior machine learning engineers. Many successful technical cofounders lie here.

There are necessarily also +4sd and maybe even +5sd programmers out there. These will be much rarer, but thanks to the ubiquitous connectivity of the internet, they are vastly more accessible than ever before. They have likely contributed to many large open source projects that you use and, of course you can read their blog posts or here hero stories about them even if you are unlikely to meet one unless at an elite company. These programmers are likely to be heavily overrepresented at elite roles in top companies doing novel and exciting work. Of course not all of them are; there will be many who due to bad luck or other personality traits are unable to reach these positions and will be in other positions. HN-hero programmers are likely in this category.

Prior 2: Project success mostly follows a power law distribution.

Most projects fail or only reach minor successes. A vanishingly small fraction of total projects obtain a huge success, with most projects eking out a living of minimal but vaguely sufficient success. The contributions of the top projects utterly dwarf huge fractions of the contributions of all other projects combined. How much good has Linux done for the world? Probably equal at least 50% of the total of all other open source OS projects in the whole of history. This is a power law distribution at its finest. Most things don’t matter, but a few events matter an absolutely huge amount. Basically the entire statistics of the distribution are determined by incredibly rare chance events occuring deep in the tails. Since project success is power-law while person-skill is normally distributed, a direct association of programmer skill with project success is misleading. Linus is not a better programmer than some large proportion of all open-source projects combined. Important to note is that (most) software is arguably a biased power-law with a long tail of positive outcomes but with a pretty sharp cutoff at 0 success. (Arguably it depends on your definition of success. For society project “success” can have a hugely negative long tail too - i.e. a genius programmer invents a devastating hack to a large amount of key infrastructure and exploits it. On the other hand in this model we conceptualise this as a long-tail positive event where “positive” means “impact”, either good or bad).

Prior 3: Project success is moderately correlated with programmer talent.

Linus is definitely an extremely skilled programmer with an extremely high natural aptitude for programming. He is likely +4 or +5sd. But the success of Linus is not particularly correlated at this high level with his own skill as a programmer. There are probably programmers with higher natural aptitude than Linus (and many of them) with vastly less project success. The prior probability of Linus creating a runaway success with linux, before it got started, was very low. A lot of it is skill, but a lot more of it is luck in the sense of being in the right time at the right place with the right idea (of course having the right idea and arguably being in the right place are also correlated with skill). Nevertheless, the probability of an average programmer making a success of Linux from scratch is effectively 0. There are strict talent floors and some reasonable correlations with programmer aptitude even high up into the gaussian tails, but it is not a perfect correlation. However there are likely also several OS-development wannabes just as talented as Linus but their basement-crafted kernels never saw the light of day through pure bad (or just plain mediocre) luck.

Prior 4: Skill is a function of both aptitude, environment, and experience.

While we might expect programming aptitude and programming skill to follow normal distributions by the central limit theorem (see above) they may not be perfectly correlated with each other. Let’s define a simple model whereby each programmer is born with some innate aptitude, normally distributed. However the skill also has significant environmental dependence. While their ultimate skill is definitely correlated with their aptitude, the experience and environment around learning the skill matters a huge amount too. A programmer in an environment surrounded by ambitious peers/competitors given or driven into a hard deadline on a project with huge future potential is likely to do significantly better and actually become significantly more skilled than one tasked with the soul-rotting COBOL codebase in a glacial industry with lazy colleagues who haven’t bothered to learn anything new since 1999. Of course there are also recursive correlations. A very high-aptitude programmer may be more likely to end up in the first than the second situation, but there are also large numbers of high-aptitude programmers for which it does not happen, and also lower innate aptitude programmers who are put into competitive situations which force them to develop much higher than expected levels of skill. I personally suspect that the noise here will actually be quite large thus reducing the correlation between innate aptitude and actually acquired skill to a more moderate level than one might expect.

Prior 5: Success will appear increasingly luck based as selection floors increase.

This is kind of a subtle point but I think it applies. Suppose we have some test that selects people increasingly deep into the natural aptitude tail. Say +4sd. Those who are measured by the test to be over +4sd achieve ‘success’ those below ‘fail’. If the test has any noise at all (which is a pretty safe assumption), then we would expect there to be people who are classified wrongly. They could have less actual skill than the test measures (they get lucky) while others get unlucky and fail even though their actual skill is above threshold. As we progress further into the tails we would generally expect the actual distribution of people who pass to be more and more noise based since samples high up in the distribution are becoming thin. Moreover, we would expect the average to be lower, since due to the exponential fall-off of the gaussian tail, many more people may actually start crossing the threshold by luck rather than by actual skill above threshold. A further decrease in statistical reliability will be due to the curse of small numbers which is a necessary corrollary to looking far into the tails of the distribution. This means that generally extrapolating extremely high levels of skill in the deep tails of distributions is a difficult and generally noisy task. Many of the people who seem to have succeed will actually have just got lucky and therefore worse than assumed and the filtered distribution as a whole will generally be worse than expected. Noise will tend to dominate outcomes more in this regime and this issue is then combined with small samples. Drawing firm conclusions from tail events is tricky.

Prior 6: The mapping from programmer skill to programmer EV will be highly nonlinear, likely power-law

This follows directly from the power-law distribution of projects. If project success depends even slightly on programmer skill, then increasing programmer skill will make heavy-tail positive events more likely. These events will dominate the ultimate expected value (EV) of the programmer, and thus EV will become a power-law distributed function of programmer skill. This is true in many fields. The EV created by those of the highest skill utterly dwarfs the contributions of many others with slightly lower skill.

What does this ultimately mean for the 10x programmer? It means that 1.) we should expect 10x programmers to exist. 2.) 10x programmers are both born and made. Crucially they must both have the innate aptitude and an environment driving an uncommon level of skill. 3.) If we are measuring skill by project successes, then the expected value of the programmer will be a nonlinear function of skill, and heavily influenced by chance and events in the power-law tails. 4.) That by looking at small N successes in the tails we will draw a very noisy and incomplete picture of everything.