Epistemic status: Just something I was musing over. I don’t have any answers to this.
Something that is very obvious about humans is that they have strengths and weaknesses. Our intelligence is ‘spiky’. Some things we are good at; some things we are not. We often have an intrinsic talent for some kind of task or field and simultaneously deep incredible weaknesses in others. Moreover, these differences vary widely between individuals. It is not just that all humans are bad at X. Rather some humans, of seemingly roughly equal general intelligence, are good at math, others good at experimental sciences, others at music or writing or drawing etc1.
This is such a fundamental fact of the human condition that it barely seems worth discussing but it is in fact deeply mysterious (at least to me). Recent advances in AI also throw this into much sharper relief because today’s AI’s don’t seem to have this property. They are universally plastic. If you point a transformer model at some data distribution it will learn it. If you point it at a very broad distribution it will learn all the modes of that distribution based on their frequency and intrinsic entropy. You don’t have a model that is ‘good at math’ vs one that is ‘good at art’ except in ways that are straightforwardly related to the data distribution2. Certainly models ‘have talents’ in some vague sense – e.g. see the recent notion of ‘claudiness’ but these seem to be emerging now as pretraining scaling is fading and model performance is increasingly dominated by specialized data and environments curated by humans3.
When you think about it, the universal plasticity of AI models is natural while the spikiness of humans seems profoundly mysterious. Ultimately there has to be some neurobiological explanation for it, but it is weird a-priori. The cortex is mostly uniform and seems to operate a general learning algorithm. There should be no particular ‘a priori’ reason why any person’s learning algorithm should be very well suited for some specific tasks and very poor at others. There are obviously inductive biases etc built into the brain, as there are built into neural network architectures, but these biases are likely very general such as encoding translation or other geometric invariances of the world, or an intrinsic desire to explore evolutionarily relevant objects such as faces, rather than specify particular talents and weaknesses, especially for skills which never existed in the evolutionary environment. Similarly, one would expect that the inductive biases of the cortex would mostly be common across all humans rather than have extreme individual differences. Similarly, the neurobiological correlates between generalist measures of IQ seem to be, as expected, very low level but fundamental things such as brain volume, neuron count, degree of myelination etc. However, again these kinds of individual differences would be expected to make a person globally smarter or dumber in a universal way that would apply across all cognitive tasks4. There is no real reason these would be super specific to a particular skill or area.
To make this a bit more concrete, I, personally, am extremely bad at drawing. I have never been able to draw anything with any kind of fidelity to get it looking even vaguely right. Whenever I try to draw anything it always comes out incredibly distorted and weird looking. Similarly, my handwriting is extremely messy and bad (much to the chagrin of all my teachers in school). This seems to be a very specific deficit I have. I do not think I am generally bad at visual understanding or thinking. I can picture very detailed things in my mind’s eye and can vividly imagine scenes etc with rich visual detail. It is simply that I am unable to translate the picture in my mind onto the page5. Similarly, I am not bad at fine motor skills. I very rapidly pick up other skills which require fine motor skills such as playing instruments, touch-typing, or video-games without issue. I am also generally good at skill acquisition in most domains I put my hand to, as expected by the general IQ findings. This specific skill or set of capabilities though I am very bad at. It is some combination of spatial reasoning and manual dexterity of some very specific type that I am extremely poor at. Not only am I bad at it, but I find it incredibly difficult to learn and improve at it whenever I do try to learn it. The deficit in learning velocity is, in my opinion, the fundamental one and the actual realized low skill level is a result of that.
This also seems to be a human universal. Although almost everybody’s life will testify to this, I especially think of Scott’s work, the parable of the talents where he explores being ‘bad at math’ despite being brilliant at many other fields.
I don’t have any good hypothesis for what is going on and am just noticing that I am very confused. Here are a couple of possible hypotheses which either don’t seem to fit or don’t seem to fully understand what is going on:
1.) General IQ-style global impairments to e.g. connectivity, neuron efficiency, myelination etc. These obviously don’t seem to match the data since they predict pseudo-universal impairments. This fits the general construct of IQ as being a measure of general cognitive abilities, which are in fact heavily correlated, but the tails come apart quite dramatically when thinking about specific individual skills and talents.
2.) Some kind of specific impairment to a particular region, i.e. to the visual cortex, motor cortex, or the connectivity between regions. I still think this is hard to make it fit the specificity of skills that people can have a talent for. For instance, taking my drawing example, I am highly sceptical that ‘drawing ability’ can be localized to a precise brain region, especially one that is somehow unique enough cytoarchitecturally for it to be impaired by specific e.g. mutations or a specific developmental stage. More generally, this hypothesis fails to explain how people have specific talents (vs impairments) which seem more finegrained than full brain regions. I.e. there are plenty of people who are e.g. only good at programming but are bad at related fields like math, physics and other sciences, or writing. It seems hard to explain this as ‘the computer programming region has somehow avoided all mutations while the entire rest of the brain is impaired. Even if we were happy with broader categories like this person is good at spatial reasoning or something it seems much more unlikely that there would be a mutation that somehow massively boosts a region compared to everything else vs hurts it. This hypothesis would seem to imply that e.g. most people are fairly universal in their plasticity and uniform in talents and then a few mutants are freakishly good at something, but this does not seem to be the case; instead almost everybody has a somewhat lopsided skew of talents that they are good or bad at.
It is possible to try to turn this into some kind of central limit theorem style argument – i.e. there are so many mutations and they are mostly linear that you shouldn’t expect a discrete categorization of mutants vs non-mutants to hold. Rather you should expect a linear additive effects model to predominate. This is usually the case in psychometrics and what is found mostly by GWASs for many traits. However, I feel that this linear additive model does not explain the ‘spiky’ variance that we see. Instead, the linear additive model would again support a general IQ-style continuous degree of general competence. In the continuum limit of infinite mutations, we would expect every ‘region’ to have an identical mean number of mutations and hence total uniformity of skill determined entirely by mutation rate.
4.) Not necessarily impairments to a full region but some kind of genetically encoded ‘inductive bias’ instantiated in the neurobiology which results in being able to quickly learn certain topics vs others. This inductive bias would then lie on top of the general IQ-style ‘global’ functioning coefficient. I think something like this has to be the case but I am really struggling to think of how this can be implemented using known neurobiological variation. ‘Talents’ seem to lie in a mysterious region where they are more finegrained than the usual brain region/brodmann area analysis but at the same time clearly lie above the level of cortical columns, and laminar organization. At the higher level, it is certainly easy to think about mutations impairing the effectiveness of e.g. some broad region of cortex, or at the low level some specific cell type. Such large-scale mutations to a broad region of cortex would seem to impair capabilities at an even broader level than human talents operate at. Individual mutations to cell types which e.g. reduce the effectiveness of some particular type of glial cell, also seem like they would have idiosyncratic and also wide ranging effects across the entire brain.
It is hard to see how mutations at such a level can hit with the fine-grained precision needed to create definable ‘talents’ in terms of human-understandable fields. E.g. it is hard to see how there could be a mutation that specifically targets e.g. coding ability, since we have only had perhaps two generations since computers have been invented. Such a mutation could theoretically create differences at more abstract levels such as degree of systematising thinking, the kind of verbal/visual ability that can be used to understand sequential algorithms etc, however these seem broader than the specificity of human talents. E.g. although there are definitely correlations, there are also people who are good at coding but bad at e.g. math and vice versa, despite these seemingly requiring many of the same ‘underlying competencies’. One possibility is that talents are in fact broad but other ‘personality’ factors as well as e.g. feedback loops in interests and resulting data quantity and quality accentuate what are originally very minor differences into major ones.
5.) There is also the statistical ‘tails come apart’ argument. The idea here is that if there is correlation between X and Y then as we increasingly select on X we will observe a negative correlation on Y. This is just Berkson’s paradox and it is super easy to understand what is happening and why as a basic case of statistical selection theory. However, framing it in this way creates an additional set of questions. The original tails come apart idea relies on a notion of a selection floor which is determined by some function of the traits X and Y plus noise. Individuals that are above the floor have different statistical properties than those of the general source distribution from which they were selected. However, in the general case of human talent non-uniformity, there is no such obvious floor or selection process occuring. The only kind of implicit floor is some kind of social filtering that would impact my anecdotal experience. While this is theoretically possible, and most people (including very likely myself) underrate the power of social filtering, naively interpreted, this selection pattern would imply a negative correlation between all unique skills which is not observed. Moreover, ‘spikiness’ of talents seems to be a broad human universal observed in enough disparate circumstances that I am sceptical or pure selection explanations.
6.) Perhaps this whole idea of talents is fake anyway and it is just a question of habits and data input? I.e., some people are good at the things they are good at because they practiced them more and were more naturally interested and drawn towards them and bad at things because they have not practiced them and were naturally not interested in them. This is similar to how we shouldn’t expect a machine learning model to be good at a task where it has seen hardly any data and if there are failings in the model outputs some of our first responses are to try to ‘fix the data’ by adding more examples to correct the model’s behaviours on particular tasks. One obvious truth that covers part of the causality for this is that if you practice a lot and spend a lot of time working on particular skills you tend to improve, all things being equal. However, broadly, I think that for human talents this is mostly false and the causality is actually reversed: people spend a lot of time practicing something they are talented at and little time practicing things they they are untalented at. I think this is a mostly rational decision spurred on by observing relative rates of skill acquisition between you and your peers. This then shapes the ‘training dataset’ in a feedback loop which further accentuates differences in people’s final capabilities, however I think that the initial rate of learning speed is actually the prime determinant here6.
Coming back to my own experiences, during childhood and adolescence I certainly spent a lot more time pursuing my ‘talents’ (mostly video games honestly) than drawing, especially compared to those who were most skilled at it. However, whenever I was placed in a situation where I and other peers were starting at roughly the same place, I was noticeably worse and learned dramatically slower than my peers. This contrasts with most other academic situations where the roles were reversed and I was the one learning much faster. For my part (although you don’t have to believe me and there can be subtle effects I’m not aware of), in cases where I am fairly sure that concepts in e.g. mathematics, that I had not been exposed to before were introduced, I was almost always able to grasp them near instantly while many/most of the rest of the class struggled. At this time I did not spend any time studying outside of class so it is not like I had more data exposure7.
Similarly, in parable of the talents, it is not like Scott Alexander somehow just never came across math before vs people who have been doing math for a long time. Almost every child in school is exposed to a similar (and large) amount of math. Instead, school provides extremely accurate feedback about your relative rate of learning vis a vis the other children in the class and this initial information on learning rate eventually influences decisions on which areas to study and be more interested in and hence begins a feedback loop that leads to large differences in ‘data input’ and specialization down the road. Again, this is all a mostly rational process. Assuming you have a set of different talents with different relative ranks vs other people in different subjects, you should naturally specialize in the places where your talents are greatest since you can achieve the highest relative rank by doing so, and in many/most fields, relative rank is the primary thing that matters. Obviously, most schoolchildren are not thinking this through in this coldly rational way, instead there is a simple RL process of e.g. ‘being good’ at something feels nice, it gets you external rewards and validation from teachers, peers, etc; ‘being bad’ at something leads to feelings of internal frustration and futility, as well as negative feedback (especially from peers). Obviously, any RL process will steer more towards the ‘being good’ things than the ‘being bad’ things. There is a slight subtlety here around what the optimal thing to do is given your underlying objective. This is because most skill acquisition has diminishing returns. It is very easy (relatively speaking) to pick up the basics of a skill, but climbing the percentiles requires ever greater investment and initial talent. If you want to reach the highest relative rank in something then the optimal thing to do is focus almost exclusively on the region where you have the highest relative talent – i.e. greatest improvement rate. If you care the most about maximizing the integral over all your skills – i.e. the absolute amounts of ‘skill points’ you can accumulate – you should focus broadly on whichever area you have the highest improvement rate over time – switching between skills as you improve existing skills enough that their improvement rate diminishes and switching to fresh new skills with a whole lot of low hanging fruit left to learn. In ‘real life’ most outcomes depend on some mixture of the two. Certain vocations – e.g. competitive sports – care almost solely about relative rank in one particular niche and specialized skill. Most ‘real world’ jobs care about both having strong specializations but also a broad basin of generally applicable skills while a number of other paths require a huge range of adequate but not exceptional skills.
7.) Personality traits seem to have big impacts here but are not fully deterministic and clearly don’t completely control outcomes. There also seems to be a potential split between ‘interest’ and ‘talent’ which cannot be completely conflated. Clearly being talented at something is more likely to produce interest than being untalented, but it is possible that you have talents you are relatively uninterested in or vice versa. This just recurses the question though because how personality works at a neurobiological level is also something I am deeply confused about. This is additionally mysterious because a lot of personality seems to be determined by subcortical structures and broadly/undifferentiatedly acting neurotransmitters like dopamine, serotonin etc while ‘talent’ i.e. ability to quickly learn and master some specific area seems to be primarily a cortical phenomenon. If this is the case, how does ‘personality’ influence and put constraints on the general cortical plasticity and learning algorithms? My most likely opinion here is that it mostly doesn’t. Instead ‘talents’ and ‘personality’ are determined somewhat separately initially and then in some cases the relative strengths of personality and talent build on each other, while in other cases they destructively interfere, which gives people perhaps an even more lopsided skill profile than they would otherwise have.
8.) Maybe this is just a feature of any randomly initialized inscrutable learning algorithm and is due to random fluctuations during initialization or training? In this case we should expect individual neural networks to exhibit ‘talents’ where they are exceptionally competent at some aspect of the task relative to its frequency in the data distribution while being much weaker at others compared to e.g. similar neural networks trained with different seeds etc. From my anecdotal experience, neural networks don’t seem to have this (much), at least not as strongly as humans, but I have not done a systematic study nor am I aware of any.
One other interesting question here is the extent to which there are ‘positive’ and ‘negative’ talents vs just a high baseline and degrees of degradation from the baseline. In some sense, this is philosophical since both give roughly the same observations. However, from my personal anecdotal experience I do feel there are differences. Some people are generally good at many things but may have a few glaring and surprising weaknesses – hence ‘negative talents’. Others are generally mediocre across a lot of things but then have huge spikes upwards at the few things they are extremely good at – ‘positive talents’. To some extent this (probably a large extent?) talents are genetically mediated. It seems much more likely to have talents similar to your parents than would be expected by chance, so some kind of mutations must be involved somehow (and what else could it be?) – but at what level do they operate on?
This has been somewhat rambly, but to sum up the fundamental point of this: it is undeniable that people have talents or impairments in incredibly specific skills and categories of skills that don’t necessarily seem to be easily mapped to global deficits in things like ‘reasoning’ or ‘visual understanding’ or something at the level of whole brain regions. However, it is very hard to think of potential neurobiological mechanisms that would cause such a finegrained and lopsided skill profile for people. The broad underlying algorithms of the brain should point towards universal plasticity of the kind that DL models seem to exhibit. Inductive biases either seem to be too global and uniform to really be able to explain this. In some sense, the mutations causing some kind of ‘inductive bias’ that operates at just the right level somehow to cause the observed degree of surprising ‘spikiness’ in human capabilities vs the uniform plasticity of neural networks seems to be the most likely solution, even though I am deeply confused by how it would be implemented at a neurobiological level. Hopefully once (if?) we understand more deeply how cognition and skills are represented and learned in the brain, this problem will seem obvious and be dissolved. Until then, I notice I am confused.
-
In psychology this is called the study of individual differences. If you actually look at the literature however, it is mostly focusing on the opposite – trying to theoretize general constructs which can parsimoniously explain the factors of variation between people such as IQ and Big-5 personality traits. This is all well and good for explaining things at the coarse level, but it leaves much variance unexplained and that variance is interesting (to me). Where does it come from? What is the underlying psychological or neuroscientific basis for this variance? Is it just noise and in which case what is causing these distinct ‘noise patterns’ in ‘personality/aptitude’ space? ↩
-
Note I am saying this but I am not actually so sure. I have never seen any specific evidence for models having ‘talents’ in the same way that humans do due to some quirk of initialization but then again I am not sure that if this did exist anybody would have spotted it since the effects could be subtle. ↩
-
Generally I think that attempting to apply factor analysis methods like we do in individual differences to different AI systems might be interesting. Certainly there are readily apparent differences in ‘personality’ between e.g. Claude, ChatGPT, and Gemini. My strong hunch is that these are heavily related to differences in post-training datasets and system prompt rather than e.g. random seed or something fundamental in the different model architectures, but I don’t know for sure. ↩
-
And this certainly occurs with IQ. It is called the ‘positive manifold’ ↩
-
And this has caused me a lot of frustration at various times throughout my life because a lot of my thinking is visual but it’s almost impossible for me to communicate it because of this. ↩
-
Scott also strongly argues for this causality in his parable of the talents article, which I generally strongly agree with. ↩
-
In fact most likely I had less data on these topics than most of my classmates since many of them did explicitly study outside of class and also did homework while I mostly did not, preferring instead to spend time playing videogames. Under the data-centric view I should have been exceptionally bad at school rather than very good at it since I spent almost no time paying attention or deliberately studying essentially until I hit my PhD at which point I was mostly following intellectual interests directly. ↩