Understanding Overparametrized Generalization

This is a successor post to my previous post on Grokking Grokking. Here we present a heuristic argument as to why overparametrized neural networks appear to generalize in practice, and why this requires a substantial amount of overparametrization – i.e. the ability to easily memorize (sometimes called interpolate) the training... [Read More]

Grokking 'grokking'

Epistemic Status: This is not my speciality within ML and I present mostly speculative intuitions rather than experimentally verified facts and mathematically valid conjectures. Nevertheless, it captures my current thinking and intuitions about the phenomenon of ‘grokking’ in neural networks and of generalization in overparametrized networks more generally. [Read More]

Self-Hosting beren.io

This website (now www.beren.io) is now self hosted! No longer will it rely exist solely at Github/Microsoft’s discretion. It feels good. [Read More]

Selection and the Just World Fallacy

One thing that recently just struck me is that people (including me) often have a strong intutive sense of the just world fallacy when it comes to personal traits. People assume that if somebody has some great strength they must also have a great weakness. That if they are really... [Read More]