Thoughts on the Falsifiability of the Free Energy Principle

Since the publication of the defining Particular Physics monograph, a question which has been repeatedly raised is whether the Free Energy Principle (FEP) is falsifiable. In the literature, it is argued that the FEP is merely a principle and thus supposedly immune from falsification. This point is often made with an analogy to the Least Action principle from classical physics. The Least Action principle states that all motion can be thought of as minimizing some mathematical function, the Action \(\mathcal{A}\), which is the integral along some path of a special functional called the Lagrangian \(\mathcal{L}\). Given a Lagrangian, you can then apply the Euler-Lagrange equations to derive the laws of motion expected for that Lagrangian. Similarly, you can also go ‘backwards’ and, by observing the motion of the object, deduce the Lagrangian that describes its behaviour.

This analogy with the undoubtedly successful Least Action principle is revealing, since it shifts the focus onto precisely the differences between a principle like the Least Action principle and the FEP, and how falsifiability may not be the best way to understand the value and scientific utility of a principle.

Firstly, while the Least Action Principle cannot be completely falsified since, given any arbitrary law of motion, I can presumably come up with some Lagrangian that it minimizes. However if we instead use a more nuanced paradigm of Bayesian Model comparison under some universal prior, then we can judge whether a principle provides a parsiomious and in some sense elegant description of reality or not. For instance, I can also postulate the Principle of Prime Action which states that the laws of motion of any particle are such so as to make the Action a prime number. Now this “principle” is unfalsifiable since for any empirically observed trajectory, I can come up with some Lagrangian such that the action is prime. Nevertheless, there is a very real sense in which the Principle of Least Action is better than my Principle of Prime Action. The Lagrangians generated by the Least Action principle are generally very simple and “nice”, especially for simple systems like a free particle in a vacuum. However (although I haven’t actually done the maths on this), I suspect the “Lagrangians” generated by the Principle of Prime Action are nowhere near this “nice”. One way we could formalize and quantify this property of “niceness” is with the Solomonoff prior, which would weight the two principles by the length of program needed to specify the laws of the universe under each principle. The Solomonoff prior effectively functions as a way to formalize and quantify the intuitive concept of Occam’s Razor, which is that we should prefer simpler hypotheses to complex ones, all other things equal.

We could then consider a principle falsified if it receives an extremely low weight under the Solomonoff prior. This would mean that the principle is a-priori extremely unlikely, and if there is any other principle that can explain the data, then it should be preferred as a scientific explanation. Importantly this weighting can, in theory, be computed for something like the FEP and so this provides a way to get close to falsifying the FEP, or at least saying that under a Bayesian model comparison, the theory performs very poorly.

Another way we can judge the value of a principle, even without strictly falsifying it is to look at the intellectual advances it produces. The Principle of Least Action has been incredibly intellectually fruitful over several hundred years of physics. Not only does it unify essentially all of classical mechanics, it has been extended to quantum mechanics, quantum field theory, and generally permeates all of physics. Moreover, it has helped physicists derive new truths which would be difficult to conceive of without the theorem. Some examples of this is the beautiful Noether’s theorem, as well as the class of Gauge theories. I am not a physicist so undoubtedly there are many more examples. The FEP, even as a principle, can also be judged against this utilitarian criterion. It is not clear what previously extant fields of work can be unified or better understood with the FEP than without – a stark contrast to the principle of least action whereby the field of classical mechanics was well established before this principle came on the scene. In terms of intellectual fruitfulness, the FEP can lay claim to several process theories such as Active Inference, which can be, in theory, be empirically validated, and if it were true that the brain or general cognising systems were performing something like active inference then this would be a significant win for the FEP. In general though, it is probably too early to tell whether the FEP has yielded substantial theoretical advances outside of the FEP itself.

If we move from judging the utility of FEP intrinsically as a principle, to judging its utility for helping us actually understand the world we can obtain another criterion for falsifiability. Applicability.

In the FEP literature, it is often claimed implicitly or explicitly that the FEP is designed to handle and explain “sentient systems” in some deep sense, and especially systems such as the brain. For much of the literature, it is not enough that the FEP apply to some mathematical system. However, the FEP also places strong mathematical conditions on the kinds of systems it can model. Specifically, it requires that a.) That the system can be modelled as an ergodic Langevin equation with white, zero mean, Gaussian noise. b.) That the dynamics possess a non-equilibrium steady state density (NESS). c.) That this density satisfies a series of conditional independence conditions called the Markov Blanket (or Friston Blanket) conditions, which enable the states of the system to be partitioned into “internal”, “external”, and “blanket” states, and for the NESS density to enfore a conditional independence between internal and external states given the blanket states.

Whether a specific dynamical system, for instance: “the brain”, actually fulfills these conditions is an empirical question, and thus can be proven false. If, for instance, it could be shown that the brain a.) cannot be modelled as a Langevin dynamical system ¹ , b.) does not possess a NESS, or c.) cannot be modelled as possessing a Markov blanket, then the FEP would not apply to such a system. Although not directly attacking the FEP itself, this would, I believe, suffice to effectively falsify much of the literature which implicitly assumes that the FEP can be usefully used to analyze and describe systems like the brain.

The utility of FEP as a principle thus faces a deep question about the breadth of its scope. If the conditions required by the FEP were so strong that it could not apply to any interesting system in the real world, then this would be a real blow to theory even if it is not strictly a falsification. Implicit in the colloquial use of “principle” is that the principle applies to a wide range of systems of interest – the principle of least action applies to essentially all physical laws at the most basic level. If the FEP was true, but only about some contrived class of Ornstein-Uhlenbeck processes with no reasonable physical analogue, then this would be a severe deflation for the theory, even if not technically a falsification.

The conditions imposed by the FEP when taken literally are extremely strong, to the extent that likely cannot be directly applied to anything. For instance, it is clear that any physical system is not literally at NESS, since all biological systems age and die, and all physical systems must eventually succumb to the heat-death of the universe. The real question, then is the extent to which the FEP applies to systems even when the conditions required only approximately apply. Importantly, this is also an empirical question which is open to falsification. If the FEP is completely fragile to the slighest violation of its conditions, then since its literal conditions are so strong as to rule out effectively all systems, then that is tantamount to the FEP itself being falsified under the applicability criterion. To me, looking forward, this is one of the biggest theoretical challenges that need to be addressed. Currently, the fragility of the FEP to slight relaxations in its assumptions is not known at all, and is crucial for being able to assess the real world utility of the FEP. At best, the FEP could function in the same way that ‘point mass’ assumptions work in physics. Obviously your planet is not literally a point mass, but modelling the real mass distribution makes very little difference to the overall result. On the other hand, it is possible that the slighest infraction of the FEP conditions suffices to destroy the main results.

Nevertheless, it is clear that while the FEP cannot technically be falsified, the intrinsic parsimony and elegance of the principle can be practically computed and the utility and scientific value of the theory can and should be extensively tested. In the best case, the FEP would be a parsimonious and elegant principle, which applies to a wide range of systems and can derive fruitful scientific insights about them which are impossible to derive any other way. In the worst-case scenario, the FEP could apply only to a sterile and arcane subset of mathematical systems with no relevance to the real world, and it cannot produce a single meaningful scientific insight or result into the nature of sentient systems. Which case the FEP falls into is a question which can be figured out through mathematical research and empirical investigation and, in my opinion, the answer to this question is far more useful to assess its scientific value than the technical question of strict falsifiability.

(1) Although this seems like quite a loose condition, it does nevertheless impose some constraints. By modelling the system as a Langevin equation we effectively require 1.) that the rate of change of the system depends on the instantaneous value of its variables; the past cannot affect the dynamics. While this is probably true for the laws of physics, if we consider more abstract high-level systems, this condition could be violated. Additionally, the form of the Langevin equation requires additive white, Gaussian noise, which is a strong assumption on the noise which in practice could well be coloured or non-gaussian in complex systems.