The use of susceptibilities in neural network interpretability, as pioneered by Baker et al. (2025) and further developed in a series of recent papers (Wang et al. 2025; Gordon et al. 2026; Wang & Murfet 2026), including under the name of Bayesian influence functions (Kreer et al. 2025; Adam et al. 2025; Lee et al. 2025), has its roots in the theory of statistical mechanics. In this post, we will describe susceptibilities in the original setting of statistical physics, which illuminates and motivates their use as an interpretability tool.
The Boltzmann Distribution
Let’s begin with some definitions: A system consists of a space of configurations and a function called the energy or Hamiltonian. The configurations might be continuous (the positions and momenta of particles in a gas, or the weights of a neural network) or discrete (the orientations of spins on a lattice, as in a model of a magnet). The Hamiltonian assigns to each configuration a real number, which in physics is the energy and in machine learning might be the loss.
At thermal equilibrium, the probability that the system is in configuration is proportional to , where is the inverse temperature. This is the Boltzmann distribution:
At high temperature (), all configurations are equally likely. At low temperature () the distribution concentrates on the energy minima.
The Ising Model
A simple but nontrivial example of a system and energy function is the Ising model, which describes a lattice of interacting spins. Place a spin variable at each site of an grid. A configuration is an assignment of a spin to every site. The Hamiltonian is
where the sum runs over all pairs of nearest neighbors and is the coupling constant. Energy is minimized when all spins are aligned, giving two ground states: all or all .
Figure 1. The 2D Ising model on a 100×100 lattice with wrapping boundary conditions. Each pixel shows a single spin: blue (+1) or vermilion (−1). Configurations are sampled from the Boltzmann distribution via Metropolis–Hastings. A useful experiment: press Hot start to initialize all spins at random, then drag up past and watch domains grow and coarsen. Or press Cold start (all spins aligned) at low , then drag upward to watch the ordered state dissolve into fluctuations as you cross the critical point .
This type of probabilistic system is difficult to directly comprehend. Its state—the exact spin at every lattice site—fluctuates rapidly. What we actually measure (in a laboratory or elsewhere) are quantities that are stable: averages over spatial regions and time intervals that are large compared to the microscopic scales but small compared to the macroscopic scales of interest. Temperature, pressure, magnetization: all of these are averages. The subject of thermodynamics begins with the recognition that these averaged quantities obey their own laws, independent of the microscopic details that have been aggregated away.
One such quantity for the Ising system is the magnetization , which measures the total alignment of the spins. At high temperature (), the Boltzmann distribution is nearly uniform over configurations, the spins point in random directions, and the expected value of is approximately 0. At low temperature (), the distribution concentrates on the two ground states and the expected value of is approximately .
The remarkable fact is that the transition between these regimes is sharp: there is a critical inverse temperature
at which the system undergoes a phase transition. For , the system is disordered (i.e. ). For , long-range order emerges and .
This example illustrates the general pattern: by tracking the expectation value of a single observable (the magnetization) as a function of a parameter (), we detect a qualitative change in the internal organization of the system—from disorder to order—without ever inspecting individual spin configurations. The phase transition is a property of the Boltzmann distribution, not of any single configuration.
Observables and Expectation Values
Generalizing from the above, an observable is a function on the configuration space. Its expectation value under the Boltzmann distribution is
The expectation value is the average of over all configurations, weighted by their Boltzmann probability. Since the underlying system can change quickly and chaotically, the value taken by an observable may be similarly ephemeral, but its expectation will be much better behaved. Examples of observables in a magnetic system include: the magnetization , the energy itself, and correlations between pairs of spins .
Different observables reveal different aspects of the system. The magnetization tells you about the alignment of spins; the correlation function tells you about the spatial extent of order; the energy tells you about the typical configuration. By choosing observables judiciously, we can efficiently probe the internal structure of the system.
Perturbing the Hamiltonian
We don’t need to limit ourselves to studying a single system, but can use this perspective to study changes to these expectation values of observables when the Hamiltonian is perturbed. For example, how does the system’s average energy respond when we introduce heat by exposing a small part of it to a heat source? Or how does the magnetization change when we hold up a magnet to it?
Consider a one-parameter family of Hamiltonians
where is some observable and controls the perturbation strength. So for positive values of the new system has low energy when is low and is large. The perturbed Boltzmann distribution is .
Consider the example of introducing a magnetic field. Let be some rectangular subregion of the lattice, and let be its magnetization
The perturbed Hamiltonian is : the field adds an energy penalty of times the overall magnetization in . The Boltzmann distribution shifts: configurations where has positive magnetization are favored. But because is coupled to its neighbours, biasing also biases the neighbours — they lower their interaction energy by also pointing . This bias propagates outward from , attenuated at each step by a factor depending on .
The simulation below lets you explore this directly. Select a region to designate it as the probe region (highlighted in yellow), then drag to apply the local field.
Figure 2. Click and drag on the lattice to draw a rectangular probe region (yellow border). Drag in the lower slider to apply the field to every spin in the selection; positive biases the region toward , negative toward . To see an example of propagation, click “Cold Start” to initialize all the nodes at -1, and then drag the h slider to the left at different temperatures.
Once we’ve chosen a parametrized family of perturbations like above, the expectation value of any observable becomes a function of :
The susceptibility is the first-order response to the perturbation:
The susceptibility is the basic tool by which condensed matter physicists interrogate the internal structure of complex systems. We can see this in the above display. At high temperature, perturbing the Hamiltonian only affects spins in the selected rectangle, but as temperature decreases towards the critical threshold, the region affected grows. If the perturbed Hamiltonian shifts the lattice points in the rectangle to be blue, those points influence the rectangle’s neighbors, which influence their neighbors, and so on. At the critical point, the influence spreads through the entire grid.
We can state this more formally by saying, the magnetic susceptibility, as a function of the temperature , diverges as approaches the critical temperature. This divergence signals the onset of long-range order.
More generally, susceptibilities for a range of observables encode complementary information about the system’s organisation. See Yeomans (1992) for more information. The key insight is that by studying how the system responds to perturbations, we learn about its internal organisation without needing direct access to the microscopic state.
Remarkably, a key result—the fluctuation–dissipation theorem—states that this derivative equals a covariance computed in the unperturbed ensemble: . This means we can learn how the system would respond to a perturbation by studying how it fluctuates in the absence of that perturbation. See Appendix A of (baker2025structural) for a derivation.
Correlations Reveal Structure
Can we detect where a given probe site is in a lattice just by measuring covariances?
Consider a version of the Ising model where the boundary conditions are hard walls rather than periodic identifications, and a single internal partition runs from the top of the lattice to roughly two-thirds of the way down the centre. The result is a lattice of three rooms: a left chamber and a right chamber, separated by the partition, both opening into a common lower region.
Figure 3. An 80×80 Ising lattice with hard outer walls (black border) and an internal partition running from the top to two-thirds of the way down the centre, creating three rooms. Click and drag to draw a rectangular probe region (yellow border). The plot shows the Pearson correlation (a scaled covariance) between the probe magnetization and each room’s total magnetization, computed as a cumulative average over sweeps since the probe was placed; the curves converge as . Draw a new region to reset.
In the above demo, you can easily recover what “room” a probe site is in, and even where it is within that room by measuring the covariance between the probe site’s magnetization and the three rooms’ magnetizations, exactly the data of a susceptibility where the observable is a room’s magnetization.
Analogy to Neural Networks
The structural inference programme (Baker et al. 2025; Wang et al. 2025; Gordon et al. 2026; Wang & Murfet 2026) applies this framework to neural networks.
| Physics | Neural networks |
|---|---|
| Configuration space | Parameter space |
| Hamiltonian | (Population) loss |
| Boltzmann distribution | Tempered posterior |
| External field perturbation | Data distribution perturbation |
| Observable (e.g. magnetization) | Component observable |
| Susceptibility | Per-token susceptibility |
The analogy between configuration space and parameter space is straightforward. But we extend it by comparing the random movement around configuration space subject to constraints minimizing energy to the movement around parameter space to minimize loss, i.e. learning.
The neural network “Hamiltonian”, i.e. the loss function is a sum of losses for individual tasks. It’s natural to perturb it by changing the distribution of these tasks. This can be a change in the corpus used for training by adding or removing texts, or it can be a synthetic adjustment by increasing or decreasing the importance to predicting a specific single token.
We primarily use change in loss from a given local minimum as an observable. This is often localised with a Dirac delta function so as to only record change in loss in a specific component of the model, see here for more details. This is analogous to magnetization of a particular room in the above example.
Conclusion
This is the first post in a series of three. In the second, we will present the more detailed definitions for susceptibilities in the context of neural nets, and describe how we compute them effectively at scale and work with the data.
In the third post, we showcase how this all actually works as a susceptibility tool. Just like how susceptibilities can indicate where a rectangular probe region is in a three room grid, they provide insights into how a large language model can understand text data.
Work with us
We're hiring Research Scientists, Engineers & more to join the team full-time.
Senior researchers can also express interest in a part-time affiliation through our new Research Fellows Program.
@article{gordon2026interpreting,
title={Interpreting the Ising Model},
author={Andrew Gordon and Rohan Hitchcock and Daniel Murfet},
year={2026}
}- 1.Structural Inference: Interpreting Small Language Models with SusceptibilitiesGarrett Baker, George Wang, Jesse Hoogland, Daniel Murfet, 2025. arXiv preprint arXiv:2504.18274.
- 2.Embryology of a Language ModelGeorge Wang, Garrett Baker, Andrew Gordon, Daniel Murfet, 2025. arXiv preprint arXiv:2508.00331.
- 3.Towards Spectroscopy: Susceptibility Clusters in Language ModelsAndrew Gordon, Garrett Baker, George Wang, William Snell, Stan van Wingerden, Daniel Murfet, 2026. arXiv preprint arXiv:2601.12703.
- 4.Patterning: The Dual of InterpretabilityGeorge Wang, Daniel Murfet, 2026. arXiv preprint arXiv:2601.13548.
- 5.Bayesian Influence Functions for Hessian-Free Data Attribution[link]Philipp Alexander Kreer, Wilson Wu, Maxwell Adam, Zach Furman, Jesse Hoogland, 2025.
- 6.The Loss Kernel: A Geometric Probe for Deep Learning Interpretability[link]Maxwell Adam, Zach Furman, Jesse Hoogland, 2025.
- 7.Influence Dynamics and Stagewise Data Attribution[link]Jin Hwa Lee, Matthew Smith, Maxwell Adam, Jesse Hoogland, 2025.
- 8.Statistical Mechanics of Phase TransitionsJ. M. Yeomans, 1992. Clarendon Press.