Physics gives us interpretability for matter. Artificial neural networks are digital rather than physical, but the problem of interpretability for these systems has a conceptual and technical overlap with physics: we seek to understand internal structures (e.g. regular arrangements of atoms in a material, or features and circuits in a neural network) that form in complex systems in response to interaction with the environment (e.g. external fields for matter, training data for networks). Some of the same mathematics can be applied to “interpretability” in both cases; this is a special case of the old and fecund relations between statistical physics and statistical learning (Hopfield 1982; Seung et al. 1992).
For example, susceptibilities are a standard tool for the physicist trying to understand a material. The use of susceptibilities for neural network interpretability, as pioneered by Baker et al. (2025) and further developed in a series of recent papers (Wang et al. 2025; Gordon et al. 2026; Wang & Murfet 2026), including under the name of Bayesian influence functions (Kreer et al. 2025; Adam et al. 2025; Lee et al. 2025), has been developed empirically and theoretically. In this post, we will describe susceptibilities in the original setting of statistical physics, which illuminates and motivates their use as an interpretability tool.
The Boltzmann Distribution
Let’s begin with some definitions: A system consists of a space of configurations and a function called the energy or Hamiltonian. The configurations might be continuous (the positions and momenta of particles in a gas, or the weights of a neural network) or discrete (the orientations of spins on a lattice, as in a model of a magnet). The Hamiltonian assigns to each configuration a real number, which in physics is the energy and in machine learning might be the loss.
At thermal equilibrium, the probability that the system is in configuration is proportional to , where is the inverse temperature. This is the Boltzmann distribution:
At high temperature (), all configurations are equally likely. At low temperature () the distribution concentrates on the energy minima.
The Ising Model
A simple but nontrivial example of a system and energy function is the Ising model, which describes a lattice of interacting spins. Place a spin variable at each site of an grid. A configuration is an assignment of a spin to every site. The Hamiltonian is
where the sum runs over all pairs of nearest neighbors and is the coupling constant. Energy is minimized when all spins are aligned, giving two ground states: all or all .
Figure 1. The 2D Ising model on a 100×100 lattice with wrapping boundary conditions. Each pixel shows a single spin: blue (+1) or vermilion (−1). Configurations are sampled from the Boltzmann distribution via Metropolis–Hastings. A useful experiment: press Hot start to initialize all spins at random, then drag up past and watch domains grow and coarsen. Or press Cold start (all spins aligned) at low , then drag upward to watch the ordered state dissolve into fluctuations as you cross the critical point .
The configuration of such a system (the spin at every lattice site) fluctuates rapidly, and no individual configuration is itself meaningful. What carries physical content are ensemble averages: expectations of observables under the Boltzmann distribution. These are what laboratory measurements return, because real instruments integrate over many microscopic fluctuations.
One such quantity for the Ising system is the magnetization . At high temperature (), the Boltzmann distribution is nearly uniform over configurations, the spins point in random directions, and the expected value of is approximately 0. At low temperature (), the distribution concentrates on the two ground states and the expected value of is approximately the number of lattice sites .
The remarkable fact is that the transition between these regimes is sharp: there is a critical inverse temperature
at which the system undergoes a phase transition. For , the system is disordered (i.e. ). For , long-range order emerges and .
This example illustrates the general pattern: by tracking the expectation value of a single observable (the magnetization) as a function of a parameter (), we detect a qualitative change in the internal organization of the system without ever inspecting individual spin configurations. The phase transition is a property of the Boltzmann distribution, not of any single configuration.
Observables and Expectation Values
Generalizing from the above, an observable is a function on the configuration space. Its expectation value under the Boltzmann distribution is
The expectation value is the average of over all configurations, weighted by their Boltzmann probability. Examples of observables in a magnetic system include: the magnetization , the energy itself, and the product for a pair of spins. Different observables reveal different aspects of the system. The idea is that by choosing observables judiciously, we can probe the internal structure of the system.
Perturbing the Hamiltonian
So far the system, as defined by the space of configurations and the Hamiltonian , was fixed. However, the “real” Hamiltonian for a physical system contains terms that relate to interactions of the system with outside degrees of freedom and thus is, in some sense, never exactly fixed. It is natural to ask how fluctuations in these outside degrees of freedom affect the physics of our system.
If we are approaching this physics via expectation values for various observables , then the natural approach to studying these fluctuations is to study derivatives of these quantities with respect to the relevant fluctuations. In the case of the Ising model, this takes the following concrete form: how does the magnetization change when we couple the spin at a lattice site with an external magnetic field, and allow that field to vary?
To make this precise, consider a one-parameter family of Hamiltonians
where is some observable and controls the perturbation strength. The perturbed Boltzmann distribution is . To “introduce an external magnetic field” to the Ising model let be some rectangular subregion of the lattice, and let be its magnetization
The field couples to the magnetisation of with strength : for , configurations with positive magnetisation in have lower energy and are favoured by the Boltzmann distribution. But because is coupled to its neighbours, biasing also biases the neighbours: they lower their interaction energy by also pointing . This bias propagates outward from , attenuated at each step by a factor depending on .
The simulation below lets you explore this directly. Select a region to designate it as the probe region (highlighted in yellow), then drag to apply the local field.
Figure 2. Click and drag on the lattice to draw a rectangular probe region (yellow border). Drag in the lower slider to apply the field to every spin in the selection; positive biases the region toward , negative toward . To see an example of propagation, click “Cold Start” to initialize all the nodes at -1, and then drag the h slider to the left at different temperatures.
Once we’ve chosen a parametrized family of perturbations like above, the expectation value of any observable becomes a function of :
The susceptibility is the first-order response to the perturbation:
At first glance, computing looks expensive: we would have to simulate the system at many values of to estimate the derivative numerically. However, this is unnecessary. The fluctuation–dissipation theorem asserts that
where the covariance is computed in the unperturbed ensemble at . The derivative of an expectation equals a covariance. The derivation is short. With , differentiate
in using the product rule. The exponent brings down a factor of , and
The two contributions combine to give
and setting recovers the identity. The mechanism is general: whenever the Hamiltonian depends linearly on via , the derivative of any expectation at is a covariance with the generator of the perturbation (here ). Note that the susceptibility, as a covariance, is itself the expectation value of some observable. Sometimes we denote the susceptibility to record the observable and external field .
The intuition is that the unperturbed system already fluctuates around equilibrium, and those spontaneous fluctuations sample all possible responses. When and tend to fluctuate together at , turning on a field that favours large pushes upward; the correlation in the unperturbed ensemble predicts the response. See Yeomans (1992) for more information.
Susceptibilities Reveal Structure
The main intuition we wish to communicate with this post is that expectation values in general, and susceptibilities in particular, probe the internal structure of a system. In this case by “system” we mean not only the configuration space and Hamiltonian , but some coupling of these ingredients to external fields (i.e. the function and control parameter ). Strictly speaking, our notion of “internal structure” is jointly a property of the system and this coupling, that is, the structure we see may depend on how we see.
To make this point concretely we have to choose a system with structure. Consider a version of the Ising model where the boundary conditions are hard walls rather than periodic identifications, and a single internal wall runs from the top of the lattice to roughly two-thirds of the way down the centre. The result is a lattice of three “chambers”
- A left chamber consisting of the rectangle bounded to the left by the leftmost wall of the box, to the right by the central wall, above by the top of the box and below by the imaginary horizontal line running across the box at the height of the lower extremity of the wall;
- A right chamber similarly defined;
- And a lower chamber which is disjoint from both, and runs across the width of the box.
Figure 3. An 80×80 Ising lattice with hard outer walls (black border) and an internal wall running from the top to two-thirds of the way down the centre, creating three chambers. Click and drag to draw a rectangular probe region (yellow border). The plot shows the Pearson correlation (a scaled covariance) between the probe magnetization and each chamber’s total magnetization, computed as a cumulative average over sweeps since the probe was placed; the curves converge as . Draw a new region to reset.
We are going to probe the system by varying the external magnetic field in some region , as in Figure 2 above with a strength controlled by the parameter . The response of the system is measured by seeing how the expectation value responds to this probe, for some . In fact we will measure the response as a vector quantity, by choosing three observables; they are the magnetizations of the chambers where .
From a physical perspective it makes sense that if we couple the spins in a probe region to the external magnetic field, which we then turn on, this perturbation will affect nearby spins more strongly than distant spins (depending on the value of ). The influence will not spread through walls, since spins adjacent to but on opposite sides of a wall are not coupled in the Hamiltonian. Thus by varying the probe region we should be able to “see” the walls and chambers that are implicit in the structure of the Hamiltonian. That is, we should be able to see the structure of the system.
To be more precise, for any pair we consider the response, or susceptibility
Any given observable only sees “part of the picture” and so we combine them to measure a vector response to the probe :
This vector is estimated by the plot in Figure 3 for any drawn as becomes large. The reader can verify for themselves that takes on characteristic values for regions in the left, right and bottom chambers; moreover, it is sensitive even to the position within each chamber. It would be an interesting exercise to invert these measurements to infer the precise position of the wall.
This completes our simple demonstration of susceptibilities as a tool for interpreting a physical system: in this case, the Ising model with a particular design of the couplings between spins that represents a “wall”. The steps, in abstract terms:
- Structure in a system is implicitly encoded in its Hamiltonian. Making this implicit structure explicit in a useful way is nontrivial, and indeed this is the problem of interpretability (i.e. interpretation is computation).
- To access this information we couple the system to external degrees of freedom, each of which is “controlled” by some field strength .
- For each external field and observable there is an associated susceptibility . Stacking these across observables gives a vector quantity .
- Interactions between the constituents of the system will propagate fluctuations in the external field in characteristic ways, which depend on the nature of its internal structure. Some of the information on these propagations, and thus the internal structure, is recorded in .
- By studying the information in (perhaps for multiple fields ) we therefore “see” some aspect of the the internal structure of the system.
And in concrete terms, in the setting of the Ising model with an internal wall:
- The wall is encoded in the Hamiltonian (via some spin-spin interactions being omitted). We imagine that either we don’t have direct access to the Hamiltonian, or that it is large and complicated and so the nature of this structure is not explicitly accessible.
- To access this information we couple the system to an external magnetic field in a region controlled by a field strength .
- The associated susceptibility vector is .
- By studying as varies we learn how to “see” the wall hidden in the Hamiltonian.
Analogy to Neural Networks
The spectroscopy programme (Baker et al. 2025; Wang et al. 2025; Gordon et al. 2026; Wang & Murfet 2026) applies this framework to neural networks.
| Physics | Neural networks |
|---|---|
| Configuration space | Parameter space |
| Hamiltonian | (Population) loss |
| Boltzmann distribution | Tempered posterior |
| External field perturbation | Data distribution perturbation |
| Observable (e.g. magnetization) | Function on parameter space |
| Susceptibility | Susceptibility |
In this case the system is a neural network, with configuration space given by the space of possible weight vectors for the neural network and the role of the Hamiltonian being played by the population loss .
The structure we are interested in belongs to the Hamiltonian near the set of ground states, that is, low-loss parameters, and is implicit in the population loss in the same way that the “wall” was implicit in the interactions of the spins in our Ising model Hamiltonian. In the setting of neural networks, this structure is derived in part from the architecture of the network and in part from the structure of the data distribution.
To study this structure via susceptibilities we have to do two things: choose observables and choose “external fields”. Our observables are some functions on parameter space. Our external fields are, in an abstract sense, ways to vary the population loss. Since the population loss is defined as the pairing of a loss density with the data distribution, one natural source of external fields are variations in the data distribution itself (which for example up- or down-weight some particular data point).
To our set of observables and chosen variation/field we associate a vector
exactly as above. The intuition is that different variations in the data distribution will have different effects on the population loss (just as different probe regions had different effects on the Ising model Hamiltonian) and these will be reflected in differences in the vectors . The fundamental hypothesis of our approach to interpretability is that internal structure in the neural network (like walls in the Ising model) will leave its traces in these susceptibility vectors. Thus, by studying these vectors across a collection of variations , we can infer that internal structure.
Build on our work
Our tools for susceptibilities, local learning coefficients, and SGMCMC sampling are open source in the devinterp library.
Work with us
We're hiring Research Scientists, Engineers & more to join the team full-time.
Senior researchers can also express interest in a part-time affiliation through our new Research Fellows Program.