Interpreting the Ising Model

A primer on the statistical physics behind susceptibilities for interpretability.

Authors
Andrew Gordon, Rohan Hitchcock, Daniel Murfet
Timaeus · † Post Writing Contributor · Correspondence to andrew@timaeus.co
Published
April 21, 2026
Authors
Andrew Gordon,Rohan Hitchcock,Daniel Murfet
Affiliations
Timaeus
Published
2026-04-21
† Post Writing Contributor;Correspondence to andrew@timaeus.co

Physics gives us interpretability for matter. Artificial neural networks are digital rather than physical, but the problem of interpretability for these systems has a conceptual and technical overlap with physics: we seek to understand internal structures (e.g. regular arrangements of atoms in a material, or features and circuits in a neural network) that form in complex systems in response to interaction with the environment (e.g. external fields for matter, training data for networks). Some of the same mathematics can be applied to “interpretability” in both cases; this is a special case of the old and fecund relations between statistical physics and statistical learning (Hopfield 1982; Seung et al. 1992).

For example, susceptibilities are a standard tool for the physicist trying to understand a material. The use of susceptibilities for neural network interpretability, as pioneered by Baker et al. (2025) and further developed in a series of recent papers (Wang et al. 2025; Gordon et al. 2026; Wang & Murfet 2026), including under the name of Bayesian influence functions (Kreer et al. 2025; Adam et al. 2025; Lee et al. 2025), has been developed empirically and theoretically. In this post, we will describe susceptibilities in the original setting of statistical physics, which illuminates and motivates their use as an interpretability tool.

The Boltzmann Distribution

Let’s begin with some definitions: A system consists of a space WW of configurations and a function H:WRH : W \to \mathbb{R} called the energy or Hamiltonian. The configurations might be continuous (the positions and momenta of particles in a gas, or the weights of a neural network) or discrete (the orientations of spins on a lattice, as in a model of a magnet). The Hamiltonian assigns to each configuration a real number, which in physics is the energy and in machine learning might be the loss.

At thermal equilibrium, the probability that the system is in configuration ww is proportional to eβH(w)e^{-\beta H(w)}, where β>0\beta > 0 is the inverse temperature. This is the Boltzmann distribution:

p(w)=1ZeβH(w),Z=WeβH(w)dw. p(w) = \frac{1}{Z}e^{-\beta H(w)}, \qquad Z = \int_W e^{-\beta H(w)}dw.

At high temperature (β0\beta \to 0), all configurations are equally likely. At low temperature (β\beta \to \infty) the distribution concentrates on the energy minima.

The Ising Model

A simple but nontrivial example of a system and energy function is the Ising model, which describes a lattice of interacting spins. Place a spin variable si=±1s_i = \pm 1 at each site ii of an L×LL \times L grid. A configuration is an assignment of a spin to every site. The Hamiltonian is

H(w)=Ji,jsisj H(w) = -J \sum_{\langle i,j \rangle} s_i s_j

where the sum runs over all pairs of nearest neighbors and J>0J > 0 is the coupling constant. Energy is minimized when all spins are aligned, giving two ground states: all +1+1 or all 1-1.

Magnetization+100²
100²
β = 0.250disordered
high temp (β = 0.10)βc ≈ 0.44low temp (β = 1.00)

Figure 1. The 2D Ising model on a 100×100 lattice with wrapping boundary conditions. Each pixel shows a single spin: blue (+1) or vermilion (−1). Configurations are sampled from the Boltzmann distribution via Metropolis–Hastings. A useful experiment: press Hot start to initialize all spins at random, then drag β\beta up past βc\beta_c and watch domains grow and coarsen. Or press Cold start (all spins aligned) at low β\beta, then drag upward to watch the ordered state dissolve into fluctuations as you cross the critical point βc0.44\beta_c \approx 0.44.

The configuration of such a system (the spin at every lattice site) fluctuates rapidly, and no individual configuration is itself meaningful. What carries physical content are ensemble averages: expectations of observables under the Boltzmann distribution. These are what laboratory measurements return, because real instruments integrate over many microscopic fluctuations.

One such quantity for the Ising system is the magnetization M(w)=isiM(w) = \sum_i s_i. At high temperature (β0\beta \to 0), the Boltzmann distribution is nearly uniform over configurations, the spins point in random directions, and the expected value of MM is approximately 0. At low temperature (β\beta \to \infty), the distribution concentrates on the two ground states and the expected value of M|M| is approximately the number of lattice sites N=L2N = L^2.

The remarkable fact is that the transition between these regimes is sharp: there is a critical inverse temperature

βc=12ln(1+2)0.4407 \beta_c = \tfrac{1}{2}\ln \left(1 + \sqrt{2}\right) \approx 0.4407

at which the system undergoes a phase transition. For β<βc\beta < \beta_c, the system is disordered (i.e. M/N0\langle |M| \rangle / N \approx 0). For β>βc\beta > \beta_c, long-range order emerges and M/N>0\langle |M| \rangle / N > 0.

This example illustrates the general pattern: by tracking the expectation value of a single observable (the magnetization) as a function of a parameter (β\beta), we detect a qualitative change in the internal organization of the system without ever inspecting individual spin configurations. The phase transition is a property of the Boltzmann distribution, not of any single configuration.

Observables and Expectation Values

Generalizing from the above, an observable is a function ϕ:WR\phi : W \to \mathbb{R} on the configuration space. Its expectation value under the Boltzmann distribution is

ϕ=Wϕ(w)p(w)dw=1ZWϕ(w)eβH(w)dw \langle \phi \rangle = \int_W \phi(w)\, p(w)\, dw = \frac{1}{Z} \int_W \phi(w)\, e^{-\beta H(w)}\, dw

The expectation value is the average of ϕ\phi over all configurations, weighted by their Boltzmann probability. Examples of observables in a magnetic system include: the magnetization M=isiM = \sum_i s_i, the energy HH itself, and the product sisjs_i s_j for a pair of spins. Different observables reveal different aspects of the system. The idea is that by choosing observables judiciously, we can probe the internal structure of the system.

Perturbing the Hamiltonian

So far the system, as defined by the space WW of configurations and the Hamiltonian HH, was fixed. However, the “real” Hamiltonian for a physical system contains terms that relate to interactions of the system with outside degrees of freedom and thus is, in some sense, never exactly fixed. It is natural to ask how fluctuations in these outside degrees of freedom affect the physics of our system.

If we are approaching this physics via expectation values ϕ\langle \phi \rangle for various observables ϕ\phi, then the natural approach to studying these fluctuations is to study derivatives of these quantities with respect to the relevant fluctuations. In the case of the Ising model, this takes the following concrete form: how does the magnetization change when we couple the spin at a lattice site with an external magnetic field, and allow that field to vary?

To make this precise, consider a one-parameter family of Hamiltonians

Hh(w)=H(w)hF(w) H_h(w) = H(w) - h \cdot F(w)

where FF is some observable and hh controls the perturbation strength. The perturbed Boltzmann distribution is ph(w)eβHh(w)=eβH(w)+βhF(w)p_h(w) \propto e^{-\beta H_h(w)} = e^{-\beta H(w) + \beta h F(w)}. To “introduce an external magnetic field” to the Ising model let KK be some rectangular subregion of the lattice, and let FF be its magnetization

F(w)=iKsi. F(w) = \sum_{i \in K} s_i.

The field couples to the magnetisation of KK with strength hh: for h>0h > 0, configurations with positive magnetisation in KK have lower energy and are favoured by the Boltzmann distribution. But because KK is coupled to its neighbours, biasing KK also biases the neighbours: they lower their interaction energy by also pointing +1+1. This bias propagates outward from KK, attenuated at each step by a factor depending on β\beta.

The simulation below lets you explore this directly. Select a region to designate it as the probe region KK (highlighted in yellow), then drag hh to apply the local field.

β = 0.600orderedh = +0.0
high temp (β = 0.10)βc ≈ 0.44low temp (β = 1.00)
h = −10h = 0h = +10

Figure 2. Click and drag on the lattice to draw a rectangular probe region (yellow border). Drag hh in the lower slider to apply the field hprectsp-h\sum_{p\in\text{rect}} s_p to every spin in the selection; positive hh biases the region toward +1+1, negative toward 1-1. To see an example of propagation, click “Cold Start” to initialize all the nodes at -1, and then drag the h slider to the left at different temperatures.

Once we’ve chosen a parametrized family of perturbations like above, the expectation value of any observable ϕ\phi becomes a function of hh:

ϕh=ϕ(w)eβ(H(w)hF(w))dweβ(H(w)hF(w))dw \langle \phi \rangle_h = \frac{\int \phi(w)e^{-\beta(H(w) - h F(w))}dw}{\int e^{-\beta(H(w) - h F(w))}dw}

The susceptibility is the first-order response to the perturbation:

χ=1βhϕhh=0 \chi = \frac{1}{\beta}\frac{\partial}{\partial h}\langle \phi \rangle_h \bigg|_{h=0}

At first glance, computing χ\chi looks expensive: we would have to simulate the system at many values of hh to estimate the derivative numerically. However, this is unnecessary. The fluctuation–dissipation theorem asserts that

χ=Cov[ϕ,F]=ϕFϕF \chi = \mathrm{Cov}[\phi, F] = \langle \phi F \rangle - \langle \phi \rangle \langle F \rangle

where the covariance is computed in the unperturbed ensemble at h=0h = 0. The derivative of an expectation equals a covariance. The derivation is short. With Z(h)=eβ(HhF)dwZ(h) = \int e^{-\beta(H - hF)}dw, differentiate

ϕh=ϕph(w)dw=1Zϕ(w)eβ(HhF)(w)dw \langle \phi \rangle_h = \int \phi \, p_h(w) dw = \frac{1}{Z} \int \phi(w) e^{-\beta(H - hF)(w)} dw

in hh using the product rule. The exponent brings down a factor of βF\beta F, and

hZ/Z=βFh. \partial_h Z / Z = \beta\langle F \rangle_h\,.

The two contributions combine to give

hϕh=β(ϕFhϕhFh)=βCovh[ϕ,F] \frac{\partial}{\partial h}\langle \phi \rangle_h = \beta\big(\langle \phi F\rangle_h - \langle \phi\rangle_h \langle F\rangle_h\big) = \beta \mathrm{Cov}_h[\phi, F]

and setting h=0h = 0 recovers the identity. The mechanism is general: whenever the Hamiltonian depends linearly on hh via hF-hF, the derivative of any expectation at h=0h=0 is a covariance with the generator of the perturbation (here FF). Note that the susceptibility, as a covariance, is itself the expectation value of some observable. Sometimes we denote the susceptibility χFϕ\chi^\phi_F to record the observable ϕ\phi and external field FF.

The intuition is that the unperturbed system already fluctuates around equilibrium, and those spontaneous fluctuations sample all possible responses. When ϕ\phi and FF tend to fluctuate together at h=0h=0, turning on a field that favours large FF pushes ϕ\langle \phi \rangle upward; the correlation in the unperturbed ensemble predicts the response. See Yeomans (1992) for more information.

Susceptibilities Reveal Structure

The main intuition we wish to communicate with this post is that expectation values in general, and susceptibilities in particular, probe the internal structure of a system. In this case by “system” we mean not only the configuration space WW and Hamiltonian HH, but some coupling of these ingredients to external fields (i.e. the function FF and control parameter hh). Strictly speaking, our notion of “internal structure” is jointly a property of the system and this coupling, that is, the structure we see may depend on how we see.

To make this point concretely we have to choose a system with structure. Consider a version of the Ising model where the boundary conditions are hard walls rather than periodic identifications, and a single internal wall runs from the top of the lattice to roughly two-thirds of the way down the centre. The result is a lattice of three “chambers”

  • A left chamber consisting of the rectangle bounded to the left by the leftmost wall of the box, to the right by the central wall, above by the top of the box and below by the imaginary horizontal line running across the box at the height of the lower extremity of the wall;
  • A right chamber similarly defined;
  • And a lower chamber which is disjoint from both, and runs across the width of the box.
β = 0.440near βc
high temp (β = 0.10)βc ≈ 0.44low temp (β = 1.00)
leftrightbottom

Figure 3. An 80×80 Ising lattice with hard outer walls (black border) and an internal wall running from the top to two-thirds of the way down the centre, creating three chambers. Click and drag to draw a rectangular probe region (yellow border). The plot shows the Pearson correlation (a scaled covariance) between the probe magnetization and each chamber’s total magnetization, computed as a cumulative average over tt sweeps since the probe was placed; the curves converge as 1/ ⁣t1/\!\sqrt{t}. Draw a new region to reset.

We are going to probe the system by varying the external magnetic field in some region pp, as in Figure 2 above with a strength controlled by the parameter hh. The response of the system is measured by seeing how the expectation value ϕ\langle \phi \rangle responds to this probe, for some ϕ\phi. In fact we will measure the response as a vector quantity, by choosing three observables; they are the magnetizations of the chambers ϕ=MC=iCsi\phi = M_C = \sum_{i \in C} s_i where C{left, right, lower}C \in \{\text{left, right, lower}\}.

From a physical perspective it makes sense that if we couple the spins in a probe region pp to the external magnetic field, which we then turn on, this perturbation will affect nearby spins more strongly than distant spins (depending on the value of β\beta). The influence will not spread through walls, since spins adjacent to but on opposite sides of a wall are not coupled in the Hamiltonian. Thus by varying the probe region pp we should be able to “see” the walls and chambers that are implicit in the structure of the Hamiltonian. That is, we should be able to see the structure of the system.

To be more precise, for any pair p,Cp, C we consider the response, or susceptibility

χpC:=1βhMCh=Cov[MC,Mp]. \chi^C_p := \frac{1}{\beta} \frac{\partial}{\partial h} \langle M_C \rangle_h = \mathrm{Cov}[M_C, M_p]\,.

Any given observable ϕ\phi only sees “part of the picture” and so we combine them to measure a vector response to the probe pp:

χp:=(χpleft,χpright,χplower). \chi_p := \big( \chi^{\text{left}}_p, \chi^{\text{right}}_p, \chi^{\text{lower}}_p \big)\,.

This vector is estimated by the plot in Figure 3 for any drawn pp as tt becomes large. The reader can verify for themselves that χp\chi_p takes on characteristic values for regions pp in the left, right and bottom chambers; moreover, it is sensitive even to the position within each chamber. It would be an interesting exercise to invert these measurements to infer the precise position of the wall.

This completes our simple demonstration of susceptibilities as a tool for interpreting a physical system: in this case, the Ising model with a particular design of the couplings between spins that represents a “wall”. The steps, in abstract terms:

  • Structure in a system is implicitly encoded in its Hamiltonian. Making this implicit structure explicit in a useful way is nontrivial, and indeed this is the problem of interpretability (i.e. interpretation is computation).
  • To access this information we couple the system to external degrees of freedom, each of which is “controlled” by some field strength hh.
  • For each external field FF and observable ϕ\phi there is an associated susceptibility χFϕ\chi^\phi_F. Stacking these across observables gives a vector quantity χF\chi_F.
  • Interactions between the constituents of the system will propagate fluctuations in the external field FF in characteristic ways, which depend on the nature of its internal structure. Some of the information on these propagations, and thus the internal structure, is recorded in χF\chi_F.
  • By studying the information in χF\chi_F (perhaps for multiple fields FF) we therefore “see” some aspect of the the internal structure of the system.

And in concrete terms, in the setting of the Ising model with an internal wall:

  • The wall is encoded in the Hamiltonian (via some spin-spin interactions being omitted). We imagine that either we don’t have direct access to the Hamiltonian, or that it is large and complicated and so the nature of this structure is not explicitly accessible.
  • To access this information we couple the system to an external magnetic field in a region pp controlled by a field strength hh.
  • The associated susceptibility vector is χp\chi_p.
  • By studying χp\chi_p as pp varies we learn how to “see” the wall hidden in the Hamiltonian.

Analogy to Neural Networks

The spectroscopy programme (Baker et al. 2025; Wang et al. 2025; Gordon et al. 2026; Wang & Murfet 2026) applies this framework to neural networks.

PhysicsNeural networks
Configuration space WWParameter space WW
Hamiltonian H(w)H(w)(Population) loss L(w)L(w)
Boltzmann distribution eβHe^{-\beta H}Tempered posterior enβLe^{-n\beta L}
External field perturbationData distribution perturbation
Observable ϕ\phi (e.g. magnetization)Function on parameter space ϕ\phi
Susceptibility 1βhϕ\frac{1}{\beta} \partial_h \langle \phi \rangleSusceptibility 1nβhϕ\frac{1}{n \beta} \partial_h \langle \phi \rangle

In this case the system is a neural network, with configuration space given by the space of possible weight vectors for the neural network and the role of the Hamiltonian being played by the population loss LL.

The structure we are interested in belongs to the Hamiltonian near the set of ground states, that is, low-loss parameters, and is implicit in the population loss in the same way that the “wall” was implicit in the interactions of the spins in our Ising model Hamiltonian. In the setting of neural networks, this structure is derived in part from the architecture of the network and in part from the structure of the data distribution.

To study this structure via susceptibilities we have to do two things: choose observables and choose “external fields”. Our observables are some functions ϕ1,,ϕH\phi_1, \ldots, \phi_H on parameter space. Our external fields are, in an abstract sense, ways to vary the population loss. Since the population loss is defined as the pairing of a loss density with the data distribution, one natural source of external fields are variations FF in the data distribution itself (which for example up- or down-weight some particular data point).

To our set of observables and chosen variation/field FF we associate a vector

χF:=(χFϕ1,,χFϕH) \chi_F := \big( \chi^{\phi_1}_F, \ldots, \chi^{\phi_H}_F \big)

exactly as above. The intuition is that different variations in the data distribution will have different effects on the population loss (just as different probe regions pp had different effects on the Ising model Hamiltonian) and these will be reflected in differences in the vectors χF\chi_F. The fundamental hypothesis of our approach to interpretability is that internal structure in the neural network (like walls in the Ising model) will leave its traces in these susceptibility vectors. Thus, by studying these vectors across a collection of variations FF, we can infer that internal structure.

Build on our work

Our tools for susceptibilities, local learning coefficients, and SGMCMC sampling are open source in the devinterp library.

Work with us

We're hiring Research Scientists, Engineers & more to join the team full-time.

Senior researchers can also express interest in a part-time affiliation through our new Research Fellows Program.