Interpreting the Ising Model | Timaeus Research

Authors

Andrew Gordon^†,Rohan Hitchcock^†,Daniel Murfet^†

Affiliations

Timaeus

Published

2026-04-21

† Post Writing Contributor;Correspondence to andrew@timaeus.co

Physics gives us interpretability for matter. Artificial neural networks are digital rather than physical, but the problem of interpretability for these systems has a conceptual and technical overlap with physics: we seek to understand internal structures (e.g. regular arrangements of atoms in a material, or features and circuits in a neural network) that form in complex systems in response to interaction with the environment (e.g. external fields for matter, training data for networks). Some of the same mathematics can be applied to “interpretability” in both cases; this is a special case of the old and fecund relations between statistical physics and statistical learning (Hopfield 1982; Seung et al. 1992).

For example, susceptibilities are a standard tool for the physicist trying to understand a material. The use of susceptibilities for neural network interpretability, as pioneered by Baker et al. (2025) and further developed in a series of recent papers (Wang et al. 2025; Gordon et al. 2026; Wang & Murfet 2026), including under the name of Bayesian influence functions (Kreer et al. 2025; Adam et al. 2025; Lee et al. 2025), has been developed empirically and theoretically. In this post, we will describe susceptibilities in the original setting of statistical physics, which illuminates and motivates their use as an interpretability tool.

The Boltzmann Distribution

Let’s begin with some definitions: A system consists of a space $W$ of configurations and a function $H : W \to \mathbb{R}$ called the energy or Hamiltonian. The configurations might be continuous (the positions and momenta of particles in a gas, or the weights of a neural network) or discrete (the orientations of spins on a lattice, as in a model of a magnet). The Hamiltonian assigns to each configuration a real number, which in physics is the energy and in machine learning might be the loss.

At thermal equilibrium, the probability that the system is in configuration $w$ is proportional to $e^{-\beta H(w)}$ , where $\beta > 0$ is the inverse temperature. This is the Boltzmann distribution:

p(w) = \frac{1}{Z}e^{-\beta H(w)}, \qquad Z = \int_W e^{-\beta H(w)}dw.

At high temperature ( $\beta \to 0$ ), all configurations are equally likely. At low temperature ( $\beta \to \infty$ ) the distribution concentrates on the energy minima.

The Ising Model

A simple but nontrivial example of a system and energy function is the Ising model, which describes a lattice of interacting spins. Place a spin variable $s_i = \pm 1$ at each site $i$ of an $L \times L$ grid. A configuration is an assignment of a spin to every site. The Hamiltonian is

H(w) = -J \sum_{\langle i,j \rangle} s_i s_j

where the sum runs over all pairs of nearest neighbors and $J > 0$ is the coupling constant. Energy is minimized when all spins are aligned, giving two ground states: all $+1$ or all $-1$ .

Magnetization+100²

−100²

β = 0.250disordered

high temp (β = 0.10)βc ≈ 0.44low temp (β = 1.00)

Figure 1. The 2D Ising model on a 100×100 lattice with wrapping boundary conditions. Each pixel shows a single spin: blue (+1) or vermilion (−1). Configurations are sampled from the Boltzmann distribution via Metropolis–Hastings. A useful experiment: press Hot start to initialize all spins at random, then drag $\beta$ up past $\beta_c$ and watch domains grow and coarsen. Or press Cold start (all spins aligned) at low $\beta$ , then drag upward to watch the ordered state dissolve into fluctuations as you cross the critical point $\beta_c \approx 0.44$ .

The configuration of such a system (the spin at every lattice site) fluctuates rapidly, and no individual configuration is itself meaningful. What carries physical content are ensemble averages: expectations of observables under the Boltzmann distribution. These are what laboratory measurements return, because real instruments integrate over many microscopic fluctuations.

One such quantity for the Ising system is the magnetization $M(w) = \sum_i s_i$ . At high temperature ( $\beta \to 0$ ), the Boltzmann distribution is nearly uniform over configurations, the spins point in random directions, and the expected value of $M$ is approximately 0. At low temperature ( $\beta \to \infty$ ), the distribution concentrates on the two ground states and the expected value of $|M|$ is approximately the number of lattice sites $N = L^2$ .

The remarkable fact is that the transition between these regimes is sharp: there is a critical inverse temperature

\beta_c = \tfrac{1}{2}\ln \left(1 + \sqrt{2}\right) \approx 0.4407

at which the system undergoes a phase transition. For $\beta < \beta_c$ , the system is disordered (i.e. $\langle |M| \rangle / N \approx 0$ ). For $\beta > \beta_c$ , long-range order emerges and $\langle |M| \rangle / N > 0$ .

This example illustrates the general pattern: by tracking the expectation value of a single observable (the magnetization) as a function of a parameter ( $\beta$ ), we detect a qualitative change in the internal organization of the system without ever inspecting individual spin configurations. The phase transition is a property of the Boltzmann distribution, not of any single configuration.

Observables and Expectation Values

Generalizing from the above, an observable is a function $\phi : W \to \mathbb{R}$ on the configuration space. Its expectation value under the Boltzmann distribution is

\langle \phi \rangle = \int_W \phi(w)\, p(w)\, dw = \frac{1}{Z} \int_W \phi(w)\, e^{-\beta H(w)}\, dw

The expectation value is the average of $\phi$ over all configurations, weighted by their Boltzmann probability. Examples of observables in a magnetic system include: the magnetization $M = \sum_i s_i$ , the energy $H$ itself, and the product $s_i s_j$ for a pair of spins. Different observables reveal different aspects of the system. The idea is that by choosing observables judiciously, we can probe the internal structure of the system.

Perturbing the Hamiltonian

So far the system, as defined by the space $W$ of configurations and the Hamiltonian $H$ , was fixed. However, the “real” Hamiltonian for a physical system contains terms that relate to interactions of the system with outside degrees of freedom and thus is, in some sense, never exactly fixed. It is natural to ask how fluctuations in these outside degrees of freedom affect the physics of our system.

If we are approaching this physics via expectation values $\langle \phi \rangle$ for various observables $\phi$ , then the natural approach to studying these fluctuations is to study derivatives of these quantities with respect to the relevant fluctuations. In the case of the Ising model, this takes the following concrete form: how does the magnetization change when we couple the spin at a lattice site with an external magnetic field, and allow that field to vary?

To make this precise, consider a one-parameter family of Hamiltonians

H_h(w) = H(w) - h \cdot F(w)

where $F$ is some observable and $h$ controls the perturbation strength. The perturbed Boltzmann distribution is $p_h(w) \propto e^{-\beta H_h(w)} = e^{-\beta H(w) + \beta h F(w)}$ . To “introduce an external magnetic field” to the Ising model let $K$ be some rectangular subregion of the lattice, and let $F$ be its magnetization

F(w) = \sum_{i \in K} s_i.

The field couples to the magnetisation of $K$ with strength $h$ : for $h > 0$ , configurations with positive magnetisation in $K$ have lower energy and are favoured by the Boltzmann distribution. But because $K$ is coupled to its neighbours, biasing $K$ also biases the neighbours: they lower their interaction energy by also pointing $+1$ . This bias propagates outward from $K$ , attenuated at each step by a factor depending on $\beta$ .

The simulation below lets you explore this directly. Select a region to designate it as the probe region $K$ (highlighted in yellow), then drag $h$ to apply the local field.

β = 0.600orderedh = +0.0

high temp (β = 0.10)βc ≈ 0.44low temp (β = 1.00)

h = −10h = 0h = +10

Figure 2. Click and drag on the lattice to draw a rectangular probe region (yellow border). Drag $h$ in the lower slider to apply the field $-h\sum_{p\in\text{rect}} s_p$ to every spin in the selection; positive $h$ biases the region toward $+1$ , negative toward $-1$ . To see an example of propagation, click “Cold Start” to initialize all the nodes at -1, and then drag the h slider to the left at different temperatures.

Once we’ve chosen a parametrized family of perturbations like above, the expectation value of any observable $\phi$ becomes a function of $h$ :

\langle \phi \rangle_h = \frac{\int \phi(w)e^{-\beta(H(w) - h F(w))}dw}{\int e^{-\beta(H(w) - h F(w))}dw}

The susceptibility is the first-order response to the perturbation:

\chi = \frac{1}{\beta}\frac{\partial}{\partial h}\langle \phi \rangle_h \bigg|_{h=0}

At first glance, computing $\chi$ looks expensive: we would have to simulate the system at many values of $h$ to estimate the derivative numerically. However, this is unnecessary. The fluctuation–dissipation theorem asserts that

\chi = \mathrm{Cov}[\phi, F] = \langle \phi F \rangle - \langle \phi \rangle \langle F \rangle

where the covariance is computed in the unperturbed ensemble at $h = 0$ . The derivative of an expectation equals a covariance. The derivation is short. With $Z(h) = \int e^{-\beta(H - hF)}dw$ , differentiate

\langle \phi \rangle_h = \int \phi \, p_h(w) dw = \frac{1}{Z} \int \phi(w) e^{-\beta(H - hF)(w)} dw

in $h$ using the product rule. The exponent brings down a factor of $\beta F$ , and

\partial_h Z / Z = \beta\langle F \rangle_h\,.

The two contributions combine to give

\frac{\partial}{\partial h}\langle \phi \rangle_h = \beta\big(\langle \phi F\rangle_h - \langle \phi\rangle_h \langle F\rangle_h\big) = \beta \mathrm{Cov}_h[\phi, F]

and setting $h = 0$ recovers the identity. The mechanism is general: whenever the Hamiltonian depends linearly on $h$ via $-hF$ , the derivative of any expectation at $h=0$ is a covariance with the generator of the perturbation (here $F$ ). Note that the susceptibility, as a covariance, is itself the expectation value of some observable. Sometimes we denote the susceptibility $\chi^\phi_F$ to record the observable $\phi$ and external field $F$ .

The intuition is that the unperturbed system already fluctuates around equilibrium, and those spontaneous fluctuations sample all possible responses. When $\phi$ and $F$ tend to fluctuate together at $h=0$ , turning on a field that favours large $F$ pushes $\langle \phi \rangle$ upward; the correlation in the unperturbed ensemble predicts the response. See Yeomans (1992) for more information.

Susceptibilities Reveal Structure

The main intuition we wish to communicate with this post is that expectation values in general, and susceptibilities in particular, probe the internal structure of a system. In this case by “system” we mean not only the configuration space $W$ and Hamiltonian $H$ , but some coupling of these ingredients to external fields (i.e. the function $F$ and control parameter $h$ ). Strictly speaking, our notion of “internal structure” is jointly a property of the system and this coupling, that is, the structure we see may depend on how we see.

To make this point concretely we have to choose a system with structure. Consider a version of the Ising model where the boundary conditions are hard walls rather than periodic identifications, and a single internal wall runs from the top of the lattice to roughly two-thirds of the way down the centre. The result is a lattice of three “chambers”

A left chamber consisting of the rectangle bounded to the left by the leftmost wall of the box, to the right by the central wall, above by the top of the box and below by the imaginary horizontal line running across the box at the height of the lower extremity of the wall;
A right chamber similarly defined;
And a lower chamber which is disjoint from both, and runs across the width of the box.

β = 0.440near βc

high temp (β = 0.10)βc ≈ 0.44low temp (β = 1.00)

leftrightbottom

Figure 3. An 80×80 Ising lattice with hard outer walls (black border) and an internal wall running from the top to two-thirds of the way down the centre, creating three chambers. Click and drag to draw a rectangular probe region (yellow border). The plot shows the Pearson correlation (a scaled covariance) between the probe magnetization and each chamber’s total magnetization, computed as a cumulative average over $t$ sweeps since the probe was placed; the curves converge as $1/\!\sqrt{t}$ . Draw a new region to reset.

We are going to probe the system by varying the external magnetic field in some region $p$ , as in Figure 2 above with a strength controlled by the parameter $h$ . The response of the system is measured by seeing how the expectation value $\langle \phi \rangle$ responds to this probe, for some $\phi$ . In fact we will measure the response as a vector quantity, by choosing three observables; they are the magnetizations of the chambers $\phi = M_C = \sum_{i \in C} s_i$ where $C \in \{\text{left, right, lower}\}$ .

From a physical perspective it makes sense that if we couple the spins in a probe region $p$ to the external magnetic field, which we then turn on, this perturbation will affect nearby spins more strongly than distant spins (depending on the value of $\beta$ ). The influence will not spread through walls, since spins adjacent to but on opposite sides of a wall are not coupled in the Hamiltonian. Thus by varying the probe region $p$ we should be able to “see” the walls and chambers that are implicit in the structure of the Hamiltonian. That is, we should be able to see the structure of the system.

To be more precise, for any pair $p, C$ we consider the response, or susceptibility

\chi^C_p := \frac{1}{\beta} \frac{\partial}{\partial h} \langle M_C \rangle_h = \mathrm{Cov}[M_C, M_p]\,.

Any given observable $\phi$ only sees “part of the picture” and so we combine them to measure a vector response to the probe $p$ :

\chi_p := \big( \chi^{\text{left}}_p, \chi^{\text{right}}_p, \chi^{\text{lower}}_p \big)\,.

This vector is estimated by the plot in Figure 3 for any drawn $p$ as $t$ becomes large. The reader can verify for themselves that $\chi_p$ takes on characteristic values for regions $p$ in the left, right and bottom chambers; moreover, it is sensitive even to the position within each chamber. It would be an interesting exercise to invert these measurements to infer the precise position of the wall.

This completes our simple demonstration of susceptibilities as a tool for interpreting a physical system: in this case, the Ising model with a particular design of the couplings between spins that represents a “wall”. The steps, in abstract terms:

Structure in a system is implicitly encoded in its Hamiltonian. Making this implicit structure explicit in a useful way is nontrivial, and indeed this is the problem of interpretability (i.e. interpretation is computation).
To access this information we couple the system to external degrees of freedom, each of which is “controlled” by some field strength $h$ .
For each external field $F$ and observable $\phi$ there is an associated susceptibility $\chi^\phi_F$ . Stacking these across observables gives a vector quantity $\chi_F$ .
Interactions between the constituents of the system will propagate fluctuations in the external field $F$ in characteristic ways, which depend on the nature of its internal structure. Some of the information on these propagations, and thus the internal structure, is recorded in $\chi_F$ .
By studying the information in $\chi_F$ (perhaps for multiple fields $F$ ) we therefore “see” some aspect of the the internal structure of the system.

And in concrete terms, in the setting of the Ising model with an internal wall:

The wall is encoded in the Hamiltonian (via some spin-spin interactions being omitted). We imagine that either we don’t have direct access to the Hamiltonian, or that it is large and complicated and so the nature of this structure is not explicitly accessible.
To access this information we couple the system to an external magnetic field in a region $p$ controlled by a field strength $h$ .
The associated susceptibility vector is $\chi_p$ .
By studying $\chi_p$ as $p$ varies we learn how to “see” the wall hidden in the Hamiltonian.

Analogy to Neural Networks

The spectroscopy programme (Baker et al. 2025; Wang et al. 2025; Gordon et al. 2026; Wang & Murfet 2026) applies this framework to neural networks.

Physics	Neural networks
Configuration space $W$	Parameter space $W$
Hamiltonian $H(w)$	(Population) loss $L(w)$
Boltzmann distribution $e^{-\beta H}$	Tempered posterior $e^{-n\beta L}$
External field perturbation	Data distribution perturbation
Observable $\phi$ (e.g. magnetization)	Function on parameter space $\phi$
Susceptibility $\frac{1}{\beta} \partial_h \langle \phi \rangle$	Susceptibility $\frac{1}{n \beta} \partial_h \langle \phi \rangle$

In this case the system is a neural network, with configuration space given by the space of possible weight vectors for the neural network and the role of the Hamiltonian being played by the population loss $L$ .

The structure we are interested in belongs to the Hamiltonian near the set of ground states, that is, low-loss parameters, and is implicit in the population loss in the same way that the “wall” was implicit in the interactions of the spins in our Ising model Hamiltonian. In the setting of neural networks, this structure is derived in part from the architecture of the network and in part from the structure of the data distribution.

To study this structure via susceptibilities we have to do two things: choose observables and choose “external fields”. Our observables are some functions $\phi_1, \ldots, \phi_H$ on parameter space. Our external fields are, in an abstract sense, ways to vary the population loss. Since the population loss is defined as the pairing of a loss density with the data distribution, one natural source of external fields are variations $F$ in the data distribution itself (which for example up- or down-weight some particular data point).

To our set of observables and chosen variation/field $F$ we associate a vector

\chi_F := \big( \chi^{\phi_1}_F, \ldots, \chi^{\phi_H}_F \big)

exactly as above. The intuition is that different variations in the data distribution will have different effects on the population loss (just as different probe regions $p$ had different effects on the Ising model Hamiltonian) and these will be reflected in differences in the vectors $\chi_F$ . The fundamental hypothesis of our approach to interpretability is that internal structure in the neural network (like walls in the Ising model) will leave its traces in these susceptibility vectors. Thus, by studying these vectors across a collection of variations $F$ , we can infer that internal structure.

Build on our work

Our tools for susceptibilities, local learning coefficients, and SGMCMC sampling are open source in the devinterp library.

Work with us

We're hiring Research Scientists, Engineers & more to join the team full-time.

Senior researchers can also express interest in a part-time affiliation through our new Research Fellows Program.

1.
Neural networks and physical systems with emergent collective computational abilities.
John J Hopfield, 1982. Proceedings of the national academy of sciences, Vol 79(8), pp. 2554--2558.
2.
Statistical mechanics of learning from examples
Hyunjune Sebastian Seung, Haim Sompolinsky, Naftali Tishby, 1992. Physical review A, Vol 45(8), pp. 6056. APS.
3.
Structural Inference: Interpreting Small Language Models with Susceptibilities [link]
Garrett Baker, George Wang, Jesse Hoogland, Daniel Murfet, 2025.
4.
Embryology of a Language Model [link]
George Wang, Garrett Baker, Andrew Gordon, Daniel Murfet, 2025.
5.
Towards Spectroscopy: Susceptibility Clusters in Language Models [link]
Andrew Gordon, Garrett Baker, George Wang, William Snell, Stan van Wingerden, Daniel Murfet, 2026.
6.
Patterning: The Dual of Interpretability [link]
George Wang, Daniel Murfet, 2026.
7.
Bayesian Influence Functions for Hessian-Free Data Attribution [link]
Philipp Alexander Kreer, Wilson Wu, Maxwell Adam, Zach Furman, Jesse Hoogland, 2025.
8.
The Loss Kernel: A Geometric Probe for Deep Learning Interpretability [link]
Maxwell Adam, Zach Furman, Jesse Hoogland, 2025.
9.
Influence Dynamics and Stagewise Data Attribution [link]
Jin Hwa Lee, Matthew Smith, Maxwell Adam, Jesse Hoogland, 2025.
10.
Statistical Mechanics of Phase Transitions
J. M. Yeomans, 1992. Clarendon Press.