Towards Spectroscopy: Susceptibility Clusters in Language Models

Spectroscopy infers the internal structure of physical systems by measuring their response to perturbations. We apply this principle to neural networks: perturbing the data distribution by upweighting a token yy in context xx, we measure the model's response via susceptibilities χxy\chi_{xy}, which are covariances between component-level observables and the perturbation computed over a localized Gibbs posterior via stochastic gradient Langevin dynamics (SGLD). Theoretically, we show that susceptibilities decompose as a sum over modes of the data distribution, explaining why tokens that follow their contexts "for similar reasons" cluster together in susceptibility space. Empirically, we apply this methodology to Pythia-14M, developing a conductance-based clustering algorithm that identifies 510 interpretable clusters ranging from grammatical patterns to code structure to mathematical notation. Comparing to sparse autoencoders, 50% of our clusters match SAE features, validating that both methods recover similar structure.

Authors
Andrew Gordon=, Garrett Baker=, George Wang, William Snell, Stan van Wingerden, Daniel Murfet
Timaeus · = Equal contribution
Published
January 19, 2026

Build on our work

Our tools for susceptibilities, local learning coefficients, and SGMCMC sampling are open source in the devinterp library.

Work with us

We're hiring Research Scientists, Engineers & more to join the team full-time.

Senior researchers can also express interest in a part-time affiliation through our new Research Fellows Program.