Towards Spectroscopy: Susceptibility Clusters in Language Models

Authors

Andrew Gordon ⁼

Timaeus

Garrett Baker ⁼

Timaeus

George Wang

Timaeus

William Snell

Timaeus

Stan van Wingerden

Timaeus

Daniel Murfet

Timaeus

Publication Details

Published:

January 19, 2026

Access

Abstract

Spectroscopy infers the internal structure of physical systems by measuring their response to perturbations. We apply this principle to neural networks: perturbing the data distribution by upweighting a token $y$ in context $x$, we measure the model's response via susceptibilities $\chi_{xy}$, which are covariances between component-level observables and the perturbation computed over a localized Gibbs posterior via stochastic gradient Langevin dynamics (SGLD). Theoretically, we show that susceptibilities decompose as a sum over modes of the data distribution, explaining why tokens that follow their contexts "for similar reasons" cluster together in susceptibility space. Empirically, we apply this methodology to Pythia-14M, developing a conductance-based clustering algorithm that identifies 510 interpretable clusters ranging from grammatical patterns to code structure to mathematical notation. Comparing to sparse autoencoders, 50% of our clusters match SAE features, validating that both methods recover similar structure.

Cite as

@article{gordon2026towards,
  author = {Andrew Gordon and Garrett Baker and George Wang and William Snell and Stan van Wingerden and Daniel Murfet},
  title = {Towards Spectroscopy: Susceptibility Clusters in Language Models},
  year = {2026},
  url = {https://arxiv.org/abs/2601.12703},
  eprint = {2601.12703},
  archivePrefix = {arXiv},
  abstract = {Spectroscopy infers the internal structure of physical systems by measuring their response to perturbations. We apply this principle to neural networks: perturbing the data distribution by upweighting a token $y$ in context $x$, we measure the model's response via susceptibilities $\chi_{xy}$, which are covariances between component-level observables and the perturbation computed over a localized Gibbs posterior via stochastic gradient Langevin dynamics (SGLD). Theoretically, we show that susceptibilities decompose as a sum over modes of the data distribution, explaining why tokens that follow their contexts "for similar reasons" cluster together in susceptibility space. Empirically, we apply this methodology to Pythia-14M, developing a conductance-based clustering algorithm that identifies 510 interpretable clusters ranging from grammatical patterns to code structure to mathematical notation. Comparing to sparse autoencoders, 50% of our clusters match SAE features, validating that both methods recover similar structure.}
}

Click to copy