Advancing AI Safety

Our mission is to empower humanity by making breakthrough scientific progress on alignment.

Research Community Building

We're hiring for a Director of Operations!

Research

What is AI Safety? Our Research Agenda Learn More

Modes of Sequence Models and Learning Coefficients

Chen and Murfet

Studying Small Language Models with Susceptibilities

Programs as Singularities

Murfet and Troiani

You Are What You Eat – AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

Lehalleur et al.

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

Community

Join our discord Work with us See our events

The Australian AI Safety Forum 2024

Partners: Digital Sciences Initiative , Gradient Institute , Sydney Knowledge Hub , Sydney Mathematical Research Institute

The Australian AI Safety Forum is a two-day interdisciplinary event scheduled for November 2024. This forum, the first of its kind in Australia, aims to discuss perspectives on technical AI safety and governance, and explore Australia’s unique role in the global AI safety landscape. The event will be anchored around the International Scientific Report on the Safety of Advanced AI, highlighting its key content and examining its implications for Australia.

Sydney, Australia

Nov 07, 2024 - Nov 08, 2024

ILIAD 2024

Partners: PIBBSS , Simplex

A 5 day, multi-track conference bringing together ~100 researchers in theoretical AI alignment.

Berkeley, California

Aug 28, 2024 - Sep 03, 2024

The 2023 Oxford Conference

A conference on developmental interpretability and singular learning theory

Oxford, United Kingdom

Nov 05, 2023 - Nov 12, 2023

Join Our Community

Connect with fellow AI safety researchers and enthusiasts. Join the conversation or reach out to discuss collaboration opportunities.

Join our Discord Contact Us