devinterp Project

RL of Board Game Agents

Train board game bots like tic-tac-toe, Othello, etc., and track learning coefficients.

Project Details

Status: Unstarted
Difficulty: Hard
Type: Applied

Team & Contact

Tags

devinterp

If you are a masochist, then deep RL might be the thing for you!

There’s evidence of something like “phase transitions” in models like AlphaZero, where the model learns human-interpretable concepts like piece value, mobility, etc. within a narrow window at around 32k training steps. Is this a proper phase transition? Can we see this in terms of the learning coefficient?

Choose a board game of your choice, and train a bot to play it. Track the learning coefficient over time. How does it behave? As a useful starting simplification, take a pointer from OthelloGPT and consider dropping the self-play component to train models to just predict valid moves. Do we see the “emergent world model” show up in a phase transition?

Where to Begin

Before starting this project, we recommend familiarizing yourself with these resources:

Ready to contribute? Let us know in our Discord community . We'll update this listing so that other people interested in this project can find you.