RL of Board Game Agents

Train board game bots like tic-tac-toe, Othello, etc., and track learning coefficients.

Type: Applied

Difficulty: Hard

Status: Unstarted

If you are a masochist, then deep RL might be the thing for you!

There’s evidence of something like “phase transitions” in models like AlphaZero, where the model learns human-interpretable concepts like piece value, mobility, etc. within a narrow window at around 32k training steps. Is this a proper phase transition? Can we see this in terms of the learning coefficient?

Choose a board game of your choice, and train a bot to play it. Track the learning coefficient over time. How does it behave? As a useful starting simplification, take a pointer from OthelloGPT and consider dropping the self-play component to train models to just predict valid moves. Do we see the “emergent world model” show up in a phase transition?

Where to begin:

Quantifying degeneracy (Lau et al. 2023)

If you have decided to start working on this, please let us know in the Discord. We'll update this listing so that other people who are interested in this project can find you.