RL of Board Game Agents
Train board game bots like tic-tac-toe, Othello, etc., and track learning coefficients.
If you are a masochist, then deep RL might be the thing for you!
There’s evidence of something like “phase transitions” in models like AlphaZero, where the model learns human-interpretable concepts like piece value, mobility, etc. within a narrow window at around 32k training steps. Is this a proper phase transition? Can we see this in terms of the learning coefficient?
Choose a board game of your choice, and train a bot to play it. Track the learning coefficient over time. How does it behave? As a useful starting simplification, take a pointer from OthelloGPT and consider dropping the self-play component to train models to just predict valid moves. Do we see the “emergent world model” show up in a phase transition?
Where to Begin
Before starting this project, we recommend familiarizing yourself with these resources:
Ready to contribute? Let us know in our Discord community . We'll update this listing so that other people interested in this project can find you.