March 2026
Multi-Turn RL for Code Debugging
Training a 7B model to debug code in a custom DSL environment using GRPO. Comparing prompting, supervised fine-tuning, and reinforcement learning across three categories of bugs.
November 2025
Visualizing Adam's adaptive learning rates
An interactive visualization showing how Adam adapts its step sizes in response to different gradient patterns.