March 2026

Multi-Turn RL for Code Debugging

Training a 7B model to debug code in a custom DSL environment using GRPO. Comparing prompting, supervised fine-tuning, and reinforcement learning across three categories of bugs.

November 2025

Visualizing Adam's adaptive learning rates

An interactive visualization showing how Adam adapts its step sizes in response to different gradient patterns.