zafir stojanovski
- about
- ml engineer @ loka
- independent research @ open thought
- publications
- reasoning gym: environments for rl with verifiable rewards
(neurips 2025, spotlight) - momentum-based weight interpolation for continual learning
(interpolate @ neurips 2022, best paper award) - open-source
- open-thought/reasoning-gym
(implemented rl envs: codeio, matrix manipulation, shortest path) - eleutherai/lm-evaluation-harness
(added eval datasets: lambada translations, paloma, legal bench) - natolambert/rlhf-book
(derived policy gradients, bradley-terry loss, ppo gradient dynamics) - posts
- 2025-08-15 policy gradients
- 2024-07-12 catastrophic forgetting