Zafir Stojanovski

ML @ Loka | Research @ Open-Thought | Graduate @ University of Tübingen

Publications

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski*, Oliver Stanley*, Joe Sharratt*, Richard Jones*, Abdulhakeem Adefioye, Jean Kaddour, Andreas Köpf

NeurIPS 2025 Spotlight

Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Zafir Stojanovski*, Karsten Roth*, Zeynep Akata

Interpolate @ NeurIPS 2022 Best Paper Award

Selected Work

Core contributor of Reasoning Gym – a library of procedural data generators for training reasoning models with RL created by Andreas Köpf (co-author of PyTorch, OpenAssistant). I built dozens of RL environments, as well as ran the zero-shot, external benchmark, and curriculum learning experiments for our NeurIPS publication.

Reinforcement Learning

Worked with Karsten Roth (now Research Scientist at DeepMind) on mitigating catastrophic forgetting in foundation models. Using momentum-based weight interpolation, we demonstrated performance close to the upper bound of jointly training on all data in our NeurIPS workshop publication.

Continual Learning

Wrote several sections of the RLHF Book by Nathan Lambert (Research Scientist at Ai2), including subsections on persona vectors, the assistant axis, and persona subnetworks, as well as a discussion of implicit regularization as a mechanism for mitigating catastrophic forgetting. I also derived the policy gradient and Bradley-Terry objectives, explained the PPO gradient dynamics, and contributed the foundations of the code library, including rejection sampling.

Reinforcement Learning

Led a team to automate glomerular sclerosis classification from gigapixel kidney biopsies, deployed in a system serving over half of the Organ Procurement Organizations in the US.

Healthcare Life Sciences

Part of a team developing models to predict protein-ligand binding affinity from DNA Encoded Library (DEL) data for drug discovery, resulting in numerous experimentally confirmed binders in the lab!

Healthcare Life Sciences

Contributed several datasets to EleutherAI's Evaluation Harness (such as Lambada Translations, Paloma, LegalBench), as well as implemented metric indicators and tests for output table consistency.

Model Evaluation

Co-founded uxo.ai in 2023 to develop agents capable of understanding and navigating the web. The goal was to build universal web scrapers that can extract structured content at scale.

Startups