Zafir Stojanovski

Zafir Stojanovski

ML @ Loka | Research @ Open-Thought | Graduate @ University of Tübingen

Publications

My work is used by AI labs such as DeepMind [1, 2, 3, 4], Meta [5, 6, 7], NVIDIA [8, 9], and Mila [10, 11, 12]:

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski*, Oliver Stanley*, Joe Sharratt*, Richard Jones*, Abdulhakeem Adefioye, Jean Kaddour, Andreas Köpf

NeurIPS 2025 Spotlight

Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Zafir Stojanovski*, Karsten Roth*, Zeynep Akata

Interpolate @ NeurIPS 2022 Best Paper Award

Selected Work

Reasoning Gym

Core contributor of Reasoning Gym – a library of procedural data generators for training reasoning models with RL created by Andreas Köpf (co-author of PyTorch, OpenAssistant). I built dozens of RL environments, as well as ran the zero-shot, external benchmark, and curriculum learning experiments for our NeurIPS publication.

Reinforcement Learning

Continual Learning

Worked with Karsten Roth (now Research Scientist at DeepMind) on mitigating catastrophic forgetting in foundation models. Using momentum-based weight interpolation, we demonstrated performance close to the upper bound of jointly training on all data in our NeurIPS workshop publication.

Continual Learning

RLHF Book

Wrote several sections of the RLHF Book by Nathan Lambert (Research Scientist at Ai2), where I derived the policy gradient objective and Bradley-Terry loss, provided intuitions for the PPO gradient dynamics, and built the foundations of the code library.

Reinforcement Learning

Kidney Biopsy Classification

Led a team to automate glomerular sclerosis classification from gigapixel kidney biopsies, deployed in a system serving over half of the Organ Procurement Organizations in the US.

Healthcare Life Sciences

Drug Discovery

Part of a team developing models to predict protein-ligand binding affinity from DNA Encoded Library (DEL) data for drug discovery, resulting in numerous experimentally confirmed binders in the lab!

Healthcare Life Sciences

EleutherAI Evaluation Harness

Contributed several datasets to EleutherAI's Evaluation Harness (such as Lambada Translations, Paloma, LegalBench), as well as implemented higher-is-better indicators and tests for output table consistency.

Model Evaluation

uxo.ai

Co-founded uxo.ai in 2023 to develop agents capable of understanding and navigating the web. The goal was to build universal web scrapers that can extract structured content at scale.

Startups