Zaid Khan Profile
Zaid Khan

@codezakh

Followers
427
Following
812
Statuses
319

@uncnlp with @mohitban47 working on grounded reasoning + multimodal agents // currently @allen_ai formerly @neclabsamerica // bs+ms CompE @northeastern

Boston, USA
Joined June 2023
Don't wanna be here? Send us removal request.
@codezakh
Zaid Khan
22 hours
DataEnvGym will be a ⭐️ Spotlight ⭐️ at #ICLR2025 (Top 5%)! DataEnvGym is a testbed for RL-style data generation agents + teaching environments to automate post-training: the process of improving a model on diverse, open-ended tasks, based on automatically-discovered model skills / weaknesses. We’re going to be continually expanding DataEnvGym with more tasks and agents/policies — DataEnvGym now covers 4 domains (multimodal reasoning + math reasoning + coding + tool-use) + leaderboard.
@codezakh
Zaid Khan
4 months
Can we automate the process of generating data to improve a model on diverse, open-ended tasks, based on automatically-discovered model weaknesses? Introducing DataEnvGym - a testbed for data-generation agents + teaching environments. Environment trains/evaluates student model ➡️ Environment discovers skills/errors and gives feedback to agent ➡️ Agent generates updated training data to address weaknesses ➡️ Iterate Key Idea -- Frame data generation + model improvement as an RL-style sequential decision-making task: states encode student errors, policy decides actions encoding which data to generate, and reward is the performance of the student model. We provide several modular environments + teaching agents that can improve models on VQA/math/programming, and provide a leaderboard benchmarking these agents. We welcome more entries to our leaderboard! Thread 🧵👇 (1/9)
Tweet media one
5
26
84
@codezakh
Zaid Khan
17 hours
@ryanmart3n Thanks Ryan! ❤️
0
0
0
@codezakh
Zaid Khan
18 hours
RT @EliasEskin: 🎉 Excited that DataEnvGym, our work on adaptive data generation agents, has been selected as a #ICLR2025 @iclr_conf Spotli…
0
4
0
@codezakh
Zaid Khan
19 hours
@NeginRaoof_ Thanks Negin! ❤️
0
0
1
@codezakh
Zaid Khan
19 hours
@trungthvu Thanks Trung! ❤️
0
0
0
@codezakh
Zaid Khan
19 hours
RT @trungthvu: Awesome work by our OpenThinker team to create the best open-data 32B reasoning model! Our model closely matches or beats…
0
2
0
@codezakh
Zaid Khan
19 hours
RT @ryanmart3n: 🧠 𝗢𝗽𝗲𝗻𝗧𝗵𝗶𝗻𝗸𝗲𝗿-𝟯𝟮𝗕 ⭐ Beating DeepSeek-R1-Distill-Qwen-32B, a closed data model, on MATH500 and GPQA-Diamond ⭐ Best performa…
0
4
0
@codezakh
Zaid Khan
19 hours
RT @etash_guha: 📈We have closed the gap from DeepSeek-R1-32B with OpenThinker-32B, our new Open-Data Reasoning Model! Not only does this mo…
0
6
0
@codezakh
Zaid Khan
19 hours
RT @NeginRaoof_: Announcing OpenThinker-32B: the best open-data reasoning model distilled from DeepSeek-R1. Our results show that large, ca…
0
112
0
@codezakh
Zaid Khan
22 hours
🙏Big thanks to my coauthors @EliasEskin @jmin__cho @mohitban47 @uncnlp @unccs More details: Paper: Project Page: Code: HuggingFace:
0
3
8
@codezakh
Zaid Khan
2 days
RT @hanlin_hl: Happy to share that “Ctrl-Adapter” is selected for ✨ Oral ✨ presentation (top 1.8%) at #ICLR2025! 🌟 Whenever new stronger d…
0
26
0
@codezakh
Zaid Khan
2 days
RT @rohanpaul_ai: The challenge lies in effectively debugging faulty code produced by LLMs due to the scarcity of unit tests that can pinpo…
0
8
0
@codezakh
Zaid Khan
5 days
RT @HuaxiuYaoML: 🚀 We introduce MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based…
0
30
0
@codezakh
Zaid Khan
8 days
RT @mohitban47: 🚨 Check out "UTGen & UTDebug" for learning to automatically generate unit tests (i.e., discovering inputs which break your…
0
18
0
@codezakh
Zaid Khan
9 days
RT @ArchikiPrasad: FYI: The debugging benchmarks we developed in our work to evaluate unit test generators & debuggers are now available on…
0
11
0
@codezakh
Zaid Khan
9 days
Testing is a critical part of software engineering — what if we could automatically discover inputs which break your code? This is a hard task even for frontier models (GPT-4o, DeepSeekV3). We show how to train SLMs (Qwen2.5-7B + Llama3.1-8B) to generate unit tests that break code and are useful for debugging! Lots of interesting followups are possible here, check it out! 🧵👇
@ArchikiPrasad
Archiki Prasad
9 days
🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨 which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests. UTGen+UTDebug improve LLM-based code debugging by addressing 3 key questions: 1⃣ What are desirable properties of unit test generators? (A: high output acc and rate of uncovering errors) 2⃣ How good are models at 0-shot unit test generation (A: they are not great) ... so how do we improve LLMs' UT generation abilities? (A: bootstrapping from code-generation data via UTGen) 3⃣ How can we use potentially noisy feedback from generated tests for debugging? (A: via test-time scaling and validation + backtracking in UTDebug) 🧵👇
Tweet media one
1
6
16
@codezakh
Zaid Khan
9 days
RT @cyjustinchen: Introducing ✨UTGen & UTDebug✨ for improving code debugging tasks w/ strong pass@1 gains. Unit tests help both humans and…
0
8
0
@codezakh
Zaid Khan
9 days
RT @EliasEskin: 🚨 Excited to announce UTGen and UTDebug, where we first learn to generate unit tests and then apply them to debugging gener…
0
10
0
@codezakh
Zaid Khan
9 days
RT @ArchikiPrasad: 🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨 which introduces ✨UTGen and UTDebug✨ for…
0
58
0
@codezakh
Zaid Khan
9 days
RT @EliasEskin: 🎉 Pleased that several projects from my postdoc @uncnlp @unccs that I'm excited about have been accepted to #ICLR2025 and…
0
21
0