VarunGangal Profile Banner
Varun Gangal Profile
Varun Gangal

@VarunGangal

Followers
1K
Following
7K
Statuses
1K

AI Researcher @amazon AGI; @asapp (22-24); PhD CMU LTI (2017-22); IIT-M CSE (2011-16) RT / bookmark ≠ endorsement. Views personal, not of employers.

New York City
Joined January 2012
Don't wanna be here? Send us removal request.
@VarunGangal
Varun Gangal
22 days
Excited to see the Humanity's Last Exam dataset , paper & repo release! Was fun crafting some hard problems for this in collab w @stevenyfeng , @boson2photon & others [names in image] end of '24, 3 of which got into final benchmark. Thanks @DanHendrycks and others at @scale_AI + for the effort & creating the aegis and chance to contribute!
Tweet media one
Tweet media two
@DanHendrycks
Dan Hendrycks
22 days
We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. State-of-the-art AIs get <10% accuracy and are highly overconfident. @ai_risk @scaleai
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
2
15
@VarunGangal
Varun Gangal
1 day
Tweet media one
Tweet media two
0
0
2
@VarunGangal
Varun Gangal
2 days
@sedrickkeh2 Congrats, this is insightful!! thanks for releasing the pre-judge/code exec filtering 173K Unverified traces too...
0
0
1
@VarunGangal
Varun Gangal
7 days
0
0
2
@VarunGangal
Varun Gangal
7 days
@himanshustwts I wish there was a way to do COCONUT (<litethinking> ? ) when you need some reasonable amount of thinking to be enabled but don't need to read the traces explicitly and don't want that added latency too.
0
0
1
@VarunGangal
Varun Gangal
8 days
@avataram Kay Kay Menon from black friday deserved a place in the options...
0
0
0
@VarunGangal
Varun Gangal
8 days
@abacaj The "bitter batter better" lesson?
0
0
0
@VarunGangal
Varun Gangal
8 days
RT @theandrewsiah: playing with @sksq96 @VarunGangal thanks to @willccbb @abacaj for their gists and help in setting up dm/reply if you w…
0
1
0
@VarunGangal
Varun Gangal
8 days
@Dorialexander @willccbb @theandrewsiah Yeah not that doing off-policy would make it invalid either; in fact seeing that this variation also works and how much and how differently would be great to see [if at all there is a distinction, of which I am not sure rn..]
0
0
1
@VarunGangal
Varun Gangal
8 days
Wondering (will try out) if it would have learnt to do [by virtue of poetry RL training) to do figurative language edits e.g. personification too (something @sedrickkeh2 , @stevenyfeng & me had made a mini-corpus for & explored generating a long while ago w BART etc at COLING'22 )
1
0
4
@VarunGangal
Varun Gangal
8 days
But weren't there two of them? [AFAIK the old one and the new one were both nice though ofc the new one is better] (Though I guess even with that its possible to upscale [MOEization etc] or bootstrap in other ways a good model out of another good model so it doesn't make one lucky run hypothesis unreasonable)
0
0
2
@VarunGangal
Varun Gangal
9 days
@abacaj Thanks a lot :)
0
0
1
@VarunGangal
Varun Gangal
9 days
W.r.t budgeting mem use to avoid OOMs I found @Dorialexander 's colab version [based on the same original gist by @willccbb that @abacaj is using a variant of] very handy: Dodges any OOMs typically [without PEFT] on an A100 in Colab with Qwen 0.5B Instruct [of course if you move up to higher params or increase the max_tokens from 200 you may hit it]
1
0
4
@VarunGangal
Varun Gangal
9 days
RT @simonw: o3-mini is really good at writing internal documentation - feed it a codebase, get back a detailed explanation of how specific…
0
91
0
@VarunGangal
Varun Gangal
9 days
@abacaj *6 something with untrained Instruct
0
0
0
@VarunGangal
Varun Gangal
9 days
I think the answer also depends on [amongst other considerations] if the reward function has any shared parameters / arch components with the policy - if it doesn't , the reward function itself is a form of supervision. If it does, e.g. like in the self-rewarding language models setup of Yuan et al ( where a shared LLM underlies both reward function and the policy, the reward model is not a form of supervision [or atleast is lesser so..]
0
0
0