Varun Gangal @VarunGangal profile

Varun Gangal

@VarunGangal

Followers

1K

Following

7K

Statuses

1K

AI Researcher @amazon AGI; @asapp (22-24); PhD CMU LTI (2017-22); IIT-M CSE (2011-16) RT / bookmark ≠ endorsement. Views personal, not of employers.

New York City

Joined January 2012

Don't wanna be here? Send us removal request.

Varun Gangal

@VarunGangal

22 days

Excited to see the Humanity's Last Exam dataset , paper & repo release! Was fun crafting some hard problems for this in collab w @stevenyfeng , @boson2photon & others [names in image] end of '24, 3 of which got into final benchmark. Thanks @DanHendrycks and others at @scale_AI + for the effort & creating the aegis and chance to contribute!

Dan Hendrycks

@DanHendrycks

22 days

We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. State-of-the-art AIs get <10% accuracy and are highly overconfident. @ai_risk @scaleai

1

2

15

Varun Gangal

@VarunGangal

1 day

@Nick_Davidov @TurnerNovak

0

2

Varun Gangal

@VarunGangal

2 days

@sedrickkeh2 Congrats, this is insightful!! thanks for releasing the pre-judge/code exec filtering 173K Unverified traces too...

0

1

Varun Gangal

@VarunGangal

7 days

@amogh42 @manasjsaloi *heard

0

2

Varun Gangal

@VarunGangal

7 days

@himanshustwts I wish there was a way to do COCONUT (<litethinking> ? ) when you need some reasonable amount of thinking to be enabled but don't need to read the traces explicitly and don't want that added latency too.

0

1

Varun Gangal

@VarunGangal

8 days

@avataram Kay Kay Menon from black friday deserved a place in the options...

0

Varun Gangal

@VarunGangal

8 days

@abacaj The "bitter batter better" lesson?

0

Varun Gangal

@VarunGangal

8 days

RT @theandrewsiah: playing with @sksq96 @VarunGangal thanks to @willccbb @abacaj for their gists and help in setting up dm/reply if you w…

0

1

0

Varun Gangal

@VarunGangal

8 days

@Dorialexander @willccbb @theandrewsiah Yeah not that doing off-policy would make it invalid either; in fact seeing that this variation also works and how much and how differently would be great to see [if at all there is a distinction, of which I am not sure rn..]

0

1

Varun Gangal

@VarunGangal

8 days

Wondering (will try out) if it would have learnt to do [by virtue of poetry RL training) to do figurative language edits e.g. personification too (something @sedrickkeh2 , @stevenyfeng & me had made a mini-corpus for & explored generating a long while ago w BART etc at COLING'22 )

1

0

4

Varun Gangal

@VarunGangal

8 days

But weren't there two of them? [AFAIK the old one and the new one were both nice though ofc the new one is better] (Though I guess even with that its possible to upscale [MOEization etc] or bootstrap in other ways a good model out of another good model so it doesn't make one lucky run hypothesis unreasonable)

0

2

Varun Gangal

@VarunGangal

9 days

@abacaj Thanks a lot :)

0

1

Varun Gangal

@VarunGangal

9 days

W.r.t budgeting mem use to avoid OOMs I found @Dorialexander 's colab version [based on the same original gist by @willccbb that @abacaj is using a variant of] very handy: Dodges any OOMs typically [without PEFT] on an A100 in Colab with Qwen 0.5B Instruct [of course if you move up to higher params or increase the max_tokens from 200 you may hit it]

1

0

4

Varun Gangal

@VarunGangal

9 days

RT @simonw: o3-mini is really good at writing internal documentation - feed it a codebase, get back a detailed explanation of how specific…

0

91

0

Varun Gangal

@VarunGangal

9 days

@abacaj *6 something with untrained Instruct

0

Varun Gangal

@VarunGangal

9 days

I think the answer also depends on [amongst other considerations] if the reward function has any shared parameters / arch components with the policy - if it doesn't , the reward function itself is a form of supervision. If it does, e.g. like in the self-rewarding language models setup of Yuan et al ( where a shared LLM underlies both reward function and the policy, the reward model is not a form of supervision [or atleast is lesser so..]

0