![Hao Peng Profile](https://pbs.twimg.com/profile_images/1588403315569590272/_Ap6lka7_x96.jpg)
Hao Peng
@haopeng_nlp
Followers
558
Following
34
Statuses
35
RT @zhaofeng_wu: š”We find that models āthinkā š in English (or in general, their dominant language) when processing distinct non-English orā¦
0
57
0
RT @AkariAsai: šØ Iām on the job market this year! šØ Iām completing my @uwcse Ph.D. (2025), where I identify and tackle key LLM limitationsā¦
0
117
0
RT @OfirPress: I'm on the academic job market! I develop autonomous systems for: programming, research-level question answering, finding sā¦
0
41
0
RT @lifan__yuan: Wanna train PRMs but process labels, annotated manually or automatically, sound too expensive to youš? Introduce Implicitā¦
0
46
0
RT @bingyikang: Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer thisā¦
0
222
0
RT @MKhalifaaaa: What If LLMs can cite the pre-training source(s) supporting their parametric knowledge? Won't this dramatically improve veā¦
0
16
0
RT @YangyiChen6666: šÆ Introducing SOLO, a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw imageā¦
0
52
0
Language models excel at undergraduate exams, but how do they fare in research? SciCode challenges models with real research coding problems. Even the best models solve less than 5%. Very proud of @MinyangTian1 and @luyu_gao for leading the charge!
SciCode is our new benchmark that challenges LMs to code solutions for scientific problems from advanced papers. The challenges were crafted by PhDs; ~10% of our benchmark is based on Nobel-winning research. GPT-4 and Sonnet 3.5 get <5% ACC. š§µ 1/6
0
0
11
RT @YueGuo10: I'm joining the UIUC @UofIllinois this fall as an Assistant Professor in the iSchool, with an affiliation in Computer Scienceā¦
0
29
0
RT @Francis_YAO_: From Claude100K to Gemini10M, we are in the era of long context language models. Why and how a language model can utilizeā¦
0
173
0
RT @zhaofeng_wu: Want to train an aligned LM in a new language š but donāt have preference data for training the reward model (RM)? š” Justā¦
0
37
0
RT @jyangballin: SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bencā¦
0
416
0
Very proud of Eurus. A huge shoutout to @lifan__yuan and @charlesfornlp for leading this!
Introducing šEurus, a suite of state-of-the-art LLM reasoning generalists powered by a new member of Ultra-Series, UltraInteractš! Particularly, Eurus-70B beats GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 12 tests (mostly OOD) covering five tasks!
0
2
21
Very proud of Eurus. A huge shoutout to @lifan__yuan and @charlesfornlp for leading this!
This is a joint work with @charlesfornlp, @wanghanbin95, @stingning, @xingyaow_, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, and advisors Bowen Zhou, @haopeng_nlp, @zibuyu9, Maosong Sun. cc @TsinghuaNLP @uiuc_nlp
0
0
5
RT @Francis_YAO_: Frontier models all have at least 100k context length, Gemini 1.5 has even 1m context. What about research and open sourcā¦
0
66
0
RT @xingyaow_: Large Language Model (LLM) agents promise to free us from mundane tasks, but how should they best interact with our world? Iā¦
0
93
0
RT @xingyaow_: This a joint work with @YangyiChen6666 , @lifan__yuan , @YizheZhangNLP , @YunzhuLiYZ , @haopeng_nlp , and @elgreco_winter .ā¦
0
2
0
RT @MKhalifaaaa: Can we boost chain-of-thought reasoning by guiding decoding toward correct solutions? Excited toā¦
0
31
0
RT @srush_nlp: Introducing COLM ( the Conference on Language Modeling. A new research venue dedicated to the theoryā¦
0
430
0