Yifei Li Profile
Yifei Li

@YifeiLiPKU

Followers
559
Following
271
Statuses
127

Ph.D. student @osunlp | Prev MSc @PKU1898 | BEng @NEUChina | Prev Intern @MSFTResearch (MSRA) | LLM & NLPer

Columbus, OH
Joined November 2021
Don't wanna be here? Send us removal request.
@YifeiLiPKU
Yifei Li
13 days
RT @RonZiruChen: 🚀Our ScienceAgentBench is covered by @Nature News! With the help of @ShijieChen98 and @YifeiLiPKU, we sampled 20 tasks fro…
0
17
0
@YifeiLiPKU
Yifei Li
13 days
RT @hhsun1: Our ScienceAgentBench in @Nature news! DeepSeek R1 @DeepSeekR1 vs. @OpenAI o1 on data-driven scientific coding tasks: We sampl…
0
32
0
@YifeiLiPKU
Yifei Li
18 days
RT @RonZiruChen: 🎉ScienceAgentBench is accepted at #ICLR2025! 🚀 Ready to step beyond ML R&D? Test your agents on real-world, data-driven…
0
14
0
@YifeiLiPKU
Yifei Li
19 days
RT @LiaoZeyi: Our paper, EIA, has been accepted by #ICLR2025! 🎉 Since the day we submitted EIA, the field of web agents has evolved signif…
0
10
0
@YifeiLiPKU
Yifei Li
19 days
RT @BoyuGouNLP: 🚀 UGround accepted to #ICLR2025 [scores=10/8/8/5]! 🎉 We’re also thrilled to share some exciting updates: ✨ UGround is SOTA…
0
25
0
@YifeiLiPKU
Yifei Li
30 days
🚀🚀 ScienceAgentBench now supports containerized evaluation --- faster (90min ➡️ <30min) and setup-free! Check out for more details ⬇️⬇️
@hhsun1
Huan Sun (OSU)
1 month
Evaluating on our ScienceAgentBench (Coding tasks in Bioinformatics/Chemistry/Geo info science/Cognitive science) just got much easier and faster! Check out our update on containerized evaluation: (1) Task environments are set up in independent docker containers, which eliminates potential package conflicts among different tasks and allows us to remove pip-tools, a major factor of slow evaluation previously. (2) Users can now evaluate their agents using a single bash command and no longer need to set up their own conda environments. (3) With multi-threading, programs for each task can be configured and executed in parallel, reducing the evaluation time to only 20-30 minutes for all 102 tasks. Great efforts led by awesome @YifeiLiPKU and @BotaoYu24 @osunlp!
0
1
11
@YifeiLiPKU
Yifei Li
1 month
RT @BoyuGouNLP: Amazing results! Thanks for testing! The carefully designed synthetic data in UGround-V1 is indeed surprisingly effective 😮…
0
2
0
@YifeiLiPKU
Yifei Li
2 months
RT @LingboMo: 🚀 Excited to announce the release of our Agent Safety Resources Repository! 📚🔍 This GitHub repo curates existing papers, ben…
0
16
0
@YifeiLiPKU
Yifei Li
2 months
RT @xiangyue96: ✈️Flying to #NeurIPS2024 tmr! Excited to reconnect with old friends and meet new ones. I co-authored 6 papers at NeurIPS👇.…
0
59
0
@YifeiLiPKU
Yifei Li
3 months
RT @_TobiasLee: 📢 Introducing VL-RewardBench - A new benchmark for vision-language generative reward models (VL-GenRMs)! 📊Even SOTA models…
0
26
0
@YifeiLiPKU
Yifei Li
3 months
RT @hhsun1: As people are talking about inference scaling, I hope to re-introduce key findings from our earlier work (long paper #ACL24 @ac
0
22
0
@YifeiLiPKU
Yifei Li
3 months
Excited to see so many advances and participating in the exploration of agent for science!! Check of our latest works here ⬇️⬇️⬇️
@hhsun1
Huan Sun (OSU)
3 months
Very excited to learn that our 2023 paper, "G2Retro as a two-step graph generative models for retrosynthesis prediction ()," (led by @ziqiChen123 and @ningx005) has been selected into Nature's special collection, "Nobel Prize in Physics 2024, This collection highlights high-impact research, reviews, and opinion articles selected from all of Nature's participating journals, celebrating “the direct contributions by the [Nobel Prize] awardees and the advances they have inspired.” Building on the momentum, earlier this year, we released SMolInstruct (a large-scale, high-quality instruction tuning dataset for small molecules) and LlaSMol (state-of-the-art instruction following LLMs for a variety of chemistry tasks), led by awesome @BotaoYu24 @osunlp. Recently, we released ChemAgent (also led by @BotaoYu24), a state-of-the-art chemistry agent equipped with 29 tools for both specialized tasks and general questions in chemistry, built on top of the pioneering work, ChemCrow. Around the same time, we released ScienceAgentBench (led by our great @RonZiruChen), a new coding benchmark for scientific tasks, to rigorously evaluate agents in assisting scientists with programming for data-driven scientific discovery. ScienceAgentBench features 102 *real-world* tasks extracted from 44 peer-reviewed publications across 4 scientific disciplines (Bioinformatics, Computational Chemistry, Geographical Information Science, Psychology & Cognitive Neuroscience). Our latest result shows #o1-preview with self-debug can achieve 42.2% task success rate (10% more than other agents) while costing $0.6-0.7 per task (>10 times more than others). See details about each project in this thread:
0
0
3
@YifeiLiPKU
Yifei Li
3 months
RT @ysu_nlp: Personally I think planning is the biggest bottleneck for language agents. So I'm super excited to introduce model-based plann…
0
43
0
@YifeiLiPKU
Yifei Li
3 months
RT @BotaoYu24: 🤔 Can LLMs with tools always outperform those without? Perhaps not... 🚀 In our new work, we introduce ChemAgent, an enhance…
0
23
0
@YifeiLiPKU
Yifei Li
3 months
RT @LiaoZeyi: Check out the technique report for AmpleGCG-Plus, which builds on the strengths of AmpleGCG by achieving higher ASR with fewe…
0
4
0
@YifeiLiPKU
Yifei Li
4 months
@_TobiasLee 😄😄
0
0
0
@YifeiLiPKU
Yifei Li
4 months
RT @ysu_nlp: 📢 Data release of ScienceAgentBench and new o1 results 🌟o1 is 2X of Claude 3.5 Sonnet with direct prompting, and 10% better w…
0
9
0
@YifeiLiPKU
Yifei Li
4 months
RT @hhsun1: Our ScienceAgentBench is now available! We also included @OpenAI #o1’s performance in our updated draft: o1 nearly doubled the…
0
17
0
@YifeiLiPKU
Yifei Li
4 months
RT @hhsun1: @AnthropicAI's release of a computer use model is both exciting and worrisome to me! Agent capability and safety should go hand…
0
15
0