![Yifei Li Profile](https://pbs.twimg.com/profile_images/1821000422137450496/BvwnWZYa_x96.jpg)
Yifei Li
@YifeiLiPKU
Followers
559
Following
271
Statuses
127
Ph.D. student @osunlp | Prev MSc @PKU1898 | BEng @NEUChina | Prev Intern @MSFTResearch (MSRA) | LLM & NLPer
Columbus, OH
Joined November 2021
RT @RonZiruChen: 🚀Our ScienceAgentBench is covered by @Nature News! With the help of @ShijieChen98 and @YifeiLiPKU, we sampled 20 tasks fro…
0
17
0
RT @hhsun1: Our ScienceAgentBench in @Nature news! DeepSeek R1 @DeepSeekR1 vs. @OpenAI o1 on data-driven scientific coding tasks: We sampl…
0
32
0
RT @RonZiruChen: 🎉ScienceAgentBench is accepted at #ICLR2025! 🚀 Ready to step beyond ML R&D? Test your agents on real-world, data-driven…
0
14
0
RT @BoyuGouNLP: 🚀 UGround accepted to #ICLR2025 [scores=10/8/8/5]! 🎉 We’re also thrilled to share some exciting updates: ✨ UGround is SOTA…
0
25
0
🚀🚀 ScienceAgentBench now supports containerized evaluation --- faster (90min ➡️ <30min) and setup-free! Check out for more details ⬇️⬇️
Evaluating on our ScienceAgentBench (Coding tasks in Bioinformatics/Chemistry/Geo info science/Cognitive science) just got much easier and faster! Check out our update on containerized evaluation: (1) Task environments are set up in independent docker containers, which eliminates potential package conflicts among different tasks and allows us to remove pip-tools, a major factor of slow evaluation previously. (2) Users can now evaluate their agents using a single bash command and no longer need to set up their own conda environments. (3) With multi-threading, programs for each task can be configured and executed in parallel, reducing the evaluation time to only 20-30 minutes for all 102 tasks. Great efforts led by awesome @YifeiLiPKU and @BotaoYu24 @osunlp!
0
1
11
RT @BoyuGouNLP: Amazing results! Thanks for testing! The carefully designed synthetic data in UGround-V1 is indeed surprisingly effective 😮…
0
2
0
RT @LingboMo: 🚀 Excited to announce the release of our Agent Safety Resources Repository! 📚🔍 This GitHub repo curates existing papers, ben…
0
16
0
RT @xiangyue96: ✈️Flying to #NeurIPS2024 tmr! Excited to reconnect with old friends and meet new ones. I co-authored 6 papers at NeurIPS👇.…
0
59
0
RT @_TobiasLee: 📢 Introducing VL-RewardBench - A new benchmark for vision-language generative reward models (VL-GenRMs)! 📊Even SOTA models…
0
26
0
Excited to see so many advances and participating in the exploration of agent for science!! Check of our latest works here ⬇️⬇️⬇️
Very excited to learn that our 2023 paper, "G2Retro as a two-step graph generative models for retrosynthesis prediction ()," (led by @ziqiChen123 and @ningx005) has been selected into Nature's special collection, "Nobel Prize in Physics 2024, This collection highlights high-impact research, reviews, and opinion articles selected from all of Nature's participating journals, celebrating “the direct contributions by the [Nobel Prize] awardees and the advances they have inspired.” Building on the momentum, earlier this year, we released SMolInstruct (a large-scale, high-quality instruction tuning dataset for small molecules) and LlaSMol (state-of-the-art instruction following LLMs for a variety of chemistry tasks), led by awesome @BotaoYu24 @osunlp. Recently, we released ChemAgent (also led by @BotaoYu24), a state-of-the-art chemistry agent equipped with 29 tools for both specialized tasks and general questions in chemistry, built on top of the pioneering work, ChemCrow. Around the same time, we released ScienceAgentBench (led by our great @RonZiruChen), a new coding benchmark for scientific tasks, to rigorously evaluate agents in assisting scientists with programming for data-driven scientific discovery. ScienceAgentBench features 102 *real-world* tasks extracted from 44 peer-reviewed publications across 4 scientific disciplines (Bioinformatics, Computational Chemistry, Geographical Information Science, Psychology & Cognitive Neuroscience). Our latest result shows #o1-preview with self-debug can achieve 42.2% task success rate (10% more than other agents) while costing $0.6-0.7 per task (>10 times more than others). See details about each project in this thread:
0
0
3
RT @ysu_nlp: Personally I think planning is the biggest bottleneck for language agents. So I'm super excited to introduce model-based plann…
0
43
0
RT @BotaoYu24: 🤔 Can LLMs with tools always outperform those without? Perhaps not... 🚀 In our new work, we introduce ChemAgent, an enhance…
0
23
0
RT @LiaoZeyi: Check out the technique report for AmpleGCG-Plus, which builds on the strengths of AmpleGCG by achieving higher ASR with fewe…
0
4
0
RT @ysu_nlp: 📢 Data release of ScienceAgentBench and new o1 results 🌟o1 is 2X of Claude 3.5 Sonnet with direct prompting, and 10% better w…
0
9
0
RT @hhsun1: @AnthropicAI's release of a computer use model is both exciting and worrisome to me! Agent capability and safety should go hand…
0
15
0