Yifei Li @YifeiLiPKU profile

Yifei Li

@YifeiLiPKU

Followers

559

Following

271

Statuses

127

Ph.D. student @osunlp | Prev MSc @PKU1898 | BEng @NEUChina | Prev Intern @MSFTResearch (MSRA) | LLM & NLPer

Columbus, OH

Joined November 2021

Don't wanna be here? Send us removal request.

Yifei Li

@YifeiLiPKU

13 days

RT @RonZiruChen: 🚀Our ScienceAgentBench is covered by @Nature News! With the help of @ShijieChen98 and @YifeiLiPKU, we sampled 20 tasks fro…

0

17

0

Yifei Li

@YifeiLiPKU

13 days

RT @hhsun1: Our ScienceAgentBench in @Nature news! DeepSeek R1 @DeepSeekR1 vs. @OpenAI o1 on data-driven scientific coding tasks: We sampl…

0

32

0

Yifei Li

@YifeiLiPKU

18 days

RT @RonZiruChen: 🎉ScienceAgentBench is accepted at #ICLR2025! 🚀 Ready to step beyond ML R&D? Test your agents on real-world, data-driven…

0

14

0

Yifei Li

@YifeiLiPKU

19 days

RT @LiaoZeyi: Our paper, EIA, has been accepted by #ICLR2025! 🎉 Since the day we submitted EIA, the field of web agents has evolved signif…

0

10

0

Yifei Li

@YifeiLiPKU

19 days

RT @BoyuGouNLP: 🚀 UGround accepted to #ICLR2025 [scores=10/8/8/5]! 🎉 We’re also thrilled to share some exciting updates: ✨ UGround is SOTA…

0

25

0

Yifei Li

@YifeiLiPKU

30 days

🚀🚀 ScienceAgentBench now supports containerized evaluation --- faster (90min ➡️ <30min) and setup-free! Check out for more details ⬇️⬇️

Huan Sun (OSU)

@hhsun1

1 month

Evaluating on our ScienceAgentBench (Coding tasks in Bioinformatics/Chemistry/Geo info science/Cognitive science) just got much easier and faster! Check out our update on containerized evaluation: (1) Task environments are set up in independent docker containers, which eliminates potential package conflicts among different tasks and allows us to remove pip-tools, a major factor of slow evaluation previously. (2) Users can now evaluate their agents using a single bash command and no longer need to set up their own conda environments. (3) With multi-threading, programs for each task can be configured and executed in parallel, reducing the evaluation time to only 20-30 minutes for all 102 tasks. Great efforts led by awesome @YifeiLiPKU and @BotaoYu24 @osunlp!

0

1

11

Yifei Li

@YifeiLiPKU

1 month

RT @BoyuGouNLP: Amazing results! Thanks for testing! The carefully designed synthetic data in UGround-V1 is indeed surprisingly effective 😮…

0

2

0

Yifei Li

@YifeiLiPKU

2 months

RT @LingboMo: 🚀 Excited to announce the release of our Agent Safety Resources Repository! 📚🔍 This GitHub repo curates existing papers, ben…

0

16

0

Yifei Li

@YifeiLiPKU

2 months

RT @xiangyue96: ✈️Flying to #NeurIPS2024 tmr! Excited to reconnect with old friends and meet new ones. I co-authored 6 papers at NeurIPS👇.…

0

59

0

Yifei Li

@YifeiLiPKU

3 months

RT @_TobiasLee: 📢 Introducing VL-RewardBench - A new benchmark for vision-language generative reward models (VL-GenRMs)! 📊Even SOTA models…

0

26

0

Yifei Li

@YifeiLiPKU

3 months

RT @hhsun1: As people are talking about inference scaling, I hope to re-introduce key findings from our earlier work (long paper #ACL24 @ac…

0

22

0

Yifei Li

@YifeiLiPKU

3 months

Excited to see so many advances and participating in the exploration of agent for science!! Check of our latest works here ⬇️⬇️⬇️

Huan Sun (OSU)

@hhsun1

3 months

Very excited to learn that our 2023 paper, "G2Retro as a two-step graph generative models for retrosynthesis prediction ()," (led by @ziqiChen123 and @ningx005) has been selected into Nature's special collection, "Nobel Prize in Physics 2024, This collection highlights high-impact research, reviews, and opinion articles selected from all of Nature's participating journals, celebrating “the direct contributions by the [Nobel Prize] awardees and the advances they have inspired.” Building on the momentum, earlier this year, we released SMolInstruct (a large-scale, high-quality instruction tuning dataset for small molecules) and LlaSMol (state-of-the-art instruction following LLMs for a variety of chemistry tasks), led by awesome @BotaoYu24 @osunlp. Recently, we released ChemAgent (also led by @BotaoYu24), a state-of-the-art chemistry agent equipped with 29 tools for both specialized tasks and general questions in chemistry, built on top of the pioneering work, ChemCrow. Around the same time, we released ScienceAgentBench (led by our great @RonZiruChen), a new coding benchmark for scientific tasks, to rigorously evaluate agents in assisting scientists with programming for data-driven scientific discovery. ScienceAgentBench features 102 *real-world* tasks extracted from 44 peer-reviewed publications across 4 scientific disciplines (Bioinformatics, Computational Chemistry, Geographical Information Science, Psychology & Cognitive Neuroscience). Our latest result shows #o1-preview with self-debug can achieve 42.2% task success rate (10% more than other agents) while costing $0.6-0.7 per task (>10 times more than others). See details about each project in this thread:

0

3

Yifei Li

@YifeiLiPKU

3 months

RT @ysu_nlp: Personally I think planning is the biggest bottleneck for language agents. So I'm super excited to introduce model-based plann…

0

43

0

Yifei Li

@YifeiLiPKU

3 months

RT @BotaoYu24: 🤔 Can LLMs with tools always outperform those without? Perhaps not... 🚀 In our new work, we introduce ChemAgent, an enhance…

0

23

0

Yifei Li

@YifeiLiPKU

3 months

RT @LiaoZeyi: Check out the technique report for AmpleGCG-Plus, which builds on the strengths of AmpleGCG by achieving higher ASR with fewe…

0

4

0

Yifei Li

@YifeiLiPKU

4 months

@_TobiasLee 😄😄

0

Yifei Li

@YifeiLiPKU

4 months

RT @ysu_nlp: 📢 Data release of ScienceAgentBench and new o1 results 🌟o1 is 2X of Claude 3.5 Sonnet with direct prompting, and 10% better w…

0

9

0

Yifei Li

@YifeiLiPKU

4 months

RT @hhsun1: Our ScienceAgentBench is now available! We also included @OpenAI #o1’s performance in our updated draft: o1 nearly doubled the…

0

17

0

Yifei Li

@YifeiLiPKU

4 months

RT @hhsun1: @AnthropicAI's release of a computer use model is both exciting and worrisome to me! Agent capability and safety should go hand…

0

15

0