RonZiruChen Profile Banner
Ziru Chen Profile
Ziru Chen

@RonZiruChen

Followers
492
Following
401
Statuses
115

Ron | チン シジョ. Ph.D. student @osunlp. Researching #NLProc & #ConvAI. “Cogito, ergo sum.”

Joined January 2017
Don't wanna be here? Send us removal request.
@RonZiruChen
Ziru Chen
4 months
🚀 Can language agents automate data-driven scientific discovery? Not yet. But we're making strides. Introducing **ScienceAgentBench**: a new benchmark to rigorously evaluate language agents on 102 tasks from 44 peer-reviewed publications across 4 scientific disciplines. (1/10)
5
40
122
@RonZiruChen
Ziru Chen
13 days
@Nature @ShijieChen98 @YifeiLiPKU More details about ScienceAgentBench:
@RonZiruChen
Ziru Chen
18 days
🎉ScienceAgentBench is accepted at #ICLR2025! 🚀 Ready to step beyond ML R&D? Test your agents on real-world, data-driven R&D tasks across diverse scientific disciplines. 🔬 👇 Resources and previous posts below:
0
0
1
@RonZiruChen
Ziru Chen
18 days
Containerized evaluation update:
@RonZiruChen
Ziru Chen
30 days
🚀ScienceAgentBench evaluation is now containerized! Inspired by SWE-Bench, we leverage Docker for task isolation, enabling multi-threaded execution and slashing evaluation time to under 30 minutes. Plus, evaluate your agents with just one bash command! Great work done by @YifeiLiPKU and @BotaoYu24 @osunlp!
0
0
1
@RonZiruChen
Ziru Chen
30 days
Please check out the following documentation for more details:
0
0
3
@RonZiruChen
Ziru Chen
3 months
RT @hhsun1: Very excited to learn that our 2023 paper, "G2Retro as a two-step graph generative models for retrosynthesis prediction (https:…
0
14
0
@RonZiruChen
Ziru Chen
3 months
RT @hhsun1: As people are talking about inference scaling, I hope to re-introduce key findings from our earlier work (long paper #ACL24 @ac
0
22
0
@RonZiruChen
Ziru Chen
3 months
Hi there, nice work on systematically comparing AI agent's R&D capabilities with expert human performance! I can see that in practice, agents can collaborate with human experts by quickly drafting a reasonably good program, and then experts can further improve it. This is quite related to our recent work, ScienceAgentBench: With 102 real-world R&D tasks for data-driven scientific discovery, we show the potential of agents in boosting scientific productivity by generating program drafts within 10 minutes, while it can take human experts hours of efforts to write the programs. It would be great if you could discuss our benchmark in your paper as a piece of related work on "LLMs for science". Thanks!
0
1
6
@RonZiruChen
Ziru Chen
3 months
RT @yugu_nlp: ❓Wondering how to scale inference-time compute with advanced planning for language agents? 🙋‍♂️Short answer: Using your LLM…
0
89
0
@RonZiruChen
Ziru Chen
3 months
RT @DavidJAlba94: ScienceAgentBench ⚗️ by @RonZiruChen et al. A benchmark for evaluating language agents for data-driven scientific discov…
0
1
0
@RonZiruChen
Ziru Chen
3 months
RT @BotaoYu24: 🤔 Can LLMs with tools always outperform those without? Perhaps not... 🚀 In our new work, we introduce ChemAgent, an enhance…
0
23
0
@RonZiruChen
Ziru Chen
4 months
@mbodhisattwa @ysu_nlp Hi @mbodhisattwa, thanks again for your explanation! We've updated our manuscript accordingly:
1
0
1
@RonZiruChen
Ziru Chen
4 months
🔍 Through a case study, we hypothesize that the low-level plans generated for each task during OpenAI o1’s reasoning process are essential for writing correct programs and identify three strategies OpenAI o1 might have been trained to use: (1) Reiterate task goal and requirements from the input prompt (2) Sketch an implementation plan in a chain-of-thought style (3) Refine the plan in-place with adjustments or improvements This suggests that inference-time compute is not just about scaling the number of tokens generated — it’s also important that LLMs are trained to use the right strategy! 📄Check out our updated preprint for more details: (3/3)
Tweet media one
0
1
3
@RonZiruChen
Ziru Chen
4 months
RT @gneubig: When people ask me "why build agents for software development", my standard response is "if we can agents can write software w…
0
9
0
@RonZiruChen
Ziru Chen
4 months
RT @ysu_nlp: People into agents, let me pitch something to you: 🌟 An agent that works across every platform (web, desktop & mobile) 🌟 Visu…
0
93
0