Ziru Chen @RonZiruChen profile

Ziru Chen

@RonZiruChen

Followers

492

Following

401

Statuses

115

Ron | チンシジョ. Ph.D. student @osunlp. Researching #NLProc & #ConvAI. “Cogito, ergo sum.”

Joined January 2017

Don't wanna be here? Send us removal request.

Ziru Chen

@RonZiruChen

4 months

🚀 Can language agents automate data-driven scientific discovery? Not yet. But we're making strides. Introducing **ScienceAgentBench**: a new benchmark to rigorously evaluate language agents on 102 tasks from 44 peer-reviewed publications across 4 scientific disciplines. (1/10)

5

40

122

Ziru Chen

@RonZiruChen

13 days

@Nature @ShijieChen98 @YifeiLiPKU More details about ScienceAgentBench:

Ziru Chen

@RonZiruChen

18 days

🎉ScienceAgentBench is accepted at #ICLR2025! 🚀 Ready to step beyond ML R&D? Test your agents on real-world, data-driven R&D tasks across diverse scientific disciplines. 🔬 👇 Resources and previous posts below:

0

1

Ziru Chen

@RonZiruChen

18 days

Containerized evaluation update:

Ziru Chen

@RonZiruChen

30 days

🚀ScienceAgentBench evaluation is now containerized! Inspired by SWE-Bench, we leverage Docker for task isolation, enabling multi-threaded execution and slashing evaluation time to under 30 minutes. Plus, evaluate your agents with just one bash command! Great work done by @YifeiLiPKU and @BotaoYu24 @osunlp!

0

1

Ziru Chen

@RonZiruChen

30 days

Please check out the following documentation for more details:

0

3

Ziru Chen

@RonZiruChen

3 months

RT @hhsun1: Very excited to learn that our 2023 paper, "G2Retro as a two-step graph generative models for retrosynthesis prediction (https:…

0

14

0

Ziru Chen

@RonZiruChen

3 months

RT @hhsun1: As people are talking about inference scaling, I hope to re-introduce key findings from our earlier work (long paper #ACL24 @ac…

0

22

0

Ziru Chen

@RonZiruChen

3 months

Hi there, nice work on systematically comparing AI agent's R&D capabilities with expert human performance! I can see that in practice, agents can collaborate with human experts by quickly drafting a reasonably good program, and then experts can further improve it. This is quite related to our recent work, ScienceAgentBench: With 102 real-world R&D tasks for data-driven scientific discovery, we show the potential of agents in boosting scientific productivity by generating program drafts within 10 minutes, while it can take human experts hours of efforts to write the programs. It would be great if you could discuss our benchmark in your paper as a piece of related work on "LLMs for science". Thanks!

0

1

6

Ziru Chen

@RonZiruChen

3 months

RT @yugu_nlp: ❓Wondering how to scale inference-time compute with advanced planning for language agents? 🙋‍♂️Short answer: Using your LLM…

0

89

0

Ziru Chen

@RonZiruChen

3 months

RT @DavidJAlba94: ScienceAgentBench ⚗️ by @RonZiruChen et al. A benchmark for evaluating language agents for data-driven scientific discov…

0

1

0

Ziru Chen

@RonZiruChen

3 months

RT @BotaoYu24: 🤔 Can LLMs with tools always outperform those without? Perhaps not... 🚀 In our new work, we introduce ChemAgent, an enhance…

0

23

0

Ziru Chen

@RonZiruChen

4 months

@mbodhisattwa @ysu_nlp Hi @mbodhisattwa, thanks again for your explanation! We've updated our manuscript accordingly:

1

0

1

Ziru Chen

@RonZiruChen

4 months

🔍 Through a case study, we hypothesize that the low-level plans generated for each task during OpenAI o1’s reasoning process are essential for writing correct programs and identify three strategies OpenAI o1 might have been trained to use: (1) Reiterate task goal and requirements from the input prompt (2) Sketch an implementation plan in a chain-of-thought style (3) Refine the plan in-place with adjustments or improvements This suggests that inference-time compute is not just about scaling the number of tokens generated — it’s also important that LLMs are trained to use the right strategy! 📄Check out our updated preprint for more details: (3/3)

0

1

3

Ziru Chen

@RonZiruChen

4 months

RT @gneubig: When people ask me "why build agents for software development", my standard response is "if we can agents can write software w…

0

9

0

Ziru Chen

@RonZiruChen

4 months

RT @ysu_nlp: People into agents, let me pitch something to you: 🌟 An agent that works across every platform (web, desktop & mobile) 🌟 Visu…

0

93

0