Weihao Zeng @AndrewZeng17 profile

Weihao Zeng

@AndrewZeng17

Followers

412

Following

1K

Statuses

341

LLM Researcher | Incoming PhD @hkust @hkustNLP | Ex-intern @MSFTResearch @Meituan | Research on LLMs Reasoning

HongKong

Joined April 2021

Don't wanna be here? Send us removal request.

Weihao Zeng

@AndrewZeng17

1 month

🚀 Excited to share our latest research: B-STAR! 💡 Tackling the stagnation in self-improvement, we present a framework that dynamically balances exploration & exploitation, unlocking new potential in complex reasoning tasks. 📖 Paper: A 🧵:

4

25

138

Weihao Zeng

@AndrewZeng17

16 days

@sybilhyz @Grad62304977 Very impressive! Is the step here referring to gradient step or rollout step?

0

Weihao Zeng

@AndrewZeng17

19 days

RT @sivil_taram: 🚀 After 5 days of DeepSeek-R1, we’ve replicated its pure reinforcement learning magic on math reasoning — no reward models…

0

150

0

Weihao Zeng

@AndrewZeng17

19 days

RT @junxian_he: We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly…

0

667

0

Weihao Zeng

@AndrewZeng17

1 month

RT @rohanpaul_ai: B-STAR introduces dynamic balancing of exploration and exploitation during LLM self-improvement training, preventing perf…

0

6

0

Weihao Zeng

@AndrewZeng17

1 month

@xpasky Thank you very much for your interpretation. We firmly believe that exploration and exploitation are key to helping us achieve scalable RL, and we are researching more elegant methods to advance this！

0

3

Weihao Zeng

@AndrewZeng17

1 month

🚀 Excited to share our latest research: B-STAR! 💡 Tackling the stagnation in self-improvement, we present a framework that dynamically balances exploration & exploitation, unlocking new potential in complex reasoning tasks.

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

0

5

Weihao Zeng

@AndrewZeng17

1 month

RT @AndrewZeng17: 🚀 Excited to share our latest research: B-STAR! 💡 Tackling the stagnation in self-improvement, we present a framework th…

0

25

0

Weihao Zeng

@AndrewZeng17

1 month

This work was a collaborative effort with the incredible @yuzhenh17 @junxian_he Paper: Code:

0

1

6

Weihao Zeng

@AndrewZeng17

2 months

RT @WeiLiu99: 🔔🎄Christmas Gift for Multimodal Reasoning: Introducing M-STaR 🎁 (1/6) How can we dive deeper to help Large Multimodal Models…

0

36

0

Weihao Zeng

@AndrewZeng17

2 months

RT @gm8xx8: B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

0

24

0

Weihao Zeng

@AndrewZeng17

2 months

RT @xcjthu1: 1/4 🚀 Densing Law of LLMs 🚀 OpenAI's Scaling Law showed how model capabilities scale with size. But what about the trend towa…

0

42

0

Weihao Zeng

@AndrewZeng17

2 months

RT @lilianweng: 🦃 At the end of Thanksgiving holidays, I finally finished the piece on reward hacking. Not an easy one to write, phew. Rew…

0

225

0

Weihao Zeng

@AndrewZeng17

4 months

@SNAT02792153 Great job! Wonder if you have tried MIND to pretrain on a larger model? Since use Llama3-70B-Instruct to generate conversations, which is very powerful, it might be like a form of distillation. Or, have you considered using a less powerful model to generate conversations?

1

0

7

Weihao Zeng

@AndrewZeng17

4 months

🙌

Yangqiu Song

@yqsong

4 months

Follow us @hkustNLP 😁

0