AndrewZeng17 Profile Banner
Weihao Zeng Profile
Weihao Zeng

@AndrewZeng17

Followers
412
Following
1K
Statuses
341

LLM Researcher | Incoming PhD @hkust @hkustNLP | Ex-intern @MSFTResearch @Meituan | Research on LLMs Reasoning

HongKong
Joined April 2021
Don't wanna be here? Send us removal request.
@AndrewZeng17
Weihao Zeng
1 month
🚀 Excited to share our latest research: B-STAR! 💡 Tackling the stagnation in self-improvement, we present a framework that dynamically balances exploration & exploitation, unlocking new potential in complex reasoning tasks. 📖 Paper: A 🧵:
4
25
138
@AndrewZeng17
Weihao Zeng
16 days
@sybilhyz @Grad62304977 Very impressive! Is the step here referring to gradient step or rollout step?
0
0
0
@AndrewZeng17
Weihao Zeng
19 days
RT @sivil_taram: 🚀 After 5 days of DeepSeek-R1, we’ve replicated its pure reinforcement learning magic on math reasoning — no reward models…
0
150
0
@AndrewZeng17
Weihao Zeng
19 days
RT @junxian_he: We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly…
0
667
0
@AndrewZeng17
Weihao Zeng
1 month
RT @rohanpaul_ai: B-STAR introduces dynamic balancing of exploration and exploitation during LLM self-improvement training, preventing perf…
0
6
0
@AndrewZeng17
Weihao Zeng
1 month
@xpasky Thank you very much for your interpretation. We firmly believe that exploration and exploitation are key to helping us achieve scalable RL, and we are researching more elegant methods to advance this!
0
0
3
@AndrewZeng17
Weihao Zeng
1 month
🚀 Excited to share our latest research: B-STAR! 💡 Tackling the stagnation in self-improvement, we present a framework that dynamically balances exploration & exploitation, unlocking new potential in complex reasoning tasks.
@gm8xx8
𝚐𝔪𝟾𝚡𝚡𝟾
2 months
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Tweet media one
0
0
5
@AndrewZeng17
Weihao Zeng
1 month
RT @AndrewZeng17: 🚀 Excited to share our latest research: B-STAR! 💡 Tackling the stagnation in self-improvement, we present a framework th…
0
25
0
@AndrewZeng17
Weihao Zeng
1 month
This work was a collaborative effort with the incredible @yuzhenh17 @junxian_he Paper: Code:
0
1
6
@AndrewZeng17
Weihao Zeng
2 months
RT @WeiLiu99: 🔔🎄Christmas Gift for Multimodal Reasoning: Introducing M-STaR 🎁 (1/6) How can we dive deeper to help Large Multimodal Models…
0
36
0
@AndrewZeng17
Weihao Zeng
2 months
RT @gm8xx8: B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Tweet media one
0
24
0
@AndrewZeng17
Weihao Zeng
2 months
RT @xcjthu1: 1/4 🚀 Densing Law of LLMs 🚀 OpenAI's Scaling Law showed how model capabilities scale with size. But what about the trend towa…
0
42
0
@AndrewZeng17
Weihao Zeng
2 months
RT @lilianweng: 🦃 At the end of Thanksgiving holidays, I finally finished the piece on reward hacking. Not an easy one to write, phew. Rew…
0
225
0
@AndrewZeng17
Weihao Zeng
4 months
@SNAT02792153 Great job! Wonder if you have tried MIND to pretrain on a larger model? Since use Llama3-70B-Instruct to generate conversations, which is very powerful, it might be like a form of distillation. Or, have you considered using a less powerful model to generate conversations?
1
0
7
@AndrewZeng17
Weihao Zeng
4 months
🙌
@yqsong
Yangqiu Song
4 months
Follow us @hkustNLP 😁
0
0
0