Qian Liu @sivil_taram profile

Qian Liu

@sivil_taram

Followers

4K

Following

3K

Statuses

1K

Researcher @ TikTok 🇸🇬 📄 Sailor / StarCoder / OpenCoder 💼 Past: Research Scientist @SeaAIL; PhD @MSFTResearch 🧠 Contribution: @XlangNLP @BigCodeProject

Joined November 2021

Don't wanna be here? Send us removal request.

Qian Liu

@sivil_taram

15 days

🚀 After 5 days of DeepSeek-R1, we’ve replicated its pure reinforcement learning magic on math reasoning — no reward models, no supervised fine-tuning, from a base model — and the results are mind-blowing: 🧠 A 7B model + 8K MATH examples for verification + Reinforcement Learning = "aha moment" 🌟 Long Chain-of-Thought and Self Reflection emerge naturally 🔥 Record Math Performance: ✅ 33.3% on AIME ✅ 62.5% on AMC ✅ 77.2% on MATH 📈 Outperforms Qwen2.5-Math-7B-Instruct and matches strong methods like PRIME and rStar-MATH, despite using >50x less data and just the simple PPO algorithm! 🔬 Checkout more details at Junxian's post!

Junxian He

@junxian_he

15 days

We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly strong. 🚀 Starting from Qwen2.5-Math-7B (base model), we perform RL on it directly. No SFT, no reward model, just 8K MATH examples for verification, the resultant model achieves (pass@1) 33.3% on AIME, 62.5% on AMC, and 77.2% on MATH, outperforming Qwen2.5-math-7B-instruct and being comparable to PRIME and rStar-MATH that use >50x more data and more complicated components. 🚀 Increased CoT length and self-reflection emerge We share the details and our findings in the blog: Training code and implementation details here:

13

150

814

Qian Liu

@sivil_taram

2 days

RT @terryyuezhuo: @huybery Thanks @huybery! For anyone interested in this and wanting to follow up, please join our discord! We'll release…

0

2

0

Qian Liu

@sivil_taram

2 days

RT @_akhaliq: swe arena is a hidden gem for vibe coding arena

0

10

0

Qian Liu

@sivil_taram

3 days

RT @TrlWorkshop: We're excited to share that the 4th Table Representation Learning (TRL) workshop will be held at ACL 2025 in Vienna 🇪🇺. W…

0

9

0

Qian Liu

@sivil_taram

3 days

RT @LoubnaBenAllal1: The wait is over: our SmolLM2 paper is out—a detailed guide for building SOTA small LMs. While most LM papers skim ove…

0

99

0

Qian Liu

@sivil_taram

4 days

RT @junxian_he: Nice to see more controlled experiments are performed to reveal the science behind long CoT models. We are also organizing…

0

3

0

Qian Liu

@sivil_taram

4 days

Fantastic insights and takeaways! The discussions following the release of R1 have been incredibly enriching. 🔥The true power of open science.

Xiang Yue

@xiangyue96

4 days

Demystifying Long CoT Reasoning in LLMs Reasoning models like R1 / O1 / O3 have gained massive attention, but their training dynamics remain a mystery. We're taking a first deep dive into understanding long CoT reasoning in LLMs! 11 Major Takeaways👇(long threads)

0

3

18

Qian Liu

@sivil_taram

5 days

RT @samira_abnar: 🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, par…

0

53

0

Qian Liu

@sivil_taram

5 days

RT @GeZhang86038849: Around one week remaining for submission! Share your insights about foundation models with us in SCI-FM @ ICLR 2025!

0

2

0

Qian Liu

@sivil_taram

6 days

@NiJinjie Yes that's a good extension

0

1

Qian Liu

@sivil_taram

6 days

For vocabulary scaling paper (assume your tokenizer scaling law refers to this haha), we do not consider the effect of input embedding parameters to the FLOPs since it is just a look-up. The optimal vocabulary size is bounded by FLOPs budget due to the language model head FLOPs. I think this paper mainly cares about the input embedding size.

1

0

1

Qian Liu

@sivil_taram

6 days

@Muennighoff Congrats Niklas! The LIMA moment for reasoning model!

1

0

3

Qian Liu

@sivil_taram

6 days

RT @BigComProject: Introducing 🏟️SWE Arena: An Open Evaluation Platform for Vibe Coding Unlike the current frontend-dev applications like…

0

28

0

Qian Liu

@sivil_taram

8 days

RT @tuzhaopeng: Are o1-like LLMs thinking deeply enough? Introducing a comprehensive study on the prevalent issue of underthinking in o1-l…

0

103

0

Qian Liu

@sivil_taram

9 days

RT @TianyuPang1: 🤔Can 𝐂𝐡𝐚𝐭𝐛𝐨𝐭 𝐀𝐫𝐞𝐧𝐚 be manipulated? 🚀Our paper shows that You Can Improve Model Rankings on Chatbot Arena by Vote Rigging!…

0

20

0

Qian Liu

@sivil_taram

11 days

YES

Ansong Ni

@AnsongNi

11 days

Disagree. @Guodaya is about the best PhD researcher there is.

0

5

Qian Liu

@sivil_taram

12 days

@AlbalakAlon @sybilhyz @Grad62304977 Fair point and thanks for suggestion on the search platform - very useful!

0

2

Qian Liu

@sivil_taram

13 days

@JustinLin610 @yihengxu_ Yes they have some super power

0

2

Qian Liu

@sivil_taram

13 days

RT @chrmanning: Re: “Every major breakthrough in AI has been American”: America does itself no favors when it overestimates its specialnes…

0

348

0

Qian Liu

@sivil_taram

13 days

RT @abc43992899: 1/n: 🚀 Announcing YuE (乐) – the most powerful open-source full-song music generation model! 🎵 Tackle the lyrics-to-song ta…

0

157

0