sivil_taram Profile Banner
Qian Liu Profile
Qian Liu

@sivil_taram

Followers
4K
Following
3K
Statuses
1K

Researcher @ TikTok 🇸🇬 📄 Sailor / StarCoder / OpenCoder 💼 Past: Research Scientist @SeaAIL; PhD @MSFTResearch 🧠 Contribution: @XlangNLP @BigCodeProject

Joined November 2021
Don't wanna be here? Send us removal request.
@sivil_taram
Qian Liu
15 days
🚀 After 5 days of DeepSeek-R1, we’ve replicated its pure reinforcement learning magic on math reasoning — no reward models, no supervised fine-tuning, from a base model — and the results are mind-blowing: 🧠 A 7B model + 8K MATH examples for verification + Reinforcement Learning = "aha moment" 🌟 Long Chain-of-Thought and Self Reflection emerge naturally 🔥 Record Math Performance: ✅ 33.3% on AIME  ✅ 62.5% on AMC  ✅ 77.2% on MATH 📈 Outperforms Qwen2.5-Math-7B-Instruct and matches strong methods like PRIME and rStar-MATH, despite using >50x less data and just the simple PPO algorithm! 🔬 Checkout more details at Junxian's post!
@junxian_he
Junxian He
15 days
We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly strong. 🚀 Starting from Qwen2.5-Math-7B (base model), we perform RL on it directly. No SFT, no reward model, just 8K MATH examples for verification, the resultant model achieves (pass@1) 33.3% on AIME, 62.5% on AMC, and 77.2% on MATH, outperforming Qwen2.5-math-7B-instruct and being comparable to PRIME and rStar-MATH that use >50x more data and more complicated components. 🚀 Increased CoT length and self-reflection emerge We share the details and our findings in the blog: Training code and implementation details here:
Tweet media one
Tweet media two
13
150
814
@sivil_taram
Qian Liu
2 days
RT @terryyuezhuo: @huybery Thanks @huybery! For anyone interested in this and wanting to follow up, please join our discord! We'll release…
0
2
0
@sivil_taram
Qian Liu
2 days
RT @_akhaliq: swe arena is a hidden gem for vibe coding arena
0
10
0
@sivil_taram
Qian Liu
3 days
RT @TrlWorkshop: We're excited to share that the 4th Table Representation Learning (TRL) workshop will be held at ACL 2025 in Vienna 🇪🇺. W…
0
9
0
@sivil_taram
Qian Liu
3 days
RT @LoubnaBenAllal1: The wait is over: our SmolLM2 paper is out—a detailed guide for building SOTA small LMs. While most LM papers skim ove…
0
99
0
@sivil_taram
Qian Liu
4 days
RT @junxian_he: Nice to see more controlled experiments are performed to reveal the science behind long CoT models. We are also organizing…
0
3
0
@sivil_taram
Qian Liu
4 days
Fantastic insights and takeaways! The discussions following the release of R1 have been incredibly enriching. 🔥The true power of open science.
@xiangyue96
Xiang Yue
4 days
Demystifying Long CoT Reasoning in LLMs Reasoning models like R1 / O1 / O3 have gained massive attention, but their training dynamics remain a mystery. We're taking a first deep dive into understanding long CoT reasoning in LLMs! 11 Major Takeaways👇(long threads)
Tweet media one
0
3
18
@sivil_taram
Qian Liu
5 days
RT @samira_abnar: 🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, par…
0
53
0
@sivil_taram
Qian Liu
5 days
RT @GeZhang86038849: Around one week remaining for submission! Share your insights about foundation models with us in SCI-FM @ ICLR 2025!
0
2
0
@sivil_taram
Qian Liu
6 days
@NiJinjie Yes that's a good extension
0
0
1
@sivil_taram
Qian Liu
6 days
For vocabulary scaling paper (assume your tokenizer scaling law refers to this haha), we do not consider the effect of input embedding parameters to the FLOPs since it is just a look-up. The optimal vocabulary size is bounded by FLOPs budget due to the language model head FLOPs. I think this paper mainly cares about the input embedding size.
1
0
1
@sivil_taram
Qian Liu
6 days
@Muennighoff Congrats Niklas! The LIMA moment for reasoning model!
1
0
3
@sivil_taram
Qian Liu
6 days
RT @BigComProject: Introducing 🏟️SWE Arena: An Open Evaluation Platform for Vibe Coding Unlike the current frontend-dev applications like…
0
28
0
@sivil_taram
Qian Liu
8 days
RT @tuzhaopeng: Are o1-like LLMs thinking deeply enough? Introducing a comprehensive study on the prevalent issue of underthinking in o1-l…
0
103
0
@sivil_taram
Qian Liu
9 days
RT @TianyuPang1: 🤔Can 𝐂𝐡𝐚𝐭𝐛𝐨𝐭 𝐀𝐫𝐞𝐧𝐚 be manipulated? 🚀Our paper shows that You Can Improve Model Rankings on Chatbot Arena by Vote Rigging!…
0
20
0
@sivil_taram
Qian Liu
11 days
YES
@AnsongNi
Ansong Ni
11 days
Disagree. @Guodaya is about the best PhD researcher there is.
0
0
5
@sivil_taram
Qian Liu
12 days
@AlbalakAlon @sybilhyz @Grad62304977 Fair point and thanks for suggestion on the search platform - very useful!
0
0
2
@sivil_taram
Qian Liu
13 days
@JustinLin610 @yihengxu_ Yes they have some super power
0
0
2
@sivil_taram
Qian Liu
13 days
RT @chrmanning: Re: “Every major breakthrough in AI has been American”: America does itself no favors when it overestimates its specialnes…
0
348
0
@sivil_taram
Qian Liu
13 days
RT @abc43992899: 1/n: 🚀 Announcing YuE (乐) – the most powerful open-source full-song music generation model! 🎵 Tackle the lyrics-to-song ta…
0
157
0