Shichao Song
@Ki_Seki_here
Followers
91
Following
619
Statuses
231
Focus on LLM | CS PhD Student @ RUC | Research Intern @ IAAR | Volunteer @ AI TIME
Beijing, China
Joined August 2021
Why is inference-time scaling crucial, as @OpenAI o1 shows? LLMs learn world knowledge, but naive prompting turns them into just high-level QA databases, losing consistency with their learned knowledge. We need models to incorporate Self-Feedback like o1! Let's dive in! 1/11
2
17
66
It's crazy, everyone, and it's even during the Spring Festival.
The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and post-trained with curated SFT and RLHF recipes. It achieves competitive performance against the top-tier models, and outcompetes DeepSeek V3 in benchmarks like Arena Hard, LiveBench, LiveCodeBench, GPQA-Diamond. 📖 Blog: 💬 Qwen Chat: (choose Qwen2.5-Max as the model) ⚙️ API: (check the code snippet in the blog) 💻 HF Demo: In the future, we not only continue the scaling in pretraining, but also invest in the scaling in RL. We hope that Qwen is able to explore the unknown in the near future! 🔥 💗 Thank you for your support during the past year. See you next year!
0
0
0
RT @RongshengWang: Recommended Interesting Work: "HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs" Paper link:
0
1
0
RT @AdinaYakup: YuLan-Mini 💐 a 2.4B model delivering good performance with just 1.08T tokens by @RenminUniv And they shared all the traini…
0
36
0
Thank you to all my colleagues. @aakas888 @Hanyu_Wang419 @zhgyqc_duguce @fan2goa1 @RucDany @immazzystar 🤗
0
0
1
RT @gm8xx8: YuLan-Mini: A 2.42B-parameter model that punches above its weight. > Data Pipelines: Combines data cleaning and scheduling for…
0
16
0
RT @shelwin_: 🚀I'm releasing Jules, a proof of concept, open-source AI LaTeX Editor. Jules comes with cursor-like ⌘ K for AI Edits and La…
0
25
0