linghui Profile
linghui

@linghui35877581

Followers
4
Following
250
Statuses
199

Ph.D.

Joined April 2022
Don't wanna be here? Send us removal request.
@linghui35877581
linghui
3 days
RT @QGallouedec: Train an agent with GRPO? Yes, it works! I've made a small demo example if you're interested!
Tweet media one
Tweet media two
0
46
0
@linghui35877581
linghui
6 days
RT @zzlccc: 🚨There May Not be Aha Moment in R1-Zero-like Training: A common belief about the recent R1-Zero-like t…
0
68
0
@linghui35877581
linghui
7 days
RT @priontific: MLX-GRPO trainer prototype by @ActuallyIsaak is functional! 🥳 I've got to do some deeper investigating here, but... initia…
0
7
0
@linghui35877581
linghui
7 days
RT @hahahahohohe: After whole day effort succeeded at running GRPO training of qwen 0.5B on a free colab T4 gpu, but it's suuuper slooow lol
0
1
0
@linghui35877581
linghui
21 days
@hamishivi Nice points! Which small set in the replication part you mean?😂
0
0
0
@linghui35877581
linghui
22 days
RT @KevinQHLin: Exciting progress in AI community over the recent weeks! LLMs -- DeepSeek-R1 (Reasoning), MiniMax-01 (Long context) LMMs -…
0
4
0
@linghui35877581
linghui
22 days
RT @xiangyue96: We are also exploring something similar. But one different thing we observed from our experiments was that the length of th…
0
17
0
@linghui35877581
linghui
23 days
RT @denny_zhou: any benchmark—including ARC-AGI—can be rapidly solved, as long as the task provides a clear evaluation metric that can be u…
0
74
0
@linghui35877581
linghui
23 days
@TsingYoga Awesome! Would have a try!
0
0
0
@linghui35877581
linghui
24 days
RT @natolambert: For those trying to understand DeepSeeks Group Relative Policy Optimization (GRPO): GRPO is just PPO without a value funct…
0
128
0
@linghui35877581
linghui
29 days
RT @knmnyn: 📣Hey #LLM processing folks (#NLProc , @iclr_conf folks), please RT and spread the word! Reasoning and planning are🔥topics in…
0
9
0
@linghui35877581
linghui
1 month
RT @WenhuChen: I spent the weekend reading some recent great math+reasoning papers: 1. AceMath ( 2. rStar-Math (htt…
0
163
0
@linghui35877581
linghui
1 month
RT @denny_zhou: Identifying what to scale is crucial for building LLMs. What’s your favorite inference-time scaling? 1. Scaling self-consi…
0
32
0
@linghui35877581
linghui
1 month
RT @xpasky: Quick primer for non-wizards about the post-MCTS LLM reasoning future (I'm kinda PRIME-pilled rn): How will LLMs learn to reas…
0
47
0
@linghui35877581
linghui
1 month
RT @lifan__yuan: How to unlock advanced reasoning via scalable RL? 🚀Introducing PRIME (Process Reinforcement through Implicit Rewards) and…
0
176
0
@linghui35877581
linghui
1 month
RT @denny_zhou: After "the bitter lesson", people love talking about that search and learning are the two methods that can scale arbitraril…
0
15
0
@linghui35877581
linghui
1 month
RT @ysu_nlp: My 2024 predictions (agents, multimodality, synthetic data, sparsity) all came true, but those were generic and somewhat obvio…
0
49
0
@linghui35877581
linghui
1 month
RT @Francis_YAO_: My 2024 product of the year: Gemini Deep Research. I understand many of you may not be aware of it, but it is definitely…
0
61
0
@linghui35877581
linghui
2 months
RT @ZheqingZhu: Introducing Pokee AI's E-commerce Agent - Now Beta on Shopify! One-click install AI search, chatbot and recommendation to y…
0
1
0