sbmaruf Profile Banner
M Saiful Bari (MARUF) Profile
M Saiful Bari (MARUF)

@sbmaruf

Followers
638
Following
2K
Statuses
3K

MIT TR35. @NTU, Singapore, Intern'20,21,22 (@awscloud), T0, BLOOMZ, UXLA, xCodeEval, I train LLM at SDAIA! - Scaling Maximalist, Core maintainer of ALLaM

Singapore
Joined June 2010
Don't wanna be here? Send us removal request.
@sbmaruf
M Saiful Bari (MARUF)
1 month
Excited to share that I’ve been recognized as an "Innovator Under 35" by MIT Technology Review from MENA Region! After earning my Ph.D. from @NTUsg, Singapore, I joined the "National Center for Artificial Intelligence (NCAI), SDAIA" (@SDAIA_SA) to work on ALLaM, Arabic Large Language Model, a nationwide initiative aimed at developing Sovereign LLM. We were among the first few organizations to successfully scale the both pretraining and continuous pretraining. A massive shoutout to my incredible manager @areebsa and mentor @haidarkk1 for their unwavering support (and for tolerating my endless YOLOs and arguments!). Thanks to @mtaasnim, @ajabal4 and @y_alnumay who still put up with me. The saddest part during this year was when @haidarkk1 left NCAI, I felt like a part of me just vanished, it was very difficult initially. I don't have anyone to do pointless arguments. :( I genuinely believe NCAI, SDAIA is on the path to achieving Artificial Superintelligence (ASI) depending on few small but critical factors. The gap with the frontier labs might just be 12–18 months. Finally truely grateful to our former Chief Scientist @ehsan_hoque for his inspiration and constant support and my academic parent @JotyShafiq just be there for me all the time. "Ganbare kuruko, akiramenaide"
@TechReviewAR
إم آي تي تكنولوجي ريفيو
1 month
#سيف_الباري من #بنغلاديش الفائز بجائزة #مبتكرون_دون35 العالمية في نسختها السابعة لعام 2024، وذلك لابتكاره "علّام (نموذج اللغة العربية الكبير)، نموذج ذكاء اصطناعي متقدم يحافظ على الفروق اللغوية والثقافية للغة العربية، مما يتيح تطبيقات شاملة في مجالات التعليم والرعاية الصحية والخدمات الحكومية". @sbmaruf
Tweet media one
11
2
33
@sbmaruf
M Saiful Bari (MARUF)
11 hours
@TheXeophon @lvwerra Haters will say the pdf costs 6M 😏
0
0
1
@sbmaruf
M Saiful Bari (MARUF)
12 hours
RT @_arohan_: This was a cool shampoo variant submission in the competition that Sai build off his CASPR work. Interestingly it beats Shamp…
0
6
0
@sbmaruf
M Saiful Bari (MARUF)
13 hours
@haidarkk1 just started. in ~30 minutes.
1
0
1
@sbmaruf
M Saiful Bari (MARUF)
14 hours
RT @soldni: This is a very sensible playbook for OpenAI, idk why everyone acts surprised. If OpenAI wants to be a consumer-first platform…
0
2
0
@sbmaruf
M Saiful Bari (MARUF)
15 hours
@iScienceLuvr why you are bullish on cerebras?
0
0
0
@sbmaruf
M Saiful Bari (MARUF)
1 day
@agihippo Who is that person? I actually believe in skilled dictatorship in llm training, it actually works.
0
0
1
@sbmaruf
M Saiful Bari (MARUF)
1 day
@agihippo It would require at least 2 years. They are not getting anywhere until Malaysia builds the 1GWatt cluster.
0
0
0
@sbmaruf
M Saiful Bari (MARUF)
2 days
Finally @sama is talking what he should be talking. "just compete with us building a better model"
0
0
0
@sbmaruf
M Saiful Bari (MARUF)
3 days
@GeZhang86038849 Interesting paper. Here is a kinda similar work but a different execution.
0
0
0
@sbmaruf
M Saiful Bari (MARUF)
4 days
🥱
@andrew_n_carr
Andrew Carr (e/🤸)
4 days
AI dot com redirects to DeepSeek 🐋
0
0
0
@sbmaruf
M Saiful Bari (MARUF)
4 days
@cursor_ai Is there any way I can get the logs of my queries and responses in the Chat, Composer, and Bug Finder tabs?
0
0
0
@sbmaruf
M Saiful Bari (MARUF)
4 days
@prajdabre1 after episode 5, I couldn't wait. In case if you want to get the entire story, it's here.
0
0
2
@sbmaruf
M Saiful Bari (MARUF)
4 days
@natolambert Currently in OSS, what's missing is scalable implementation of these algos. specially things that runs in both horizontal and vertical scaling.
0
0
0
@sbmaruf
M Saiful Bari (MARUF)
4 days
I liked this paper a lot. Nice analysis.
@omarsar0
elvis
6 days
Really cool paper! It studies how LLMs develop extended CoT reasoning, focusing on RL and compute scaling. Key insights include: SFT boosts performance – While not strictly necessary, SFT simplifies training and increases efficiency. Models fine-tuned with long CoT data achieve higher accuracy than those using short CoT sequences. Reward shaping is crucial for stable RL – They find that naive RL approaches don’t always extend CoT length effectively. To address this, it introduces a cosine length-scaling reward with repetition penalties, which balances reasoning depth and prevents meaningless length increases. Scaling verifiable reward signals – RL models trained with noisy, web-extracted “silver” supervision signals can generalize better to OOD tasks, such as STEM reasoning. Filtering such data is crucial to maintaining training stability. Emergent reasoning abilities in base models – Skills like error correction and backtracking exist in base models but require careful RL incentives to be effectively utilized in complex tasks. This paper provides a good roadmap for folks looking to refine CoT training strategies for LLMs, highlighting how RL and reward tuning impact reasoning depth.
Tweet media one
0
1
3
@sbmaruf
M Saiful Bari (MARUF)
4 days
Twitter has become unbearable these days due to the bots. @elonmusk do something.
0
0
0
@sbmaruf
M Saiful Bari (MARUF)
4 days
@madiator brandon is an OG!
0
0
3
@sbmaruf
M Saiful Bari (MARUF)
6 days
RT @askalphaxiv: We used Gemini 2 Flash to build Cursor for arXiv papers Highlight any section of a paper to ask questions and “@” other p…
0
172
0
@sbmaruf
M Saiful Bari (MARUF)
6 days
RT @gneubig: LLMs are starting to have personalities. User: How are you? GPT-4o: Responds with 4 rocket emojis 🚀 Deepseek-R1: Thinks for…
0
11
0