anton_lozhkov Profile Banner
Anton Lozhkov Profile
Anton Lozhkov

@anton_lozhkov

Followers
2K
Following
2K
Statuses
425

Open-sourcing Language Models @huggingface ✨

Joined January 2015
Don't wanna be here? Send us removal request.
@anton_lozhkov
Anton Lozhkov
2 months
Introducing 📐FineMath: the best open math pre-training dataset with 50B+ tokens! Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH. Here’s a breakdown 🧵
Tweet media one
19
84
365
@anton_lozhkov
Anton Lozhkov
17 hours
why would someone bookmark this 😭
@anton_lozhkov
Anton Lozhkov
20 hours
@Promptmethus Math is the gateway drug, we're moving on to harder stuff this week
1
0
3
@anton_lozhkov
Anton Lozhkov
20 hours
@Promptmethus Math is the gateway drug, we're moving on to harder stuff this week
7
1
23
@anton_lozhkov
Anton Lozhkov
22 hours
@nooriefyi Having enough VRAM for the KV cache, to generate long responses at reasonable batch sizes! @lmsysorg SGLang was essential with their MLA support
0
0
11
@anton_lozhkov
Anton Lozhkov
22 hours
Stay tuned for more Open R1 updates:
0
1
14
@anton_lozhkov
Anton Lozhkov
2 days
RT @JiaLi52524397: 🚀 NuminaMath 1.5 is here! 🚀 900k+ high-quality competition math problems with CoT solutions, new problem metadata, manua…
0
68
0
@anton_lozhkov
Anton Lozhkov
2 days
RT @LoubnaBenAllal1: We just published the second OpenR1 update with OpenR1-220k-Math, our new large-scale dataset for mathematical reasoni…
0
60
0
@anton_lozhkov
Anton Lozhkov
5 days
RT @bclavie:
Tweet media one
0
33
0
@anton_lozhkov
Anton Lozhkov
6 days
RT @simonw: Today I found out about SmolLM2-135M-Instruct, a tiny LLM which quantizes down to just below 100MB... which means it can fit in…
0
64
0
@anton_lozhkov
Anton Lozhkov
7 days
RT @LoubnaBenAllal1: The wait is over: our SmolLM2 paper is out—a detailed guide for building SOTA small LMs. While most LM papers skim ove…
0
104
0
@anton_lozhkov
Anton Lozhkov
8 days
RT @kimmonismus: Within 24 hours, OpenAI's Deep Research has been replicated by an open-source version that already scores 54% on the same…
0
737
0
@anton_lozhkov
Anton Lozhkov
15 days
RT @carrigmat: Complete hardware + software setup for running Deepseek-R1 locally. The actual model, no distillations, and Q8 quantization…
0
4K
0
@anton_lozhkov
Anton Lozhkov
15 days
RT @edwardbeeching: As part of our open reproduction of R1, we have roughly reproduced DeepSeek's MATH-500 eval numbers with Hugging Face's…
0
116
0
@anton_lozhkov
Anton Lozhkov
16 days
RT @lvwerra: We're just a few weeks away from having a fully open pipeline of R1 and everybody who can rent some GPUs can train their own v…
0
20
0
@anton_lozhkov
Anton Lozhkov
19 days
Tweet media one
0
56
0
@anton_lozhkov
Anton Lozhkov
19 days
RT @QGallouedec: Last moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open tr…
0
435
0
@anton_lozhkov
Anton Lozhkov
21 days
RT @andimarafioti: Smol but mighty: • 256M delivers 80% of the performance of our 2.2B model. • 500M hits 90%. Both beat our SOTA 80B model…
0
8
0
@anton_lozhkov
Anton Lozhkov
21 days
RT @andimarafioti: Introducing the smollest VLMs yet! 🤏 SmolVLM (256M & 500M) runs on <1GB GPU memory. Fine-tune it on your laptop and run…
0
122
0
@anton_lozhkov
Anton Lozhkov
23 days
RT @HKydlicek: 🚀 We've boosted MATH benchmark scores for popular models by 65% —no training or model changes needed! The secret? Math-Ve…
0
62
0