Anton Lozhkov @anton_lozhkov profile

Anton Lozhkov

@anton_lozhkov

Followers

2K

Following

2K

Statuses

425

Open-sourcing Language Models @huggingface ✨

Joined January 2015

Don't wanna be here? Send us removal request.

Anton Lozhkov

@anton_lozhkov

2 months

Introducing 📐FineMath: the best open math pre-training dataset with 50B+ tokens! Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH. Here’s a breakdown 🧵

19

84

365

Anton Lozhkov

@anton_lozhkov

17 hours

why would someone bookmark this 😭

Anton Lozhkov

@anton_lozhkov

20 hours

@Promptmethus Math is the gateway drug, we're moving on to harder stuff this week

1

0

3

Anton Lozhkov

@anton_lozhkov

20 hours

@Promptmethus Math is the gateway drug, we're moving on to harder stuff this week

7

1

23

Anton Lozhkov

@anton_lozhkov

22 hours

@nooriefyi Having enough VRAM for the KV cache, to generate long responses at reasonable batch sizes! @lmsysorg SGLang was essential with their MLA support

0

11

Anton Lozhkov

@anton_lozhkov

22 hours

Stay tuned for more Open R1 updates:

0

1

14

Anton Lozhkov

@anton_lozhkov

2 days

RT @JiaLi52524397: 🚀 NuminaMath 1.5 is here! 🚀 900k+ high-quality competition math problems with CoT solutions, new problem metadata, manua…

0

68

0

Anton Lozhkov

@anton_lozhkov

2 days

RT @LoubnaBenAllal1: We just published the second OpenR1 update with OpenR1-220k-Math, our new large-scale dataset for mathematical reasoni…

0

60

0

Anton Lozhkov

@anton_lozhkov

5 days

RT @bclavie:

0

33

0

Anton Lozhkov

@anton_lozhkov

6 days

RT @simonw: Today I found out about SmolLM2-135M-Instruct, a tiny LLM which quantizes down to just below 100MB... which means it can fit in…

0

64

0

Anton Lozhkov

@anton_lozhkov

7 days

RT @LoubnaBenAllal1: The wait is over: our SmolLM2 paper is out—a detailed guide for building SOTA small LMs. While most LM papers skim ove…

0

104

0

Anton Lozhkov

@anton_lozhkov

8 days

RT @kimmonismus: Within 24 hours, OpenAI's Deep Research has been replicated by an open-source version that already scores 54% on the same…

0

737

0

Anton Lozhkov

@anton_lozhkov

15 days

RT @carrigmat: Complete hardware + software setup for running Deepseek-R1 locally. The actual model, no distillations, and Q8 quantization…

0

4K

0

Anton Lozhkov

@anton_lozhkov

15 days

RT @edwardbeeching: As part of our open reproduction of R1, we have roughly reproduced DeepSeek's MATH-500 eval numbers with Hugging Face's…

0

116

0

Anton Lozhkov

@anton_lozhkov

16 days

RT @lvwerra: We're just a few weeks away from having a fully open pipeline of R1 and everybody who can rent some GPUs can train their own v…

0

20

0

Anton Lozhkov

@anton_lozhkov

19 days

RT @LoubnaBenAllal1:

0

56

0

Anton Lozhkov

@anton_lozhkov

19 days

RT @QGallouedec: Last moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open tr…

0

435

0

Anton Lozhkov

@anton_lozhkov

21 days

RT @andimarafioti: Smol but mighty: • 256M delivers 80% of the performance of our 2.2B model. • 500M hits 90%. Both beat our SOTA 80B model…

0

8

0

Anton Lozhkov

@anton_lozhkov

21 days

RT @andimarafioti: Introducing the smollest VLMs yet! 🤏 SmolVLM (256M & 500M) runs on <1GB GPU memory. Fine-tune it on your laptop and run…

0

122

0

Anton Lozhkov

@anton_lozhkov

23 days

RT @HKydlicek: 🚀 We've boosted MATH benchmark scores for popular models by 65% —no training or model changes needed! The secret? Math-Ve…

0

62

0