samsja19 Profile Banner
samsja Profile
samsja

@samsja19

Followers
3K
Following
6K
Statuses
2K

training LLM across the globe at @PrimeIntellect

sf
Joined March 2020
Don't wanna be here? Send us removal request.
@samsja19
samsja
2 months
Intellect 1 is out. It's a 10B model trained across 3 continents using 100+ H100s, with 30 individual compute contributors. The evals are good (for 1T tokens), and the model is live. I can't stress how important this release is for open-source AI. Decentralized training is the only path toward sovereign open-source foundation models. This release proves that it's not just a fairy tale - it's working, and it's just the beginning.
@PrimeIntellect
Prime Intellect
2 months
Releasing INTELLECT-1: We’re open-sourcing the first decentralized trained 10B model: - INTELLECT-1 base model & intermediate checkpoints - Pre-training dataset - Post-trained instruct models by @arcee_ai - PRIME training framework - Technical paper with all details
9
54
356
@samsja19
samsja
2 hours
RT @johannes_hage: great work! very interesting approach to initially limit the context length so the model learns to utilize it more effe…
0
1
0
@samsja19
samsja
5 hours
@zack_overflow @Aryvyo I wish this program existed when I was a kid
0
0
1
@samsja19
samsja
7 hours
@mike64_t 🤣🤣🤣
0
0
2
@samsja19
samsja
10 hours
0
0
3
@samsja19
samsja
10 hours
@taha_yssne Noicee
0
0
0
@samsja19
samsja
11 hours
epic what ppl can do with a bit of compute
@mrsiipa
maharshi
13 hours
i'm open-sourcing "smolattn" a minimal implementation of the flash attention 1 algorithm in CUDA that is almost 6 times faster than PyTorch's manual implementation (for small sequence lengths upto 1024). the entire kernel is less than 200 lines of code.
Tweet media one
2
0
26
@samsja19
samsja
11 hours
RT @LoubnaBenAllal1: We just published the second OpenR1 update with OpenR1-220k-Math, our new large-scale dataset for mathematical reasoni…
0
45
0
@samsja19
samsja
11 hours
@AvpElk @GaryMarcus What does pure LLM even mean ? Again one should not make the confusion between architecture (LLM) and objective (next token prediction, rl ,...)
1
0
1
@samsja19
samsja
11 hours
@Ar_Douillard It's a really well written paper
0
0
1
@samsja19
samsja
11 hours
@TimDarcet @giffmana what if it was doing rl indefinitely ? As in the inference is just part of the RL process
0
0
0
@samsja19
samsja
11 hours
@jonasgeiping amazing work
0
0
6
@samsja19
samsja
21 hours
@aryaman2020 @jeremyphoward I was making the argument this approach would be the best way to scale reasoning. I have been proven wrong since tho
0
0
5
@samsja19
samsja
21 hours
@aryaman2020 @jeremyphoward Does not mean that it's the end game 😉
0
0
2
@samsja19
samsja
22 hours
@yaroslavvb @_arohan_ @KhonaMikail nice nice hehe its coming
0
0
2
@samsja19
samsja
22 hours
@Dorialexander @dylan522p @ChrSzegedy But again, I am not picking a camp or trying to say who is right or wrong. But one needs to take the scaling part into perspective
0
0
1
@samsja19
samsja
22 hours
@taha_yssne @Dorialexander this look sick, where can I try it
0
0
0
@samsja19
samsja
22 hours
@Rixhbhco not yet !
0
0
1