samsja @samsja19 profile

samsja

@samsja19

Followers

3K

Following

6K

Statuses

2K

training LLM across the globe at @PrimeIntellect

sf

Joined March 2020

Don't wanna be here? Send us removal request.

samsja

@samsja19

2 months

Intellect 1 is out. It's a 10B model trained across 3 continents using 100+ H100s, with 30 individual compute contributors. The evals are good (for 1T tokens), and the model is live. I can't stress how important this release is for open-source AI. Decentralized training is the only path toward sovereign open-source foundation models. This release proves that it's not just a fairy tale - it's working, and it's just the beginning.

Prime Intellect

@PrimeIntellect

2 months

Releasing INTELLECT-1: We’re open-sourcing the first decentralized trained 10B model: - INTELLECT-1 base model & intermediate checkpoints - Pre-training dataset - Post-trained instruct models by @arcee_ai - PRIME training framework - Technical paper with all details

9

54

356

samsja

@samsja19

2 hours

RT @johannes_hage: great work! very interesting approach to initially limit the context length so the model learns to utilize it more effe…

0

1

0

samsja

@samsja19

5 hours

@zack_overflow @Aryvyo I wish this program existed when I was a kid

0

1

samsja

@samsja19

7 hours

@mike64_t 🤣🤣🤣

0

2

samsja

@samsja19

10 hours

@mrsiipa @PrimeIntellect 😍

0

3

samsja

@samsja19

10 hours

@taha_yssne Noicee

0

samsja

@samsja19

11 hours

epic what ppl can do with a bit of compute

maharshi

@mrsiipa

13 hours

i'm open-sourcing "smolattn" a minimal implementation of the flash attention 1 algorithm in CUDA that is almost 6 times faster than PyTorch's manual implementation (for small sequence lengths upto 1024). the entire kernel is less than 200 lines of code.

2

0

26

samsja

@samsja19

11 hours

RT @LoubnaBenAllal1: We just published the second OpenR1 update with OpenR1-220k-Math, our new large-scale dataset for mathematical reasoni…

0

45

0

samsja

@samsja19

11 hours

@AvpElk @GaryMarcus What does pure LLM even mean ? Again one should not make the confusion between architecture (LLM) and objective (next token prediction, rl ,...)

1

0

1

samsja

@samsja19

11 hours

@Ar_Douillard It's a really well written paper

0

1

samsja

@samsja19

11 hours

@TimDarcet @giffmana what if it was doing rl indefinitely ? As in the inference is just part of the RL process

0

samsja

@samsja19

11 hours

@jonasgeiping amazing work

0

6

samsja

@samsja19

21 hours

@aryaman2020 @jeremyphoward I was making the argument this approach would be the best way to scale reasoning. I have been proven wrong since tho

0

5

samsja

@samsja19

21 hours

@aryaman2020 @jeremyphoward Does not mean that it's the end game 😉

0

2

samsja

@samsja19

22 hours

@yaroslavvb @_arohan_ @KhonaMikail nice nice hehe its coming

0

2

samsja

@samsja19

22 hours

@Dorialexander @dylan522p @ChrSzegedy But again, I am not picking a camp or trying to say who is right or wrong. But one needs to take the scaling part into perspective

0

1

samsja

@samsja19

22 hours

@taha_yssne @Dorialexander this look sick, where can I try it

0

samsja

@samsja19

22 hours

@Rixhbhco not yet !

0

1