![Jackmin Profile](https://pbs.twimg.com/profile_images/1837142927296057346/HrJ3F5sj_x96.jpg)
Jackmin
@jackminong
Followers
710
Following
6K
Statuses
341
π²πΎ. Making GPUs go brr @PrimeIntellect πΊπΈ. Previously @JinaAI_ π©πͺ.
San Francisco, CA
Joined October 2021
In order to support dynamically on and off-boarding compute to the run, we introduced a new distributed abstraction `ElasticDeviceMesh` which manages the resizing of process groups without requiring a cold restart. More info on this, along with some other interesting innovations we did to pull this off in our blog post:
Announcing INTELLECT-1: the first-ever decentralized training of a 10B model Scaling decentralized training 10x beyond prior efforts. Anyone can join us to build open-source AGI π¦
4
9
57
This seems like a good idea more people should be aware of. The choice of hyperparameters for the relative heights of the correct and wrong piece are interesting though. Is there an accuracy difference between Reward B and C? If not, does that mean you should always pick larger correct reward > wrong?
Takeaway 4: Reward shaping can be used to stabilize and control CoT length while improving accuracy. We designed a reward function (Cosine Reward) to use CoT length as an additional input to stabilize emergent length scaling.
0
0
4
@bronzeagepapi We will be sure to inform you guys if it discovers any quantum mechanic glitches during the run π«‘
1
0
2
RT @MatternJustus: The path to our first reasoning model consists of three steps: 1. Generating cold-start reasoning data 2. SFT on cold-sβ¦
0
11
0
@shxf0072 has there been any papers using GRPO for non LLM tasks? why should it only be great at optimizing LLM?
2
0
4
@teortaxesTex We need more memes! Memes allow people to say what they actually want to say without the full commitment of having said it explicitly
0
0
1
RT @srush_nlp: Got talked into giving a DeepSeek talk this afternoon Not sure I have anything new to say here! Butβ¦
0
53
0
@CamutoDante i dont quite get what u mean here. intermediate activations are deterministic given the same input sequence and model. there isnt much a malicious or a benevolent actor can do to alter them
0
0
0
RT @huseinzol05: @mesolitica released Malaysian TTS models including dataset! Special thanks to @jackminong and @PrimeIntellect for the comβ¦
0
3
0