Zengyi Qin @qinzytech profile

Zengyi Qin

@qinzytech

Followers

3K

Following

164

Statuses

103

MIT PhD @MIT | Hardcore GenAI Researcher | MyShell | Homepage: https://t.co/bwtUBzigZD

Boston, MA, USA

Joined December 2023

Don't wanna be here? Send us removal request.

Zengyi Qin

@qinzytech

10 months

Training LLMs can be much cheaper than previously thought. 0.1 million USD is sufficient for training LLaMA2-level LLMs🤯 While @OpenAI and @Meta use billions of dollars to train theirs, you can also train yours with much less money. Introducing our open-source project JetMoE: A thread 🧵

53

170

897

Zengyi Qin

@qinzytech

10 days

@krishnakaasyap source is from Huawei employees. BTW in terms of FLOPS they already catched up. But the communication is still a little behind NVIDIA

1

0

1

Zengyi Qin

@qinzytech

17 days

@simone_m_romeo yes. after all this is a research preview

1

0

8

Zengyi Qin

@qinzytech

18 days

@TsingYoga And the DPO version’s weight seems broken. It outputs random words

0

1

Zengyi Qin

@qinzytech

18 days

@TsingYoga I see. That makes sense

0

Zengyi Qin

@qinzytech

24 days

@JoJrobotics @luisbrasroque they can reason but not generalizable enough

1

0

Zengyi Qin

@qinzytech

24 days

@Caffeinix_alche The tasks won't be too long but are sufficient to give o1 a 0 score

2

0

64

Zengyi Qin

@qinzytech

24 days

@srivatsamath We will release and open-source a model that significantly outperforms o1 in computer-use agents and release the benchmark at the same time. Stay tuned

4

2

135

Zengyi Qin

@qinzytech

24 days

@gauranshsoni also almost 0% because their pre-training data does not contain sufficient long-horizon interactive computer-use decision making data

1

49

Zengyi Qin

@qinzytech

29 days

RT @tom_doerr: MeloTTS: A text-to-speech library supporting English, Spanish, French, Chinese,Japanese, and Korean, with various accents an…

0

78

0

Zengyi Qin

@qinzytech

1 month

@CongyueD @iclr_conf DDL TQL

0

2

Zengyi Qin

@qinzytech

1 month

@davidbau Consider this one, which democratizes Large model training and make it accessible to many research labs. Website: Paper:

0

3

Zengyi Qin

@qinzytech

1 month

Many people think @xai's 100K GPU cluster is no longer necessary given @deepseek_ai's success with only 2K GPUs. That is not true. The fact is that compute is always limited. If you have 100K GPUs then you can do a lot of LARGE experiments very QUICKLY, then iterate the model very fast.

1

0

18

Zengyi Qin

@qinzytech

1 month

@Alibaba_Qwen Will definitely grab a coffee with you when I’m back in Hangzhou!

0

3

Zengyi Qin

@qinzytech

1 month

BTW here is a comparison between DeepSeekMoE and JetMoE with similar parameter count. See Table 3 in this screenshot

0

4

Zengyi Qin

@qinzytech

1 month

@ZiqiPang Neither one. We should instead train an agentic one - it should do some bold/risky stuff that big companies like OpenAI won't release due to safety issues

3

0

15