Wanchao Liang @wanchao_ profile

Wanchao Liang

@wanchao_

Followers

147

Following

309

Statuses

28

Lifelong Hacker PyTorch @ Meta. GitHub: @wanchaol Opinions are my own

Joined May 2022

Don't wanna be here? Send us removal request.

Wanchao Liang

@wanchao_

6 days

@cloneofsimo @OpenAI Wow! The deep research results are actually pretty accurate, even on the examples! I’m quite shocked haha, gonna try use deep research on my work!

0

1

Wanchao Liang

@wanchao_

6 days

RT @cloneofsimo: Pytorch docs are sometimes lacking, especially new features lack of real-life code examples. You would read through implem…

0

21

0

Wanchao Liang

@wanchao_

6 days

@jackminong @cloneofsimo @main_horse The google doc was actually the original proposal/RFC which hasn’t been updated much since the development, the most accurate one is PyTorch API Docs as pointed out by deep research! But I guess I should write some detailed tutorials about it sometime 😅

0

3

Wanchao Liang

@wanchao_

16 days

@StasBekman @SnowflakeDB Congrats Stas!

1

0

1

Wanchao Liang

@wanchao_

6 months

RT @cHHillee: For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introduc…

0

268

0

Wanchao Liang

@wanchao_

6 months

RT @weifengpy: We have been working on PyTorch native float8 and FSDP2 for distributed training. Check out TorchTitan and TorchAO/float8 ht…

0

5

0

Wanchao Liang

@wanchao_

10 months

RT @iScienceLuvr: The @PyTorch team is developing a library for large model training called torchtitan 👀 They have scripts to train Llama-…

0

199

0

Wanchao Liang

@wanchao_

10 months

RT @PyTorch: Announcing the alpha release of torchtune! torchtune is a PyTorch-native library for fine-tuning LLMs. It combines hackable m…

0

298

0

Wanchao Liang

@wanchao_

10 months

@rvarm1 Congrats on the launch!

0

1

Wanchao Liang

@wanchao_

11 months

RT @mvpatel2000: 🚨New🌟blog✍️ on ⏩ maximizing🌙 FLOPS 🚀 Training large models requires maximizing flops/GPU, especially at scale. Excited to…

0

36

0

Wanchao Liang

@wanchao_

1 year

@sam_shleifer @StasBekman but if you want the `MultithreadProcessGroup` to work with the torch launcher or in a multiprocess setting with real train, it's more involved as you need to swap some global objects with thread local object, see how the MultithreadedProcessGroup did this

0

Wanchao Liang

@wanchao_

1 year

@sam_shleifer @StasBekman If you just want unit tests to work faster with some collectives inside your test case, you can either inherit your test case from `MultiThreadedTestCase` mentioned in the above link, or just use `@spawn_threads_and_init_comms(world_size=4)` which would create a default pg.

0

2

Wanchao Liang

@wanchao_

1 year

@StasBekman @sam_shleifer 20s NCCL pg init is still quite insane for single node set up I think, @sam_shleifer do you have a repro that we can look at?

0

2

Wanchao Liang

@wanchao_

1 year

RT @marksaroufim: This is a good question, it gets to the root of the tradeoff between performance and flexibility so how do PyTorch folks…

0

83

0

Wanchao Liang

@wanchao_

1 year

@StasBekman @bentleynathan1 Btw I’m not the one that leads the torch.compile + distributed efforts, but it’s my pleasure to work with many talented people from compiler and distributed team to make this happen :)

1

0

1

Wanchao Liang

@wanchao_

1 year

@StasBekman @jamesr66a @MSFTDeepSpeed @PyTorch This one:

1

3

Wanchao Liang

@wanchao_

1 year

@jamesr66a @StasBekman @MSFTDeepSpeed @PyTorch Yeah it was difficult to combine FSDP and PP together as all-gathering parameters for every micro batch is expensive and could drag down the perf significantly. But now there’s way to combine them together, thanks to the BFS schedule paper! Something we’re exploring with PiPPy

1

0

3

Wanchao Liang

@wanchao_

1 year

@StasBekman @MSFTDeepSpeed “We are working on it” haha, I think we’ll have something pretty nice to share soon for sequence parallel

1

0

6