![Wanchao Liang Profile](https://pbs.twimg.com/profile_images/1529868460972396552/Cw3nDJr6_x96.jpg)
Wanchao Liang
@wanchao_
Followers
147
Following
309
Statuses
28
Lifelong Hacker PyTorch @ Meta. GitHub: @wanchaol Opinions are my own
Joined May 2022
@cloneofsimo @OpenAI Wow! The deep research results are actually pretty accurate, even on the examples! I’m quite shocked haha, gonna try use deep research on my work!
0
0
1
RT @cloneofsimo: Pytorch docs are sometimes lacking, especially new features lack of real-life code examples. You would read through implem…
0
21
0
@jackminong @cloneofsimo @main_horse The google doc was actually the original proposal/RFC which hasn’t been updated much since the development, the most accurate one is PyTorch API Docs as pointed out by deep research! But I guess I should write some detailed tutorials about it sometime 😅
0
0
3
RT @weifengpy: We have been working on PyTorch native float8 and FSDP2 for distributed training. Check out TorchTitan and TorchAO/float8 ht…
0
5
0
RT @iScienceLuvr: The @PyTorch team is developing a library for large model training called torchtitan 👀 They have scripts to train Llama-…
0
199
0
RT @mvpatel2000: 🚨New🌟blog✍️ on ⏩ maximizing🌙 FLOPS 🚀 Training large models requires maximizing flops/GPU, especially at scale. Excited to…
0
36
0
@sam_shleifer @StasBekman but if you want the `MultithreadProcessGroup` to work with the torch launcher or in a multiprocess setting with real train, it's more involved as you need to swap some global objects with thread local object, see how the MultithreadedProcessGroup did this
0
0
0
@sam_shleifer @StasBekman If you just want unit tests to work faster with some collectives inside your test case, you can either inherit your test case from `MultiThreadedTestCase` mentioned in the above link, or just use `@spawn_threads_and_init_comms(world_size=4)` which would create a default pg.
0
0
2
@StasBekman @sam_shleifer 20s NCCL pg init is still quite insane for single node set up I think, @sam_shleifer do you have a repro that we can look at?
0
0
2
RT @marksaroufim: This is a good question, it gets to the root of the tradeoff between performance and flexibility so how do PyTorch folks…
0
83
0
@StasBekman @bentleynathan1 Btw I’m not the one that leads the torch.compile + distributed efforts, but it’s my pleasure to work with many talented people from compiler and distributed team to make this happen :)
1
0
1
@jamesr66a @StasBekman @MSFTDeepSpeed @PyTorch Yeah it was difficult to combine FSDP and PP together as all-gathering parameters for every micro batch is expensive and could drag down the perf significantly. But now there’s way to combine them together, thanks to the BFS schedule paper! Something we’re exploring with PiPPy
1
0
3
@StasBekman @MSFTDeepSpeed “We are working on it” haha, I think we’ll have something pretty nice to share soon for sequence parallel
1
0
6