Wanchao Liang Profile
Wanchao Liang

@wanchao_

Followers
147
Following
309
Statuses
28

Lifelong Hacker PyTorch @ Meta. GitHub: @wanchaol Opinions are my own

Joined May 2022
Don't wanna be here? Send us removal request.
@wanchao_
Wanchao Liang
6 days
@cloneofsimo @OpenAI Wow! The deep research results are actually pretty accurate, even on the examples! I’m quite shocked haha, gonna try use deep research on my work!
0
0
1
@wanchao_
Wanchao Liang
6 days
RT @cloneofsimo: Pytorch docs are sometimes lacking, especially new features lack of real-life code examples. You would read through implem…
0
21
0
@wanchao_
Wanchao Liang
6 days
@jackminong @cloneofsimo @main_horse The google doc was actually the original proposal/RFC which hasn’t been updated much since the development, the most accurate one is PyTorch API Docs as pointed out by deep research! But I guess I should write some detailed tutorials about it sometime 😅
0
0
3
@wanchao_
Wanchao Liang
16 days
@StasBekman @SnowflakeDB Congrats Stas!
1
0
1
@wanchao_
Wanchao Liang
6 months
RT @cHHillee: For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introduc…
0
268
0
@wanchao_
Wanchao Liang
6 months
RT @weifengpy: We have been working on PyTorch native float8 and FSDP2 for distributed training. Check out TorchTitan and TorchAO/float8 ht…
0
5
0
@wanchao_
Wanchao Liang
10 months
RT @iScienceLuvr: The @PyTorch team is developing a library for large model training called torchtitan 👀 They have scripts to train Llama-…
0
199
0
@wanchao_
Wanchao Liang
10 months
RT @PyTorch: Announcing the alpha release of torchtune! torchtune is a PyTorch-native library for fine-tuning LLMs. It combines hackable m…
0
298
0
@wanchao_
Wanchao Liang
10 months
@rvarm1 Congrats on the launch!
0
0
1
@wanchao_
Wanchao Liang
11 months
RT @mvpatel2000: 🚨New🌟blog✍️ on ⏩ maximizing🌙 FLOPS 🚀 Training large models requires maximizing flops/GPU, especially at scale. Excited to…
0
36
0
@wanchao_
Wanchao Liang
1 year
@sam_shleifer @StasBekman but if you want the `MultithreadProcessGroup` to work with the torch launcher or in a multiprocess setting with real train, it's more involved as you need to swap some global objects with thread local object, see how the MultithreadedProcessGroup did this
0
0
0
@wanchao_
Wanchao Liang
1 year
@sam_shleifer @StasBekman If you just want unit tests to work faster with some collectives inside your test case, you can either inherit your test case from `MultiThreadedTestCase` mentioned in the above link, or just use `@spawn_threads_and_init_comms(world_size=4)` which would create a default pg.
0
0
2
@wanchao_
Wanchao Liang
1 year
@StasBekman @sam_shleifer 20s NCCL pg init is still quite insane for single node set up I think, @sam_shleifer do you have a repro that we can look at?
0
0
2
@wanchao_
Wanchao Liang
1 year
RT @marksaroufim: This is a good question, it gets to the root of the tradeoff between performance and flexibility so how do PyTorch folks…
0
83
0
@wanchao_
Wanchao Liang
1 year
@StasBekman @bentleynathan1 Btw I’m not the one that leads the torch.compile + distributed efforts, but it’s my pleasure to work with many talented people from compiler and distributed team to make this happen :)
1
0
1
@wanchao_
Wanchao Liang
1 year
1
1
3
@wanchao_
Wanchao Liang
1 year
@jamesr66a @StasBekman @MSFTDeepSpeed @PyTorch Yeah it was difficult to combine FSDP and PP together as all-gathering parameters for every micro batch is expensive and could drag down the perf significantly. But now there’s way to combine them together, thanks to the BFS schedule paper! Something we’re exploring with PiPPy
1
0
3
@wanchao_
Wanchao Liang
1 year
@StasBekman @MSFTDeepSpeed “We are working on it” haha, I think we’ll have something pretty nice to share soon for sequence parallel
1
0
6