![Aman Salykov Profile](https://pbs.twimg.com/profile_images/1880958652393095168/taH2l_QF_x96.jpg)
Aman Salykov
@salykova_
Followers
2K
Following
358
Statuses
59
making AI inference run really fast
Vienna
Joined November 2019
@__tensorcore__ agree with this, but heavily templated code further complicates debugging. I remember one of your devs mentioning that this is (one of) the main reason why using cuda-gdb with CUTLASS is not recommended
0
0
0
RT @awnihannun: Reminder, many institutions outside of the US and China are building amazing foundation models. many-polar world (aka evenβ¦
0
20
0
@PytorchToAtoms @giffmana @cHHillee @PytorchToAtoms btw. it depends on how you perform benchmarks: with locked stable clock or unlocked. in the latter case, sure, your results will be affected by the number of iterations, matrix size, etc. as your clock speed varies due to power limits
1
0
1
@PytorchToAtoms @giffmana @cHHillee yes, but it gives you relative performance among the generated kernels. you can then pick the best cutlass kernel and test it however you like
0
0
1