Sparsh Tewatia Profile
Sparsh Tewatia

@spteotia

Followers
59
Following
983
Statuses
187

Next token prediction enjoyer Currently doing MS in AI

United Kingdom
Joined December 2010
Don't wanna be here? Send us removal request.
@spteotia
Sparsh Tewatia
14 days
@JFPuget @kaggle or tpus
0
0
0
@spteotia
Sparsh Tewatia
21 days
@Jsevillamol Do the full evaluation on Deepseek R1, and distilled models please for frontier math.
0
0
1
@spteotia
Sparsh Tewatia
22 days
@_lewtun @EpochAIResearch It could tell about how much "UNFAIR"advantage OpenAI had with access to the data as on other benchmarks it is comparable.
0
0
3
@spteotia
Sparsh Tewatia
1 month
@casper_hansen_ It goes down to how much data / tokens you have for training, if 8k is enough for your use-case and you have few rows , by few meaning less than 50K, then it is better to use instruct model , if you have say 100k+ , then base model can also be used.
0
0
0
@spteotia
Sparsh Tewatia
1 month
@ethayarajh @YouJiacheng Lol! Exact same thing came to mind, it is so easier to be fooled just seeing the loss curves.
0
0
0
@spteotia
Sparsh Tewatia
1 month
You know the joke about Chinese AI developers , they made them a model they couldn't understand.
0
0
0
@spteotia
Sparsh Tewatia
2 months
@Locchiu You see L5 in our lifetime ?
1
0
0
@spteotia
Sparsh Tewatia
2 months
@VictorTaelin What makes you think any of them don't have enough budget for their respective projects lol?
0
0
0
@spteotia
Sparsh Tewatia
2 months
@DaveShapi @AlwaysUhhJustin @ardabrowski Yes, you didn't said about consciousness, that's the point even if you can get the nature of algorithms, math and computation right , than also this problem remains about consciousness which would need to be tackled for self aware AI.
0
0
1
@spteotia
Sparsh Tewatia
2 months
@ardabrowski @DaveShapi @AlwaysUhhJustin Read about hard problem of consciousness, it is not classical computation that's for sure, we don't know much about brain at all
1
0
0
@spteotia
Sparsh Tewatia
2 months
RT @AlbertQJiang: Before NeurIPS, I write down some thoughts about AI4Math and why I am doing LLMs and informal reasoning now. https://t.c…
0
20
0
@spteotia
Sparsh Tewatia
3 months
@omni_georgio A bit more prompt optimization is needed maybe try DSPy if dataset is small, local models even like Qwen 14B works fine for these kind of tasks for me , but yes it would depend upon how complex dataset really is.
0
0
0
@spteotia
Sparsh Tewatia
3 months
@benthamite_ Very nice tasks, added to the list will be doing them just for fun.
1
0
1
@spteotia
Sparsh Tewatia
3 months
@neurallambda Yep, they are the best in the market right now.
0
0
1
@spteotia
Sparsh Tewatia
3 months
@tensor_kelechi @nrehiew_ You will have to use SPMD , FSDP properly , which is much easier in JAX, but performance is similar I guess both only use XLA at lower level. Remember speed improvement in TPU compared to GPUs is only because of larger batch sizes due to sharding.
0
0
1
@spteotia
Sparsh Tewatia
3 months
@sun_hanchi @QuanquanGu Not said anywhere and I doubt it does that.
0
0
0
@spteotia
Sparsh Tewatia
3 months
@sun_hanchi @QuanquanGu How can you say that it doesn't need that plan ? I don't think so it has been shown anywhere, am i missing something ?
1
0
0