![Sparsh Tewatia Profile](https://abs.twimg.com/sticky/default_profile_images/default_profile_x96.png)
Sparsh Tewatia
@spteotia
Followers
59
Following
983
Statuses
187
Next token prediction enjoyer Currently doing MS in AI
United Kingdom
Joined December 2010
@Jsevillamol Do the full evaluation on Deepseek R1, and distilled models please for frontier math.
0
0
1
@_lewtun @EpochAIResearch It could tell about how much "UNFAIR"advantage OpenAI had with access to the data as on other benchmarks it is comparable.
0
0
3
@casper_hansen_ It goes down to how much data / tokens you have for training, if 8k is enough for your use-case and you have few rows , by few meaning less than 50K, then it is better to use instruct model , if you have say 100k+ , then base model can also be used.
0
0
0
@ethayarajh @YouJiacheng Lol! Exact same thing came to mind, it is so easier to be fooled just seeing the loss curves.
0
0
0
@VictorTaelin What makes you think any of them don't have enough budget for their respective projects lol?
0
0
0
@DaveShapi @AlwaysUhhJustin @ardabrowski Yes, you didn't said about consciousness, that's the point even if you can get the nature of algorithms, math and computation right , than also this problem remains about consciousness which would need to be tackled for self aware AI.
0
0
1
@ardabrowski @DaveShapi @AlwaysUhhJustin Read about hard problem of consciousness, it is not classical computation that's for sure, we don't know much about brain at all
1
0
0
RT @AlbertQJiang: Before NeurIPS, I write down some thoughts about AI4Math and why I am doing LLMs and informal reasoning now. https://t.c…
0
20
0
@omni_georgio A bit more prompt optimization is needed maybe try DSPy if dataset is small, local models even like Qwen 14B works fine for these kind of tasks for me , but yes it would depend upon how complex dataset really is.
0
0
0
@tensor_kelechi @nrehiew_ You will have to use SPMD , FSDP properly , which is much easier in JAX, but performance is similar I guess both only use XLA at lower level. Remember speed improvement in TPU compared to GPUs is only because of larger batch sizes due to sharding.
0
0
1
@sun_hanchi @QuanquanGu How can you say that it doesn't need that plan ? I don't think so it has been shown anywhere, am i missing something ?
1
0
0