Sparsh Tewatia @spteotia profile

Sparsh Tewatia

@spteotia

Followers

59

Following

983

Statuses

187

Next token prediction enjoyer Currently doing MS in AI

United Kingdom

Joined December 2010

Don't wanna be here? Send us removal request.

Sparsh Tewatia

@spteotia

14 days

@JFPuget @kaggle or tpus

0

Sparsh Tewatia

@spteotia

21 days

@Jsevillamol Do the full evaluation on Deepseek R1, and distilled models please for frontier math.

0

1

Sparsh Tewatia

@spteotia

22 days

@_lewtun @EpochAIResearch It could tell about how much "UNFAIR"advantage OpenAI had with access to the data as on other benchmarks it is comparable.

0

3

Sparsh Tewatia

@spteotia

1 month

@casper_hansen_ It goes down to how much data / tokens you have for training, if 8k is enough for your use-case and you have few rows , by few meaning less than 50K, then it is better to use instruct model , if you have say 100k+ , then base model can also be used.

0

Sparsh Tewatia

@spteotia

1 month

@ethayarajh @YouJiacheng Lol! Exact same thing came to mind, it is so easier to be fooled just seeing the loss curves.

0

Sparsh Tewatia

@spteotia

1 month

You know the joke about Chinese AI developers , they made them a model they couldn't understand.

0

Sparsh Tewatia

@spteotia

2 months

@Locchiu You see L5 in our lifetime ?

1

0

Sparsh Tewatia

@spteotia

2 months

@VictorTaelin What makes you think any of them don't have enough budget for their respective projects lol?

0

Sparsh Tewatia

@spteotia

2 months

@DaveShapi @AlwaysUhhJustin @ardabrowski Yes, you didn't said about consciousness, that's the point even if you can get the nature of algorithms, math and computation right , than also this problem remains about consciousness which would need to be tackled for self aware AI.

0

1

Sparsh Tewatia

@spteotia

2 months

@ardabrowski @DaveShapi @AlwaysUhhJustin Read about hard problem of consciousness, it is not classical computation that's for sure, we don't know much about brain at all

1

0

Sparsh Tewatia

@spteotia

2 months

RT @AlbertQJiang: Before NeurIPS, I write down some thoughts about AI4Math and why I am doing LLMs and informal reasoning now. https://t.c…

0

20

0

Sparsh Tewatia

@spteotia

3 months

@omni_georgio A bit more prompt optimization is needed maybe try DSPy if dataset is small, local models even like Qwen 14B works fine for these kind of tasks for me , but yes it would depend upon how complex dataset really is.

0

Sparsh Tewatia

@spteotia

3 months

@benthamite_ Very nice tasks, added to the list will be doing them just for fun.

1

0

1

Sparsh Tewatia

@spteotia

3 months

@neurallambda Yep, they are the best in the market right now.

0

1

Sparsh Tewatia

@spteotia

3 months

@tensor_kelechi @nrehiew_ You will have to use SPMD , FSDP properly , which is much easier in JAX, but performance is similar I guess both only use XLA at lower level. Remember speed improvement in TPU compared to GPUs is only because of larger batch sizes due to sharding.

0

1

Sparsh Tewatia

@spteotia

3 months

@sun_hanchi @QuanquanGu Not said anywhere and I doubt it does that.

0

Sparsh Tewatia

@spteotia

3 months

@sun_hanchi @QuanquanGu How can you say that it doesn't need that plan ? I don't think so it has been shown anywhere, am i missing something ?

1

0