![Rajhans Samdani Profile](https://pbs.twimg.com/profile_images/864578132052803584/QGuAz_G7_x96.jpg)
Rajhans Samdani
@rajhans_samdani
Followers
1K
Following
3K
Statuses
2K
Principal Eng at Snowflake. Previously: Head of ML @Neeva, Chief scientist at @askspoke, research scientist @google. IIT Bombay. UIUC.
Joined October 2011
Kinda supports the hypothesis that RL works *now* because the pretraining data in 2024 contains enough AI generated crap with COT.
Some comparison of the completions at the start vs the end of training. It basically unlearns all base model behavior It also learns to "hardcode" lets think step by step which is pretty cool
0
0
1
And here it is. Hosted entirely within Snowflake's security perimeter.
AI innovation is all about choice and experimentation. @SnowflakeDB customers can now preview DeepSeek in Cortex AI and explore its benefits, all while leveraging Snowflake’s built-in security and governance. Check it out ! 🚀
0
0
5
🔜❄️
If you want to use DeepSeek and don't want to send data to China, here are two easy options: 1. Use Perplexity Just select "Reasoning with RI" in the dropdown. Honestly, this is the best version of Perplexity I've used yet. Cons: Queries per day are limited 2. Use Ollama to install DeepSeek locally a) Download and install Ollama b) Install DeepSeek in Terminal: `ollama run deepseek-r1:7b` c) Go to http[:]//localhost[:]3000 and talk to the model Check out my full DeepSeek guide for more on: - How AI reasoning models are different - The best way to prompt DeepSeek - 5 use cases that AI models excel at 📌 Read now:
0
1
1
@deliprao @cosminnegruseri Always surprised with the takes that combine “LLMs will automate SWE” and “learn CUDA”. OTOH, good luck automating a simple flag flip or adding data persistence that require design approvals from 4 diff enterprise security teams.
1
0
1
@arimorcos Im equally wary of Alex’s COI here but to be fair, it’s not clear the inputs and outputs (not the traces) were *not* created by humans. No?
0
0
0
@cosminnegruseri Deepseek has been great for a while (even v2 was very good.) All the baselines are higher for that matter (and easily beat my codellama finetune from then). New understanding of instruct tuning has also emerged. I wouldn't do Vicuna style let's throw everything at the model now.
0
0
2
@yoavgo I think it’s the “chain of thought as a product” that is driving all the attention. Kinda like how chatgpt went viral with “sing this in the style of a cowboy” style demos
1
0
2
Kinda cool when someone who you follow on github joins your org! Welcome Stas.
I'm excited to announce I'm joining the @SnowflakeDB AI Research team, that currently includes many of the original DeepSpeed team members (!) and will be working on the open source ArcticTraining and DeepSpeed, to add new features and improve performance and ease of use.
1
0
4
@DanHendrycks yes, I do think this is simplest (may or may not be correct) explanation OpenAI bears the cost of research and ablations in many cases.
@jeremyphoward If you’re looking for Occam’s razor, the *simplest* explanation is a 15 min phone call that leaked the main idea. Not saying that happened but the main idea is super simple, most of the cost is in ablations, and people talk.
0
0
0
RT @rajhans_samdani: @jeremyphoward If you’re looking for Occam’s razor, the *simplest* explanation is a 15 min phone call that leaked the…
0
1
0
@jeremyphoward If you’re looking for Occam’s razor, the *simplest* explanation is a 15 min phone call that leaked the main idea. Not saying that happened but the main idea is super simple, most of the cost is in ablations, and people talk.
1
1
1
This. You can tell the n00bs apart from their new found excitement for deepseek.
Everyone's a buzz about DeepSeek. Always been very inventive. Whether DS Coder models (data recipe) or MoE models (model arch). OTOH, $ stuff is overblown. GPT-4 = 13T tokens, 200B active params. 5x more compute. GPT-4o far less. Real cost is thousands of failed ablations.
0
0
1