Shashank Rajput Profile
Shashank Rajput

@shashank_r12

Followers
684
Following
566
Media
7
Statuses
169
Explore trending content on Musk Viewer
@shashank_r12
Shashank Rajput
4 months
To all my friends: Over the past 3 months, whenever I said "Sorry, can't make it tonight, gotta work" or "Sorry, I'm busy this weekend", but then couldn't really say what exactly we were working on, THIS was the monster we were building @DbrxMosaicAI . #DBRX
@jefrankle
Jonathan Frankle
4 months
Meet DBRX, a new sota open llm from @databricks . It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.
Tweet media one
33
266
1K
13
12
168
@shashank_r12
Shashank Rajput
1 year
Thank you @DimitrisPapail for being such an amazing advisor!!! Your help and guidance have been invaluable throughout my phd! I am also extremely lucky to have had the opportunity to collaborate with some really brilliant researchers from various universities and organizations!
Tweet media one
@DimitrisPapail
Dimitris Papailiopoulos
1 year
PapaiLLM lab represent. Congrats to @shashank_r12 for defending a 600 page tour de force of a thesis!!
Tweet media one
1
4
59
2
3
65
@shashank_r12
Shashank Rajput
1 year
Our paper was recently featured in the @tldrnewsletter : !!! Feeling lucky to have collaborated on this project with @madiator , @_nikhilmehta , @YiTayML , @vqctran , and other amazing people at Google Brain!
@madiator
Mahesh Sathiamoorthy #ICML2024
1 year
Happy to share our recent work "Recommender Systems with Generative Retrieval"! Joint work with @shashank_r12 , @_nikhilmehta , @YiTayML , @vqctran and other awesome colleagues at Google Brain, Research, and YouTube. Preprint: #GenerativeAI 🧵 (1/n)
Tweet media one
13
73
480
1
4
30
@shashank_r12
Shashank Rajput
2 months
Thank you @sopharicks for inviting me for the talk! It was great talking to you and the other members of BuzzRobot!
@sopharicks
Sophia
2 months
It was great to have @shashank_r12 share with the BuzzRobot community details about DBRX, the large language model created by @databricks . Shashank walked us through the architecture of the model, hyperparameter choices, the software and hardware issues the team experienced
Tweet media one
0
2
14
2
3
24
@shashank_r12
Shashank Rajput
4 months
So true!!! 🥲
@code_star
Cody Blakeney
4 months
Me at work for the past 2 weeks
Tweet media one
2
13
381
1
0
24
@shashank_r12
Shashank Rajput
2 months
A big advantage of two column paper format is that people can comfortably read your paper on their phones. Kind of embarrassing that I only realized this today, after years of reading papers on my phone 😅
Tweet media one
2
1
19
@shashank_r12
Shashank Rajput
8 months
@OfirPress @BlancheMinerva @xlr8harder In our (preliminary) experiments we also see ALiBi and RoPE have matching training curves (in fact ALiBi converges a bit faster initially). Performance is also similar for eval on seq lens less than the max seq len seen during training. After that, ALiBi extrapolates better.
Tweet media one
Tweet media two
4
1
16
@shashank_r12
Shashank Rajput
4 months
@DbrxMosaicAI Feeling ecstatic that all the hard work by the team payed off! It was the greatest experience working with all the talented and hardworking folks @DbrxMosaicAI ! Looking forward to building even bigger and better LLMs!
0
0
11
@shashank_r12
Shashank Rajput
2 months
@HongyiWang10 is one of the best researchers that I've worked with. I was really lucky to have him as a senior phd student in the lab when I started my phd. He is one of the few people I know who has comprehensive expertise in both ML and Systems. Congratulations Hongyi!!!
@HongyiWang10
Hongyi Wang
2 months
[1/n] I'm thrilled to share that I will join the Rutgers CS Department @RutgersCS as a tenure-track Assistant Professor in the summer of 2025! I'm excited about and looking forward to this new chapter of my career journey!
Tweet media one
42
8
263
1
1
9
@shashank_r12
Shashank Rajput
3 months
Even GPT2-chatbot isn't immune to the Sharma-ji-ka-beta syndrome
Tweet media one
1
0
6
@shashank_r12
Shashank Rajput
8 months
@DimitrisPapail @OfirPress @BlancheMinerva @xlr8harder Yes, but we observed that it is very sensitive to learning rates.
0
0
4
@shashank_r12
Shashank Rajput
1 year
@srchvrs @madiator @tingchenai @_nikhilmehta @YiTayML @vqctran Yes, the amazon dataset is a particularly difficult dataset in terms of recall. Other product recommendation datasets like REES46 and YOOCHOOSE are 'easier' where a recall of ~0.5 can be reached.
0
0
5
@shashank_r12
Shashank Rajput
4 months
1
0
5
@shashank_r12
Shashank Rajput
3 months
@Gradient_AI_ @AIatMeta @huggingface @CrusoeEnergy Wow! Amazing work! It seems that you used 2.8 Billion as the RoPE theta (), which is much, much bigger than any RoPE thetas seen in other models. How did you come up with that value?
1
0
5
@shashank_r12
Shashank Rajput
4 months
@mvpatel2000 @DbrxMosaicAI Says you, who doesn't really understand what a vacation is!
Tweet media one
0
0
5
@shashank_r12
Shashank Rajput
2 years
@madiator Thank you for hosting me @madiator ! It was amazing working with you and rest of the Google Brain team!
0
0
4
@shashank_r12
Shashank Rajput
4 months
@maxisawesome538 @DbrxMosaicAI no, you the captain 😂
Tweet media one
1
0
4
@shashank_r12
Shashank Rajput
4 years
Do single author papers qualify to have a Discussion section?
1
0
3
@shashank_r12
Shashank Rajput
2 years
@JeffDean @DimitrisPapail Thank you @JeffDean ! And thank you @GoogleAI for the opportunity! Looking forward to a productive research collaboration with Google!
0
0
3
@shashank_r12
Shashank Rajput
4 years
Can we bring the mullet back into fashion? Cutting hair on back of the head is difficult! #Quarantine
0
0
3
@shashank_r12
Shashank Rajput
4 years
If WeChat is banned then how will Chinese people working in the US communicate with their families back home? I've been told of alternatives like QQ and Skype, but what is the guarantee that the US or even China (in retaliation) wouldn't ban these in the future!
0
1
2
@shashank_r12
Shashank Rajput
4 months
0
0
2
@shashank_r12
Shashank Rajput
1 year
@zeroXmusashi @madiator @_nikhilmehta @YiTayML @vqctran Yes, in order to use this to build a recommender system for a dataset, you only need two things: user session data, and some semantic data about each item. For tiktok, the latter could be stuff like video title, tags, caption or creator's name.
1
0
1
@shashank_r12
Shashank Rajput
5 months
@maxisawesome538 Business plan for today:
Tweet media one
0
0
1
@shashank_r12
Shashank Rajput
4 months
0
0
1
@shashank_r12
Shashank Rajput
3 years
@unsorsodicorda @aminkarbasi @DimitrisPapail @ten10_93 Yes, this paper by Park et al. - - provides results for memorization using ReLU networks, and uses a separation assumption similar to ours.
1
0
1
@shashank_r12
Shashank Rajput
2 years
@DimitrisPapail Thank you @DimitrisPapail , it has been great working you! Looking forward to a lot more of productive and fun research ahead!
0
0
1
@shashank_r12
Shashank Rajput
4 years
@moayush
Ministry of Ayush
4 years
It is clarified that the @moayush has not removed any doctor or medical officer from duty or service at any time in the recent past.
Tweet media one
197
287
823
1
0
1
@shashank_r12
Shashank Rajput
3 years
Congratulations Dr. Hongyi Wang!!!! @HongyiWang10
@DimitrisPapail
Dimitris Papailiopoulos
3 years
My first PhD student defended today, and it filled my heart with bittersweet joy. Congratulations Dr. Hongyi Wang @HongyiWang10 ! It has been an incredible honor to serve as your advisor. I can't wait to see the great things you will accomplish.
Tweet media one
10
1
352
1
0
1
@shashank_r12
Shashank Rajput
4 years
@moayush
Ministry of Ayush
4 years
It is clarified that the @moayush has not removed any doctor or medical officer from duty or service at any time in the recent past.
Tweet media one
197
287
823
0
0
1
@shashank_r12
Shashank Rajput
1 year
@KartikSreeni Thank you Kartik!
0
0
1
@shashank_r12
Shashank Rajput
1 month
@mvpatel2000 Keeps selecting the same person for random search because nobody bothered randomizing the global seed.
0
0
1
@shashank_r12
Shashank Rajput
2 months
@sopharicks I would love to! :)
0
0
1
@shashank_r12
Shashank Rajput
3 years
@unsorsodicorda @aminkarbasi @DimitrisPapail @ten10_93 Great question! We have a margin assumption on the points, and in fact, the "exponential improvement" is in the dependence on the margin. Indeed, modern DNN can interpolate training datasets, and this was one of the motivations for our work.
1
0
1