1. I worked my ass off during my undergraduate at a tier 3 college. This mostly involved memorization. A lot of time wasted.
2. I spent a year of my Masters being depressed. A lot of time wasted.
3. I spent most of my PhD being depressed and borderline suicidal. A lot of time
Still coping with how I wasted my first 2 years at college doing nothing 🥲
My prime years, going to waste
Rebounding now ofc, but it would be nice to have had that headstart 🫠
anyone relate? lol
Today I learned that the softmax tempering during training trick I discovered in 2020 was used by deepmind for their alphacode system and it boosts performance by 1% or so. My paper has exactly 1 citation and it's from deepmind. Unbelievable!
Some of you were interested in my failures. Here are the most notable ones, followed by lessons:
1. 7th grade: Failed to get the cutoff marks for a statewide scholarship exam.
2. 10th grade: Failed to get top rank in my school despite studying for 12 hours a day.
3. 12th grade:
Reminds me of how I got into IIT Bombay as a RA.
Story time:
I had a AIR 110 in GATE and I was mostly confident that I would get into IITB via direct admit but after round 1 the cut off was 100 IIRC. I entered into panic mode and started applying to other IITs and IISC. This is
As someone who has dabbled with this a lot some lessons:
1. If you can fit your model and optimizer on 1 gpu then use ddp. Use grad accumulation to increase batch size as needed.
2. If 1 doesn't work then try using an 8 bit optimizer via bitsandbytes. (Praise be
@Tim_Dettmers
).
Something I learned about FSDP today is,
If you can fit your model in one GPU, or even in one Node, then don't use FSDP. The communication overhead makes it slower and your MFUs would decrease, compared to DDP or PipelineParallelism.
🔗
🚨 📢 Preprint Alert
After more than a year of hard work, we are pleased to introduce IndicTrans2, the first machine translation system supporting all 22 scheduled Indic languages.
📎:
💻:
▶️:
Thread👇[1/n]
If you're not going to listen to me then atleast listen to the master. If you're a student or an academic dreaming of making LLMs for Indian Languages, stop wasting your time. You're not going to make it. Instead focus on more fundamental and focused problems like:
1. How to get
If you are a student or academic researcher and want to make progress towards human-level AI:
>>>DO NOT WORK ON LLMs<<<
LLMs are an off ramp.
Thousands of engineers are working on LLMs with enormous computing resources.
The only way you could possibly contribute is by analyzing
Right now I have about 50 emails from non-CS major people wanting to pursue an internship in AI/ML claiming "robust foundation in AI/ML"! Literally everyone wants in!
Extremely excited and humbled to announce that my appointment as an Adjunct Faculty at IIT Madras has been approved!
With this, I hope to deeply involve myself with Indian academia and work with brilliant minds, starting at AI4Bharat!
New Website:
Pardon the baggy eyes. Mans was watching Kdrama till 2 am and woke up at 7 am.
I would like to share this award with my collaborators in
@ai4bharat
.
Moral of the story: Create large scale, open source datasets and models, and the Japanese government will give you an award. ;)
Similar experience.
Year 1: 1 paper
Year 2: 0 papers
Year 3, first half: 0 papers
Year 3, second half: 5 papers (3 conf+2 journals)
Don't underestimate the power of accumulation and time.
Idk who needs to hear this, but entering grad school, I had 0 papers. Awarded THE Google PhD fellowship with 0 published papers. First year as a fellow, co-authored a CHI paper. Over a year later: 4 papers accepted in one month. Things take time. Just keep trying and applying. 🫶🏽
In other news: I just got promoted for the second time in 3 months. As usual I can't take credit for this and owe it all to my generous and outstanding bosses and collaborators.
What kind of nutrition are the younger generation on? I have emails from 2nd year undergraduate kids with strong passion for ML and publications.
In my 2nd year of a tier 3 university, I was still struggling with programming. Will my kids be born preloaded with ML foundations?
Call me a sucker or an idealist but IIT Bombay's motto is just perfect: Knowledge is the supreme goal.
It sounds even better in Sanskrit: ज्ञानं परमं ध्येयम् (nyanam paramam dhyeyam).
If you send a long long email asking for an internship, the odds are that I will likely ignore it. It was most likely generated or edited by chatgpt and I ain't gonna read all that yapping.
Get to the point!
For example:
Hello! My name is Raj Dabre from St Francis Institute of
TW: Depression/Suicide
Everyone knows that people who pursue a Ph.D. often have to deal with mental health problems, often depression and sometimes suicide. Not everyone talks about this out loud, but I think it's best we do and have an open communication. At least I will, and I
Get a Masters and PhD at world class institutions. Work for the Japanese government. Publish several papers. Get appointed as an adjunct faculty at an IIT. Have the ability to be a mentor to lots of students. Still be someone's bandar (monkey).
Winning!
Ok so since its becoming popular to shit on IITians I'd like to share some perspectives as a person who has seen both sides of the story.
Background: Took JEE. Major fail. Did UG in a tier 3 college. Took GATE. Got into an IIT.
1. JEE prep was one of the hardest things I ever
Hard cope.
Not one "ML/AI researcher" in India is anywhere close to Andrew Ng, Ian Goodfellow, Andrej Karpathy, Ilya Sutskever, etc (most of the folks mentioned are less than 40 years of age btw)
You think the JEE grind is the only way to learn math? Absolute joke.
HuggingFace has directly or indirectly created a generation of data engineers who can build classifiers, AI engines and what not, but cannot explain the Transformer model.
Well done!
The AI rat race has become so toxic that there are labs in the US/Europe that literally have intern farms in the 100 of people. They assign 5 interns per masters or phd student to do grunt work and then spam publications. I understand that people are desperate but it feels wrong!
Unfortunately, this is very prevalent.
I have examples of people who:
1. Faked paper acceptances by claiming that they were published in A* venues just to get into advisor positions and rack up middle author publications.
2. Worked on a project where juniors/interns did the
It's disgusting seeing a PhD making others do all the work due to his skill issues. To make it worse, this guy did not acknowledge their contributions in the writing (authorship) or did not pay them. This is clearly an academic integrity violation, and he should be punished.
For people writing papers for ACL 2024 from scratch for next week's ddl, I would like to share some tips:
1. Always write the methodology section first followed by experimental and results section. Introduction will only make sense when you know exactly what your results say.
5 out of 6
#ACL2024
papers accepted. 3 main and 2 findings. More details soon.
Congratulations to all my co-authors!
Thank you
@aclmeeting
for releasing decisions on time!
It's my wife's birthday and this is my present to her. I hope that I can further cement my connection to Indian academia with my appointments to IITB and IITM. :)
I would not trust half the info in this. Its clear that the writers did not do their homework, nor know what LLMs are.
1. IndicBERT and IndicBART are not LLMs.
2. There is no proof of 2T tokens for Indic languages.
3. 120+ languages is very unrealistic if not an outright lie.
Meta: We are committed to open-source so that everyone can use our LLMs.
Also, meta: Releases a model so large that only 1% of people or less can use it. :/
Viva La Revolucion!
FWIW, I've interviewed over 50 people and only 2 people managed to give satisfactory answers. Some people suck at basic coding but can build pipelines for ML engines.
I'll say it again: You shall not pass.
Took my first interview and copied
@prajdabre1
's question of discussing how a transformer works in depth,
not only did the candidate fail to answer the question they even couldn't tell me what recursion is, and btw they are an AI Scientist at some startup lol.
TW: Depression/Suicide.
This is the second half of my post about my challenges and how I overcame them. Read the quoted tweet for the first half.
So there I was, a nervous, self-deprecating wreck of a kid in a premier institution in the middle of my Master’s degree in 2013. My
TW: Depression/Suicide
Everyone knows that people who pursue a Ph.D. often have to deal with mental health problems, often depression and sometimes suicide. Not everyone talks about this out loud, but I think it's best we do and have an open communication. At least I will, and I
Translation: I have received an outstanding performance award from NICT (my employer; indirectly the Japanese government) for my work on IndicTrans2 which helped improve translation between Japanese and 22 Indian languages via improving translation between English and Indic
It's not humanly possible to co-author more than 5 to 10 papers in just half a year unless you are a end career prof or the leader of a major organization who didn't just pop out of nowhere. If someone, at my career stage and age, co-author 40 papers in 6 months then that means
Yesterday we had our first paper reading session in the
@ai4bharat
discord covering LoRA, DoRA and REFT (
@aryaman2020
). This is to be the first of many.
Recordings:
Link to discord:
Join us!
As a part of the AI4BHARAT discord weekly event, I will be doing a hands on tutorial of BART/IndicBart pretraining and fine-tuning. If you're interested then join us from 6 PM IST.
P(Y) is the equation for Decoder only models.
P(Y|X) is the equation for Encoder-Decoder models.
But what is the equation for Encoder only models, specifically BERT?
Gemma model by Google has a multilingual tokenizer with decent fertility for a number of languages. Now you know why the tokenizer has 256k subwords. Make of it what you will!
Last month (year?) I had the honor of having dinner with Prof Pushpak Bhattacharya, of CFILT in IIT Bombay. Prof Pushpak is one of the biggest names in NLP, and in India there is almost no one who doesn't know him. He gave me two nuggets of wisdom as follows:
(1/N)
Back in my 1st year of UG, I wrote down some nasty, nasty, nasty things on the class desk about a teacher who had mildly inconvenienced me. (I was stupid back then). I just happened to sit down on my friend's desk the day I did it. Teachers do a random year-end inspection of the
Mostly saying it for myself right now, but if it helps someone else then:
Rejection and failure is a better teacher than acceptance and victory. How you respond to it, how you adapt to it, defines your character. One door closes, another one opens.
I gave a talk in Microsoft Research India's SNLP Reading Group on "Addressing the Data and Modeling Challenges in NLG for Indian Languages"
Please feel free to go through:
I hope to make the video recording available soon!
Thanks,
@VarunGumma23
and
Advice to reviewers: If the paper has typos or some missing information then that's supposed to be handled under the comments and suggestions section. Listing it as a weakness to dock points is absolutely asinine. If the paper is badly written it can be counted as a weakness if
I am extremely pleased to announce that IndicTrans2 will be published in TMLR (
@TmlrOrg
). This is a tremendous achievement for my coauthors and me that took nearly 1.5 years of hard work. The camera ready version will be out soon but for now we are over the moon!
#NLProc
#ACL
🚨 📢 Preprint Alert
After more than a year of hard work, we are pleased to introduce IndicTrans2, the first machine translation system supporting all 22 scheduled Indic languages.
📎:
💻:
▶️:
Thread👇[1/n]
Today is my last day in India for a while. 7 months ago, my bosses sent me on a business trip to India to strengthen ties between Japan (NICT) and India (IIT Bombay and Madras). The idea of living in India for 7 months after mostly being in Japan for 9 years was daunting.
1/N
If you see "GenAI leadership" in anyone's profile then it's safe to assume that they are 99% a fraud. No real leader needs to announce that they are one.
You know who is a real GenAI leader? Our lord and savior
@karpathy
. He is a teacher and a doer. The true master.
People be taking llama2, expanding vocabulary, pretraining on 2B tokens of a language and calling it a product. Bruh I have trained like 50 such models but I can tell you that outside of being useful to answer some research questions they are utter garbaggio.
Strongly recommend reading this paper to understand that there is still a lot of optimization we can do. 8-bit training will soon be integrated into YANMTT. Training billion parameter models should be available to all.
Every PhD student has had atleast one moment where they got a result that they thought was a game changer. 10 mins of euphoria later: Oh crap I trained on the test set.
This is why we are skeptical of most things.
In my UG university, I know of profs who did some really scummy things like:
1. Stealing my thesis advisor's idea to finish her own PhD. My thesis advisor reported this to the director but the reply was: please resolve this among yourselves.
2. Copy pasting content from
The meta reviewer is clearly messing with us. This paper got a 4 4 3 for soundness as well as excitement. The suggested improvement is totally inconsequential since our methods only involve prompting and a comparison against fine-tuning makes no sense.
@emnlpmeeting
Proud to announce that 100% of my ICLR papers have been accepted. I submitted 0 and got 0. That's a win right?
Jokes apart, congratulations to the people whose papers got in. To those whose didn't: papers don't define your worth. You will make it next time.
I wanna take an gap year just to do research…
Don’t wanna study…
Imma never gonna get a PhD offer if I don’t do more research 😭😭😭
And also learn more fundamental Deep Learning
IndicTrans2 has seen 152000 downloads in the past month itself. How is this possible? It was because
@VarunGumma23
, with the help of
@jaygala24
and
@pranjalchitale
, went through a mad grind sesh to port the models to
@huggingface
. Just wait for some newer models we are cooking.
Update: I did 1 shot MT eval on FLORES dev set.
Hindi-English: 37.2 BLEU
Gujarati-English: 18.3 BLEU
English-Hindi: 25.5 BLEU
English-Gujarati: 6.6 BLEU
This model is already decently multilingual. This easily beats LLAMA2!
RIP!
Press F to pay respects!
Gemma model by Google has a multilingual tokenizer with decent fertility for a number of languages. Now you know why the tokenizer has 256k subwords. Make of it what you will!
HuggingFace has directly or indirectly created a generation of data engineers who can build classifiers, AI engines and what not, but cannot explain the Transformer model.
Well done!
Yayy! The biggest work on MT for Creoles has been accepted to
@naaclmeeting
! We cover 42 Creoles and our MT models deliver reasonable quality given the extremely low resource nature of most of these Creoles! Come work on Creoles!
Thanks to all my collaborators in JHU!
If you eat mangoes with a spoon or a fork then you need redemption beyond what Jesus' sacrifice enabled. Eat like a starving degenerate and your soul will be saved.
I am extremely honoured and pleased to share our survey titled:
Natural Language Processing for Dialects: A Survey.
Ours is the first of its kind, comprehensive survey of NLP for dialects.
@aadi_joshi
@diptesh
@haffari
BharatGPT, Krutrim/Ola etc have been yapping about GPT models for India for almost half a year now but we have seen only PR and no real models. How about less PR and actual tangible outputs?
In contrast, we have seen actual models put out by
@SarvamAI
and
@ai4bharat
.
🚨🚨 New preprint 🚨🚨
Presenting: An Empirical Comparison of Vocabulary Expansion
and Initialization Approaches for Language Models
Paper:
Code:
@anoopk
@ratishsp
@nandi_mundra
The recordings for the last 5 paper reading sessions on the
@ai4bharat
discord are now up on YouTube:
Session
#4
on Whisper Style Training
Session
#5
on SSMs and Mamba
Session
#6
on SSMs and Mamba (Part 2)
People I know: Why don't you apply to Google, you have sufficient publications?
Me (who has been rejected 3 times so far): ummmmm
Unless you are the right person at the right time, no level of fancy résumé will be enough.
@karpathy
goes on vacation to Bhutan, comes back, writes llm.c, breaks the internet, appreciates Ashley, gets her suspended and moves on to the next big thing.
I go on vacation, come back and think: I need a vacation to recover from the vacation.
This is why he is the MASTER!
🚨🚨🚨 New paper on LMs trained with machine translated data!
We all know that billions to trillions of tokens are needed to get the best LLMs. But not all languages have such data.
Q: What to do?
A: Just create it using machine translation!
Paper:
NeurIPS having a high school student track is going to do wonders for inclusion. Imagine a utopia where only the most privileged kids get a head start in this field. Underprivileged kids should just give up and opt for janitorial roles.
(Sarcasm)
All great researchers have top tier coding skills. But not all people with top tier coding skills can be great researchers. Mid-tier coding skills means you will be mid at best unless you accept a purely managerial role. Even then you will be limited by the impact you have,
Another (shameless self promotional) pro: If you do a phd in IIT Bombay in CFILT or IIT Madras in AI4BHARAT then there is a chance that we will cross paths and I always take good care of students I work with.
This Resume has an ATS score of more than 92 🤯
This Resume helped many in getting an interview calls from companies like Google, Microsoft, Amazon, and many more. 💼
I have personally used this single-column resume in my job hunting and got amazing results
I am sharing the
Pardon the baggy eyes. Mans was watching Kdrama till 2 am and woke up at 7 am.
I would like to share this award with my collaborators in
@ai4bharat
.
Moral of the story: Create large scale, open source datasets and models, and the Japanese government will give you an award. ;)
The purpose of my account went from academia, memes and light trolling to gradually including exposing scammers and grifters in academia. I only have my spooky stalking skills and a tiny amount of pettiness to blame for this. Note that stalking skills can be used for good, for
A student graduating their degree told me (paraphrasing): Although you were not our thesis advisor on pen and paper, you were the go-to guy for all our small and big problems so we could freely do what we want knowing that there's a safety net.
He may have been your father, boy,
📽️ New 4 hour (lol) video lecture on YouTube:
"Let’s reproduce GPT-2 (124M)"
The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model:
- first we build the GPT-2 network
- then we optimize
Back in 2022, I made KreoleMorisienMT, a translation dataset and model for Mauritian Creole translation. I now present CreoleM2M, a translation system for 26 creoles. This is a tiny teaser from our much larger work on creole NLP. For now, here is a demo:
In 24 hours, some of us will be laughing, some of us will be crying, but all of us will be suffering. Hang in there. Just remember that we publish papers because we have to, and we do research because we want to. Sometimes we have to do things so that we can do what we want to!
Translation: I have received an outstanding performance award from NICT (my employer; indirectly the Japanese government) for my work on IndicTrans2 which helped improve translation between Japanese and 22 Indian languages via improving translation between English and Indic
🚀IndicLLMSuite Launch Announcement!🚀
We're thrilled to unveil IndicLLMSuite: A collection of data resources and tools for developing Indic LLMs.
📜 Paper:
🌐 Blog (the way forward):
💻 Resources:
(1/n)
This paper summarizes exactly what I've been yelling at people who use LLMs to evaluate LLMs, especially LLMs that evaluate themselves. LLMs will likely favor LLMish generations and will certainly favor themselves. What were you even thinking?
One major cultural gap between Japanese and Indians is how we state our intentions and (not) plan.
Following is how things would be a decade ago:
If I said to my Japanese friend that I'm thinking of meeting up with them, they would take it very seriously and actually mentally
Just checked out another GenAI leader's profile, with 40K+ followers on LinkedIn, saying that they are a visiting lecturer at Oxford and MIT.
Reality: a guest lecture or 2.
LMAO WHAT?
How are we letting people get away with this?
LinkedIn bio is not to be trusted at all.