Rachit Bansal Profile Banner
Rachit Bansal Profile
Rachit Bansal

@rach_it_

Followers
1,657
Following
1,634
Media
18
Statuses
246

Upcoming Ph.D student @Harvard • Pre-doc @GoogleDeepMind • Anything `science', ~cosmos, and Oxford commas

Cambridge, MA
Joined March 2019
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@rach_it_
Rachit Bansal
3 months
I am pleased to share that I'll be joining @Harvard as a PhD student this Fall. Looking forward to work with @elmelis , @wattenberg , @viegasf , et al. at SEAS! I'll be supported by a @KempnerInst fellowship, and am keen to further our understanding & usability of large ML models!
40
16
874
@rach_it_
Rachit Bansal
8 months
Extending an LLM for new knowledge sources is tedious—fine-tuning is expensive/causes forgetting, LoRA is restrictive. Excited to share our work where we show that an LLM can be efficiently *composed* with specialized (L)LMs to enable new tasks! 🧵(1/8)
Tweet media one
21
139
671
@rach_it_
Rachit Bansal
2 years
You have an exciting use-case, you train a neural network, but would your model work for the many kind of (OOD) inputs it would see? In our #NeurIPS paper, we find answers studying the relationship between information organization & memorization! w/ @danish037 & @boknilev (1/7)
Tweet media one
2
17
146
@rach_it_
Rachit Bansal
4 months
Looking forward to presenting this work at #ICLR2024 next week in Vienna! 🇦🇹 Please stop by our poster on 8th (10:45am) if you are interested in efficient, modular, decentralized development of large models!
@rach_it_
Rachit Bansal
8 months
Extending an LLM for new knowledge sources is tedious—fine-tuning is expensive/causes forgetting, LoRA is restrictive. Excited to share our work where we show that an LLM can be efficiently *composed* with specialized (L)LMs to enable new tasks! 🧵(1/8)
Tweet media one
21
139
671
3
17
101
@rach_it_
Rachit Bansal
3 years
This is enraging. The outrageous application fee at these schools is a serious factor towards non-inclusivity. It is not a joke. Here, I am listing a set of analogies depicting the magnitude of this problem (especially as an international student)👇 (0/n)
@obladioblada987
Pari (train fan)
3 years
Surely Northwestern could word the last sentence better/ not say anything at all instead of saying this?
Tweet media one
51
259
4K
2
12
61
@rach_it_
Rachit Bansal
3 years
#NLPaperAlert : Our work "How Low is Too Low? A Computational Perspective on Extremely Low-Resource Languages" with @cdli_news was accepted at ACL SRW 2021 ( @acl_srw ). Elated. 📖 Read here: ⭐ Star here: Thread 🔽 \1
Tweet media one
6
6
43
@rach_it_
Rachit Bansal
3 years
Personal update: I would be spending the next several months at Technion, working on exciting problems with @boknilev and @technionnlp . Grateful and looking forward to being a part of this beautiful, vibrant community.
Tweet media one
1
0
32
@rach_it_
Rachit Bansal
3 months
@Harvard @elmelis @wattenberg @viegasf @KempnerInst I am greatly indebted to an incredible set of mentors, collaborators, and idols: @partha_p_t , @jainprateek_ , @boknilev , @nsaphra , @kchonyc , @danish037 . I am grateful to my friends ( @akankshat1701 , @BadolaKartikeya , @tiwarishabh16 , @_toolazyto_ ) for all their love over the years.
3
0
28
@rach_it_
Rachit Bansal
1 year
Super excited to present this work at ICLR in Kigali w/ my super co-authors @jeevesh_juneja and @nsaphra ! (So happy that the three of us finally met for the first time in person today). 🌟 Please do stop by at our poster on Wednesday, 3rd May, if you are around.
@jeevesh_juneja
Jeevesh Juneja
1 year
We have been told that every training run goes to the same basin. ( @jefrankle , 2019) That permutations will make everything connected. ( @rahiment , 2021; @SamuelAinsworth , 2022) But is it really the case? Our work ( @iclr ) reveals, NO:
1
9
52
1
3
26
@rach_it_
Rachit Bansal
2 years
I had an incredible time working on this with @nsaphra . We took a deep dive into loss surface connectivity of seemingly similar models ID yet drastically different OOD, and were intrigued by how much there is to learn. Special shout-out to @JunejaJeevesh for steering it upfront.
Tweet media one
@nsaphra
Naomi Saphra
2 years
- Mama, how does pretraining lead to high accuracy? - Well, dear, transfer selects a good loss basin that contains all finetuning runs. - But mama—why does OOD accuracy vary so much between models? 🧵 w @JunejaJeevesh @deaddarkmatter @kchonyc @JoaoSedoc
7
46
283
2
3
26
@rach_it_
Rachit Bansal
8 months
We propose CALM—Composition to Augment Language Models: (i) Scales up LLMs on new tasks by *re-using* existing (L)LMs w/ very few new parameters & data, (ii) Keeps existing model weights intact, hence preserves original capabilities, (iii) Applies to diverse domains and settings.
Tweet media one
1
3
23
@rach_it_
Rachit Bansal
8 months
Consider a toy example: You have some key-value pairs {x1: 10, x2: 7,..., xn: 2} to reason upon. You have an LLM that excels at reasoning but has no knowledge of the KV pairs. Composing a model trained on the pairs with the LLM enables reasoning over the pairs (x1+x8*xn = 38)!
Tweet media one
1
2
15
@rach_it_
Rachit Bansal
4 years
It's so refreshing to hear about the lives and thoughts of your role models. Love this series. Highly recommended!
@deviparikh
Devi Parikh
4 years
Episode 5 is out! Y-Lan Boureau on Humans of AI: Stories, Not Stats. Video: Podcast: All episodes so far:
Tweet media one
0
7
79
0
1
14
@rach_it_
Rachit Bansal
8 months
Coding: We compose an LM trained on the entire set of open-source GitHub code w/ an LLM where code is under-represented in its training data. We see significant gains across all tasks: Code explanation (CodeXGLUE), completion (HumanEval), and generation (MBPP). Again, unlike FT.
Tweet media one
1
1
14
@rach_it_
Rachit Bansal
3 years
The *yearly* fee of *attending* my almost-premium high school in New Delhi was roughly equal to the amount of money it would cost me to *apply* for the 10 schools on my list. (1/n)
1
0
14
@rach_it_
Rachit Bansal
8 months
We explore CALM for real-world tasks: Multilinguality: We reuse an LM trained on a bunch of low-resource languages (LRLs) w/ an LLM that has never seen some of these LRLs. We see promising results for MT & reasoning across all L- and H-RLs (unlike FT that leads to forgetting)!
Tweet media one
1
3
13
@rach_it_
Rachit Bansal
8 months
w/ amazing collaborators and mentors: Bidisha, @siddalmia05 , @nitish_gup , @shikharv15 , @tweet4sri , Abhishek, @jainprateek_ , @partha_p_t ! This work was done during my pre-doctoral tenure at Google Research, Bangalore. Special thanks to @ManishGuptaMG1 , @divy93t for enabling this!
0
2
13
@rach_it_
Rachit Bansal
8 months
We are especially excited about the potential of this work in enabling collaborative development of large models in a modular manner. People across the community could develop small specialized models for their languages/domains/etc., and see them work w/ capabilities of an LLM.
1
0
10
@rach_it_
Rachit Bansal
3 years
The monthly hostel cost (rent + all meals) at my undergraduate university is roughly equal to the cost of applying to 1 school. An average undergrad can be accommodated for the entire school-year if I drop my plan of applying this year. (3/n)
1
0
10
@rach_it_
Rachit Bansal
3 years
Grocery shopping at my household happens once-in-two-weeks and easily sustains our family of 4 for the same time. I can substitute the cost of applying for 1 of my schools with this grocery shopping ~5 times, i.e., 10-weeks worth of "basic living" for a middle-class family. (2/n)
1
0
10
@rach_it_
Rachit Bansal
3 years
On a related note: does anyone know of any creative working solutions to avoid foggy spectacles when wearing a mask out in the cold? 😶‍🌫️
1
0
9
@rach_it_
Rachit Bansal
2 years
Much of this work is a result of @danish037 's infectious motivation to do better science. I owe a lot of my own growth over the past years to him. If you wish to work on exciting problems while furthering India’s research landscape, his upcoming lab at IISc is the place to be.
@danish037
Danish Pruthi
2 years
I am beyond thrilled to share that I'll be starting as an assistant professor at the Indian Institute of Science (IISc), Bangalore in April 2023. I couldn’t have been luckier—I'm grateful for the support of many kind mentors, peers, students, friends and family members. (1/4)
84
23
1K
0
1
9
@rach_it_
Rachit Bansal
3 years
Anyway, I hope these schools at least have a valid reason for this absurd amount of money they charge. I hope that for each fee that I pay, there is another student who gets a waiver. For anyone who qualifies, please do apply for fee waivers:
@KaiserWhoLearns
Kaiser Sun
3 years
Are you a student having hardship paying application fees for PhD programs? Feeling difficult to find fee waiver information, I created a list of application fee waivers from different schools!👇
12
203
525
0
1
9
@rach_it_
Rachit Bansal
8 months
Rather than a shallow combination, CALM introduces a small set of cross-attention parameters over models’ layer representations. CALM finds an effective combination of models, *enabling* new tasks that neither of the models could do, while preserving their original capabilities!
1
0
8
@rach_it_
Rachit Bansal
1 year
@fooobar Thank you for all your hearty love and care, Gaurav. Your fatherly support (as well as your childlike goofiness) would be sorely missed in the office. You’re inspirational and many like me are admirably cheerleading you, whilst tracing your footsteps. Until our paths cross again!
0
0
7
@rach_it_
Rachit Bansal
3 years
(4/n)
@shaily99
Shaily
3 years
@QueerinAI Minimum living wage of India is ~5k INR (~65 USD) --as per internet. This is way more than 2-3 times of it. This is 2-3 times the monthly pay of a (mostly white collar) job with a decent salary.
0
4
37
1
0
6
@rach_it_
Rachit Bansal
2 years
📜 Paper: 💻 Code: 🌄 Et al.: If you are attending NeurIPS, catch the work live as it is presented during Poster Session 5. Although I can't attend in person, I look forward to e-meeting many of you! (7/7)
1
1
6
@rach_it_
Rachit Bansal
3 years
A very special thanks to @orf_bnw for his meticulous suggestions, patient guidance, and unwavering support during the pre-submission mentoring phase. This wouldn't have been possible without him. Thank you @acl_srw for making it possible.
@acl_srw
ACL SRW 2024
3 years
A list of the 45 accepted papers is now online: We are grateful to the mentors and program committee members who have dedicated a lot of time to reviewing the student's papers – thank you!
0
6
15
0
0
5
@rach_it_
Rachit Bansal
4 years
Excited to announce that I will be volunteering at @icmlconf from July 12-18. Can't wait to be a part of the experience, contribute in all the ways I can and attend my first ever conference live! #icml2020 #virtual
5
1
5
@rach_it_
Rachit Bansal
2 years
Following prior art, we study two facets of memorization impede generalization in current-day networks---that of individual training examples (example-level) and that of spuriously correlated artifacts (heuristic memorization). We wish to readily identify these behaviours.
Tweet media one
1
0
5
@rach_it_
Rachit Bansal
3 years
@jbhuang0604 This is invaluable, thank you so much! During the early stages of the project, how do you suggest to circle in on a problem statement? I find it really hard to balance the various trade-offs like taking ownership, (while) respecting opinions, (while) keeping everyone on board.
1
0
5
@rach_it_
Rachit Bansal
2 years
This work was done during my visit at @boknilev ’s lab at the Technion, and I cannot be more grateful. The 5 months I spent in Israel were among the best of my life, intellectually, socially, and beyond. If you are considering coming here, I have only positive things to say.
@rach_it_
Rachit Bansal
3 years
Personal update: I would be spending the next several months at Technion, working on exciting problems with @boknilev and @technionnlp . Grateful and looking forward to being a part of this beautiful, vibrant community.
Tweet media one
1
0
32
1
1
5
@rach_it_
Rachit Bansal
8 months
@omarsar0 Thank you for sharing our work @omarsar0 :) A full detailed thread coming soon!
0
0
4
@rach_it_
Rachit Bansal
2 years
@savvyRL @ml_collective 🔽 How this one started :)
Tweet media one
1
0
4
@rach_it_
Rachit Bansal
2 years
Experiments across several synthetic and natural setups that our hypothesis indeed holds true: For cases when a network exhibit heuristic memorization, both intra and inter neuron diversity falls, while the reverse is observed for example-level memorization. (5/7)
Tweet media one
Tweet media two
1
0
4
@rach_it_
Rachit Bansal
4 years
Had such a wonderful time attending my first ever conference in the form of #acl2020nlp this week. So many wonderful papers, sessions and people. I would require much more than a Twitter thread to be specific. P.S. Is it weird that I am almost lachrymose as it comes to an end?
0
0
3
@rach_it_
Rachit Bansal
3 years
@Nitin_wysiwyg @alexisjross I believe they'll be uploaded here:
0
0
3
@rach_it_
Rachit Bansal
2 years
We quantify inter-neuron and intra-neuron diversity using mutual information and entropy, respectively, where each neuron is considered as a random variable over its activations across *in-distribution examples*. We hypothesise that these measures distinguish model behaviour.
Tweet media one
1
0
3
@rach_it_
Rachit Bansal
3 years
Finally, we introduce InterpretLR, an interpretability toolkit evaluated and configured for low-resource NLP. We apply it upon all of our proposed techniques for machine translation and sequence labeling to additionally compare and evaluate them w.r.t. human annotations. \4
Tweet media one
1
0
3
@rach_it_
Rachit Bansal
4 years
I have had the opportunity of having some enriching conversations with @danish037 in the recent past. I encourage fellow research aspirants and students to clear the haze through this wonderful initiative by him.
@danish037
Danish Pruthi
4 years
I am planning to reserve some time every week this semester to help undergraduate/masters students by answering any questions they might have around pursuing research. (1/3)
8
39
330
0
0
3
@rach_it_
Rachit Bansal
3 years
In this work, we gauge the current-day #NLProc methods upon 'extremely' low-resource languages by making the first attempt at adapting them for Sumerian cuneiform- one of the world’s first written languages. Our study includes NMT, POS tagging, and NER for Sumerian. \2
Tweet media one
1
0
3
@rach_it_
Rachit Bansal
8 months
0
0
3
@rach_it_
Rachit Bansal
2 years
A simple visualisation of neural activation patterns for networks that generalize v/s those that memorize show clear qualitative differences. This nudged us to look closer if learning behaviours are captured in activation patterns. We analyse this through neuron diversity. (3/7)
Tweet media one
1
0
3
@rach_it_
Rachit Bansal
2 years
A natural application of our findings is the problem of model selection: given a set of models, rank them on their generalizability. We test the utility of our measures therein and see assuring results. We further envision applications in regularization and OOD detection. (6/7)
Tweet media one
1
0
3
@rach_it_
Rachit Bansal
3 years
@abhesrivas One that I find very common is appending Hindi pural morphemes such as "-ऐं" or "-आं" to everyday English words to reflect their plural form. "Bottleऐं" instead of "Bottles" or "बोतलें".
1
0
3
@rach_it_
Rachit Bansal
4 years
@akanksha_atrey Having started with research during the lockdown itself, I've never got to experience this environment. The work is surely great, but it indeed seems that I've missed a huge part of what the experience entails.
0
0
2
@rach_it_
Rachit Bansal
3 years
@boknilev I have found TextAttack () to be a great (and easy to use) resource for creating adversarial examples. The same authors also have some thorough work on analysis and evaluation in this domain.
0
1
2
@rach_it_
Rachit Bansal
3 years
@jbhuang0604 Thank you so much for this thread. Is this particular advice really true, though? Would my application even reach the professor I'm applying for if I do not have a good enough GPA/GRE score? ^ Also because I see current PhD students at many labs being ~4-pointer undergrads.
1
0
2
@rach_it_
Rachit Bansal
3 years
We introduce the problem of Target-side Incoherence and the severe limitations it imposes on semi-supervised and unsupervised MT of low-resource languages. Experiments with human evaluations suggest data augmentation and forward translation help cope-up with these constraints. \3
1
0
2
@rach_it_
Rachit Bansal
8 months
@RisingSayak @siddalmia05 @nitish_gup @shikharv15 @tweet4sri @jainprateek_ @partha_p_t @sourab_m @younesbelkada Good question! We do an estimation of this in the appendix. TLDR; Parametric (hence, memory and latency) overhead is minimal with respect to the original LLM: ~1.5-2% of that model (what we call the `anchor’).
0
1
2
@rach_it_
Rachit Bansal
3 years
@shaily99 @SemanticScholar It happened with me as well. Fortunately, their team is very responsive. I just dropped them a message (the 'Contact' option at the bottom of the page) stating the issue and it was resolved within a day.
2
0
2
@rach_it_
Rachit Bansal
8 months
@generatorman_ai @siddalmia05 @nitish_gup @shikharv15 @tweet4sri @jainprateek_ @partha_p_t Moreover, since we evaluate on tasks that none of the individual models can do well, I am not very hopeful about simple merging methods. Still worth trying.
1
0
1
@rach_it_
Rachit Bansal
4 years
Join us and work on groundbreaking research problems. I can personally speak for the wonderful mentors and consequent research impact here. Do apply! #researchers #NLProc #internship #lcs2iiitd
@lcs2lab
LCS2 Lab
4 years
We invite applications for paid winter internship in the areas of #NLP , #Socialcomputing . Last date is Nov 30, 2020. Limited seats are available. Do check the criteria and apply asap at #winterinternship #iiitd @IIITDelhi @Tanmoy_Chak @shadakhtar2309
0
8
24
0
1
1
@rach_it_
Rachit Bansal
4 years
@srush_nlp @huggingface @Thom_Wolf This was so much needed! Thank you @huggingface ! PS. I hope you keep updating it :)
1
0
2
@rach_it_
Rachit Bansal
3 years
@alexisjross This would be super useful! I would like to suggest some questions, would be wonderful if you could allow DMs :)
1
0
1
@rach_it_
Rachit Bansal
6 months
@BrihiJ @xiangrenNLP @swabhz @nlp_usc Congratulations Brihi! So exciting ✨
0
0
1
@rach_it_
Rachit Bansal
5 years
@MaxLenormand @bhutanisanyam1 That condition applies to the normal Colab we currently use as well but I'm not sure whether such things would occur more often in Pro or not.
1
0
1
@rach_it_
Rachit Bansal
2 years
@jivatneet @emrek @amt_shrma Congratulations on the well deserved spotlight Jivat! 🎉 Really interesting and insightful work
0
0
1
@rach_it_
Rachit Bansal
3 months
@sumanthd17 @Harvard @elmelis @wattenberg @viegasf @KempnerInst While your prediction didn’t hold true, the note was very sweet nonetheless 🙈 Thanks for your wishes!
0
0
1
@rach_it_
Rachit Bansal
3 months
@BrihiJ @Harvard @elmelis @wattenberg @viegasf @KempnerInst Thanks Brihi :) Looking forward to hosting you in Boston some time!
0
0
1
@rach_it_
Rachit Bansal
8 months
0
0
1
@rach_it_
Rachit Bansal
8 months
@fouriergalois @PengmingWang @siddalmia05 @nitish_gup @shikharv15 @tweet4sri @jainprateek_ @partha_p_t Thanks for pointing to that @fouriergalois —that is correct. We have been doing experiments with larger models @PengmingWang . We have seen similar trends: Large models, even when composed w/ much smaller models, improve for the target domains. Hope to add some of those soon :)
1
0
1
@rach_it_
Rachit Bansal
3 years
@manupillai308 @cdli_news @acl_srw Thank you so much, Manu 💟
0
0
1
@rach_it_
Rachit Bansal
3 months
@shaily99 @Harvard @elmelis @wattenberg @viegasf @KempnerInst Thank you so much Shaily! Your kindness and support has been invaluable :)
0
0
1
@rach_it_
Rachit Bansal
8 months
@sarahookr We have been working on something with a very similar motivation! In our recent work (linked), we find how we can adapt an existing large-LM for new languages (/domains) by cheaply composing it with other specialized (L)LMs. We saw encouraging results for MT and other tasks.
@rach_it_
Rachit Bansal
8 months
We explore CALM for real-world tasks: Multilinguality: We reuse an LM trained on a bunch of low-resource languages (LRLs) w/ an LLM that has never seen some of these LRLs. We see promising results for MT & reasoning across all L- and H-RLs (unlike FT that leads to forgetting)!
Tweet media one
1
3
13
1
0
1
@rach_it_
Rachit Bansal
8 months
@generatorman_ai @siddalmia05 @nitish_gup @shikharv15 @tweet4sri @jainprateek_ @partha_p_t Good point! We have been thinking about this as well and hope to add some comparisons soon. Given the difference in model size (b/w the models we compose), merging (afaik) is not very straightforward. There are approximation methods that we hope to try.
1
1
1
@rach_it_
Rachit Bansal
3 years
@Shreyagupta08 @eaclmeeting Thanks a lot for this wonderful summary, @Shreyagupta08 ! Very helpful for those who aren't able to attend the conference.
0
0
1
@rach_it_
Rachit Bansal
3 years
@rightaditya @danish037 @emnlpmeeting Moreover, they even attributed that 48-hour extension to other reasons including the then existing situation in India. I believe we should keep these gestures in mind before making any accusations.
@emnlp2020
emnlp2020
4 years
Many concerns have been voiced about whether the 48h extension that we and @NeurIPSConf offered due to the civil unrest in US is another manifestation of American priviledge. We would like to clarify that our decision was also based on requests from India due to covid lockdown./1
6
15
92
1
0
1
@rach_it_
Rachit Bansal
4 years
@dkaushik96 Got excited for a moment thinking that @acmi_lab is hiring interns.
0
0
1
@rach_it_
Rachit Bansal
2 years
@shaily99 No one visited their poster :( Yes, they were attending virtually
0
0
1
@rach_it_
Rachit Bansal
3 years
@WtmIndia @akankshat1701 I actually thought @WtmIndia was talking about you from that first line. Having seen your journey first-hand, I don't think I can embrace what makes me proud of you in a tweet, but most prominently-- I am proud that you proved yourself (``I can't do it'') wrong. 🌊
1
0
1
@rach_it_
Rachit Bansal
3 years
@rightaditya @danish037 @emnlpmeeting Not that the specifics matter (it's more about the gesture, in my opinion). It was extended for 2 days *due to* the BLM movement; the prior extension was because of worldwide COVID-19 crisis and lockdown.
@emnlp2020
emnlp2020
4 years
In view of the #COVID19 situation, @EMNLP2020 will be run on-line (Punta Cana in 2021). Furthermore, we changed the dates to accommodate the authors in these stressful times: * main conference: 16-18 November (Mon-Wed) * submission: June 1st * anonymity period start: May 1st
1
182
369
1
0
1
@rach_it_
Rachit Bansal
3 years
0
0
1
@rach_it_
Rachit Bansal
3 years
@abhesrivas From my experience so far, getting in as a volunteer is not a competetive process. Last year, I (an undergrad) applied to be a volunteer at 5 different venues, and got it each time. I think they are generous that way.
1
0
1
@rach_it_
Rachit Bansal
4 years
@Shreyagupta08 Congrats Señora❣️
0
0
1
@rach_it_
Rachit Bansal
27 days
1
0
1