After 15 incredible years at
@MSFTResearch
and a short but exciting stint at MS Turing, I have decided to move on to pursue my academic aspirations.
Thrilled to announce that I will be joining
@mbzuai
next week as a professor of
#NLProc
.
Had the pleasure of listening to
@ylecun
at
@mbzuai
today. Loved it when he said Academia shouldn't not work on incremental hacks like RAG & CoT for making
#LLMs
look smart, when they actualy are not! Instead think on the next paradigm of truly intelligent machines."
W
@ericxing
My first class as a full-time academic. At
@mbzuai
with the awesome students from around the globe, on methods of
#NLG
. Besides
#LLMs
we also discussed a bit of Firth, and a pinch of Wittgenstein and Chomsky.
#NLProc
I will be looking for research assistants, interns, postgrad and PhD students, and postdocs. Follow me here and/or on LinkedIn for announcements on specific opportunities to work with us at MBZUAI.
Or catch me at
@emnlpmeeting
next week in Singapore to find out more.
It was so much fun to work on these two projects with all of you at Microsoft Turing. Thank you all, my awesome collaborators and (now ex-) colleagues
#EMNLP2023
In office again after 20 mnth and 20 days. The white board still has the notes of our
#ACL2020
work on State and Fate of Linguistic Diversity, and on the table last 3 pieces of confectionary that I bought from Hong Kong during
#EMNLP2019
- the last physical conf I attended.
Can all universities please follow the 23:59 UTC-12 (that is the end of the day, anywhere on earth) as their reco deadlines?
Just now I missed
@MilaNLProc
LOR deadlines for two excellent students because it was EoD EST (UTC-5).
Not sure loss is whose though.
Four
@IndiaMSR
NLP group alumni meet over Pre-
#NAACL2022
dinner! Missed all these fun since Emnlp 2019.
@kalikabali
, Sunayana - you were missed as well.
This week in
#EMNLP2023
, we will present the following papers from Microsoft Turing (10 Dec, 0830 - 1000, Findings and Industry track poster).
I will give a keynote at
@WiNLPWorkshop
That will cover some aspects of "Ethical Reasoning Over Moral Alignment" work.
1/3
I: Who is Neil Armstrong?
My 9yo son: I don't know.
I: Come on! I know you know this.
Son: nope...
I: N. A. is the first man to walk on...
Son: The moon!
I: See, u knew it.
Son: No! But after "first man to walk on" it has to be "the moon".
#Language
models are powerful
#NLProc
It was great hosting
@danish037
at
@mbzuai
and listening to his talk on evaluation of explanations of
#LLMs
. Apparently, current explanation generation techniques neither help humans nor models to learn or understand much from those.
On the
#EMNLP2021
deadline extension debate - what's more worrisome to me is not the apathy of the organizers, but the desperation of the community to submit papers within a deadline, when in fact, real deads are lined up! 1/
#NLProc
#AcademicTwitter
@Richa071193
I did my PhD in India, that too ~20 years ago, and it couldn't have been more "with" than that anywhere in the world. So I guess it depends on who the advisor is, and what they expect from the student as much as what the student expects from them.
Incredibly proud of you
@kalikabali
, and so glad to have had you as a colleague, a close collaborator and a great friend for the last 16 years! A huge congratulations and very well deserved.
Excited to be at
#NAACL2022
- my first in-person
#NLProc
conference since the pandemic.
If you want to connect or know more about the exciting stuff happening at
#MicrosoftTuring
, and the job opportunities, catch me at the
#Microsoft
booth.
How linguistically fair are multilingual pre-trained language models?
Well not much if you select the one that has highest average accuracy across all languages.
Excited to present our work at
#AAAI2021
#aaai21
#NLProc
@RealAAAI
@MSFTResearch
This is proud grand-advisor moment!! 😎
Congratulations
@RishirajSahaRoy
, my first PhD-student (with
@gangulyniloy
) for successfully advising
@azinmatin
, his first PhD student and my first grand-PhD-student.
😃
This is a wonderful opportunity for fresh graduates who wants to have a feel for research and decide whether they want to go for PhD and/or take up research as a career.
There are several positions in
#ML
,
#NLProc
,
#AI
.
Microsoft Research India's Research Fellow Program gives students first hand experience of industrial research, and prepares students for careers in research, engineering, as well as entrepreneurship.
Application deadline: Jan 14, 2022
I will be in ACL and looking forward to meeting you all, especially prospective students and post docs. If you want to meet me please dm here or email me.
Proud moment:
#MBZUAI
has 41 papers accepted at ACL 2024! 📚🧠
Our cutting-edge
#NLP
and
#LLM
research covers:
- Diverse datasets
- Translation challenges
- Health and finance applications
- Emergent AI abilities
- Complex linguistic structures
Follow us for key insights from
Our
#EACL2023
paper: Fairness in Language Models Beyond English: Gaps and Challenges,
is now on arxiv.
w Krithika Ramesh & Sunayana Sitaram
We survey the fairness in multilingual and non-English contexts, highlighting the shortcomings current research.
My 2.5 yo blurted out "oof it's byatha (= 'pain' in Bangla)" to mean "it's hurting".
Goes on to show that
#CodeMixing
is an innate ability.
We would have never used this construct, and her syntax is fairly limited to SV, SO and simple NPs.
#bilingual
#languageaquisition
4.5 yo friend of my 3 yo daughter: Uncle, why didn't you tell your password to Rahi?
Me: Umm.. well I don't want her to...
4.5 yo: But I know my father's password, which is 9***** (tells the actual password)
Me: yes, that's why I didn't!
😅
After 4 days of Shonan meeting on "human—centered machine translation", we captured the essence through a poster.
We came up with this stochastic parrot trying to balance the human aspects of machine translation made out as a pyramid of dice.
#stochasticparrot
#MT
I wonder what would be the equivalent of this for GenAI. There's so little to understand and so much to memorize. Almost no principles yet no dearth of facts.
A wonderful and unique idea by
@Indiaacm
and
@codscomad
to thank the speakers by planting a tree in their name through the
I wish all conferences did the same - that would help offset the carbon emission from the travels of the attendees to a little extent
Found this book in my Uber car today. I asked the driver if a passenger had forgotten. "No Sir, it is my book. For you to read and enjoy your ride" comes the reply.
Being a mythology fan, I was excited like a kid. But couldn't finish the story as it was a short ride.
#readandride
Today I finished my first fully online course, and also my first course where I was the only instructor - Intro to
#NLProc
at
@PlakshaUniv
. Contrary to my expectations and fears around online teaching, it turned out to be quite a rewarding experience. 1/n
My PhD supervisor was Prof Sudeshna Sarkar, my first research-cum-professional mentor while post doc was Ranjita Bhagwan, and my longest research collaborator and a great friend, philosopher and guide has been
@kalikabali
. So I am three times lucky in this respect :)
I will be at LREC-Coling2024 next week. Please drop by our posters and/or happy to chat with prospective PhD students, postdocs and more.
Topics of interest - AI ethics and safety, GenAI for social science, art and music.
Will be presenting three works
-
@sleepinyourhat
There are ~6000 languages on the planet. We must qualify the question by asking "NLP for which language" is solved? For my mother tongue Bangla I don't see even a good spell-checker or POS tagger. Some fancy numbers on a few datasets in a few languages means nothing to me.
It's a great experience to learn about
#Hausa
,
#Yoruba
and other languages of
#Nigeria
, and about the linguistic pluralism, hegemonies notwithstanding.
Kudos to
@Shmuhammadd
for driving it.
The dataset should be useful in spurring more research in these languages.
#LREC2022
We are building a large Turing India team in Bangalore, that will closely collaborate with MSR India on some of the hardest problems in this space. Check out for career opportunities at Turing India!
#languagetechnology
#turingNLG
#AI
#TuringIndia
n/n
Whoa! This is a short paper we recently wrote on what we call the
#jailbreak
paradox with
@AetherSuRa
and
@somakaditya
It's a work in progress so we didn't publicize it, but glad to see that
@elder_plinius
' s tweet on this work has grabbed a lot of attention.
Excited to be physically present at EMNLP to present our work on evaluating the stability of quantization and distillation for low-resource MT at WMT's Session 6: Research Papers on Practical Aspect of Machine Translation (12:00 - 12:20 Gulf Time Zone).
1/2
Excited to be part of the Deep Learning Indaba
@DeepIndaba
at Accra, Ghana.
I will be talking at a panel on Efficient multilingual NLP for African languages in the age of LLMs.
Thanks to the organizingers
@MasakhaneNLP
and
@iam_OchiengM
for inviting me.
⭐️Exciting news! We're gearing up for Deep Learning Indaba 2023!
@DeepIndaba
... 🚀 Mark your calendars for our "Efficiency in Africa NLP" Workshop, scheduled on Friday, September 8, 2023, in Accra, Ghana. This workshop is an ideal fit for learners at any level.
#realizationoftheday
Working on a paper deadline is far easier from India (IST) than the USA (PST in particular).
We don't have to stay awake till 4 am trying to proofread or write in a half groggy state!
What is "normal" to ChatGPT? Well, the first 5 images clearly shows that "normal" is western, financially well to do (middle class?) culturally homogenous identities.
No surprises here though, as that's where the data says (or does it really?)
Me: Can you draw a very normal image?
ChatGPT: Here is a very normal image depicting a tranquil suburban street scene during the daytime.
Me: Not bad, but can you go more normal than that?
(cont.)
Look what I found while sorting my old boxes!
#ACL2007
, just 16 years ago, and printed conf proceedings seem such an ancient thing now.
I've loads of old conf proceedings and journal issues. No idea what to do with them. Any creative or useful ideas?
#ACL2023
#AcademicChatter
Today, in the last class of
#NLProc
course for
@PlakshaTLP
, I talked about evaluation of NLP systems. Students had 20 min to come up with a few basic capabilities in the CheckList framework to evaluate an
#MTsystems
. It didn't take them more than 10 mins to get to this 🙄 1/2
Can LLMs soundly resolve moral dilemmas when a sufficient ethical policy is provided? And what happens when no policy is provided?
We explore this in our
#EMNLP2023
paper
Kudos to all the first authors -
@AetherSuRa
@Aditi184
@utkar
and Kumar Tanmay (all Microsoft Turing RFs)
New paper!🎉
Our work, "Ethical Reasoning over Moral Alignment: A Case and Framework for In-Context Ethical Policies in LLMs," has been accepted to the Findings of EMNLP 2023!
@monojitchou
@AetherSuRa
@kr_tanmay147
@0203_utkarsh
Paper Link:
Yes, please stop calling
#Hindi
a low resource language. It is in class 4 according to , second to only 7 languages. Neither
#Bengali
,
#Telugu
and
#Tamil
are low resource languages. We, the speakers and researchers, should be happy about it.
#NLProc
During the pandemic many turned to us asking if we can help them build a chatbot for their language - Bengali, Manipuri, Swahili and so on. I felt incredibly sorry that we actually cannot build a chatbot for these languages quickly, despite all the hype around
#NLProc
1/2
Despite much noise and work on
#alignment
, it is amazing how brittle still
#LLMs
are. It doesn't take much effort to bring out their ugly biases if one probes beyond popular topics of gender, race and religion.
Our new study explores
#caste
biases in job interview settings.
LLMs are known to generate harmful views, but what are the various and potentially covert forms of harm and identity threats in LLM-generated conversations?
We explore this with
@hayounggjung
,
@anjali_singh35
,
@monojitchou
, and
@tanmit
. Preprint:
Thank you
@MasakhaneNLP
for the opportunity. I thoroughly enjoyed the discussion.
It is so inspiring to see so many researchers and engineers come together voluntarily for a common cause - to put African languages on the
#NLProc
map. What an amazing level of zeal and positivity!
💥🔈🌟Thank you
@monojitchou
for such a fantastic and detailed talk tonight and thank you
@MasakhaneNLP
for a full house, great questions and an inspiring session 🧡💛💚🖤 let’s keep it going!❤️🔥
#NLProc
#AfricanNLP
The GenAI revolution also makes me feel that we're moving too fast without enough thinking. These powerful models and their apps need a deeper and nuanced understanding. There's also a huge opportunity to study cognition, culture and human behavior using GenAI and vice versa.
It's amazing how around the world there r so many ppl, who still havn't figured out when to press the up/down button of an elevator. Many even prefer to press both.
Is there a name for such collective ignorance of a simple convention, that can be learnt or inferred so easily?
#Research
(as other influencers of dev) is typically controlled by
#GlobalNorth
Having my entire edu & res career in
#GlobalSouth
& focusing on probs and
#languages
of Global South, I know the amount of effort it takes to get even the problem acknowledged by the res community 1/2
We've been on a multi-year effort to take steps towards understanding how well NLP/language tech serves people on a *global* scale. Here's a first report:
We perform meta-analysis of performance across 7 tasks, and devise "global utility" metrics. 1/7
Whether we condition a prompt with culturally motivated personas or irrelevant cues, LLMs change their responses whether or not the questions are culture-dependent. Quite like the placebo effect! We need better controls for establishing model bias through promoting. 1/2
📢📢Socio-demographic Prompts are used for Cultural Alignment and studying biases. But is the “bias” elicited by these prompts systematic? Are they similar/different from when LLMs are prompted with "Your favorite programming language is C++"?🧐 The answer will surprise you
If you want to apply for the Research Fellowship program Microsoft Turing, please use the same link and process. The eligibility criteria remains same as well 👇
Research Fellow applications for Fall 2023 at MSR India are now open. Candidates should have completed BS/BE/BTech or MS/ME/MTech in Computer Science or related areas, graduating by summer 2023.
My 20 hr
#NLProc
course at
@PlakshaTLP
starts soon. I always wonder where on the
#DunningKruger
curve I should target to leave the students by the end of it.
Slope of enlightenment requires 1k+ hrs. But I also don't want to leave them at peak of Mt Stupid or valley of despair.
Me
@2001
—
#AI
and
#NLProc
are the most exciting, esoteric and philosophically deep subareas of CS. Let me do my PhD on AI.
Me
@2011
- AI is under spotlight now. I made a great choice 10 years ago.
Me
@2021
- everybody's working on AI. Let me find a more interesting topic .🙄
1. Bangla
2. Nagamese Creole
3. Hindi
4. English
5. Sanskrit
6. C
7. QBASIC
8. Assembly
9. HTML
10. Java
11. VHDL & Verilog
12. C++
13. Assamese/Axomiya
14. Kannada
15. Python
I have forgotten/am forgetting most of these because of disuse. 😢
in what order did you learn your languages?
1. python
2. R
3. haskell
4. C
5. rust
6. scheme
7. {java,type}script
8. clojure
9. scala
10. java
11. idris
12. go
Our awesome
#UGRIP
interns presenting their 4 weeks of effort to make LLMs ace linguistics olympiad. Conclusion - even the best LLMs don't perform better than pre-transformer approaches. Self-critiquing doesn't help either, but human expert critique helps. Paper coming soon.
I can't thank enough my amazingly smart, warm, and kind colleagues and friends at Microsoft. Despite my strong intentions to join academia since my PhD and all through these years, it's too difficult to leave a place so intellectually stimulating and empathetic at the same time.
The Idu Mishmi people of Arunachal Pradesh consider tigers as their elder brother!
Learnt this amazing fact from our intern Pamir Gogoi who has been doing her fieldwork with the community on Idu mishmi language technology. 1/n
#idumishmi
#arunachalpradesh
#language
#tiger
I am quite excited about the
#ACL2021
theme. Great set of resources compiled by SIGEL on the topic. Here is another resource that might help you locate where a language stands with respect to digital resources for NLP -
Interested in the
#acl2022nlp
Theme Track on “Language Diversity: from Low-Resource to Endangered Languages”? Check out this set of resources that SIGEL has collected:
#NLProc
#linguistics
What better place than academia to explore all these fascinating questions! So, I felt this was the perfect time to move to academia. And
@mbzuai
, studded with outstanding academics, provided the perfect opportunity and environment!
The first conference where I get a lunchbox specially marked for me 😍 Thank you
@acmcompass
for meticulously taking care of all the food constraints, for all the meals, even when it required to have something special for just one individual.
#trueinclusion
Loved all the talks, their diversity and depth, interactions with the students, and meeting many old friends after a long time.
#IndoML2022
was was a great experience. Kudos to the organizers and the hosts.
Here are some glimpses of the beautiful
#iitgandhinagar
campus.
#IndoML2022
has concluded. We witnessed some amazing talks by outstanding speakers from all around the world. Prof. Animesh Mukherjee concluded with a vote of thanks to all the speakers, participants, organizers, and volunteers.
#IndoML
#IITGN
#IIT
@cse_iitgn
@iitgn
@IITKgp
This is an accessible article on our work on Ethical reasoning abilities of LLMs across languages. To be presented tomorrow
@eaclmeeting
10am session on multilinguality.
Work with
@Aditi184
, Kumar Tanmay and Utkarsh Agarwal.
The emergence of large-language models (
#LLMs
) brings a new perspective to ethical decision-making. Monojit Choudhury, a natural language processing professor at
#MBZUAI
, is leading a study to explore LLMs' moral reasoning capabilities. This groundbreaking research will debut at
In the past 14 yrs of my research group, I have made a few policies for myself, which may be helpful to others:
A 🧵
1) Any group member is free to criticize my input with reason. This has been one of the most liberating experience. Importantly, it has helped me learn.
1/n
I got 6 papers to review frm
@emnlpmeeting
& loved reading 'em all! Not because all were of great quality (some were), but because they fall right into my areas of interest. w/o bidding :o
Is it only me, or
#EMNLP2021
's new review assgnmnt policy?
Reviewing can be fun too 😀
@prajdabre1
Well when the metric (publication and citation counts) becomes the target it ceases to be a good metric. The problem is not only with "specific" researchers but also with the system.
It's exciting to receive a book straight from the authors, doubly so when one of them are is your ex-intern, and quadruply so when you are using it for teaching a course right then.
Thanks
@adyantalamadhya
@mbodhisattwa
, Anuj Gupta, Harshit Surana.
#NLProc
@PracticalNLProc
Really loved the choice of formats for
#ACL2022
invited sessions. Kudos to the program chairs Smaranda,
@preslav_nakov
and
@AlineVillav
for a refreshing, creative and thoughtful set of speakers and topics. Excited and looking forward (only virtually though :/)
#NLProc
It was fun to interact with high school students on cutting-edge
#NLProc
stuff. I was so impressed by the deep and thoughtful questions! Thank you
@dpsrkpnet
Exun Clan for the invitation.
Why is Language Understanding the Hardest Piece of the AI Puzzle?
My 5th grader son had to write article on Japan. He took relevant wiki articles, put those through
#Quillbot
and stitched the summaries. Done in 5 min!
His mom was pretty impressed and so will his teacher be, am sure.
1/2
#AI
#nlproc
#education
This is an area I started working on after joining
#microsoftTuring
earlier this year.
Excited to talk about some of our latest research on Responsible AI at
@indoml_sym
later this week.
Thank you organizers for the opportunity.
#nlproc
#ResponsibleAI
IndoML 2022 brings a series of exciting talks on state-of-the-art ML Technologies. Dr. Monojit Choushury
of Microsoft Turing, India will discuss about T for “Terrorist”, “Tropical” or “Territorial”? Teachings Ethics to Large Language Models
#indoML
#mlsymposium
#indoML2022
This paper is an outcome of a long & patient work, which may not be apparent from the content.
@saujasv
and I had started with the dream of building a machine that could win a medal at the International Linguistics Olympiad
#ioling
.
...
Can we learn explicit phonological rules that generalize from only a few examples?
This is the question we explore in our
@sigmorphon
paper (with
@aloxatel
,
@monojitchou
, and Dipti Misra Sharma) on using program synthesis for linguistic rules.
Paper:
The CoWIN eqn:
Probability of 18-45 yo getting protection from Covid through vaccination
= W*O*A*M*V
Where,
W = p(website is up)
O = p(receive OTP in time)
A = p(appointment available)
M = p(manage to get vaccinated before getting infected)
V = p(effectiveness of vaccine)
...1/2
MSR India is a such a wonderful place with wonderful people, that there's nothing more one could ask for as a computer scientist. It was my first job, which I joined straight after PhD; the journey since then has been spectacularly enriching and deeply satisfying.
2/n
What's hot in
#NLProc
? And how well does it serve all our languages? I will be talking about building language technology beyond English; for Indian, African and other languages that the global South speak.
Recently our
#Georgia
tourist visas got rejected on the account of insufficient funds! The only case of rejection after travelling to 30+ countries.
I had applied through
@travelwithatlys
and they say
@GeoEmbassyIndia
rejects 90% of the apps from India. Is it true?
Is it necessary to declare the
#caste
of a student in
#Karnataka
?
#DelhiPublicSchool
has asked us to declare caste for my daughter else they won't process her file.
But then a member of the community does what the system expects them to do.
When did the academic community landed up itself in such a sadistic yet futile race, when 90% of the papers are only reserved for adorning a fixed number of pages in a proceedings w/o any further use? 2/2
A great strategy that will work for many "Low-Resource" language work as well!
For instance, Javanese, Hausa and Bhojpuri are the 22nd, 23rd and 27th most spoken (by native speakers) languages respectively.
A great performance by Team India at
#IOL2022
! Siddhant Attavar bagged a silver medal and Shashwat Mundra received an honourable mention. Congratulations to the team!
The short stint at Microsoft Turing has been a great personal learning experience, but will also allow me to boast for rest of my life of being part (even if small) of the GenAI revolution and making of the new Bing.
But..
Juxtapose this with the fact that workshops are much more linguistically inclusive than ACL main conference, as we have shown in "The State and Fate of Linguistic Diversity and Inclusion in the NLP World - ACL Anthology"
#BitterTruths
of
#NLProc
I submitted zero papers to
#NeurIPS2020
of which zero got accepted and none were rejected. Else I would have either died of excitement or shame. Now I feel blessed ;)
#justsaying
as a reaction to the deluge of accept/reject/consolation/reviewer-2 tweets on NeurIPS.
If you are working on
#codemixing
, either from a
#linguistics
perspective or
#NLProc
system, you might find our
#eacl2021
demo on automatic generation of grammatically valid code mixed sentences interesting and useful. Code will be out soon.
Our paper - "GCM: A Toolkit for Generating Synthetic Code-mixed Text" has been accepted at the 16th Conference of European Chapter of the Association for Computational Linguistics
@eaclmeeting
in the demo paper track.
A very interesting workshop on Social media, society and India by
@joyopal
and team at Univ of Michigan. You can attend online. Excited to be a part of it and look forward to the interactions. I'll be speaking on "We might praise you in English, but gaali to Hindi me hi denge😜"
Social Media & Society in India Conference next week
Open talks/workshops on political media, caste, religion, children, healthcare, misinfo, policy, entertainment.
Featuring lawyers, activists, entertainers, journalists, scholars & practitioners