orf_bnw Profile Banner
Orhan Firat Profile
Orhan Firat

@orf_bnw

Followers
2K
Following
3K
Media
13
Statuses
299

Research Scientist at Google DeepMind

New York
Joined August 2010
Don't wanna be here? Send us removal request.
@orf_bnw
Orhan Firat
2 years
🎉👏! this made me feel sentimental- was almost gonna dropout of phd after the 2nd time this got rejected! I was so fortunate to have mentors like @kchonyc and Yoshua convincing me otherwise, and ofc collaborators like @caglarml and @imkelvinxu ambitiously pushing this forward 🥹.
@kchonyc
Kyunghyun Cho
2 years
well :) 5 years too late but still happy to receive the best research paper award cc ⁦@orf_bnw⁩ ⁦@caglarml⁩ ⁦@imkelvinxu
Tweet media one
3
11
179
@orf_bnw
Orhan Firat
6 years
Massively Multilingual NMT in the wild: 100+ languages, 1B+ parameters, trained using 25B+ examples. Check out our new paper for an in depth analysis: #GoogleAI.
1
54
149
@orf_bnw
Orhan Firat
5 years
How to build 1000+ layer Transformers with 80+ billion parameters? By using GPipe 🙂.We will be presenting GPipe today @NeurIPS - East Exhibition Hall B+C at poster #40 .Paper > Poster and Slides > (1/4).
3
31
129
@orf_bnw
Orhan Firat
1 year
And in a few hours, I will be discussing Gemini’s multilingual capabilities at MRL @mrl2023_emnlp #EMNLP2023 . I will trace our path from M4, PaLM, PaLM 2, and Gemini through the lens of multilinguality; share some lessons learned and open problems. Exciting!.
@mrl2024_emnlp
MRL
1 year
Are you excited like us for our workshop tomorrow? We hope you are. Check out the updated schedule on our website with location details and full list of papers:
3
8
93
@orf_bnw
Orhan Firat
1 year
♊️Gemini 1.0 is here 🚀- polymath and polyglot LLM! . Proud to be part of this amazing team!.
@JeffDean
Jeff Dean
1 year
I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks,
Tweet media one
Tweet media two
6
3
74
@orf_bnw
Orhan Firat
3 years
Thrilled to be @#ICML2022 in person! ⬇️ Some work we will be presenting around large language models: 1⃣understanding scaling properties under different architecture biases,2⃣ interplay b/w data/noise/architecture and 3⃣ efficient in-context learning w/ sparse models (GLaM-1.2T).
1
3
69
@orf_bnw
Orhan Firat
5 years
Do massively multilingual translation models (M4) generalize to cross-lingual downstream tasks? Check out Poster #218 today #AAAI2020. Presented by @asiddhant1 with the awesome team Melvin Johnson, @naveenariva, Jason Riesa, @ankurbpn.Paper Poster 👇1/2
Tweet media one
2
13
53
@orf_bnw
Orhan Firat
4 years
This week we will be presenting three papers at #ICLR2021 each exploring a different aspect of multi-task/multilingual models at scale: (1) modeling (2) optimization and (3) large scale systems.
1
6
49
@orf_bnw
Orhan Firat
5 years
Summary of our recent work on multilingual NMT. We mainly studied scaling up the models on two axes simultaneously: number of languages and the size of the neural networks. Several artifacts along the way: . .
@GoogleAI
Google AI
5 years
New research demonstrates how a model for multilingual #MachineTranslation of 100+ languages trained with a single massive #NeuralNetwork significantly improves performance on both low- and high-resource language translation. Read all about it at:
1
6
45
@orf_bnw
Orhan Firat
4 years
More on confluencing unsupervised and multilingual MT. Great work with the awesome team: @xgarcia238, @ank_parikh , @adisid01, @Foret_p, @ThiboIbo of @GoogleResearch, #GoogleAI. (1/3).
@ank_parikh
Ankur Parikh
4 years
Check out our multilingual unsupervised translation work! Theory + SOTA results. Led by @xgarcia238 (1/4). 1. Multilingual View of Unsupervised MT - Findings of EMNLP 2020 (). 2. Multilingual Unsupervised MT for Rare Languages ()
Tweet media one
1
7
37
@orf_bnw
Orhan Firat
4 years
First step towards "bit/pixel level", end-to-end neural machine translation. Led by awesome @elmanmansimov and Mitchell Stern @GoogleAI . Let's see where does vision end and language start, or is there even a distinction between the two? Exciting times ahead 🙃.
@elmanmansimov
Elman Mansimov
4 years
During summer 2019, together with Mitchell, @orf_bnw, @MiaXuChen, Jakob & Puneet at Google, we worked on an ambitious way of tackling in-image translation (translate text in the image and generate the same image with translated text) using the end-to-end neural approach. [1/2].
0
3
34
@orf_bnw
Orhan Firat
5 years
More on massively multilingual NMT. This time we analyze the representational similarity across languages, how they evolve across layers and how robust are they. Great analysis and intriguing results are thanks to the great work by @snehaark. More to come, very soon . 🙂.
@snehaark
Sneha Kudugunta
5 years
New EMNLP paper “Investigating Multilingual NMT Representation at Scale” w/ @ankurbpn, @orf_bnw, @caswell_isaac, @naveenariva. We study transfer in massively multilingual NMT @GoogleAI from the perspective of representational similarity. Paper: 1/n
Tweet media one
0
9
32
@orf_bnw
Orhan Firat
4 years
Today we will be hosting a Machine Translation Birds of a Feather Meetup together with @kchonyc at #ACL2021NLP @aclmeeting come say hi 🙂 at Gather Town D&I Session Room, MT Table (bottom left) - 6pm ET.
0
3
29
@orf_bnw
Orhan Firat
5 years
this!.
@_arohan_
rohan anil
5 years
Shampoo is out of the bottle! . Preprint: "Second order optimization made practical". We train certain neural nets faster than before. How fast ? It has shown upto ~40% reduction in training time for a Transformer. (@tomerikoriko)
Tweet media one
0
3
26
@orf_bnw
Orhan Firat
5 years
This time not "massively" 😅 … but, adapting Transformer for conditional computation turned out to be very effective and useful, giving us additional knobs to play with and monitor the allocation behavior at inference time . .
@ankurbpn
Ankur Bapna
5 years
Is it possible to serve increasingly large models for practical applications? Please check out our latest paper on how to control the amount of computation utilized by your model at inference ->.Arxiv: with @naveenariva and @orf_bnw. 1/4.
1
2
23
@orf_bnw
Orhan Firat
4 years
Thanks for the invitation @gneubig, it was a pleasure to attend the lecture and connect with you all 🙂.
@gneubig
Graham Neubig
4 years
We have finished uploading our 23 class videos on Multilingual NLP: Including two really great guest lectures:.NLP for Indigenous Languages (by Pat Littell, CNRC): Universal NMT (by Orhan Firat, Google):
0
2
19
@orf_bnw
Orhan Firat
1 year
Attn: 2.8T tokens covering 419 languages!.
@snehaark
Sneha Kudugunta
1 year
Excited to announce MADLAD-400 - a 2.8T token web-domain dataset that covers 419 languages(!). Arxiv: Github: 1/n
Tweet media one
1
1
19
@orf_bnw
Orhan Firat
5 years
Symbiosis between unsupervised and supervised multilingual MT Join us to chat about the smoothing effect of multilingual MT, self-supervised learning to ingest monolingual data and extending MT models to new languages!. #acl2020nlp.
@ankurbpn
Ankur Bapna
5 years
Please join us to chat about our #acl2020nlp work on using monolingual data to improve massively multilingual NMT, with a focus on low resource and unsupervised languages . QA session at 18:00-19:00 GMT / 14:00-15:00 EDT today. Video:
0
8
17
@orf_bnw
Orhan Firat
1 year
This is the way.
@caglarml
Caglar Gulcehre
1 year
I am excited about our work on efficiently aligning language models with a reward using Reinforced Self Training (ReST). The best side is that our approach is much more efficient and easier to implement than most other alignment algorithms. See below for more on ReST🧵👇 (1/n)
Tweet media one
0
3
15
@orf_bnw
Orhan Firat
4 years
Great contribution to extremely low-resource mt! Awesome work by @deaddarkmatter et al. 👏👏👏 congratulations 🙂.
@rach_it_
Rachit Bansal
4 years
#NLPaperAlert: Our work "How Low is Too Low? A Computational Perspective on Extremely Low-Resource Languages" with @cdli_news was accepted at ACL SRW 2021 (@acl_srw). Elated. 📖 Read here: ⭐ Star here: Thread 🔽 \1
Tweet media one
0
2
15
@orf_bnw
Orhan Firat
4 years
#NLProc if you are working on multilingual NLP models, check out our workshop proposal at *ACL.
@_dataman_
Duygu Ataman
4 years
One of this year's new workshops: Multilingual Representation Learning aims to advance generalization and low-resource NLP by bringing together efforts in understanding and interpreting multilingual models @seb_ruder @alex_conneau @orf_bnw @gozde_gul_sahin @alexandrabirch1.
0
4
13
@orf_bnw
Orhan Firat
5 years
Yuval Noah Harari: the world after coronavirus via @financialtimes.
0
1
13
@orf_bnw
Orhan Firat
1 year
@kchonyc i'll try my best . :-).
0
0
13
@orf_bnw
Orhan Firat
1 year
Great read for the holidays 😍 huge congrats to @snehaark @adityakusupati @Devvrit_Khatri and the team 👏🎉.
@snehaark
Sneha Kudugunta
1 year
Late tweet, but thank you ENSLP #NeurIPS2023 for the best paper award, and @Devvrit_Khatri . for the excellent presentation on behalf of the team @adityakusupati!. Excited to push further on conditional computation for tiny fast flexible models 🚀
Tweet media one
0
0
12
@orf_bnw
Orhan Firat
6 years
There’s also a need to highlight the importance of tools and frameworks that enable large scale research, like Tensorflow Lingvo ( and GPipe, without which this research wouldn’t have been possible.
1
1
11
@orf_bnw
Orhan Firat
3 years
awesome work by @whybansal !. practical take: "in some cases sub-optimalities in the architectures and data quality can be compensated for by adding an extra constant factor of data.".
@whybansal
Yamini Bansal
3 years
How do different interventions to the training setup (e.g. architecture, noise) impact the data scaling laws (or sample efficiency) in NMT? Most do not affect the scaling exponent!. New work from my internship with @_ghorbani @bneyshabur & @orf_bnw! . 1/n
Tweet media one
0
1
10
@orf_bnw
Orhan Firat
4 years
Take 1: Scaling the multilinguality of the model improves unsupervised MT performance. Take 2: In the multilingual setting, a lot of the drawbacks of unsupervised MT vanish (e.g. domain mismatch, lexical dissimilarity and varying quality of the data). (2/3).
1
1
10
@orf_bnw
Orhan Firat
6 years
Multilingual NMT is known to be very effective for low-resource languages, but comes with high-resource regression. We show what happens when we scale things up: large improvements on low-resource, with high-resource quality on-par with competitive bilingual baselines.
Tweet media one
1
0
10
@orf_bnw
Orhan Firat
4 years
More to come especially in massively multilingual setup, very soon 🙂. (3/3).
0
0
10
@orf_bnw
Orhan Firat
6 years
@roeeaharoni @NAACLHLT @GoogleAI And @roeeaharoni with the cool poster, check out the left panel 😎🆒
Tweet media one
2
3
10
@orf_bnw
Orhan Firat
5 years
tl;dr Massively Multilingual Translation Encoders (MMTE) are obtained as a by-product of massively multilingual MT model. MMTE excels on zero-shot transfer to low resource languages and is competitive with alternatives like mBERT (2/2).
0
1
10
@orf_bnw
Orhan Firat
3 years
1⃣ "Examining Scaling and Transfer of Language Model Architectures for MT" .Poster: Tue 19 Jul 6:30 p.m. EDT .Spotlight: Wed 20 Jul 10:30 a.m. EDT.Paper: delivered to you by our great @BZhangGo et al.
1
1
9
@orf_bnw
Orhan Firat
6 years
The whole effort builds on top of great prior work on how to train (, how to scale (, how to schedule (, how to generalize ( and many others.
1
1
9
@orf_bnw
Orhan Firat
5 years
@surafelml @fbk_mt thanks for covering the paper and for the great thread. bit more context: paper was meant to be more of an "open-problems" paper. we are following this with a series of papers addressing each one (summary so far is here: . four down, more on the way 🙂.
1
3
8
@orf_bnw
Orhan Firat
4 years
"Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation" ( by @BZhangGo @ankurbpn and @RicoSennrich.
1
1
7
@orf_bnw
Orhan Firat
3 years
2⃣ "Data Scaling Laws in NMT: The Effect of Noise and Architecture".Poster: Thu 21 Jul 6:00 pm EDT.Spotlight: Thu 21 Jul 3:35 pm EDT.Paper: by amazing @whybansal and @_ghorbani et al. (more: .
@whybansal
Yamini Bansal
3 years
Some life updates!.- I've started at Google as a Research Scientist.- I now live in NYC.- I'll be at #ICML2022 Tuesday onwards presenting this work . If you're at any of the above (or even o/w), and would like to chat about Life, ML and Everything, feel free to DM me :).
1
1
7
@orf_bnw
Orhan Firat
1 year
Tweet media one
0
0
8
@orf_bnw
Orhan Firat
1 year
@baohao_liao Thanks for the great summary and highlights ☺️.
0
0
3
@orf_bnw
Orhan Firat
5 years
delivered by several amazing people spanning multiple teams @GoogleAI , another step towards universal translation, getting close 🙂.
0
0
5
@orf_bnw
Orhan Firat
3 years
3⃣ "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts" .Poster: Thu 21 Jul 6 pm EDT.Spotlight: Thu 21 Jul 3:30 pm EDT.Paper: by stellar Nan Du, @iamandrewdai, @bignamehyp et al. (more: .
1
1
6
@orf_bnw
Orhan Firat
4 years
"Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models" ( by @MrZiruiWang, Yulia Tsvetkov and Yuan Cao.
1
0
6
@orf_bnw
Orhan Firat
4 years
"GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding" ( by @lepikhin et al.
1
1
6
@orf_bnw
Orhan Firat
4 years
@kchonyc @LiuQunMTtoDeath @tarfandy @emjotde oh wow!!! debugging zero-shot nmt at the lobby . bar hopping at temple bar . dl4mt session2 . rnns . a lot.has changed in 5 years 😶🤔😃.
0
0
4
@orf_bnw
Orhan Firat
4 years
@M2lSchool @mmbronstein @gaiarubera @MarcRanzato thank you for the invitation and organizing it in the first place 🙂.
0
0
4
@orf_bnw
Orhan Firat
6 years
Indeed studying the problem using a massive open-domain dataset has its own challenges, with the most pronounced being the heavy data imbalance across language pairs. We provide practical recipes to mitigate the issue, and exciting future directions.
Tweet media one
1
0
5
@orf_bnw
Orhan Firat
3 years
I wouldn't miss this chance to connect with Behnam, i've been learning a lot from him!.
@bneyshabur
Behnam Neyshabur
3 years
I'm going to try having in-person @ml_collective meetings when I’m on #workation:. I'm in #istanbul for a few weeks. If you are in #istanbul and you like to chat about AI research, career choices, travel, etc., you can book an in-person meeting here:.
0
0
5
@orf_bnw
Orhan Firat
1 year
@kchonyc well aligned 😀😀😀.
0
0
4
@orf_bnw
Orhan Firat
4 years
@kchonyc Really real conclusion: don't listen anyone 😀 (including us) do things that you feel excited and curious about.
0
0
4
@orf_bnw
Orhan Firat
3 years
@elmanmansimov You are getting old my friend 😉 and I feel you.
1
0
4
@orf_bnw
Orhan Firat
6 years
There is still a long way to go. So expect more to come very soon 🙂.
0
0
4
@orf_bnw
Orhan Firat
5 years
@yoavgo @ehudkar @Eric_Wallace_ depth seems to be crucial for (parallel) transfer (i.e. multilingual MT), checkout last section of supplemental here:
1
0
2
@orf_bnw
Orhan Firat
2 years
The 2nd multilingual representation learning workshop ( started accepting submissions, checkout👇.
@mrl2024_emnlp
MRL
2 years
MRL is expecting your submissions! If you have your ARR reviews by the end of this week, send them to our workshop and have the chance to participate in a great event with an excellent line of speakers and meet a large community of scientists working on multilinguality.
0
2
3
@orf_bnw
Orhan Firat
5 years
All with the great team @bignamehyp, @topocheng, @ankurbpn, @MiaXuChen, @quocleix (4/4).
0
0
3
@orf_bnw
Orhan Firat
1 year
Incredible to see cross-capability transfer at scale—the confluence of massive multilinguality, stellar reasoning, deft coding, sharp math, exceptional vision, audio, and more, all skillfully coming together and unlocking new capabilities.
0
0
3
@orf_bnw
Orhan Firat
5 years
Btw, still very much relevant to large models, tying back to “massively” 🙂 and domain/task routing.
0
0
3
@orf_bnw
Orhan Firat
5 years
This further enabled us to study the transfer capability, depth-width trade-off and trainability challenges of very deep models, on massively multilingual (massive) NMT and image classification (3/4).
1
0
3
@orf_bnw
Orhan Firat
5 years
… Very exciting direction from both efficiency and interpretability perspectives: train a single model and control its behavior depending on the available computation budget, expected quality or let the model decide itself. .
1
1
3
@orf_bnw
Orhan Firat
5 years
@jigarkdoshi cute :-).
1
0
3
@orf_bnw
Orhan Firat
4 years
@kchonyc I blame @melvinjohnsonp 😜 very nice paper with extensive analysis btw, thanks!.
1
0
3
@orf_bnw
Orhan Firat
5 years
@roeeaharoni Came back after watching this, total eye-opener.
0
0
2
@orf_bnw
Orhan Firat
3 years
And if you are interested in studying LLMs empirically or theoretically; faculty/postdoc looking for a visiting position, SWE/RS/rSWE looking for a full-time position, PhD student seeking an internship, please reach out!.
0
1
3
@orf_bnw
Orhan Firat
4 years
@FrancescoVisin @M2lSchool Out where? 😃.
0
0
3
@orf_bnw
Orhan Firat
1 year
@CohereForAI @GoogleDeepMind @ahmetustun89 Thanks for having me; it was great to meet and chat with all of you 😊🙏.
0
0
2
@orf_bnw
Orhan Firat
3 years
@kchonyc i got hit by a sense of familiarity half way into the abstract . strange! they seem to have passed the "strawman" phase rather quickly😜.
1
0
2
@orf_bnw
Orhan Firat
5 years
@roeeaharoni @ankurbpn it was great having you @roeeaharoni, way to go.
0
0
2
@orf_bnw
Orhan Firat
4 years
Come say hi 🙂.
0
0
2
@orf_bnw
Orhan Firat
5 years
With novel batch-splitting pipeline-parallelism, GPipe achieves almost linear speedup with the number of devices and allows us to train massive neural networks to study vision and nlp problems at scale (2/4).
1
0
2
@orf_bnw
Orhan Firat
1 year
@zacharynado this.
0
0
2
@orf_bnw
Orhan Firat
5 years
on trainability of very deep models, investigating the internal representations, cross-lingual down-stream transfer, adapting to specific languages and domains, dynamics of the transfer. .
1
0
2
@orf_bnw
Orhan Firat
8 years
@YadFaeq wow kudos to your "attention" how did you find the link, lol ;).
1
0
2
@orf_bnw
Orhan Firat
8 years
“You don't need a doctor, you need a time machine!” lolol😂😂.
0
0
1
@orf_bnw
Orhan Firat
3 years
@adisid01 i vote for the green one :P.
0
0
1
@orf_bnw
Orhan Firat
1 year
0
0
1
@orf_bnw
Orhan Firat
9 years
negative vibes all the way :).
0
0
1
@orf_bnw
Orhan Firat
4 years
1
0
1
@orf_bnw
Orhan Firat
5 years
@alvations @wellformedness Thanks @alvations for the mention 😊 tho not sure how innovative my definition was (emphasizing the interplay between variables in end-to-end learning). On data (X) totally agree, but formulation is challenging. Perhaps poking active learning/learning to learn in a similar vein. .
0
0
1
@orf_bnw
Orhan Firat
4 years
@elmanmansimov @awscloud @AmazonScience congrats Elman, both for your dissertation and position at Amazon 🙂 we should catch up sometime.
1
0
1
@orf_bnw
Orhan Firat
9 years
who wants to have a job that's easy,.
0
0
1
@orf_bnw
Orhan Firat
3 years
0
0
1
@orf_bnw
Orhan Firat
9 years
nerd chills :D.
0
0
1
@orf_bnw
Orhan Firat
6 years
@roeeaharoni @NAACLHLT @GoogleAI And with the human evalers 🙂
Tweet media one
0
0
1