Loren Lugosch Profile Banner
Loren Lugosch Profile
Loren Lugosch

@lorenlugosch

Followers
2,019
Following
1,005
Media
309
Statuses
978

Machine learning @ ; audio & language; Freigeisterei und Vielgeisterei; "at once a man of business and a man of rhyme"

His whereabouts are unknown.
Joined September 2017
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@lorenlugosch
Loren Lugosch
4 years
Autoregressive models:
6
118
1K
@lorenlugosch
Loren Lugosch
4 years
A theological theory of deep learning.
Tweet media one
13
101
740
@lorenlugosch
Loren Lugosch
3 years
Writing in the Overleaf and your co-author's cursor is just sitting above yours, silently
Tweet media one
9
56
747
@lorenlugosch
Loren Lugosch
11 months
Tweet media one
30
62
744
@lorenlugosch
Loren Lugosch
3 years
We're releasing an open-source massively multilingual speech recognizer! Repo (+ colab notebook): It's a 1-billion-parameter CTC transformer. This is a very cool model, for a few reasons:
14
127
602
@lorenlugosch
Loren Lugosch
4 years
Has your neural net ever NaN’d so hard you thought about chucking your laptop in the trash and moving to British Columbia to be a tree planter instead?
20
22
391
@lorenlugosch
Loren Lugosch
4 years
Model-free RL: If I push this button, I will get a treat. Model-based RL: 𝘎𝘦𝘯𝘵𝘭𝘦𝘮𝘦𝘯, 𝘵𝘩𝘳𝘰𝘶𝘨𝘩 𝘱𝘢𝘪𝘯𝘴𝘵𝘢𝘬𝘪𝘯𝘨 𝘤𝘢𝘭𝘤𝘶𝘭𝘢𝘵𝘪𝘰𝘯𝘴, 𝘐 𝘩𝘢𝘷𝘦 𝘥𝘦𝘵𝘦𝘳𝘮𝘪𝘯𝘦𝘥 𝘵𝘩𝘢𝘵 𝘪𝘧 𝘐 𝘱𝘶𝘴𝘩 𝘵𝘩𝘪𝘴 𝘣𝘶𝘵𝘵𝘰𝘯, 𝘐 𝘸𝘪𝘭𝘭 𝘨𝘦𝘵 𝘢 𝘵𝘳𝘦𝘢𝘵.
7
36
375
@lorenlugosch
Loren Lugosch
2 years
Transformers are an unholy creation: God did not intend for us to put a softmax _inside_ a neural network.
20
14
371
@lorenlugosch
Loren Lugosch
3 years
Researchers Had To Shut Down AI After It Taught Itself 19 Languages?!* 🤔😱🤖😤 Like👍 Subscribe🔔 * = we used pseudo-labeling to train a single massively multilingual speech recognizer for all 60 languages of Common Voice. Paper: 🧵
Tweet media one
5
44
236
@lorenlugosch
Loren Lugosch
3 years
“Why do convolutions work, father?” “Well, son, translation equivariance is a sensible inductive bias for many modalities, like images and audio.” “Why do Einsum(bhnk,bhmk−>bhnm)TransposeSigmoid1x1Nets work, father?” “Enough questions for today, son.”
4
17
228
@lorenlugosch
Loren Lugosch
2 years
The paper to propose using a neural net to predict HMM states was published in the first NeurIPS (1988):
Tweet media one
4
26
209
@lorenlugosch
Loren Lugosch
4 years
Introducing the Transducer! A sequence-to-sequence model from 2012 (!) that combines the best aspects of CTC and attention models for problems like speech recognition—long neglected, but starting to have a comeback. Blog: Code:
Tweet media one
Tweet media two
Tweet media three
2
49
202
@lorenlugosch
Loren Lugosch
3 years
New addition to my collection of Now Unfashionable Sequence Model shirts:
Tweet media one
@lorenlugosch
Loren Lugosch
6 years
The aforementioned meme shirt by @ZakDavid . It’s gotten a lot of compliments!
Tweet media one
1
2
28
7
13
183
@lorenlugosch
Loren Lugosch
4 years
I have completed the rite of passage required of all machine learning PhD students: getting rejected from NeurIPS! 🎉
1
2
173
@lorenlugosch
Loren Lugosch
5 years
Hidden Markov Models have gotten a bit less love in the age of deep learning, but they are really nifty models that can learn even from tiny datasets. I’ve written a notebook introducing HMMs and showing how to implement them in PyTorch—check it out here:
5
37
141
@lorenlugosch
Loren Lugosch
3 years
Sorry, guys. Facebook is down because a neural net I trained during my internship grew too large and began to eat the other computers (this happens sometimes).
6
0
137
@lorenlugosch
Loren Lugosch
4 years
I wrote a short post about logsumexp: It's an operation you've almost certainly used, if you do machine learning—but not everyone has taken a moment to ponder it and understand it intuitively.
3
27
127
@lorenlugosch
Loren Lugosch
4 years
Ahh, old NeurIPS papers. A Wild West of analog chip design, LDPC decoders, and opening sentences like this one:
Tweet media one
3
9
123
@lorenlugosch
Loren Lugosch
3 years
Tweet media one
5
4
109
@lorenlugosch
Loren Lugosch
3 years
It seems to be a good career move to be a carpenter for a few years (Harrison Ford, Geoff Hinton, Jesus)
5
6
104
@lorenlugosch
Loren Lugosch
1 year
Henceforth you must refer to me as Dr. Lugosch!
Tweet media one
Tweet media two
18
3
103
@lorenlugosch
Loren Lugosch
3 years
Might distill a transformer into an LSTM, out of spite
8
3
91
@lorenlugosch
Loren Lugosch
3 years
Got nervous during my first-ever RL interview; could not remember the word "policy", came up with "uhhh model-free.. model"
2
0
88
@lorenlugosch
Loren Lugosch
2 years
I’ve moved to Boston 🇺🇸 to work for Apple! Hope to see you around if you’re in the area.
Tweet media one
13
1
88
@lorenlugosch
Loren Lugosch
1 year
How can you train a speech recognizer using only unpaired audio and text? Here's a simple recipe: - train language model (LM) for the target language - train acoustic model (AM) for some other (!) source language - iterative pseudo-labeling using AM + LM
Tweet media one
Tweet media two
2
17
82
@lorenlugosch
Loren Lugosch
2 years
Last week I started my internship at Apple, and this week I got to visit The Apple Spaceship in Cupertino and meet two of my fabulous co-authors IRL.
Tweet media one
3
2
78
@lorenlugosch
Loren Lugosch
2 years
Has anyone tried RNN architectures with all the transformer stuff except for self-attention? (in other words, layer norm + residuals + feedforward + deep, and then just RNN instead of self-attention)
12
4
74
@lorenlugosch
Loren Lugosch
4 years
Looks like I'll be interning with Ronan Collobert and Gabriel Synnaeve @syhw at FAIR this summer! I believe our... colloberation... should have a lot of... synnergy...
8
0
68
@lorenlugosch
Loren Lugosch
4 years
It is time for ᵈ𝒾𝓃𝓃𝑒𝓇 The transformer must f̵̞̘̳̈́͊̇̃e̶͙̣͌́̉̀͑͒͒̆̌̐ͅę̴͚͇̼̯̞͌͜d̷̥̈̏͊͜
Tweet media one
0
6
64
@lorenlugosch
Loren Lugosch
2 years
PyTorch implementation of M-CTC-T on @huggingface ! A huge thanks to @cwkeam and @patrickvonplaten for porting the model from Flashlight.
@lorenlugosch
Loren Lugosch
3 years
We're releasing an open-source massively multilingual speech recognizer! Repo (+ colab notebook): It's a 1-billion-parameter CTC transformer. This is a very cool model, for a few reasons:
14
127
602
1
10
56
@lorenlugosch
Loren Lugosch
3 years
A work of research is called “seminal” if it had an extraordinarily good random seed.
1
4
55
@lorenlugosch
Loren Lugosch
3 years
"Pseudo-Labeling for Massively Multilingual Speech Recognition" accepted to ICASSP 2022! See you in Singapore, assuming the Pi or Rho variant doesn't thwart my plans.
@lorenlugosch
Loren Lugosch
3 years
Researchers Had To Shut Down AI After It Taught Itself 19 Languages?!* 🤔😱🤖😤 Like👍 Subscribe🔔 * = we used pseudo-labeling to train a single massively multilingual speech recognizer for all 60 languages of Common Voice. Paper: 🧵
Tweet media one
5
44
236
1
6
58
@lorenlugosch
Loren Lugosch
1 year
Gramle! (Spectrogram Wordle)
Tweet media one
0
10
56
@lorenlugosch
Loren Lugosch
1 year
My theory is that non-doomers are common but not well-represented online because they are emotionally stable people who are not temperamentally well-suited for wading into a debate where people are shrieking about whether AI will be Racist or SkyNet.
@nasim_rahaman
Nasim Rahaman
1 year
From what I infer, most doomers fall in to three categories: they are either (a) fundamentally misanthropic, (b) like to think they're "saving the world", or (c) looking for a moat. Or some combination of the above. Let's take a look at this. [1/4]
5
0
13
7
4
54
@lorenlugosch
Loren Lugosch
4 years
❌ Dammit, I forgot to buy one of the ingredients. ✅ Excellent, an opportunity to perform an ablation study on this recipe.
1
1
49
@lorenlugosch
Loren Lugosch
3 years
Authorship idea: if you helped with a paper, but maybe not quite enough to merit being a co-author, you can get a “Feat.”, like “Attention Is All You Need (Feat. Pitbull)”
2
2
50
@lorenlugosch
Loren Lugosch
3 years
A reviewer complained that my paper did not have a distinct “Related Work” heading. Remember, your paper should always have: Introduction Post-Introduction Related Work Unrelated Work Background Foreground Foreplay Wait For It Almost There Experiments Brace For Impact
3
3
47
@lorenlugosch
Loren Lugosch
3 years
New God just dropped:
Tweet media one
@arankomatsuzaki
Aran Komatsuzaki
3 years
Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-To-Sequence Learning Framework Proposes OFA, which achieves new SotA on multimodal tasks and performs on par with uni-modal models (BERT, MAE, etc.) in uni-modal tasks.
Tweet media one
3
23
127
2
2
43
@lorenlugosch
Loren Lugosch
5 years
Broke: @GaryMarcus arguing that neural networks can’t implement System 2 cognition Woke: Hubert Dreyfus arguing that expert systems can’t implement System 1 cognition
Tweet media one
0
10
39
@lorenlugosch
Loren Lugosch
4 years
Autumn in Montreal. The leaves have all fallen, and the first croissants have begun to bloom.
Tweet media one
0
0
35
@lorenlugosch
Loren Lugosch
6 years
I’m delighted to announce that this fall I’ll be starting a PhD in AI with two of my favorite professors, @bretthmeyer and @DerekRenderling , at @McGillU / @MILAMontreal ! Excited to go on a journey of learning, computers, and getting computers to do learning :)
9
2
36
@lorenlugosch
Loren Lugosch
6 years
The aforementioned meme shirt by @ZakDavid . It’s gotten a lot of compliments!
Tweet media one
1
2
28
@lorenlugosch
Loren Lugosch
3 years
Tweet media one
0
0
31
@lorenlugosch
Loren Lugosch
3 years
I wish computer vision systems did the thing where a bunch of vertices dance across the image until a mesh forms and it starts blinking and printing some shit like “𝙼𝙰𝚃𝙲𝙷 𝙲𝙾𝙽𝙵𝙸𝚁𝙼𝙴𝙳”. Instead it’s like 𝚝𝚎𝚗𝚜𝚘𝚛([-𝟶.𝟽𝟼𝟿𝟾, 𝟷.𝟹𝟹𝟾𝟹, ...
1
2
31
@lorenlugosch
Loren Lugosch
10 months
@ylecun @giffmana Sad that this gnarly dragon logo didn't survive.
Tweet media one
0
3
30
@lorenlugosch
Loren Lugosch
2 years
@giffmana 1.0 was autoregressive (in latent space) whereas 2.0 was BERTish
Tweet media one
Tweet media two
1
1
30
@lorenlugosch
Loren Lugosch
2 years
@tetraduzione Careful: this trick does not play nicely with batch norm
1
0
31
@lorenlugosch
Loren Lugosch
4 years
Wanted to understand gradient boosting for classifiers and coded it up using sklearn trees and PyTorch:
1
5
28
@lorenlugosch
Loren Lugosch
3 years
Hindi ASR challenge: 100 hours of labeled data, 1000 hours of unlabeled data. Perfect opportunity for testing out semi-supervised learning algorithms on Not-Librispeech!
0
5
30
@lorenlugosch
Loren Lugosch
2 years
Residual architectures trained with layer drop are fun because you can compute the logits at any layer. I just tried it on M-CTC-T:
Tweet media one
Tweet media two
1
3
28
@lorenlugosch
Loren Lugosch
2 years
@deliprao The true mother is from 2008:
Tweet media one
2
1
28
@lorenlugosch
Loren Lugosch
2 years
2023 goal: train a sentient n-gram model.
3
1
29
@lorenlugosch
Loren Lugosch
2 years
If you have a meta-learning idea, the Bengio brothers were probably already doing it in the ‘90s. If you have a meta-meta-learning idea, Schmidhuber was probably already doing it in the ‘80s.
1
2
28
@lorenlugosch
Loren Lugosch
5 years
The brains of SpeechBrain!
Tweet media one
1
1
26
@lorenlugosch
Loren Lugosch
2 years
Most research papers are very boring and badly written, but every now and then Hinton et al. will put out a banger like this:
@OriolVinyalsML
Oriol Vinyals
2 years
We should have totally incorporated the fun fact in the manuscript, to compete with this other 💎 from the distillation paper.
Tweet media one
2
3
38
1
5
24
@lorenlugosch
Loren Lugosch
4 years
This is pretty darn cool.
Tweet media one
1
3
24
@lorenlugosch
Loren Lugosch
3 years
3. No phonemes! No tokenizers! Death To Tokenizers!
1
1
25
@lorenlugosch
Loren Lugosch
4 years
Inside you there are two wolves. One wolf likes smol models with strong inductive biases that can learn from 100 training examples. The other likes 1 trillion parameter transformers that can eat the entire Internet and do on-the-fly meta-learning. Which wolf will you feed?
5
3
23
@lorenlugosch
Loren Lugosch
3 years
Wordle 203 5/6 ⬛🟨⬛⬛🟨 🟨🟨⬛⬛⬛ ⬛⬛🟩🟨🟩 🟩🟩🟩⬛🟩 🟩🟩🟩🟩🟩 I will not waste a day coding a neural net to play Wordle I will not waste a day coding a neural net to play Wordle I will not waste a day coding a neural net to play Wordle I will not waste a day coding a n-
4
0
25
@lorenlugosch
Loren Lugosch
5 years
The transformer kind of looks like a Tatooine moisture farm:
Tweet media one
Tweet media two
1
6
23
@lorenlugosch
Loren Lugosch
3 years
Out-of-domain Out-of-vocabulary Out-of-bag Out-of-memory Out-of-office Out-of-order Out-of-gas Out-of-time
8
1
21
@lorenlugosch
Loren Lugosch
1 year
You’ve heard of Pareto-Optimal. Now get ready for
Tweet media one
1
3
22
@lorenlugosch
Loren Lugosch
3 years
@ylecun @alex_conneau A neural network
Tweet media one
1
0
22
@lorenlugosch
Loren Lugosch
4 years
The GODFATHER of AI The BIG DADDY of ML The PATRIARCHY of PARAMETERS The GRANDFATHER of GRADIENTS The PHALLUS of FORECASTING---oh, too far.
@MelMitchell1
Melanie Mitchell
4 years
AI people: Should @techreview and other media stop using the term "AI Godfather", "Godfather of AI", etc. ? ?
22
13
35
3
0
18
@lorenlugosch
Loren Lugosch
2 years
Thinking about her 😤😩🥺
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
0
21
@lorenlugosch
Loren Lugosch
3 years
I went backcountry skiing, during an Extreme Cold Warning, with some friends who are a bit more daring and outdoorsy than I am. The best part of the experience was that I Did Not Die.
3
0
19
@lorenlugosch
Loren Lugosch
3 years
ʰᵉˡᵖ ᵐᵉ
Tweet media one
Tweet media two
1
1
19
@lorenlugosch
Loren Lugosch
1 year
Enjoying my new job. A shame that I’ll be out sick for the next several weeks.
Tweet media one
0
0
21
@lorenlugosch
Loren Lugosch
2 years
L-BFGS Let's Be Friends, Good Sir
2
0
21
@lorenlugosch
Loren Lugosch
3 years
Shot #1 ... already I can feel its dark power surging through me... yes... y͓͎̻͒͞e͕̜̟͆̏ͨͫs̮̯̯̰͓̊ͧ̽͛ͪ͂!̹̻̜͔̝̄͐̾̍̔͒͑͢ͅ!̱͍͉̦̺͇ͨ͝
Tweet media one
2
0
19
@lorenlugosch
Loren Lugosch
1 year
Schmidhuber is going to bring a delightful chaotic energy to this fight. Regulators will be baffled by his claim to have achieved AGI in 1991.
@SchmidhuberAI
Jürgen Schmidhuber
1 year
Silly AI regulation hype One cannot regulate AI research, just like one cannot regulate math. One can regulate applications of AI in finance, cars, healthcare. Such fields already have continually adapting regulatory frameworks in place. Don’t stifle the open-source movement!
Tweet media one
52
213
1K
0
0
19
@lorenlugosch
Loren Lugosch
1 year
I've changed my mind on AI regulation. This must be stopped:
Tweet media one
2
0
20
@lorenlugosch
Loren Lugosch
4 years
I wonder if in the future a false etymology for “Zoomer” will arise, like: “Ah yes, in 2020 the world began using Zoom because of the pandemic. Hence the new generation became known as ‘Zoomers’.”
2
1
19
@lorenlugosch
Loren Lugosch
2 years
@giffmana (Also 1.0 was a pure CNN and 2.0 used a transformer) (Also 1.0 called itself "unsupervised", but by the time 2.0 rolled around they had jumped on the bandwagon and started saying "self-supervised" :P)
1
1
20
@lorenlugosch
Loren Lugosch
3 years
In a few years I’ll make a transformer shirt, but by that time MLPs will have replaced everything.
2
0
18
@lorenlugosch
Loren Lugosch
2 years
Tweet media one
Tweet media two
0
3
18
@lorenlugosch
Loren Lugosch
4 years
*scribbling in lab notebook* ...the chicken... is necessary... for roast chicken...
1
0
17
@lorenlugosch
Loren Lugosch
3 years
1. No need to tell the model which language you're speaking! The model implicitly figures that out and transcribes in the appropriate script, etc., unlike a "multi-headed" / "multi-decoder" model.
Tweet media one
1
2
18
@lorenlugosch
Loren Lugosch
1 year
@Noahpinion He is John Kirby, the lawyer who saved Nintendo and had a big shiny pink head:
Tweet media one
0
0
19
@lorenlugosch
Loren Lugosch
3 years
The jobs of tomorrow: - Data labeler - Data cleaner - Ornamental hermit - Blood Boy - Monoclonal Antibody Boy
0
3
19
@lorenlugosch
Loren Lugosch
2 years
@irinarish @ethanCaballero Error is zero (infinite compute was used —> avg’d over infinite seeds)
1
0
19
@lorenlugosch
Loren Lugosch
2 years
Your scientists were so preoccupied with how much wood a woodchuck _could_ chuck… that they did not stop to ask how much wood a woodchuck _should_ chuck.
0
1
14
@lorenlugosch
Loren Lugosch
4 years
Me: "Let me make a retreat out of the world and into my studies; surely there I will be free of 2020 and this goddam virus" My studies:
Tweet media one
Tweet media two
1
0
17
@lorenlugosch
Loren Lugosch
3 years
This is work in progress: we're still training some models, and we're currently working on releasing the weights, and afterwards we'll update the paper. Until then, enjoy and let me know if you have any questions.
2
2
17
@lorenlugosch
Loren Lugosch
6 years
It’s my first day at Mila. Time to find the biggest, meanest neural network in the yard and make it overfit to a single minibatch, to assert my dominance.
0
2
17
@lorenlugosch
Loren Lugosch
1 year
Would
Tweet media one
@ManifoldMarkets
Manifold
1 year
Manifold has just launched a dating app! 🤯💖 The premise is simple: OkCupid meets prediction markets! Bet on who would date who for at least 6 months. It's crowdsourced matchmaking! 100+ profiles created in just a couple days. What are you waiting for? Get in there!
Tweet media one
105
98
567
2
0
17
@lorenlugosch
Loren Lugosch
2 years
Note that my test line is so dark it even beats the baseline (SOTA)
4
0
17
@lorenlugosch
Loren Lugosch
5 years
The unsupervised revolution continues!
@mirco_ravanelli
Mirco Ravanelli
5 years
I'm happy to announce our latest work on self-supervised learning for #speech . PASE+ is based on a multi-task approach useful for #speech recognition. It will be presented at #ICASSP2020 . paper: code: @Mila #deeplearning #AI
Tweet media one
8
79
293
1
3
17
@lorenlugosch
Loren Lugosch
3 years
I'd like to work on "language --> actions" type agents (RL/IL). If anyone is looking for interns to work on that kind of stuff, let me know!
1
5
14
@lorenlugosch
Loren Lugosch
4 years
If I were a 1995 hacker, my hacker name would be MAXIMUM LIKELIHOOD, and I would train neural nets with over 100,000 (!!) weights, using data swiped from my rivals using Hacking.
1
0
13
@lorenlugosch
Loren Lugosch
4 years
@srchvrs F u l l - c h u n k m o d e
Tweet media one
1
0
14
@lorenlugosch
Loren Lugosch
3 years
Hehe
Tweet media one
2
0
15
@lorenlugosch
Loren Lugosch
4 years
Still can't believe that one of the fundamental reinforcement learning algorithms is just called "REINFORCE". It's like naming your neural net architecture "NEUR".
1
1
14
@lorenlugosch
Loren Lugosch
1 year
Wtf, now I have to change my username
@arankomatsuzaki
Aran Komatsuzaki
1 year
VERA: Vector-Based Random Matrix Adaptation Presents VeRA, which reduces the number of trainable parameters by 10x compared to LoRA, yet maintains the same performance
Tweet media one
12
97
522
1
0
15
@lorenlugosch
Loren Lugosch
3 years
We also discovered something delightful for the monolingual setting: if you train on pseudo-labels for the wrong language, it still works! This is like teaching a child to write English by reading them "The Cat in the Hat" and then making them transcribe Telemundo for 2 years.
Tweet media one
1
1
12
@lorenlugosch
Loren Lugosch
4 years
Shovel Knight bids you Happy Thanksgiving. 🇨🇦🍗
Tweet media one
0
1
15
@lorenlugosch
Loren Lugosch
1 year
Asserting my dominance using a linear y-axis:
@arankomatsuzaki
Aran Komatsuzaki
1 year
LongNet: Scaling Transformers to 1,000,000,000 Tokens Presents LONGNET, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences abs: repo:
Tweet media one
30
285
1K
0
0
14