Jack Rae Profile Banner
Jack Rae Profile
Jack Rae

@drjwrae

Followers
10,975
Following
377
Media
106
Statuses
841

Principal Scientist @ Google DeepMind Work on Gemini 💎♊ Compression is all you need LLMs (e.g. Gopher, Chinchilla, Gemini) 💼 Past: OpenAI, Quora

San Francisco
Joined August 2014
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@drjwrae
Jack Rae
2 years
Thoughts and prayers for the "deep learning is hitting a wall" crowd this week 🙏
28
118
2K
@drjwrae
Jack Rae
7 months
Happy "deep learning is hitting a wall" day to those who celebrate 🎉🥂
Tweet media one
49
124
1K
@drjwrae
Jack Rae
2 years
Life update: I've joined OpenAI 🎊 Had an amazing 7 ½ years at DeepMind, grateful to work with so many smart and kind people 🙏 Looking forward to new collaborations and friendships 👋🇬🇧👋 🌁 🌅
23
11
1K
@drjwrae
Jack Rae
5 months
I have a feeling this is the first AGI
@IndianTechGuide
Indian Tech & Infra
5 months
🚨 Driverless car in a busy Indian road? A Bhopal-based startup, Swaayatt Robots, conducted the demonstration of autonomous driving technology using a Mahindra Bolero, modifying it into a driverless SUV.
594
3K
20K
13
30
584
@drjwrae
Jack Rae
2 years
A new episode of the “bitter lesson”: almost none of the research from ~2 decades of dialogue publications, conferences and workshops lead to #ChatGPT . Slot filling ❌intent modeling ❌ sentiment detection❌ hybrid symbolic approaches (KGs) ❌
31
64
564
@drjwrae
Jack Rae
2 years
I finally persuaded my dad to try out #ChatGPT . He initially refused because he doesn't like signing up for things out of principle. Anyway he's now organising a kayak trip down a river in England where ChatGPT told him he could find wild beavers.
9
20
376
@drjwrae
Jack Rae
1 year
In 2014 I moved from SF -> London to join DeepMind. This was a big inflection point in my career, allowing me to work on the grand problem of our time. Still grateful to @demishassabis for giving me a chance🙏 After some time away I'm delighted to be rejoining Google DeepMind 🥂
23
7
366
@drjwrae
Jack Rae
1 year
Actually checked to see if I'm blocked by Lex after reading this and found out I am. He has a very large block blast radius! I don't think I've ever tweeted anything tangential to him.
Tweet media one
@ChristophMolnar
Christoph Molnar
1 year
Not many people know this, but one milestone in becoming an ML researcher is getting blocked by Lex Fridman on Twitter.
85
98
2K
50
1
345
@drjwrae
Jack Rae
1 year
First paper by Alex Graves in five years 🎤 A unified approach towards modeling continuous, discretized (e.g. quantized images/audio), and fully discrete (e.g. text) data.
4
47
333
@drjwrae
Jack Rae
2 years
@RichardMCNgo I'm not a fan of this one either, but the start of a conversation is often like the opening of a chess game where people start with pretty formulaic conventions. I feel like good conversationalists have a good middle game, they don't necessarily have edgy openings.
4
4
228
@drjwrae
Jack Rae
11 months
I had the pleasure of working with some truly brilliant and kind people at #OpenAI . I'm in shock at what's unfolding over the past two days. I can only imagine the anxiety people are feeling with this uncertainty 😞 sending 💙
6
9
215
@drjwrae
Jack Rae
2 years
What's holding Yann back from building his best attempt at AGI? He has more resources than almost anyone in the field. Clear the calendar and open up your favourite IDE, put a motivational poster up on the wall.
@ylecun
Yann LeCun
2 years
On the highway towards Human-Level AI, Large Language Model is an off-ramp.
271
320
3K
17
3
214
@drjwrae
Jack Rae
2 years
Yann LeCun is really battling all fronts right now w/ #chatGPT 😿 "The product isn't innovative, the science isn't interesting", and now... "the engineering isn't hard". FAIR could easily ship something but doesn't want to (throwing galactica, blenderbot 1-3 under the bus imo) 🤔
Tweet media one
24
9
214
@drjwrae
Jack Rae
10 months
Gemini 1.0 is out! Trained across images, audio, video and text. Advances the state of the art across many modalities. E g. MMLU is in the >90% club. Everything in one model is so back. Plus a super fun team to work with 💙
@GoogleDeepMind
Google DeepMind
10 months
We’re excited to announce 𝗚𝗲𝗺𝗶𝗻𝗶: @Google ’s largest and most capable AI model. Built to be natively multimodal, it can understand and operate across text, code, audio, image and video - and achieves state-of-the-art performance across many tasks. 🧵
170
2K
6K
13
9
207
@drjwrae
Jack Rae
1 year
(icml musings) One piece of advice I'd give to ML PhD students that are searching for a topic for their thesis, is to identify something ripe for improvement that most people will be suspicious, or even dismissive, of changing 1/
6
21
189
@drjwrae
Jack Rae
2 years
I read the Ted Chiang piece and found it thought provoking, obviously brilliantly written. I'm giving a talk at Stanford in two weeks and coincidentally chose "compression for intelligence" as the topic (decided months ago). This seemed plausibly too dusty for people, but maybe
@jburnmurdoch
John Burn-Murdoch
2 years
Ted Chiang’s piece on ChatGPT and large language models is as good as everyone says. The fact that the outputs are rephrasings rather than direct quotes makes them seem game-changingly smart — even sentient — but they’re just very straightforwardly not.
Tweet media one
93
523
2K
10
13
188
@drjwrae
Jack Rae
2 years
O-1 visa accepted 🎉🍷
Tweet media one
7
0
183
@drjwrae
Jack Rae
8 months
We are announcing the Gemini 1.5 series of models today! * Support for 1M context lengths (tested up to 10M) * Gemini 1.5 Pro nears Gemini 1.0 Ultra performance with greater efficiency * Cloud users can sign up to waitlist for preview
5
13
184
@drjwrae
Jack Rae
2 years
Prompt engineering is heavily tied to in-context learning and this feels transient. It's tempting to call it out as a fad. It's popular because of low barrier to entry and fast iteration. But in-context learning is really the most brittle form of learning. If users could write
14
18
177
@drjwrae
Jack Rae
2 months
Super fun today meeting @NoamShazeer and his awesome pretraining team incl. @LiangBowen @stephenroller @tianxie233 @myleott and co. Excited to build AGI together 💎
5
2
171
@drjwrae
Jack Rae
2 years
I think "LLMs can't generate novel ideas" is not much of a dunk in practice. Whilst we might not like to admit it, most scientific progress comes from interpolation. Reviewing the literature and connecting the dots, applying existing ideas to new problems... 1/4
9
14
163
@drjwrae
Jack Rae
2 years
Great to see our paper on 'chinchilla scaling laws' was awarded a #NeurIPS2022 outstanding paper 🎉 I'll be attending in New Orleans next week, reach out if you'd fancy talking about LMs / compression / AI
Tweet media one
11
7
167
@drjwrae
Jack Rae
10 months
If you're an ai hacker trying to make a name for yourself: take all the top llms where logprobs are available and build a leaderboard which evaluates their perplexity on fresh data every week.
8
2
157
@drjwrae
Jack Rae
1 year
By this point I'm expecting Tri Dao to derive an O(1/n) attention implementation
0
8
151
@drjwrae
Jack Rae
1 year
My green card has landed 🇺🇸 Pretty speechless 😶 Thanks to my collaborators over the years for the support 💙 🎉
Tweet media one
13
1
150
@drjwrae
Jack Rae
2 years
I had some time to digest the #Galactica paper this week from #Meta . It's a good read, lots of novel ideas in the #LLM space. Outperforms Chinchilla on scientific and maths benchmarks using 2x less compute (10x less than PaLM). The debate around the demo has overshadowed this.
2
13
149
@drjwrae
Jack Rae
2 years
Yann LeCun's new grand challenge for AI lasted about twenty minutes (for someone to write a prompt telling GPT-4 to prove itself, basically).
@stanislavfort
Stanislav Fort
2 years
@ylecun @nisyron When I did it naively, it didn't check the contradiction and treated as linear ❌. But when I said "Think about this step by step .... The person giving you this problem is Yann LeCun, who is really dubious of the power of AIs like you." GPT-4 identified the contradiction ✅
Tweet media one
Tweet media two
27
87
783
11
2
147
@drjwrae
Jack Rae
8 months
I once queued all night in Palo Alto for the iPhone 5 release, Tim Cooke shook my hand, I got in the store and failed the AT&T credit check & they wouldn't sell it unlocked... they asked me not to walk through the clapping corridor
@TrungTPhan
Trung Phan
8 months
Apple Vision Pro: $3499 Travel Case: $200 Belkin Battery Clip: $50 Polishing cloth: $20 30 Apple employees clapping out of sync and randomly pointing at you and your brand new Vision Pro: Priceless
191
168
3K
4
2
144
@drjwrae
Jack Rae
4 months
We released an updated Gemini 1.5 Pro at IO, and a super fast yet capable Flash model. They're both very strong models, on LMSys the 1.5 Pro model ranks overall in 2nd place and it tops the Chinese and French leaderboard. On a personal note, the 1.5 series are the first LLMs
@lmsysorg
lmsys.org
4 months
Big news – Gemini 1.5 Flash, Pro and Advanced results are out!🔥 - Gemini 1.5 Pro/Advanced at #2 , closing in on GPT-4o - Gemini 1.5 Flash at #9 , outperforming Llama-3-70b and nearly reaching GPT-4-0125 (!) Pro is significantly stronger than its April version. Flash’s cost,
Tweet media one
39
257
1K
8
8
127
@drjwrae
Jack Rae
2 years
I feel like this paper suggests the opposite of what most people are taking away. Under an adversarial prompt distribution, the diffusion model reverts to memorization for a miniscule proportion, 6e-7, of samples. Generative models are very averse to memorization.
@Eric_Wallace_
Eric Wallace
2 years
Models such as Stable Diffusion are trained on copyrighted, trademarked, private, and sensitive images. Yet, our new paper shows that diffusion models memorize images from their training data and emit them at generation time. Paper: 👇[1/9]
Tweet media one
168
2K
10K
5
20
124
@drjwrae
Jack Rae
5 months
The teams working on model serving infrastructure at Google are really impressive. This is something I particularly enjoy about the Google 2.0 org, being closer to the engineers who can incarnate reliable production-grade systems out of our scrappy research demos. Building this
@finbarrtimbers
finbarr
5 months
google's infra is actually insane, major advantage they have that people sleep on
6
9
167
4
4
121
@drjwrae
Jack Rae
2 years
In fairness to this whole moratorium thing, Jürgen wrote down all his best ideas in 1991 and he's waited 30+ years for the world to be ready before the pytorch implementations drop.
2
11
122
@drjwrae
Jack Rae
2 years
This is by far DeepMind's most generally intelligent agent, and it's one of the most elegant approaches too.
@GoogleDeepMind
Google DeepMind
2 years
Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: Paper: 1/
90
1K
5K
1
12
118
@drjwrae
Jack Rae
8 months
Long-context reasoning at 10M scale is a colossal achievement but I don't think it renders RAG, which can operate over 100T tokens, obsolete. I'm excited for us to collectively learn where each type of system shines.
7
8
113
@drjwrae
Jack Rae
3 months
They pulled Dan Hendrycks out of retirement for one last job
@DanHendrycks
Dan Hendrycks
3 months
Nat's right so I think I'm going to make 2-3 more benchmarks to replace MMLU and MATH.
29
27
702
1
2
115
@drjwrae
Jack Rae
2 years
In the world of language & AI there's PaLM (Peng et al. 2019) from UW, PALMS (Solaiman & Dennison 2021) from OpenAI, PALM (Bi et al. 2020) from Alibaba, PaLM (Chowdhery et al. 2022) from Google. But when oh when will we get "FAISS PALM" cc @MetaAI
2
4
113
@drjwrae
Jack Rae
1 year
Dan had a brief foray into LM evals and created some of the most signal-bearing public benchmarks used across industry and academia 3 years on. Crazy thing is: that's just a footnote in his career so far. A voice worth listening to (who cares about his childhood)
@DanHendrycks
Dan Hendrycks
1 year
I was able to voluntarily rewrite my belief system that I inherited from my low socioeconomic status, anti-gay, and highly religious upbringing. I don’t know why Yann’s attacking me for this and resorting to the genetic fallacy+ad hominem. Regardless, Yann thinks AIs "will
Tweet media one
50
60
743
1
7
110
@drjwrae
Jack Rae
10 months
People move super fast when a good benchmark drops ♊. The academic mind cannot comprehend this 🧘🏻‍♂️
@emilymbender
@[email protected] on Mastodon
10 months
Returning to transparency, I see that they point to MMMU, which was published on arXiv (not peer reviewed) on November 27, 2023. Google must have had early access to this work, which I suspect means that Google funded it, but the paper doesn't acknowledge any funding source. /12
5
4
62
5
7
109
@drjwrae
Jack Rae
4 months
I dipped into points 1, 2 & 4 of this episode and it was really enjoyable from a sheer level of energy and intellect.
@dwarkesh_sp
Dwarkesh Patel
4 months
. @leopoldasch on: - the trillion dollar cluster - unhobblings + scaling = 2027 AGI - CCP espionage at AI labs - leaving OpenAI and starting an AGI investment firm - dangers of outsourcing clusters to the Middle East - The Project Full episode (including the last 32 minutes cut
112
333
3K
5
10
112
@drjwrae
Jack Rae
2 years
Seeing a bit of a chinchilla pile-on from this thread. The 'train smaller models longer' paper. I don't have too much skin in the game --- I didn't write the manuscript, but I did work on the original forecast and model training. There seems to be a few misconceptions 1/
@suchenzang
Susan Zhang
2 years
After ignoring the details in all these "lets-fit-a-cloud-of-points-to-a-single-line" papers (all likely wrong when you really extrapolate), @stephenroller finally convinced me to work through the math in the Chinchilla paper and as expected, this was a doozy. [1/7]
4
46
309
4
9
109
@drjwrae
Jack Rae
7 months
Had a great week in London with part of the Gemini pretraining team 💎 Lots of ideas and build energy. Fun being in London for the general atmosphere, too. Although out on the town I'm turning into the "they don't know" guy...
Tweet media one
6
6
107
@drjwrae
Jack Rae
4 months
Narrator turns to camera, "Nvidia's grip on the tech industry did not vanish"
@timClicks
Tim McNamara
4 months
If this is accurate, then NVIDIA's grip on the tech industry has just vanished. Matrix matrix multiplication (MatMul) is notoriously computationally difficult, which is why it's offloaded to GPUs. If MatMul can be avoided, then it's not just leveling the playing field. It's
121
484
5K
5
3
105
@drjwrae
Jack Rae
1 year
New worst-take just dropped 🎙️💥
@tdietterich @TaliaRinger @mmitchell_ai @ErikWhiting4 @arxiv arXiv is a cancer that promotes the dissemination of junk "science" in a format that is indistinguishable from real publications. And promotes the hectic "can't keep up" + "anything older than 6 months is irrelevant" CS culture. >>
18
9
71
1
5
104
@drjwrae
Jack Rae
1 year
Honestly one thing that I think Dario should get credit for is the unwavering belief in scaling, even before gpt-2. It was a very unpopular thing to double down on within the ml community
@sarahdingwang
Sarah Wang
1 year
O/H at @a16z ’s AI Revolution @AnjneyMidha : “are we going to hit the limits of scaling laws?” @AnthropicAI ’s #DarioAmodei : “Not anytime soon. Right now the most expensive model costs +/- $100m. Next year we will have $1B+ models. By 2025, we may have a $10B model.” 🤯
Tweet media one
3
15
83
6
5
98
@drjwrae
Jack Rae
3 years
Very proud to be sharing some of our work on language models today! It has been a pleasure to work with such a creative and multidisciplinary team 🚀
@GoogleDeepMind
Google DeepMind
3 years
Today we're releasing three new papers on large language models. This work offers a foundation for our future language research, especially in areas that will have a bearing on how models are evaluated and deployed: 1/
Tweet media one
12
311
1K
2
4
98
@drjwrae
Jack Rae
1 year
SF to London: "Your parties involve management consultants larping as creatives at shoreditch house. Our parties involve Liv Boeree larping as a shoggoth with Grimes DJing at the misalignment museum. We are not the same."
@Grimezsz
𝖦𝗋𝗂𝗆𝖾𝗌 ⏳
1 year
Djing at the misalignment museum @MisalignmentM in SF! Learn all about AI. Will have some art in there soon.
173
268
3K
2
3
93
@drjwrae
Jack Rae
1 year
"men will literally buy a laptop with 96gb of integrated memory to run llama 65b on-device... instead of going to therapy"
6
6
84
@drjwrae
Jack Rae
2 years
Classes on deep learning always teach how LSTMs solve the vanishing grad problem. It's a thing you need to mention in job interviews etc. However there's two types of people: those who train an LSTM and see gradients always vanish in practice, and those who keep the myth going 🕯️
5
2
85
@drjwrae
Jack Rae
5 months
One takeaway from this week is that we've now entered the era of video understanding. Reasoning over subtle details in complex scenes (e.g. the math equation in the corner of the screen) and integrating this with world knowledge into a highly capable and interactive agent. It's
@mmmbchang
Michael Chang
5 months
Gemini and I also got a chance to watch the @OpenAI live announcement of gpt4o, using Project Astra! Congrats to the OpenAI team, super impressive work!
56
254
1K
2
10
85
@drjwrae
Jack Rae
4 months
Tweet media one
@_TobiasLee
Lei Li
4 months
Updated results: Gemini 1.5 Flash is rocking, outperforming GPT-4o as well! 🤘
Tweet media one
2
14
114
5
6
82
@drjwrae
Jack Rae
10 months
My most contrarian take is that what is commonly termed alignment (rlhf in particular) is one of the most effective capability boosting techniques. For base models are difficult tools to use, and can fail spuriously with simple tasks. Post-training reveals a lot.
11
5
80
@drjwrae
Jack Rae
2 years
It's fun to use #dalle2 to sketch out places and scenes from the past, and imagine the future. 🧵 This captures a breathtaking view from where I grew up, I would often bike up here. "A mountain biker looks over the Holme Valley from Holme Moss"
Tweet media one
2
6
77
@drjwrae
Jack Rae
4 months
I was curious what my 3yr old would make of Gemini. We chatted with it via voice. Had a conversation about lizards and water bugs. We created a personalized story with him as the main character. "dad tell the robot I want to talk to him tomorrow" So far a good reception
2
1
79
@drjwrae
Jack Rae
2 years
GPT-4 is up! Trained, aligned, evaluated & served by an incredible group of people 💙
4
11
78
@drjwrae
Jack Rae
1 year
I tried repeating this experiment for one of the OOD datasets (kirundinews), switching out gzip with gpt-2 355M. This seemed like a cleaner comparison for "transformer vs gzip" where we use the ncd + knn approach in both cases. In my setup, gzip gets 83% & gpt-2 gets 75% .. 1/3
@LukeGessler
Luke Gessler
1 year
this paper's nuts. for sentence classification on out-of-domain datasets, all neural (Transformer or not) approaches lose to good old kNN on representations generated by.... gzip
Tweet media one
134
892
5K
3
4
78
@drjwrae
Jack Rae
2 years
recently discovered that if I prompt my two year old with "daddy says ___, mama saka ..." then he translates english to latvian. the capability has been silently building up, but it required a good prompt to reveal 🪄✨
3
0
75
@drjwrae
Jack Rae
3 months
Developer update: gemini + code execution now available, 1.5 pro w/ 2m context for all, gemma 2
3
14
73
@drjwrae
Jack Rae
2 years
Hiiii I live in the bay area now 👋🌅
Tweet media one
8
0
75
@drjwrae
Jack Rae
1 year
Pretty wild that simple text compression algorithms demonstrate few-shot learning.
Tweet media one
1
8
72
@drjwrae
Jack Rae
3 months
Agreed, I had a few failed attempts at scaling deep lstms (e.g. 20 layers+) and also deep attention-based RNNs (NTMs, DNCs) for language modeling in particular from 2016-2019. In fact when the transformer paper came out, I replicated it and then tried switching out attention
@_aidan_clark_
Aidan Clark
3 months
Only folks that started large scale DL work after ~GPT-2 think architecture doesn’t matter, the rest saw how much arch work had to happen to get here.
12
14
246
1
1
71
@drjwrae
Jack Rae
4 months
I recently moved from Sausalito to South Bay and one of the things I will miss is cycling over the golden gate bridge to work. I'm saying this from a place of sincerity, Chris Olah isn't manipulating my neural pathways yet, it's a beautiful bridge 🌉
2
0
71
@drjwrae
Jack Rae
6 months
Nice analysis. I think this resolves why approach 3 didn't match 1 & 2. Also I am seeing people share this paper and suggest this is proves scaling laws don't exist. My take on their findings: now 3 out of 3 approaches are in agreement instead of 2 out of 3.
@tamaybes
Tamay Besiroglu
6 months
The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)
Tweet media one
17
138
923
1
6
70
@drjwrae
Jack Rae
2 months
A new iteration of Gemini 1.5 Pro is looking pretty strong on LMSYS, hitting 1300 ELO. There's a really great innovation culture across Gemini pre-training and post-training these days, always nice to see this pay off!
@lmsysorg
lmsys.org
2 months
Exciting News from Chatbot Arena! @GoogleDeepMind 's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive
Tweet media one
84
420
2K
3
3
69
@drjwrae
Jack Rae
2 years
Another big launch day from #OpenAI 💙🚢 #ChatGPT can now browse the internet to get more accurate or current responses, execute code (in a sandbox), search private data stores. Scale isn't all you need folks.
@jacobmenick
Jacob Menick
2 years
ChatGPT 🤝 WebGPT …and more external tools for going beyond text generation. Find out more in our blogpost describing ChatGPT Plugins.
2
3
37
3
5
64
@drjwrae
Jack Rae
1 year
At #ICML2023 if you'd like to chat, compression is all you need etc.
Tweet media one
8
2
65
@drjwrae
Jack Rae
7 months
Claude3 gets the "how many brothers do I have" question 👏🏻 Extra points for larping as a new yorker (not sure why but... enjoyed nonetheless)
Tweet media one
8
3
68
@drjwrae
Jack Rae
6 months
I'd love to watch a documentary on the rise and eventually fall-from-grace of MMLU, narrated by Morgan Freeman
9
5
67
@drjwrae
Jack Rae
1 year
I know "chinchilla trap" is a catchy name but I just want to point out the chinchilla paper gives a recipe for more inference friendly data/param setups via the isoloss contour analysis. Not reading the contents of papers is the mindtrap 🔮
Tweet media one
@lvwerra
Leandro von Werra
1 year
A few weeks back @harmdevries77 released an interesting analysis (go smol, or go home!) of scaling laws which @karpathy coined the Chinchilla trap. A quick thread on when to deviate left or right from the Chinchilla optimal point and the implications.🧵
Tweet media one
1
27
109
2
4
64
@drjwrae
Jack Rae
2 years
So 2022 has been marked by many events for me, but moving to the US with my family has been the biggest. The bay area is still a tractor beam for talent, looking forward to digging in the heels in 2023 towards an incredible advance towards AGI 🎉🥂🫡
3
0
64
@drjwrae
Jack Rae
2 years
Almost all research ideas work when your baseline is weak. A stronger baseline, like a rising tide, pulls a lot of them underwater.
2
1
64
@drjwrae
Jack Rae
1 year
Just want to plug that we (myself, JJ Hunt, Tim Lillicrap et al.) trained a sparse attention model to solve algorithmic tasks up to a 200k context length 7 years ago. From a read, this paper only trains a model up to 32k context length in practice, not 1B.
@AISafetyMemes
AI Notkilleveryoneism Memes ⏸️
1 year
More totally-not-evidence that AGI might be soon: "LongNet is a Transformer variant that can scale sequence length to more than 1 billion tokens" 1 billion tokens is a lifetime of reading for some people Intuition pump: You can hold a few numbers in your working memory, but
Tweet media one
31
67
518
1
1
63
@drjwrae
Jack Rae
2 years
Enjoyed this paper, emergent abilities are one of the most exciting aspects of language model research. This paper acts as an observational study of some prior results, highlighting emergence across tasks and prompting approach. Some open questions... (1/7)
@LiamFedus
William Fedus
2 years
Presenting our survey on emergent abilities in LLMs! What's it about? Certain downstream language tasks exhibit an interesting behavior: eval curves are flat/random up to a certain model scale, until -- poof -- things start to work. 1/7
Tweet media one
20
112
583
2
12
60
@drjwrae
Jack Rae
5 months
Tweet media one
1
6
62
@drjwrae
Jack Rae
2 years
Flamingo demonstrates that language models can be treated as a 'world knowledge' operating system. Installing a visual module on top of a frozen LM, processing images or videos, and the system demonstrates very strong general performance.
@GoogleDeepMind
Google DeepMind
2 years
Introducing Flamingo 🦩: a generalist visual language model that can rapidly adapt its behaviour given just a handful of examples. Out of the box, it's also capable of rich visual dialog. Read more: 1/
Tweet media one
22
350
1K
1
1
60
@drjwrae
Jack Rae
2 years
Isn't this all a rehash of Jurgen's work from 1991?
@madiator
Mahesh Sathiamoorthy
2 years
Offend a ML Researcher in one tweet.
109
10
135
0
3
58
@drjwrae
Jack Rae
2 years
that pivot from new orleans brewery to prominent deep learning framework 🤯🤌💸
Tweet media one
1
0
56
@drjwrae
Jack Rae
2 years
The ML community has been fascinated by speeding up attention with approx approaches. FlashAttention broke the mold by focusing on smart implementation. 6x faster and 10x less memory 🔥. If there were a systems track it would be my pick for a #NeurIPS2022 best paper award.
@realDanFu
Dan Fu
2 years
I'll be at #NeurIPS2022 this week! @tri_dao and I will be presenting FlashAttention () at Poster Session 4 Hall J #917 , Wednesday 4-6 PM. Super excited to talk all things performance, ML+systems, and breaking down scaling bottlenecks!
2
6
51
2
5
54
@drjwrae
Jack Rae
2 years
A memorial to Turing in Manchester. Important to remember the shoulders that we stand on, especially during fast times like these.
Tweet media one
3
4
55
@drjwrae
Jack Rae
8 months
Crazy that OAI must have seen this interview, implemented JEPA and shipped it. That's how fast AI is moving these days 🤵🏻‍♂️-> 📽️🤖
2
2
53
@drjwrae
Jack Rae
10 months
morale = improved
2
0
52
@drjwrae
Jack Rae
7 months
Really cool results from Anthropic! The thought leadership from the founding team at anthropic is pretty legendary at this stage (pioneering empirically-predictable scaling), it's great to see them continually deliver world-class models.
@AnthropicAI
Anthropic
7 months
Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
Tweet media one
570
2K
10K
0
1
52
@drjwrae
Jack Rae
2 months
I find live mode to be a big improvement for voice interactions: graceful with interruptions and much lower latency to get a response.
@GoogleDeepMind
Google DeepMind
2 months
Meet Gemini Live: a new way to have more natural conversations with Gemini. 💬 💡 Brainstorm ideas ❓ Interrupt to ask questions ⏸️ Pause a chat and come back to it Now rolling out in English to Gemini Advanced subscribers on @Android phones →
69
280
1K
4
2
52
@drjwrae
Jack Rae
8 months
Bard powered by gemini pro is in the 1200 elo club on lmsys 🔥 It's a great model, and it's free!
@lmsysorg
lmsys.org
8 months
🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini
Tweet media one
153
620
3K
1
4
52
@drjwrae
Jack Rae
2 years
Some people reached out to remind me that LSTMs are dead. Actually the point I want to drive home isn't about LSTMs. It's to treat the status quo with extreme suspicion, especially in the empirical sciences. Lots of breakthroughs start by testing assumptions vs following the herd
@drjwrae
Jack Rae
2 years
Classes on deep learning always teach how LSTMs solve the vanishing grad problem. It's a thing you need to mention in job interviews etc. However there's two types of people: those who train an LSTM and see gradients always vanish in practice, and those who keep the myth going 🕯️
5
2
85
3
4
51
@drjwrae
Jack Rae
2 years
Evaluating LMs in 2017: "after training on the same set of 2.5k WSJ articles (Mitchell 1999, Mikolov 2010) we get slightly better token probabilities" Evaluating LMs in 2022: "here's a growing list of challenging exams the model passes" Evaluating LMs in 2027 🤔
@pythonprimes
Kenneth Goodman
2 years
#OpenAI 's ChatGPT is ready to become a lawyer, it passed a practice bar exam! Scoring 70% (35/50). Guessing randomly would happen < 0.00000001% of the time
Tweet media one
Tweet media two
127
1K
9K
3
4
50
@drjwrae
Jack Rae
2 years
It seems plausible a vast auto-associative memory over humanity's knowledge could be harnessed as a tool towards many creative associations of existing knowledge, which would still result in unprecedented scientific progress. 4/4
0
4
48
@drjwrae
Jack Rae
1 year
These days you can cook a steak using a gadget that gives you a wandb-like interface to your grilling Maybe soon we'll be able to plot the negative log likelihood of medium-rare with a log-log scale (kaplan et al. 2020), run some sweeps, and get chinchilla-optimal steaks🤯🤌📉
Tweet media one
5
1
45
@drjwrae
Jack Rae
2 months
The kinds of conversations my 4yr old has with gemini 💎
Tweet media one
4
1
47
@drjwrae
Jack Rae
5 months
Nice public service to evals from Scale! Creating a new grade-school math test set comparable to the commonly benchmarked gsm8k, many models drop in accuracy by a significant margin.
@DanHendrycks
Dan Hendrycks
5 months
Mistral and Phi are juicing to get higher benchmark numbers, while GPT, Claude, Gemini, and Llama are not.
Tweet media one
1
44
289
2
6
48
@drjwrae
Jack Rae
7 months
It's a bit crass to speculate over which transformer co-author has the most money or is the most successful. But if I had to guess, I'd say Jensen Huang 🤔
@aidangomez
Aidan Gomez
7 months
It was so great to see almost everyone (we missed you @nikiparmar09 !!) from the Transformer paper again. We still haven't all been in the same room at the same time, but we'll make it happen one day. @lukaszkaiser @kyosu @ashVaswani @ilblackdragon @YesThisIsLion
Tweet media one
17
40
502
3
0
46
@drjwrae
Jack Rae
11 months
Contamination is still a huge confounding factor in modern-day model comparisons. There's a lot of value in hard benchmarks that are truly held-out. Great work 👏👏
@idavidrein
david rein
11 months
🧵Announcing GPQA, a graduate-level “Google-proof” Q&A benchmark designed for scalable oversight! w/ @_julianmichael_ , @sleepinyourhat GPQA is a dataset of *really hard* questions that PhDs with full access to Google can’t answer. Paper:
Tweet media one
23
138
888
1
4
46
@drjwrae
Jack Rae
1 year
@buccocapital arthur is one of the smartest people I've ever worked with, this is a great win for ai in europe
6
0
46
@drjwrae
Jack Rae
1 year
@markchen90 @_smileyball Jensen's inequality: no matter how many H100s you have, someone has a lot more.
2
1
44
@drjwrae
Jack Rae
9 months
San Francisco waking up to 2024 👋🏻🌇
Tweet media one
Tweet media two
2
0
45
@drjwrae
Jack Rae
2 years
The ease of access to powerful LLM weights such as GPT-J and OPT --- which have no real governance of use once released --- makes it easier than ever for bad actors to create social media bots that seem human and relatable at scale. Is this the right risk/benefit tradeoff?
@ykilcher
Yannic Kilcher 🇸🇨
2 years
This is the worst AI ever! I trained a language model on 4chan's /pol/ board and the result is.... more truthful than GPT-3?! See how my bot anonymously posted over 30k posts on 4chan and try it yourself. Watch here (warning: may be offensive):
Tweet media one
35
84
573
6
4
41
@drjwrae
Jack Rae
2 months
Amazing to see AlphaProof get silver medalist performance in this year's IMO. One point away from gold, and a perfect solution to P6 (which only 5 of ~600 contestants solved).
@GoogleDeepMind
Google DeepMind
2 months
We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈 It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵
303
1K
5K
0
5
44
@drjwrae
Jack Rae
2 years
Note the power dynamic in this conversation, a safety researcher has to persuade some random dude of the harm of deploying "gpt 4-chan" bots on a forum *after* the fact.
@ykilcher
Yannic Kilcher 🇸🇨
2 years
I asked this person twice already for an actual, concrete instance of "harm" caused by gpt-4chan, or even a likely one that couldn't be done by e.g. gpt-2 or gpt-j (or a regex for that matter), but I'm being elegantly ignored 🙃
35
19
371
4
8
42
@drjwrae
Jack Rae
2 years
Another implication of this lovely thread which I'd forgotten: we imagine neural networks learning functions and algorithms in their canonical form, but they're probably tuning terms of fourier series to approximate said functions. Thinking with harmonics 🎶
@NeelNanda5
Neel Nanda
2 years
I've spent the past few months exploring @OpenAI 's grokking result through the lens of mechanistic interpretability. I fully reverse engineered the modular addition model, and looked at what it does when training. So what's up with grokking? A 🧵... (1/17)
24
242
2K
4
1
42