LiamFedus Profile Banner
William Fedus Profile
William Fedus

@LiamFedus

Followers
21K
Following
10K
Media
99
Statuses
996

VP of Post-Training @OpenAI Past: Google Brain

San Francisco, CA
Joined October 2012
Don't wanna be here? Send us removal request.
@LiamFedus
William Fedus
5 months
Happy to release a couple of our reasoning models today (🍓)! At @OpenAI , these new models are becoming a larger contributor to the development of future models. For many of our researchers and engineers, these have replaced a large part of their ChatGPT usage.
57
176
2K
@LiamFedus
William Fedus
9 months
GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot 🙂. Here’s how it’s been doing.
Tweet media one
183
880
5K
@LiamFedus
William Fedus
2 years
Was having so much fun, I forgot to Tweet about it: I joined OpenAI!. I’m continuously awed by the people, the technology, and the ambition. We’re just scratching the surface with ChatGPT. If you’re interested, get in touch!.
37
37
1K
@LiamFedus
William Fedus
3 years
Today we're releasing all Switch Transformer models in T5X/JAX, including the 1.6T param Switch-C and the 395B param Switch-XXL models. Pleased to have these open-sourced!. All thanks to the efforts of James Lee-Thorp, @ada_rob, and @hwchung27.
19
200
1K
@LiamFedus
William Fedus
9 months
Not only is this the best model in the world, but it's available for free in ChatGPT, which has never before been the case for a frontier model.
29
64
915
@LiamFedus
William Fedus
9 months
But the ELO can ultimately become bounded by the difficulty of the prompts (i.e. can’t achieve arbitrarily high win rates on the prompt: “what’s up”). We find on harder prompt sets — and in particular coding — there is an even larger gap: GPT-4o achieves a +100 ELO over our prior
Tweet media one
21
87
737
@LiamFedus
William Fedus
1 year
OpenAI is nothing without its people.
20
35
705
@LiamFedus
William Fedus
3 years
Presenting our survey on emergent abilities in LLMs!. What's it about? Certain downstream language tasks exhibit an interesting behavior: eval curves are flat/random up to a certain model scale, until -- poof -- things start to work. 1/7
Tweet media one
20
109
575
@LiamFedus
William Fedus
5 months
As part of today, we’re also releasing o1-mini. This is an incredibly smart, small model that can also reason before it’s answer. o1-mini allows us at @OpenAI to make high-intelligence widely accessible. On the AIME benchmark, o1-mini re-defines the
Tweet media one
17
61
453
@LiamFedus
William Fedus
9 months
GPT-4o is the first model to exceed human performance on MathVista.
@lupantech
Pan Lu
9 months
🚨 BREAKING: @OpenAI's new GPT-4o model outperforms humans on MathVista for the first time!. 📊 Scores: .Human avg: 60.3 .GPT-4o: 63.8. 📖 Learn more:.OpenAI : MathVista:
Tweet media one
9
50
431
@LiamFedus
William Fedus
4 years
Pleased to share new work!. We design a sparse language model that scales beyond a trillion parameters. These versions are significantly more sample efficient and obtain up to 4-7x speed-ups over popular models like T5-Base, T5-Large, T5-XXL. Preprint:
Tweet media one
11
85
430
@LiamFedus
William Fedus
3 years
Fun following LLM retrieval progress. One recent work is Memorizing Transformers which increases context length up to 262k by an external memory of (keys, values) for that document. - Matches quality of Transformers 5x larger.- Can fine-tune a prior pre-trained models to use it
Tweet media one
3
62
420
@LiamFedus
William Fedus
1 year
❤️.
@sama
Sam Altman
1 year
i love the openai team so much.
10
14
387
@LiamFedus
William Fedus
4 years
The mysterious LambdaNetwork author(!) finally revealed. Lambdas are an efficient alternative to self-attention. The idea in the terms of attention: lambdas are matrices that summarize a context. These matrices apply to query vectors to model data.
Tweet media one
3
72
374
@LiamFedus
William Fedus
9 months
GPT-4o is now up on openai/simple-evals and is setting new SOTA on MMLU, MATH, GPQA, HumanEval. Especially excited for free ChatGPT users. This is a step change over 3.5.
11
57
371
@LiamFedus
William Fedus
2 years
Our survey on sparse expert models describes the advances over the last decade, discusses some difficulties, and presents our view on promising future areas. Sparsity has been a fun area to work on the last two years. Excited for the models to come.
Tweet media one
7
71
348
@LiamFedus
William Fedus
9 months
A few years ago, real-time voice translation felt like an incredible piece of tech to design + build. Now, it simply falls out of multimodal training.
@karpathy
Andrej Karpathy
9 months
They are releasing a combined text-audio-vision model that processes all three modalities in one single neural network, which can then do real-time voice translation as a special case afterthought, if you ask it to. (fixed it for you).
3
22
353
@LiamFedus
William Fedus
1 year
Deep respect for the OpenAI team, who are pulling back-to-back all-nighters negotiating for the company and the employees. Seeing their tenacity and seriousness (but also humor) in dealing with this insanity reveals how they even built this company in the first place.
6
13
330
@LiamFedus
William Fedus
1 year
Corporate Governance course enrollment up +7000%.
4
17
323
@LiamFedus
William Fedus
10 months
We've been tinkering to make our models smarter.
@xu3kev
Wen-Ding Li @ NeurIPS
10 months
A big jump in math/reasoning for our coding benchmark 🤯
Tweet media one
20
21
298
@LiamFedus
William Fedus
5 months
This plot is a nice visual representation of a paradigm shift.
@lmarena_ai
lmarena.ai (formerly lmsys.org)
5 months
No more waiting. o1's is officially on Chatbot Arena!. We tested o1-preview and mini with 6K+ community votes. 🥇o1-preview: #1 across the board, especially in Math, Hard Prompts, and Coding. A huge leap in technical performance!.🥈o1-mini: #1 in technical areas, #2 overall.
Tweet media one
13
25
295
@LiamFedus
William Fedus
3 years
Proud to release our last year of work on sparse expert models! . This started over a year ago when we found Switch Transformers pre-trained well, but some variants were unstable or fine-tuned poorly. The new SOTA ST-MoE-32B addresses this.
Tweet media one
6
62
282
@LiamFedus
William Fedus
9 months
GPT-4o shifts the world in an important way. I think this potentially creates another “ChatGPT moment” for the rest of the world. Everyone can now access, for free, the best public model that is also intelligently tokenized for non-English languages. It establishes a new.
@emollick
Ethan Mollick
9 months
Biggest actual implication of today's OpenAI announcement is very practical: the top barrier I see when I give talks on using AI is that people don't pay for AI to start, and they use GPT-3.5 (the free model) and are disappointed. Now everyone around the world gets GPT-4 free.
9
15
273
@LiamFedus
William Fedus
2 years
After 6 trillion reminders -- the world gets it -- it's a "large language model trained by OpenAI" 🙃. @tszzl removed this behavior in our next model release to free your custom instructions for more interesting requests. (DM us if it's still a nuisance!).
@JeremyNguyenPhD
Jeremy Nguyen ✍🏼 🚢
2 years
With the new Custom Instructions: this works in ChatGPT. (Results in the comments).
Tweet media one
16
16
268
@LiamFedus
William Fedus
9 months
Almost two years after the pre-training of GPT-4 — the field might expect us to be in a strongly diminishing return regime — but we continue to find significant advances and recognize the value of better post-training. It’s not the cherry atop the cake.
@alexandr_wang
Alexandr Wang
9 months
1/ Some thoughts on the recent OpenAI and Google announcements, and what it indicates about what's next in AI. Hint: post-training is REALLY important. THREAD.
6
26
255
@LiamFedus
William Fedus
7 years
Two recent algorithms, World Models by @hardmaru and Schmidhuber (2018) and Curiosity by @pathak2206 et al. (2018), have approx. equal performance when a learned module (RNN and embedding in ICM, respectively) is instead left as a fixed randomly initialized module. @_brohrer_.
8
69
241
@LiamFedus
William Fedus
9 days
Reasoning has begun to deliver us better models like o1, o3, o3-mini, but the genuine unlock will be agents. Reasoning gives us better planning, tool-use, error recovery and I’m thrilled for this year. 2025 is the year of agents. Congrats team!!.
@OpenAI
OpenAI
9 days
Introduction to Operator & Agents.
7
12
247
@LiamFedus
William Fedus
1 year
Releasing voice for ChatGPT for all free users today! Hope you enjoy. (What board coup?).
@OpenAI
OpenAI
1 year
ChatGPT with voice is now available to all free users. Download the app on your phone and tap the headphones icon to start a conversation. Sound on 🔊
16
10
217
@LiamFedus
William Fedus
6 years
Fresh on arXiv: Hyperbolic Discounting and Learning Over Multiple Horizons . We question the RL paradigm of discounting by a single discount factor, gamma. Modeling many Q-values allows you to hyperbolically-discount and is also a great auxiliary task.
6
44
213
@LiamFedus
William Fedus
3 years
A brief 4 year LLM history:.enc-only (BERT) -> enc-dec (T5) -> dec-only (GPT). As of 2022, the most compute is in decoder models -- what research supports this? Is this the best approach?. Enc-dec: T5, AlphaCode, Switch, ST-MoE, RETRO.Dec-only: GPT-{1,2,3}, {🐭, 🐹}, PaLM.
9
34
212
@LiamFedus
William Fedus
4 years
In Revisiting ResNets, we disentangle the impact of (1) architecture, the (2) training methodology, and the (3) scaling strategy. In a surprise, when we refresh ResNets (introduced in 2015), they still rival state-of-the-art!.
Tweet media one
2
30
211
@LiamFedus
William Fedus
3 years
Introducing our most recent language model: 🌴. My favorite part of scaling these models is how predictable upstream scaling may hide unpredictable, significant jumps in downstream capabilities (e.g. reasoning).
@GoogleAI
Google AI
3 years
Introducing the 540 billion parameter Pathways Language Model. Trained on two Cloud #TPU v4 pods, it achieves state-of-the-art performance on benchmarks and shows exciting capabilities like mathematical reasoning, code writing, and even explaining jokes.
4
28
211
@LiamFedus
William Fedus
3 years
It takes an army. Today we're delighted to release BigBench🪑: 200+ language tasks crowd-sourced from *442* authors spanning 132 institutions, plus, our analysis. BigBench is the result of the ingenuity + cleverness of the community.
Tweet media one
2
52
196
@LiamFedus
William Fedus
1 year
@pmddomingos @tszzl I can’t speak to his ML theory, but @tszzl is one of the most important contributors to ChatGPT.
5
1
184
@LiamFedus
William Fedus
9 months
The speed and intelligence (plus reduced laziness) of GPT-4o enable more interesting multi-turn use-cases such as acting as the CPU of an LLM OS and agentic tasks.
@ashpreetbedi
Ashpreet Bedi
9 months
Building the LLM OS by @karpathy with gpt-4o. The speed + quality is 🔥🤯. code:
5
22
182
@LiamFedus
William Fedus
2 years
Neat new modular neural net: Branch-Train-Merge. Unlike usual sparse models, this splits an LLM (branch), trains experts on different datasets (train), then collapses to a single model (merge). Quality on par with models with 2.5x more compute!.
Tweet media one
3
25
185
@LiamFedus
William Fedus
1 year
We’re back.
@OpenAI
OpenAI
1 year
We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.
3
2
178
@LiamFedus
William Fedus
3 years
An improved Switch Transformer version is up on JMLR!. We were seriously impressed by the JMLR review quality and I personally love being untethered to the conference-cycle (publish when ready). Thanks reviewers and to our editor, @alexandersclark!.
Tweet media one
3
23
169
@LiamFedus
William Fedus
5 years
Better late than never. Dopamine Tensorflow code now available for RL agents that learn over multiple time-horizons and can model alternative discount functions. @carlesgelada @marcgbellemare @hugo_larochelle.
0
30
156
@LiamFedus
William Fedus
5 months
In addition to o1-preview and o1-mini, our 4o models keep getting better! New models in LMSys and in ChatGPT.
@lmarena_ai
lmarena.ai (formerly lmsys.org)
5 months
Chatbot Arena update🔥. We've been testing the latest ChatGPT-4o (20240903) over the past 2 weeks, and the results show significant improvements across the board:. - Overall: 1316 -> 1336.- Overall (style control): 1290 -> 1300.- Hard Prompts: 1314 -> 1335.- Multi-turn: 1346 ->
Tweet media one
5
10
157
@LiamFedus
William Fedus
10 months
Our improved model in the arena at lmsys and we’ve rolled out to ChatGPT users today — stay tuned for better versions to come.
@lmarena_ai
lmarena.ai (formerly lmsys.org)
10 months
🔥Exciting news -- GPT-4-Turbo has just reclaimed the No. 1 spot on the Arena leaderboard again! Woah!. We collect over 8K user votes from diverse domains and observe its strong coding & reasoning capability over others. Hats off to @OpenAI for this incredible launch!. To offer
Tweet media one
10
14
152
@LiamFedus
William Fedus
5 months
Not exactly a well-timed article.
Tweet media one
9
1
130
@LiamFedus
William Fedus
3 years
Insane generations from Parti! A bit more fun looking at samples across model sizes, than at log-log scaling curves.
Tweet media one
2
19
125
@LiamFedus
William Fedus
1 year
Happy birthday, ChatGPT. The world is different one year later. Humbling to predict product and consumer trends. The outside world probably thinks we're all joking, but this actually was our “low-key research preview” as we geared up to align + release GPT-4. It’s hard to.
@sama
Sam Altman
1 year
a year ago tonight we were probably just sitting around the office putting the finishing touches on chatgpt before the next morning’s launch. what a year it’s been….
4
3
116
@LiamFedus
William Fedus
9 months
+1. I’ve been incredibly fortunate that @barret_zoph has been my closest collaborator for many years now. In addition to doing crazy cool things, he pushes others to do them too. MJ-vibes.
@srush_nlp
Sasha Rush
9 months
Barret Zoph, who presented the OpenAI demo, has done many crazy cool things in the last decade. One that remember is that as an undergrad in 2015, he wrote his own neural translation system. in CUDA. Remember finding that pretty impressive at the time.
3
3
118
@LiamFedus
William Fedus
6 years
Our unsupervised graph algo: Mutual info maximization learns strong node representations that on node classification tasks at times exceeds *supervised* algos! . Fun work with very talented collaborators @PetarV_93, @williamleif P. Liò, Bengio, Hjelm
Tweet media one
3
30
114
@LiamFedus
William Fedus
1 month
I have yet to find a well-defined task that cannot be optimized by these models. Eval improvement like ARC AGI showcase this dynamic.
@ai_for_success
AshutoshShrivastava
1 month
So we went from 0 to 87% in 5 years in ARC AGI score. There is no wall it seems. GPT-2 (2019): 0%.GPT-3 (2020): 0%.GPT-4 (2023): 2%.GPT-4o (2024): 5%.o1-preview (2024): 21%.o1 high (2024): 32%.o1 Pro (2024): ~50%.o3 tuned low (2024): 76%.o3 tuned high (2024): 87%
Tweet media one
7
9
117
@LiamFedus
William Fedus
2 years
GPT-4 is released today! . This model has become an integral part of my workflow since joining OpenAI (coding, learning, etc.). Try it on #ChatGPT Plus and tell us what you think! .
6
17
113
@LiamFedus
William Fedus
1 year
❤️.
@ilyasut
Ilya Sutskever
1 year
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
2
3
105
@LiamFedus
William Fedus
5 months
o1-mini limits now expanded by 7x!.
@OpenAI
OpenAI
5 months
We appreciate your excitement for OpenAI o1 and we want you to be able to use it more. For Plus and Team users, we have increased rate limits for o1-mini by 7x, from 50 messages per week to 50 messages per day. o1-preview is more expensive to serve, so we’ve increased the rate.
3
5
103
@LiamFedus
William Fedus
5 years
In 'Benchmarking Bonus-Based Exploration Methods in ALE' we find that when standardizing training duration, architecture, model capacity - new methods do not clearly improve over prior baselines. Work led by @aalitaiga which received ICML exp. workshop 2019 best paper award!
Tweet media one
2
20
98
@LiamFedus
William Fedus
1 year
Congrats to the team and many of my past colleagues at Google! Another step forward in AI.
@JeffDean
Jeff Dean
1 year
I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks,
Tweet media one
Tweet media two
3
0
82
@LiamFedus
William Fedus
7 months
GPT-4o mini is out today! There is a new frontier cramming ever more intelligence + capability into ever tinier models.
@OpenAIDevs
OpenAI Developers
7 months
Introducing GPT-4o mini! It’s our most intelligent and affordable small model, available today in the API. GPT-4o mini is significantly smarter and cheaper than GPT-3.5 Turbo.
Tweet media one
1
6
85
@LiamFedus
William Fedus
3 years
Don’t dismiss old methods too quickly. ResNets are still strong baselines when augmented with modern improvements. Thanks for the shout-out, @OriolVinyalsML.
@OriolVinyalsML
Oriol Vinyals
3 years
The Deep Learning Devil is in the Details. I love this work from @IrwanBello and collaborators in which they show how training "tricks" improve ~3% absolute accuracy on ImageNet, progress equivalent to years of developments and research!. Paper:
Tweet media one
0
11
85
@LiamFedus
William Fedus
9 months
Advanced models with full context that sit at the level of the user is a key paradigm shift. Future workflows won’t include copy-and-pasting to The ChatGPT MacOS app is a first step — access with the Option + Space shortcut and begin work.
@dr_cintas
Alvaro Cintas
9 months
The new ChatGPT Mac app is amazing. I got a fully working Breakout game code using a shortcut to pull up the app with GPT-4o and a simple screenshot of my screen. So many use cases and faster workflows.
4
8
84
@LiamFedus
William Fedus
2 years
*Chiming in on the GPT-4 is getting dumber meme*. The code evals in the paper penalized GPT-4 for markdown (```) which is used for nice display in ChatGPT UI, even when the model got _better_. This eval instead surfaced an opportunity for improved instruction following.
@Si_Boehm
Simon Boehm
2 years
@matei_zaharia @james_y_zou June GPT-4 started surrounding code with ```python markdown, which you didn't strip. I forked your code, removed that markdown, and re-submitted the output to Leetcode for judging. Now the June version does significantly better than the march version of GPT-4.
Tweet media one
2
10
83
@LiamFedus
William Fedus
3 years
Significant progress in Ahn et al, 2022. Definitely got crowded out by the release of DALL-E2/PaLM that week. A compelling prospect is that this robotic-interaction data might improve LLM's common sense, understanding of physics, and general helpfulness.
Tweet media one
4
9
80
@LiamFedus
William Fedus
8 months
New blog post from Jason on evals. With insufficiently good evals, progress is blocked. Some ideas were only later found to be good once our evals improved. Great evals are especially key in post-training when there is no singular metric that can be hill-climbed.
@_jasonwei
Jason Wei
8 months
New blog post where I discuss what makes an language model evaluation successful, and the "seven sins" that make hinder an eval from gaining traction in the community: Had fun presenting this at Stanford's NLP Seminar yesterday!
Tweet media one
0
10
77
@LiamFedus
William Fedus
5 years
The interplay of RL algorithms with experience replay is poorly understood. We study this and uncover a relationship between n-step returns and replay capacity. ICML '20 paper: Prajit R.*, @agarwl_ , Yoshua, @hugo_larochelle , Mark R., @wwdabney
Tweet media one
1
14
75
@LiamFedus
William Fedus
2 years
We've released a conversational version of GPT. Talk to it here!.
@gdb
Greg Brockman
2 years
Just launched ChatGPT, our new AI system which is optimized for dialogue: Try it out here:
6
2
69
@LiamFedus
William Fedus
5 years
In our recent paper we connect RL issues of poor sample complexity and exploration difficulties to catastrophic interference (within an environment). @its_dibya* @jdmartin86, @marcgbellemare, Yoshua, @hugo_larochelle .*joint-1st author.
4
19
72
@LiamFedus
William Fedus
5 years
Looking forward for workshops tomorrow - my favorite part of #NeurIPS. @its_dibya and I will speak about the MEMENTO observation in Atari agents tomorrow at 4:15 in the BARL workshop. Come see us at the poster!. @jdmartin86, @marcgbellemare, yoshuawonttweet, @hugo_larochelle
Tweet media one
0
17
67
@LiamFedus
William Fedus
5 years
Enjoyed the kNN-LM paper by Khandelwal and Levy et al. (2019). Using an interpolated non-parametric and parametric model, they set a SOTA on Wikitext, reducing perplexity by 2.9 points. This approach helps with predicting long-tail language predictions.
Tweet media one
1
24
67
@LiamFedus
William Fedus
1 year
Today we're rolling an experiment to give ChatGPT memory -- the ability to remember important pieces across conversations!.
@OpenAI
OpenAI
1 year
We’re testing ChatGPT's ability to remember things you discuss to make future chats more helpful. This feature is being rolled out to a small portion of Free and Plus users, and it's easy to turn on or off.
2
6
68
@LiamFedus
William Fedus
9 months
All three gpt2-chatbot results from LMSys and all are in the 1300 club. The gap from the two most recent versions (including the 4o variant) is especially evident on coding.
@lmarena_ai
lmarena.ai (formerly lmsys.org)
9 months
Breaking news — gpt2-chatbots result is now out!. gpt2-chatbots have just surged to the top, surpassing all the models by a significant gap (~50 Elo). It has become the strongest model ever in the Arena!. With improvement across all boards, especially reasoning & coding
Tweet media one
1
3
66
@LiamFedus
William Fedus
3 years
The muTransfer work of @TheGregYang et al., 2022 is a refreshing mix of theory, intuition, and great empirical results!. With this approach (dif init and per-layer lr), the activations remain constant as a fn of width while training (bottom vs. top row).
Tweet media one
1
16
66
@LiamFedus
William Fedus
2 months
A new research opportunity at OpenAI to improve ChatGPT with user feedback! DM Andrew for more info.
@kondrich2
Andrew
2 months
We value your perspective, help define ChatGPT’s future!. I’m hiring Research Engineers & Scientists to shape ChatGPT’s responses into personalized and reliable interactions for the 1B messages users send every day. Responsibilities include developing Post-Training and RLHF
Tweet media one
3
8
66
@LiamFedus
William Fedus
9 months
ChatGPT will now start to remember across threads! An important step towards increasing the usefulness of these models.
@OpenAI
OpenAI
9 months
Memory is now available to all ChatGPT Plus users. Using Memory is easy: just start a new chat and tell ChatGPT anything you’d like it to remember. Memory can be turned on or off in settings and is not currently available in Europe or Korea. Team, Enterprise, and GPTs to come.
3
5
61
@LiamFedus
William Fedus
1 year
LLMs are a new primitive of programming. Hard logic (if, else) can be replaced/augmented with soft intelligent judgment (prompt: “When you see this…”). Devs also push the creative frontier and explore a far larger surface area than we can alone. Register for our Nov 6th dev.
1
4
63
@LiamFedus
William Fedus
6 years
En route to Montreal for RLDM. Speaking tomorrow at 11:40pm about non-exponential time-preferences in RL agents and also how learning Q-values over multiple horizons is an effective auxiliary task - come chat with us!. @carlesgelada yoshua @hugo_larochelle @marcgbellemare
Tweet media one
1
8
60
@LiamFedus
William Fedus
5 years
Replay buffers in deep RL seem rather lacking. Store a relatively short amount of experience, randomly sample (maybe prioritize based on TD-error), throw away old experience regardless of value, train very few times on any transition. This blog post motivates new directions!.
@GoogleDeepMind
Google DeepMind
5 years
In our new blog post, we review how brains replay experiences to strengthen memories, and how researchers use the same principle to train better AI systems:.
1
7
59
@LiamFedus
William Fedus
7 years
Excited that both our papers were accepted to ICLR 2018!! Thanks to this group of unbelievably talented collaborators: @elaClaudia, @goodfellow_ian, Andrew Dai, @shakir_za, @balajiln.
3
7
60
@LiamFedus
William Fedus
6 years
@TrevMcKendrick Now if only a unapologetic Bay Area upstart would ignore all local zoning laws and start building sufficient housing.
1
4
52
@LiamFedus
William Fedus
6 years
And loading in ML datasets finally enters the modern era. Check it out and thanks @rsepassi!.
@TensorFlow
TensorFlow
6 years
The brand new TensorFlow Datasets, make it super easy to load a variety of public datasets into #TensorFlow programs in both tf.​data and NumPy format!. Read @rsepassi’s article to learn more ↓
1
13
54
@LiamFedus
William Fedus
6 years
Clear @shortscienceorg paper summary by @decodyng on Language GANs Falling Short.
0
15
56
@LiamFedus
William Fedus
1 year
Cool work just released from Ironclad Research: Rivet, a visual programming environment for building AI agents/language model programs. As many folks know, designing and iterating on agents is complex and error-prone: this tooling should help significantly. Check out the demo!.
@gogwilt
Cai GoGwilt
1 year
🚀 Today, we've open-sourced Rivet, a game-changer for #AI agents!. We just launched our first AI agent at @ironclad_inc. But I almost gave up on it, until Andy build v0 of Rivet and showed me what was possible. Let us know what you think!.
2
7
48
@LiamFedus
William Fedus
6 years
“The Seven Habits of Highly Effective Neural Networks” - 👌 paper title from Prajit at lunch. Crowd-sourcing the rest:.1.
6
5
54
@LiamFedus
William Fedus
7 years
MILA's era of tropical-climate GPU-filled offices is ending. New office 2018!.
0
13
52
@LiamFedus
William Fedus
4 years
Tutorial implementation of Switch Transformer in @PyTorch!.
@labmlai
labml.ai
4 years
Minimalistic single GPU @PyTorch implementation (with notes) of Switch Transformer. Annotated code: Github: Paper: Colab: @LiamFedus @barret_zoph @GoogleAI
Tweet media one
0
6
53
@LiamFedus
William Fedus
10 months
Smarter models up in prod today — send us your feedback!.
@OpenAI
OpenAI
10 months
Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source:
Tweet media one
5
1
52
@LiamFedus
William Fedus
3 years
Excited to see a sparse model at the top of the SuperGLUE leaderboard this morning (SS-MoE). We started fine-tuning sparse-expert models in Switch Transformer (, but encountered a few problems. Happy to have remedied many of these issues. Paper to follow.
Tweet media one
1
9
50
@LiamFedus
William Fedus
2 years
Pleased to have effective, cheap models out through our API. We can't wait to see what the world builds on them. :).
@miramurati
Mira Murati
2 years
Just made ChatGPT available on our API! Our incredible team of builders has delivered 10x cheaper model than our existing GPT-3.5 models, through system-wide optimizations, making it easier to power as many applications as possible.
0
0
49
@LiamFedus
William Fedus
10 months
Adding a minimal eval library showing performance on our new model which also quantifies "majorly improved".
@OpenAI
OpenAI
10 months
Majorly improved GPT-4 Turbo model available now in the API and rolling out in ChatGPT.
1
2
50
@LiamFedus
William Fedus
6 years
‘The estimated weight of all insects on Earth combined — is dropping by an estimated 2.5 percent every year.’. I’m continually shocked by the pace of (largely) human impact on our environment.
@NandoDF
Nando de Freitas
6 years
We have a new global tally of the insect apocalypse. It’s alarming.
2
15
47
@LiamFedus
William Fedus
1 year
For folks outside the LLM bubble, a blank text box with an AI is mystifying. Small quality-of-life improvements like prompt examples (e.g. plan a tour, explain this code, etc.) can bridge this gap.
Tweet media one
@OpenAI
OpenAI
2 years
We’re rolling out a bunch of small updates to improve the ChatGPT experience. Shipping over the next week:. 1. Prompt examples: A blank page can be intimidating. At the beginning of a new chat, you’ll now see examples to help you get started. 2. Suggested replies: Go deeper with.
1
6
49
@LiamFedus
William Fedus
9 months
Upgraded and augmented version of MMLU — MMLU-Pro, which has 10 options and some harder STEM questions. There’s no perfect static eval, only an iterative process of ever-better evals as our model capabilities progress and the use-cases shift.
@WenhuChen
Wenhu Chen
9 months
Tired of MMLU? The current models already hit the ceiling? It's time to upgrade MMLU!. Introducing our new benchmark MMLU-Pro, a more robust and challenging massive multi-task language understanding benchmark with 12K questions. What's New?.1. MMLU-Pro uses 10 options instead of
Tweet media one
2
4
45
@LiamFedus
William Fedus
6 years
Transformer killed the RNN, now encroaching into CNN’s territory. Next the radio star? Nice work, @IrwanBello and team. Optimistic and excited to follow future work of transformers in image domains.
@quocleix
Quoc Le
6 years
Exciting new work on replacing convolutions with self-attention for vision. Our paper shows that full attention is good, but loses a few percents in accuracy. And a middle ground that combines convolutions and self-attention is better. Link:
Tweet media one
1
8
45
@LiamFedus
William Fedus
9 months
New work from Scale where they created a GSM8k-equivalent difficulty eval from scratch. The resulting performance gap surfaces some model families have data contamination issues and may not be as strong as the public eval would indicate.
@alexandr_wang
Alexandr Wang
9 months
How overfit are popular LLMs on public benchmarks?. New research out of @scale_ai SEAL to answer this:. - produced a new eval GSM1k.- evaluated public LLMs for overfitting on GSM8k. VERDICT: Mistral & Phi are overfitting benchmarks, while GPT, Claude, Gemini, and Llama are not.
Tweet media one
1
2
46
@LiamFedus
William Fedus
3 years
Barret and I share our perspectives on sparse expert models we've worked on (Switch, GLaM, ST-MOE, etc.). Hope this and other talks help provide high-level context for those curious about these classes of models. Thanks for having us, Yannic!.
@ykilcher
Yannic Kilcher 🇸🇨
3 years
New interview with Barret Zoph (@barret_zoph) and William Fedus (@LiamFedus) of Google Brain on Sparse Expert Models. We talk about Switch Transformers, GLAM, information routing, distributed systems, and how to scale to TRILLIONS of parameters. Watch now:.
Tweet media one
0
4
47
@LiamFedus
William Fedus
9 months
In 2024 "chatbot" became cool again.
@sama
Sam Altman
9 months
im-a-good-gpt2-chatbot.
2
0
44
@LiamFedus
William Fedus
6 years
I think this SF -> NOLA flight may hold it’s own poster session @iclr2019.
1
1
43
@LiamFedus
William Fedus
3 years
This project was expertly led by @_jasonwei! It also drew upon a deep bench of stellar collaborators across four institutions. Fun getting to think about these problems. 7/7
Tweet media one
1
8
43
@LiamFedus
William Fedus
3 years
Cool progress for extremely long context modeling!. Paper:
0
4
40
@LiamFedus
William Fedus
1 year
ChatGPT has excelled in knowledge work — we’re thrilled to release our enterprise offering. It’s become indispensable to accelerate work within OpenAI and now it will across more professional settings.
@OpenAI
OpenAI
1 year
Introducing ChatGPT Enterprise: enterprise-grade security, unlimited high-speed GPT-4 access, extended context windows, and much more. We’ll be onboarding as many enterprises as possible over the next few weeks. Learn more:
Tweet media one
3
1
36
@LiamFedus
William Fedus
7 years
Our @NipsConference '17 workshop paper finally on arxiv: . TLDR: New intrinsic reward for RL agents to learn independently controllable features of the environment by interacting with it. The objective is a lower bound on causal directed information.
2
8
39
@LiamFedus
William Fedus
6 years
In past GAN research, I’ve also found that 0-norm gradient penalty worked at least as effectively as 1-norm gradient penalty. #ICLR2019 paper analyzing why:.
@_davemacdonald
lightwater.eth💡💧
6 years
Here's a new #GAN paper accepted to #ICLR2019 that shows why gradient penalties are important for healthy training and generalization, and shows how a 0-norm gradient penalty is better. Great explanations and insights - worth reading.
Tweet media one
0
4
38
@LiamFedus
William Fedus
3 years
We've observed emergent abilities across a wide range of models families (LaMDA, GPT, Gopher, Chinchilla, PaLM): the performance of few-shot inference is random for a while, but then rapidly improves at a certain scale. Similar results were seen in Ganguli et al., 2022. 3/7
Tweet media one
2
4
39
@LiamFedus
William Fedus
3 years
MoE models are tough to grok due to the dependence of the computation on each batch of data. This is further complicated when using two data modalities. Fun to read a thorough study in LIMoE!. Below: MoE finds it helpful to learn a "door handle" expert 😅
Tweet media one
1
7
39
@LiamFedus
William Fedus
3 years
Big fan Chinchilla. It finds our language models are bloated and compute-inefficient. I love the attention to detail (i.e. lr decay schedule) backed by a SOTA model. And as we start training smaller models for longer, this will increase the emphasis on data quantity and quality.
1
1
40
@LiamFedus
William Fedus
7 years
Fully differentiable architecture search! . Liu et al. compute a softmax over operators and setup an approx alternating gradient descent optimization of weights and architectures. Excited about the continued improvements in architecture search efficiency.
@OriolVinyalsML
Oriol Vinyals
7 years
Welcome back, gradients! This method is orders of magnitude faster than state-of-the-art non-differentiable techniques. DARTS: Differentiable Architecture Search by Hanxiao Liu, Karen Simonyan, and Yiming Yang. Paper: Code:
0
2
38