Does One Large Model Rule Them All?
New post with
@matei_zaharia
and
@ericschmidt
on the future of the AI ecosystem.
Our key question: does the rise of large, general AI models means the future AI ecosystem is dominated by a single general model? ⬇️
A Survey of Deep Learning for Scientific Discovery
To help facilitate using DL in science, we survey a broad range of deep learning methods, new research results, implementation tips & many links to code/tutorials
Paper
Work with
@ericschmidt
Thread⬇️
Do Vision Transformers See Like Convolutional Neural Networks?
New paper
The successes of Transformers in computer vision prompts a fundamental question: how are they solving these tasks? Do Transformers act like CNNs, or learn very different features?
Thrilled to share that I successfully defended my PhD today!!
This milestone wouldn't have been possible without the support and guidance of my collaborators, mentors, friends and family -- thank you so much!!! Thanks also to everyone who attended my (virtual) defense!
New blog post: "Reflections on my (Machine Learning) PhD Journey"
2020 has marked the end of my six year PhD journey, filled with struggles, success and evolution of personal & research perspectives. In the post I share experiences and lessons learned ⬇️
After almost 6.5 years, I left Google Brain earlier this month.
It's been an incredible journey of gaining insights on many exciting areas of machine learning, and watching the field grow and evolve (so much, so quickly!)
A few months ago I left Google Brain to pursue my next adventure: building
@samaya_AI
! We're excited to bring the latest AI advances to the Knowledge Discovery process!
Many of these trends don't hold. Last week we celebrated
@geoffreyhinton
's retirement, and a few weeks earlier saw
@kkariko
receive the Nobel Prize. Their research took decades to come together, and had enormous impact at a world scale. We'd be much worse off if they'd pivoted!
Enjoyed visiting UC Berkeley’s Machine Learning Club yesterday, where I gave a talk on doing AI research. Slides:
In the past few years I’ve worked with and observed some extremely talented researchers, and these are the trends I’ve noticed:
1. When
We're releasing tutorials on our work using CCA to compare and probe representations in deep neural networks: There are Jupyter notebooks overviewing the technique, descriptions of results, and discussions of open problems. We hope this is useful resource!
Our paper on Understanding Transfer Learning for Medical Imaging has been accepted to
#NeurIPS2019
!!
Preprint:
As a positive datapoint: we had a good reviewing experience, with detailed feedback and mostly useful comments. Thanks to the Program Chairs!
Delighted our new paper "Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics" just won Best Paper at the Continual Learning Workshop at
#ICML2020
!!
Paper:
Oral *tomorrow*, details at:
⬇️ Paper thread
LaMDA: Language Models for Dialogue Applications
Paper:
Blogpost:
Excited to see this paper come out! I enjoyed the weddell seal conversation with LaMDA in our 2021 research summary blogpost!
Do Vision Transformers See Like Convolutional Neural Networks?
New paper
The successes of Transformers in computer vision prompts a fundamental question: how are they solving these tasks? Do Transformers act like CNNs, or learn very different features?
Happy to share our paper on ViTs and CNNs was accepted to
#NeurIPS2021
!
Our other two submissions this year were rejected. I still think they have some great results and am looking forward to improving the papers with the received feedback.
Do Vision Transformers See Like Convolutional Neural Networks?
New paper
The successes of Transformers in computer vision prompts a fundamental question: how are they solving these tasks? Do Transformers act like CNNs, or learn very different features?
Pointer Value Retrieval: A new benchmark for understanding the limits of neural network generalization
We introduce a rich family of tasks with a pointer-value rule, to study mechanisms NN of generalization, from memorization to reasoning.
Paper:
Do Wide and Deep neural networks Learn the Same Things?
Paper:
We study representational properties of neural networks with different depths and widths on CIFAR/ImageNet, with insights on model capacity effects, feature similarity & characteristic errors
Do wide and deep neural networks learn the same thing? In a new paper () with
@maithra_raghu
and
@skornblith
we study how width and depth affect learned representations within and across models trained on CIFAR and ImageNet. 1/6
"Transfusion: Understanding Transfer Learning with Applications to Medical Imaging"
The benefits of transfer are nuanced. With *no* feature reuse and only the pretrained weight scaling, we can regain the effects of transfer. More findings in the paper!
Excited to share the
@icmlconf
2022 Workshop on Knowledge Retrieval and Language Models
Please consider submitting!
We welcome work across topics including LM grounding, open-domain Q&A, bias in retrieval, analyses of scale, transfer and LM phenomena.
New blogpost on citation trends in
@NeurIPSConf
and
@icmlconf
: I scraped paper citations and studied topic trends, citation distributions and academia/industry splits. Releasing scraper, data and a tutorial!
Post:
Code/Data:
Super exciting to see AI achieving a Silver Medal at IMO today, both as an AI researcher and more personally, as someone who spent many years competing in math Olympiads.
Some quick (possibly controversial!) thoughts:
ICLR Town: Pokemon-esque environment to wander around and bump into people, which syncs almost seamlessly with video-chatting capabilities.
What a fun idea for virtual (research) conferences! Thanks
@iclr_conf
organizers!!
#ICLR2020
#iclr
(Uses )
Rapid Learning or Feature Reuse?
New paper:
We analyze MAML (and meta-learning and meta learning more broadly) finding that feature reuse is the critical component in the efficient learning of new tasks -- leading to some algorithmic simplifications!
Rapid Learning or Feature Reuse? Meta-learning algorithms on standard benchmarks have much more feature reuse than rapid learning! This also gives us a way to simplify MAML -- (Almost) No Inner Loop (A)NIL. With Aniruddh Raghu
@maithra_raghu
Samy Bengio.
Excited to see this article by
@QuantaMagazine
overviewing the development of the Vision Transformer, insights on how it works, and promising new applications!
Headed to
#ICML2022
for the first in-person conference since pre-covid! Looking forward to exciting ML discussions with old & new friends
Our workshop on Knowledge Retrieval and Language Models is on Friday 22nd. Do stop by (or tune in online)!
@icmlconf
Excited to attend
#NeurIPS2020
!
My amazing collaborator
@thao_nguyen26
**who is applying to PhD programs this year** will be presenting Do Wide and Deep Neural Nets Learn the Same Things? at
@WiMLworkshop
posters *today* & in Inductive Biases Workshop
Very excited about our latest preprint: , joint work with
@arimorcos
and Samy Bengio. We apply Canonical Correlation (CCA) to study the representational similarity between memorizing and generalizing networks, and also examine the training dynamics of RNNs.
Do different networks learn similar representations to solve the same tasks? How do RNN representations evolve over training? What can representational similarity tell us about generalization? Using CCA,
@arimorcos
and
@maithra_raghu
try to find out!
My entry to
#MachineLearning
(from another field) wouldn't have happened without
#NIPS2014
. But the reason I went, and found a welcoming community was due to
#WiML2014
. Now
#WiML2018
's organizer call is open . Apply by 25/03! The impact can't be overstated.
How do representations evolve as they go through the transformer? How does the Masked Language Model objective affect these compared to Language Models? How much do different tokens change and influence other tokens?
Answers in the paper by
@lena_voita
: !
How does transfer learning for medical imaging affect performance, representations and convergence? Check out the blogpost below and our
#NeurIPS2019
paper for some of the surprising conclusions, new approaches and open questions!
How does transfer learning for medical imaging affect performance, representations and convergence? In a new
#NeurIPS2019
paper, we investigate this across different architectures and datasets, finding some surprising conclusions! Learn more below:
Presenting this at
@iclr_conf
*today*!
Talk and Slides:
Poster Sessions: (i) 10am - 12 Pacific Time, (ii) 1pm - 3pm Pacific Time
Thanks to the organizers for a *fantastic* virtual conference, hope to see you there!
#iclr
#ICLR2020
Rapid Learning or Feature Reuse? Meta-learning algorithms on standard benchmarks have much more feature reuse than rapid learning! This also gives us a way to simplify MAML -- (Almost) No Inner Loop (A)NIL. With Aniruddh Raghu
@maithra_raghu
Samy Bengio.
Motivating the Rules of the Game for Adversarial Example Research:
Fantastic and nuanced position paper by
@jmgilmer
@ryan_p_adams
@goodfellow_ian
on better bridging the gap between research on adversarial examples and realistic ML security challenges.
First foray into Deep RL We test on a game with continuously tuneable difficulty and *known* optimal policy. We study different RL algorithms, supervised learning, and multiagent play.
@jacobandreas
Our paper on using Machine Learning (Direct Uncertainty Prediction) for predicting doctor disagreements and medical second opinions will be at
@icmlconf
next week!
Blog:
Paper:
#icml2019
#DeepLearning
Probably the best we can do is be master of our crafts (know the field well, write good code, collaborate, bring energy & challenge ourselves), and be *brave* --- take risks and try things, even if they're hard, they don't get external validation, and the outcomes are uncertain.
Really enjoyed this discussion with
@jaygshah22
on our work on exploring neural network hidden representations, our recent paper on ViTs and CNNs, and PhD experiences + the ML research landscape!
Video:
In a chat with
@maithra_raghu
, Sr. Research Scientist at
@GoogleAI
about analyzing internal representations of
#DeepLearning
models, comparing vision transformers and CNNs, how she developed her interest in ML, and useful tips for researchers/PhD students!
Had a fantastic week learning about exciting research directions and meeting old and new friends at
#NeurIPS2019
. Thanks to the organizers, volunteers and participants for a wonderful conference!
My talk at
#ML4H
is at (~44 mins), and posters below!
Looking forward to attending
#ICLR2021
next week! We're presenting three papers on questions exploring neural network representations, properties of training and algorithms for helping the learning process.
And at last(!!) Google's response to ChatGPT
Excited to see Google putting some of these advances out, especially after many years seeing first-hand the development of LaMDA and other AI technology.
Delighted to be named one of this year's
#STATWunderkinds
for our work on machine learning in medicine:
Grateful to my collaborators and mentors for their advice and support throughout!
@statnews
On AGI and Self-Improvement
With
@ericschmidt
Questions on AGI are at heart of debate on AI capabilities & risks. To get there AI must learn "on the fly". We outline definitions of AGI, explore this gap, and examine the crucial role of *self-improvement*
It is usually is very hard to predict *true* breakthroughs, which are often *novel* and have high impact. The novelty means that it's a slow process to be recognized as a breakthrough, and it can be a long and lonely road in the meantime
A blogpost I wrote on our paper SVCCA, at
#nips2017
! With Justin Gilmer,
@jasonyo
@jaschasd
-- hoping many people will try it out on their networks with the open source code:
In order to build better and more robust DNN-based systems, one must be able to effectively interpret the models. We introduce a simple and scalable method to both compare and interpret the representations learned by DNNs
Looking forward to speaking about Artificial and Human Intelligence in Healthcare at the
#OReillyAI
conference ! Will discuss developing better AI systems and human expert interactions:
Furthermore, a lot of best research practices are determined by the maturity and state of the field. Right, now, in LLM research, it's important to write good code and have good infra. That was hardly the case earlier in deep learning when we barely had libraries for autodiff!
Bellairs. Day 5
@HazanPrinceton
and myself: double feature on controls+RL. +spotlights:
@maithra_raghu
: meta-learning as rapid feature learning. Raman Arora: dropout, capacity control, and matrix sensing .
@HanieSedghi
: module criticality and generalization! And that is a wrap!🙂
Exploring the AI Landscape:
New blog by
@bclyang
and me! We'll be covering topics in AI from fundamental research to considerations for deployment.
Our first post: is on Digital Health and AI for Health, a longstanding interest!
Excited to be speaking at REWORK's Deep Learning in Healthcare summit!
#reworkHEALTH
I'll be speaking about our work on Direct Uncertainty Prediction for Medical Second Opinions:
An analysis of self-attention reveals some reasons for this difference: very early ViT layers learn to incorporate local and *global* spatial information, unlike CNN early layers with their smaller receptive field size.
@geoffreyhinton
often wrote quick matlab code and even computed gradients by hand! (I was always inspired that even at that level of seniority, he could quickly prototype his own ideas!)
Heading to
#NeurIPS2018
this week! Looking forward to meeting old friends and new! Let me know if you'll be around and want to chat.
@arimorcos
and I will be presenting our paper on the Wednesday poster session, hope to see you there!
🎉🌐 Big news from
@samaya_AI
. We have two shiny new offices in
#London
&
#MountainView
🏢, staffed with an incredible team of brilliant minds💡🚀. Check out our freshly launched website at 🌟
This article, on the lack of an AI moat at Google and OpenAI has been making the rounds:
While it's true that that there is exciting, fast-paced opensource activity in AI, and we may see many current LLMs commoditize, there are still *quality moats*
I'm deeply saddened to hear about the passing of
@SusanWojcicki
We met just a couple months back, and she offered sage advice on running a company, even giving feedback on our new product features. I was struck by her insight, her groundedness and her warmth. Sending her family
AI winning IMO gold would be impressive, but an AI coming up with IMO *questions* would be even more impressive to me.
Can it understand and use different theorems intelligently to come up with hard, creative and truly new questions? Can it do this consistently?
Excited to be speaking at
@reworkdl
deep learning summit today , and Stanford's HealthAI
@ai4healthcare
hackathon tomorrow!
What with the ICML deadline just wrapping up, it's been a busy week 😅
Very interesting work on identifying, understanding and reconstructing the representations learned by neural networks!
(I've also enjoyed
@distillpub
's "Building Blocks of Interpretability" and "Zoom In" which this work builds on)
Excited to share a new paper, Curve Circuits
We reverse engineer a non-trivial 50k+ parameter learned algorithm from the weights of a neural network and use its core ideas to craft an artificial artificial neural network from scratch that reimplements it
Very exciting work by
@matei_zaharia
@alighodsi
and quite literally all of
@databricks
(who created the dataset!)
Lots of interesting followup questions from this --- how well can we use this to bootstrap synthetic data, etc.
Free Dolly! Introducing the first *commercially viable*, open source, instruction-following LLM. Dolly 2.0 is available for commercial applications without having to pay for API access or sharing data with 3rd parties.
I've been enjoying reading
@beenwrekt
's posts on
#ReinforcementLearning
: (new post today!), and it's great to see these insights come together in paper format!
What are the limits to the generalization of large pretrained transformer models?
We find minimal fine-tuning (~0.1% of params) performs as well as training from scratch on a completely new modality!
with
@_kevinlu
,
@adityagrover_
,
@pabbeel
paper:
1/8
Although there were ups and downs, I'm deeply grateful to the many rich experiences during my PhD, and hope this blogpost might be helpful to others on the journey.
Wishing everyone a happy new year!!
With the rapid pace of progress in Machine Learning, it's hard not to feel publication pressure during the PhD. But while writing papers is important, the main research goal of the PhD (to me at least!) is to make you an independent researcher, with a rich research vision
But attending locally is also very important! It is automatically encoded in CNNs, but larger ViTs only learn to do this with enough data (which is needed for their strong performance also.)
Another research update: Final version of our
#nips2017
@NipsConference
paper SVCCA: with accompanying code:(!!) We look at deep learning dynamics and interpret the latent representations. With Justin Gilmer,
@jasonyo
,
@jaschasd
Looking forward to speaking at
@RAAISorg
this Friday! Many exciting ML research areas, from health to privacy to bioengineering. Details on the talks, research and speakers at:
It was awesome having
@samaya_AI
as part of the first batch of AI Grant companies! Grateful to
@natfriedman
and
@danielgross
for creating an energizing community for AI-native products. Consider applying!
Using local and global info allows ViT earlier layers to learn better representations, which are strongly propagated through residual connections. Surprisingly ViT has stronger residual connections than ResNet! These help explain the uniform structure of ViT representations
Looking forward to heading to
#NeurIPS2023
next week! This year marks a decade(!) of attending NeurIPS!
It's remarkable to see how much the field has advanced in 10 years!
These past 2 years of building
@samaya_AI
has been incredible, and we are continuing to grow!
Thanks to
@atJustinChen
and
@statnews
for an in-depth followup discussion on our research work and motivations.
We talk about neural networks, techniques to better understand them, and ways this can inform their design and usage as assistive tools.
We believe not! The future ecosystem will be rich, with set of *Specialized AI Systems* and a few *General AI Models*, with many entities participating.
Specialized AI Systems develop for well-defined, high-value workflows, while General AI models tackle a heavy tail of uses
Sending best wishes to friends, former colleagues and the team at
@OpenAI
. You've made incredible, world changing contributions to AI, and it was sad to see the developments of the past few days. Wishing you the best in navigating these transitions.
Using representational similarity measures, we investigate the internal structure of the two architectures, finding striking differences, with ViT to having a much more uniform representation across all layers
Some good news: the recent ruling forcing international students to choose between leaving the country and safety (taking online classes) has been rescinded:
To me, a big surprise of the PhD was how much it really is a journey, with evolving perspectives (both personal and research) affecting interest in specific problems, research directions and broader subfields. Importantly, it's hard to predict this evolution going in!
Thanks so much to the organizers and
@MITEECS
for hosting the EECS Rising Stars 2018! Entertaining, insightful and inspiring discussion by panelists and speakers on research and academia, and a truly unique opportunity to meet my fantastic peers across all of EECS!
It takes a lot of technical knowledge, effort & iteration to build high quality AI systems for specific, valuable uses.
So while "base models" may commoditize (also discussed in ), there are plenty of chances of moats for focused, high-value AI products.
Totally agree. Public criticism disproportionally impacts the graduate student leading the project, and ML publishing is already very high pressure. Twitter also isn't the right place for a nuanced scientific discussion.
I realize this is seemingly an unpopular opinion, but I can't get onboard with these Twitter criticisms of some of the recent
#ICML2022
best paper awardees. I've been thinking about this all day. A thread... 🧵 1/N
Today is my first day as a CTO (and co-founder) of
@samaya_AI
.
The last 4 years at FAIR have been incredible.
Now I'm looking forward to bringing the latest knowledge discovery technologies to market!
We provide links to incredible resources developed by the community: software packages & high level APIs, freely available DL tutorials, sites with summaries/discussions/code of new research, repositories of DL pipelines & pretrained models, data curation & analysis packages
I've gained a lot from the interesting paper links, tutorials and code releases posted on Twitter. However it's important to recognise the drawbacks of the filter bubble: Particularly poignant: "Algorithms know what you've been, not what you want to be."
New paper Teaching with Commentaries
We introduce commentaries, metalearned information to help neural net training & give insights on learning process, dataset & model representations
Led by
@RaghuAniruddh
& w/
@skornblith
@DavidDuvenaud
@geoffreyhinton
Intriguing invited talk at
#DeepPhenomena
from Chiyuan Zhang on the effect of resetting different layers: Are all layers created equal?
#ICML2019
@icmlconf
Enjoyed speaking at
@RealAAAI
workshop on Learning Network Architectures During Training:
I overviewed our work on techniques to gain insights from neural representations for model & algorithm design.
All talk videos are on the workshop page above! ⬆️
@karpathy
I usually mute all notifications and put on do not disturb. Sometimes takes me a little longer to respond to things, but the mental space is worth it :)