Gabriel Ilharco Profile Banner
Gabriel Ilharco Profile
Gabriel Ilharco

@gabriel_ilharco

Followers
4,141
Following
1,270
Media
59
Statuses
451

Building cool things @xAI . Prev. PhD at UW, Google Research

Palo Alto, CA
Joined September 2015
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@gabriel_ilharco
Gabriel Ilharco
2 years
Introducing task vectors! A new way to steer models by doing arithmetic with model weights. Subtract to make models forget, add to make them learn 📜: 🖥️:
Tweet media one
20
273
1K
@gabriel_ilharco
Gabriel Ilharco
1 year
Introducing DataComp, a new benchmark for multimodal datasets! We release 12.8B image-text pairs, 300+ experiments and a 1.4B subset that outcompetes compute-matched CLIP runs from OpenAI & LAION 📜 🖥️ 🌐
Tweet media one
8
190
776
@gabriel_ilharco
Gabriel Ilharco
1 year
Today we are releasing a CLIP ViT-L/14 model with 79.2% zero-shot accuracy on ImageNet. Our model outperforms OpenAI's CLIP by a large margin, and outperforms even bigger models (ViT-g/14) trained on LAION-2B Check it out at !
17
141
739
@gabriel_ilharco
Gabriel Ilharco
9 months
CLIP models have become a lot better since 2021
Tweet media one
9
87
649
@gabriel_ilharco
Gabriel Ilharco
5 months
As good a time as any to say I recently graduated and joined @xAI . It’s going to be an exciting year, buckle up =)
@grok
Grok
5 months
@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░
2K
2K
16K
31
50
480
@gabriel_ilharco
Gabriel Ilharco
2 years
Fine-tuning can make models like CLIP less robust. A simple idea is highly effective at mitigating that: averaging zero-shot and fine-tuned models. Check out our work introducing WiSE-FT, just accepted to CVPR! Paper: Code:
Tweet media one
4
96
530
@gabriel_ilharco
Gabriel Ilharco
4 months
Grok is going multimodal! It’s incredible to see how fast a small, focused team can move. Kudos to the amazing team @xAI that made this possible
Tweet media one
15
67
374
@gabriel_ilharco
Gabriel Ilharco
3 years
We are releasing an open-source training implementation of OpenAI’s CLIP!📎 CLIP models learn from language supervision, and are capable of strong zero-shot performance at various vision tasks () Our reproduction can be found at
4
77
334
@gabriel_ilharco
Gabriel Ilharco
3 years
Instead of a single neural network, why not train lines, curves and simplexes in parameter space? Fantastic work by @Mitchnw et al. exploring how this idea can lead to more accurate and robust models:
Tweet media one
2
48
293
@gabriel_ilharco
Gabriel Ilharco
9 months
Another breakthrough in CLIP models, powered by better datasets. Great job @Vaishaal , @AlexFang26 and team! Paper:
Tweet media one
3
44
275
@gabriel_ilharco
Gabriel Ilharco
4 years
I've been seeing a lot of talk around the recent Vision Transformer (ViT) paper, so I thought I'd highlight some of my favorite previous work on self-attention and transformers in computer vision! Link to ViT: (thread 👇)
2
52
264
@gabriel_ilharco
Gabriel Ilharco
2 years
The year is 2032. A model was trained on all images, videos and text on the web, using over 100 yottaFLOPs. It still thinks this is an image of a dog. To fix models post-hoc, check out PAINT!🎨 📜 💻 🌐
Tweet media one
5
39
179
@gabriel_ilharco
Gabriel Ilharco
4 years
Vision plays a central role in shaping the meaning of concrete words like "apple" or "banana". Yet, most of today's NLP models learn representations of these concepts from text-only. Can such representations share similarities with the visual world? 1/n
Tweet media one
2
39
145
@gabriel_ilharco
Gabriel Ilharco
3 years
Forget about messy vision backbones inside vision+language models? Check out ViLT, a cool work by Kim et al., extending Vision Transformers to multimodal domains. Link:
Tweet media one
1
27
134
@gabriel_ilharco
Gabriel Ilharco
3 years
CLIP has spoken.
Tweet media one
Tweet media two
0
14
91
@gabriel_ilharco
Gabriel Ilharco
5 years
Can machines learn language from grounded, untranscribed speech? I don't know, but we're making fast progress! Paper: Thread below (1/n)
1
21
85
@gabriel_ilharco
Gabriel Ilharco
10 months
🚀Big updates to OpenCLIP! We now support over 100 pretrained models, and many other goodies. Check it out!
@wightmanr
Ross Wightman
10 months
v2.23.0 of OpenCLIP was pushed out the door! Biggest update in a while, focused on supporting SigLIP and CLIPA-v2 models and weights. Thanks @gabriel_ilharco @gpuccetti92 @rom1504 for help on the release, and @bryant1410 for catching issues. There's a leaderboard csv now!
2
18
147
1
17
78
@gabriel_ilharco
Gabriel Ilharco
3 years
A surprisingly simple way to improve generalization when fine-tuning: combine the weights of zero-shot and fine-tuned models. We find significant improvements across many datasets and model sizes, at no additional computational cost at fine-tuning or inference time!
@Mitchnw
Mitchell Wortsman
3 years
Can zero-shot models such as CLIP be fine-tuned without reducing out-of-distribution accuracy? Yes! Our new method for robust fine-tuning improves average OOD accuracy by 9% on multiple ImageNet distribution shifts without any loss in-distribution (1/9)
Tweet media one
5
63
256
0
21
80
@gabriel_ilharco
Gabriel Ilharco
2 years
A quick reminder that less than 10 years ago, this is what image generation looked like. Things move fast!!
Tweet media one
2
5
74
@gabriel_ilharco
Gabriel Ilharco
4 years
New paper out! In NLP, fine-tuning large pretrained models like BERT can be a very brittle process. If you're curious about this, this paper is for you! Work with the amazing @JesseDodge , @royschwartz02 , Ali Farhadi, @HannaHajishirzi & @nlpnoah 1/n
@JesseDodge
Jesse Dodge
4 years
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping We found surprisingly large variance just from random seeds when fine-tuning BERT. Both weight inits and the order of the training data have big impact. 1/n
12
107
434
1
23
60
@gabriel_ilharco
Gabriel Ilharco
2 years
By *negating* a task vector, users can mitigate undesirable behaviors (e.g. toxic generations from a LM), or forget tasks altogether (e.g. OCR). For instance, by fine-tuning a GPT-2 model on toxic data, negating the resulting task vector reduces toxic generations by 6x. (8/n)
Tweet media one
1
3
49
@gabriel_ilharco
Gabriel Ilharco
4 years
We are hosting a tutorial on High Performance NLP at #emnlp2020 , covering a bunch of fun stuff in efficiency! Our first live Q&A session starts in ~1h! Slides: With the amazing Cesar Ilharco, @IuliaTurc , @Tim_Dettmers , Felipe Ferreira and @kentonctlee .
0
12
48
@gabriel_ilharco
Gabriel Ilharco
2 years
Task vectors offer a simple and efficient way of editing models. To create a task vector, we first fine-tune on a downstream task, then subtract the weights of the pre-trained model from the weights of the fine-tuned model. (3/n)
Tweet media one
1
0
46
@gabriel_ilharco
Gabriel Ilharco
5 years
Thrilled that our paper got honorable mention for Best Paper Award for Research Inspired by Human Language Learning and Processing! @conll2019 #emnlp2019
@gabriel_ilharco
Gabriel Ilharco
5 years
Can machines learn language from grounded, untranscribed speech? I don't know, but we're making fast progress! Paper: Thread below (1/n)
1
21
85
4
11
47
@gabriel_ilharco
Gabriel Ilharco
2 years
Much like software, models can be patched, adding support for new tasks with little change elsewhere. I'll be at NeurIPS this week presenting our patching method, PAINT🎨. Come say hi! 👋
Tweet media one
1
7
39
@gabriel_ilharco
Gabriel Ilharco
5 years
New paper on evaluation metrics for instruction conditioned navigation (e.g. VLN) on arXiv! #NLProc #Robotics #VLN Paper 👉 Thread 👇
Tweet media one
2
9
39
@gabriel_ilharco
Gabriel Ilharco
1 year
Despite their importance, datasets rarely receive the same research attention as model architectures or training algorithms. We believe this is a major shortcoming in the machine learning ecosystem, and that datasets deserve as much rigorous empirical experimentation as models
2
3
37
@gabriel_ilharco
Gabriel Ilharco
9 months
@giffmana @rom1504 Not bad at all, @giffmana ! Curious about your suspicions, whether here or in private =). I'm pretty suspicious of datasets myself
Tweet media one
3
0
34
@gabriel_ilharco
Gabriel Ilharco
2 years
One of the most important challenges in machine learning today is figuring out how to control the behavior of pre-trained models, whether to reduce biases, align with human preferences, or simply improve accuracy on downstream tasks. (2/n)
1
0
31
@gabriel_ilharco
Gabriel Ilharco
2 years
Another key benefit of task vectors is that they enable us to reuse existing fine-tuned models, without the need to re-train or transfer any of the data. This is particularly exciting in light of the fast growth of fine-tuned models in recent years. (5/n)
Tweet media one
1
0
32
@gabriel_ilharco
Gabriel Ilharco
2 years
While it might be surprising at first that we can operate directly in the weight space of neural networks, our research builds on several recent exciting works exploring the geometry of loss landscapes and weight averaging (links at the end!) (6/n)
1
2
32
@gabriel_ilharco
Gabriel Ilharco
2 years
Once created, task vectors can be combined via arithmetic operations like addition or subtraction, changing model behavior accordingly. And since all operations are element-wise, editing models with task vectors has no impact on inference time! (4/n)
1
0
31
@gabriel_ilharco
Gabriel Ilharco
2 years
By *adding* task vectors, we can create multi-task models without any additional training. Using CLIP, adding task vectors from two different tasks greatly improves the accuracy of the zero-shot model, and almost matches the accuracy of using multiple specialized models (9/n)
Tweet media one
1
0
30
@gabriel_ilharco
Gabriel Ilharco
1 year
Together with DataComp, we are releasing CommonPool, the largest collection of image-text pairs to date. CommonPool has 12.8 billion samples collected from Common Crawl, and is larger than existing datasets by a factor of 2.5x.
Tweet media one
2
4
28
@gabriel_ilharco
Gabriel Ilharco
1 year
DataComp is a new benchmark for designing multimodal datasets. Unlike traditional benchmarks, DataComp has data front and center. The goal of participants is to propose new training sets, while keeping code, hparams & compute constant.
Tweet media one
1
4
28
@gabriel_ilharco
Gabriel Ilharco
2 years
As more task vectors are added together, we can create more powerful multi-task models, without any re-training, and without increasing inference time (10/n)
Tweet media one
1
0
28
@gabriel_ilharco
Gabriel Ilharco
5 years
Come check out our talk tomorrow at the Visually Grounded Interaction and Language workshop!
@gabriel_ilharco
Gabriel Ilharco
5 years
New paper on evaluation metrics for instruction conditioned navigation (e.g. VLN) on arXiv! #NLProc #Robotics #VLN Paper 👉 Thread 👇
Tweet media one
2
9
39
0
6
27
@gabriel_ilharco
Gabriel Ilharco
10 months
If you're interested in the next generation of vision datasets, don't miss our workshop today at #ICCV2023 !
1
1
27
@gabriel_ilharco
Gabriel Ilharco
2 years
…, , and, one of our main sources of inspiration, (17/n)
1
0
23
@gabriel_ilharco
Gabriel Ilharco
1 year
We also show that the ranking of many curation approaches is consistent across scales This suggests that experiments at smaller scales can provide valuable insights for larger scales, thereby accelerating investigations
Tweet media one
1
4
21
@gabriel_ilharco
Gabriel Ilharco
1 year
As a highlight, we find a 1.4B subset of our pool, DataComp-1B, that outcompetes compute-matched models from OpenAI and LAION by a large margin
Tweet media one
2
2
21
@gabriel_ilharco
Gabriel Ilharco
1 year
We present 300+ baseline experiments along with many insights into dataset design A key result is that smaller, more aggressively filtered datasets can perform *better* than larger datasets coming from the same pool
1
3
19
@gabriel_ilharco
Gabriel Ilharco
2 years
In our work we edit models using three arithmetic expressions over task vectors: negating a task vector, adding task vectors together, and doing analogies with task vectors. (7/n)
1
0
20
@gabriel_ilharco
Gabriel Ilharco
5 years
I recently had my last day as a @GoogleAI Resident. It has been an amazing year and I'm very thankful to @jasonbaldridge , @vihaniaj , @alex_y_ku , @quocleix and other collaborators for teaching me what no book can and making me fall in love with doing research.
1
0
20
@gabriel_ilharco
Gabriel Ilharco
2 years
Overall, we show that task arithmetic is a simple, efficient and effective way of editing models. It enables us to re-use existing checkpoints without the need to re-train or transfer data, and to combine models without increasing inference time. (14/n)
1
0
20
@gabriel_ilharco
Gabriel Ilharco
2 years
Finally, much like with word embeddings such as Word2Vec (think "man" is to "woman" as "king" is to "queen"), you can do *analogies* with task vectors! (11/n)
1
0
19
@gabriel_ilharco
Gabriel Ilharco
2 years
Consider two sentiment analysis datasets. We can improve accuracy on the first by combining three other task vectors, obtained by A) unsupervised ft on the 1st dataset; B) supervised ft on the 2nd and C) unsupervised ft on the 2nd B+(A-C) improves accuracy on the first! (12/n)
Tweet media one
1
0
17
@gabriel_ilharco
Gabriel Ilharco
4 years
Whoa, this is really cool! Text-only models often outperform text+vision models in text-only tasks, given the statistical discrepancies in the language used in these domains. "Vokenization" is a neat way to get some grounded supervision without paying the domain shit price
@HaoTan5
Hao Tan
4 years
*Vokenization*: a visually-supervised language model attempt in our #emnlp2020 paper: (w. @mohitban47 ) To improve language pre-training, we extrapolate multimodal alignments to lang-only data by contextually mapping tokens to related images ("vokens") 1/4
Tweet media one
Tweet media two
Tweet media three
Tweet media four
7
89
356
0
2
17
@gabriel_ilharco
Gabriel Ilharco
1 year
Our benchmark is designed with scale in mind, with 4 levels of compute ranging from 12.8M to 12.8B samples seen in training At the smallest scale, we can train in a few hours on a single GPU. At the largest, experiments may take up to 40 thousand GPU hours
1
2
16
@gabriel_ilharco
Gabriel Ilharco
1 year
DataComp is centered around image-text datasets, which have been instrumental in building models like CLIP, DALL-E, Stable Diffusion, Flamingo, and many others. Our standardized infrastructure trains CLIP models and evaluates them on a diverse suite of 38 downstream tasks.
1
2
16
@gabriel_ilharco
Gabriel Ilharco
1 year
Large-scale image-text datasets like LAION or DataComp are heavily filtered. Instead of throwing millions of images away, can we make use of them via image captioning models? Check out this very cool work led by @thao_nguyen26 ! 👇
@thao_nguyen26
Thao Nguyen
1 year
Are synthetic captions useful for multimodal training? In , we show how image captioning can improve the quality of web-scale datasets. Replacing noisy web captions with generated ones outperforms existing filtering methods from the DataComp benchmark 1/n
Tweet media one
5
20
117
0
0
16
@gabriel_ilharco
Gabriel Ilharco
1 year
This massive effort was made possible thanks to the work of many, @sy_gadre , @AlexFang26 , @JonathanHayase , Georgios Smyrnis, @thao_nguyen26 , Ryan Marten, @Mitchnw , Dhruba Ghosh, @JieyuZhang20 , Eyal Orgad, @rahiment , @giannis_daras , @sarahmhpratt , @RamanujanVivek , ...
1
2
16
@gabriel_ilharco
Gabriel Ilharco
1 year
Special thanks to all the DataComp authors for making this possible, and to Stability AI for providing compute!
1
2
16
@gabriel_ilharco
Gabriel Ilharco
1 year
We will also hold a workshop at ICCV 2023 centered around DataComp in October, and will invite outstanding submissions to give presentations. Check out to learn more!
1
4
15
@gabriel_ilharco
Gabriel Ilharco
9 months
Full results here
0
1
15
@gabriel_ilharco
Gabriel Ilharco
5 years
Researcher 1: we should show that our system is robust Researcher 2: how about we simulate what would happen if a giraffe tried to eat the cube? Researcher 1: excellent idea
@OpenAI
OpenAI
5 years
We’re all used to robots that fail when their environment changes unpredictably. Our robotic system is adaptable enough to handle unexpected situations not seen during training, such as being prodded by a stuffed giraffe:
46
387
2K
0
2
15
@gabriel_ilharco
Gabriel Ilharco
2 years
Also, when data from the target task is available, using analogies with task vectors can provide a better starting point for fine-tuning. (13/n)
Tweet media one
1
0
15
@gabriel_ilharco
Gabriel Ilharco
2 years
New sota open-source CLIP model just dropped 🔥
@Mitchnw
Mitchell Wortsman
2 years
We've trained a new ViT-G/14 CLIP model with OpenCLIP on LAION-2B which achieves 80.1% zero-shot accuracy on ImageNet and 74.9% zero-shot image retrieval (R @5 ) on MSCOCO. As of Jan 2023 this is the best open source CLIP code: blog:
6
86
458
0
1
14
@gabriel_ilharco
Gabriel Ilharco
1 year
Along with filtering CommonPool, we have a separate Bring Your Own Data (BYOD) track. In BYOD, any data can be used as long as it doesn’t overlap with our evaluation suite. We show that adding data sources such as RedCaps and CC12M can improve performance of some baselines
1
2
13
@gabriel_ilharco
Gabriel Ilharco
1 year
A special thanks to Stability AI and the Gauss Centre for Supercomputing e.V for providing us with compute resources for training our models.
3
2
13
@gabriel_ilharco
Gabriel Ilharco
4 years
Personally, I'm excited about the potential of self-attention in vision, especially given recent indication that, in some scenarios, it can scale better than convolutions. It's great to see all this recent progress, and I hope it shows up to its promise in the near future!
1
0
12
@gabriel_ilharco
Gabriel Ilharco
5 years
Apparently GPT-2 thinks it's a good idea to initialize weights with zeros 🤔
Tweet media one
Tweet media two
1
1
11
@gabriel_ilharco
Gabriel Ilharco
5 years
While I'm sad to leave this incredible environment, I'm very excited to be joining @uwcse as a PhD student this Fall!
1
0
11
@gabriel_ilharco
Gabriel Ilharco
4 years
1) Image Transformer (2018), by @nikiparmar09 , @ashVaswani , @kyosu , @lukaszkaiser , Noam Shazeer, @alex_y_ku , @dustinvtran . Local attention at the pixel level for image generation and super-resolution Link:
Tweet media one
1
0
11
@gabriel_ilharco
Gabriel Ilharco
5 years
Why is reaching the target the only thing most people worry about in navigation? Find out more in our ACL 2019 paper 1/
Tweet media one
1
2
11
@gabriel_ilharco
Gabriel Ilharco
2 years
@ericjang11 @colinraffel Some of our recent papers that might interest you! Merging a finetuned and a pretrained model: Merging models finetuned on the same task: Merging models finetuned on different tasks:
1
0
11
@gabriel_ilharco
Gabriel Ilharco
2 years
There is much more in our paper, and we think this is just the beginning! I’m excited for a future where we have cheap and reliable ways of controlling how models behave, without needing to re-train them from scratch (15/n)
1
0
11
@gabriel_ilharco
Gabriel Ilharco
4 years
2) Attention Augmented CNNs (2019), by @IrwanBello , @barret_zoph , @ashVaswani , Jonathon Shlens, @quocleix Augmenting CNNs with self-attention yields considerable improvements on ImageNet Link:
Tweet media one
1
1
11
@gabriel_ilharco
Gabriel Ilharco
1 year
There is much more in the links below. We are beyond excited to build the next generation of multimodal datasets rigorously and collaboratively, and hope you join us in this journey! 📜: 🖥️: 🌐:
1
2
11
@gabriel_ilharco
Gabriel Ilharco
2 years
Check out model soups, a new recipe for fine-tuning! 🍜 Our recipe leads to 90.98% accuracy on ImageNet when fine-tuning BASIC
@Mitchnw
Mitchell Wortsman
2 years
Introducing a new recipe for fine-tuning --- model soups 🍜 TL;DR: we average the weights of multiple fine-tuned models to improve accuracy without increasing inference time Paper: Code: To appear at ICML (1/10)
Tweet media one
5
42
266
0
1
10
@gabriel_ilharco
Gabriel Ilharco
4 years
@srush_nlp @Tim_Dettmers @IuliaTurc @kentonctlee That's very kind, I'm glad you enjoyed it!
0
0
10
@gabriel_ilharco
Gabriel Ilharco
4 years
@JeffDean I hope they are worthy of your attention!
0
0
9
@gabriel_ilharco
Gabriel Ilharco
5 years
Packed room today at our #KDD tutorial on Deep Learning for #NLProc with #TensorFlow ! Great to talk to such a broad and intelligent audience!
Tweet media one
1
2
9
@gabriel_ilharco
Gabriel Ilharco
5 years
Languages are beautiful. In classic Tupi, spoken by native Amerindians in Brazil, all verbs are in the present tense. Time is generally expressed by the suffixes "rama" (future) and "ûera" (past). (1/2)
1
0
9
@gabriel_ilharco
Gabriel Ilharco
3 years
It has been incredibly fun to put this together! Huge shout-out to the amazing people involved @Mitchnw , Nicholas Carlini, @rtaori13 , Achal Dave, @Vaishaal , John Miller, Hongseok Namkoong, @HannaHajishirzi , Ali Farhadi & @lschmidt3 . Special thanks to @_jongwook_kim and @AlecRad
0
0
9
@gabriel_ilharco
Gabriel Ilharco
2 years
This is a big steps towards democratizing access to large models. Congrats on the great work, Tim!
@Tim_Dettmers
Tim Dettmers
2 years
We release LLM.int8(), the first 8-bit inference method that saves 2x memory and does not degrade performance for 175B models by exploiting emergent properties. Read More: Paper: Software: Emergence:
Tweet media one
17
250
1K
0
1
9
@gabriel_ilharco
Gabriel Ilharco
4 years
6) DETR: End-to-End Object Detection with Transformers (2020), by @alcinos26 , @fvsmassa , @syhw , Nicolas Usunier, @kirillov_a_n , @szagoruyko5 ) Object detection as a set prediction problem and a transformer on top of a CNN backbone Link:
Tweet media one
1
1
9
@gabriel_ilharco
Gabriel Ilharco
4 years
7) Group Equivariant Stand-Alone Self-Attention for Vision (2020) by @davidwromero , @jb_cordonnier Self-attention with equivariance to arbitrary symmetries by carefully defining the positional encodings Link:
Tweet media one
1
0
9
@gabriel_ilharco
Gabriel Ilharco
2 years
If you haven't been following it, @wightmanr , @CadeGordonML and others have been doing amazing work with the OpenCLIP library! They recently trained two ViT models on LAION-400M, the first large-scale, open-source CLIP models where the data is also publicly available!
@wightmanr
Ross Wightman
2 years
OpenCLIP () has been updated with the latest results from a ViT-B/16 training run with the LAION400M dataset. Reaching zero-shot top-1 of 67.07 for In1k validation set. Further zero-shot analysis pending...
3
35
222
0
1
8
@gabriel_ilharco
Gabriel Ilharco
4 years
5) Axial-DeepLab (2020), by Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, @YuilleAlan , Liang-Chieh Chen) Factorizing 2-d attention into two faster 1-d operations. Strong results on classification and segmentation Link:
Tweet media one
1
0
8
@gabriel_ilharco
Gabriel Ilharco
4 years
Fantastic work by @OfirPress , @nlpnoah and @ml_perception showing when it's helpful to use shorter sequences for language modeling! Thread 👇
@OfirPress
Ofir Press
4 years
Everyone thinks that you have to increase the input length of language models to improve their performance. Our new Shortformer model shows that by *shortening* inputs performance improves while speed and memory efficiency go up. ⬇(1/n) (code below)
Tweet media one
8
89
547
0
0
8
@gabriel_ilharco
Gabriel Ilharco
1 year
Overall, DataComp provides a controlled environment that enables rigorous experimentation over dataset design choices. The large improvements we see from simple baselines highlight the power of careful empirical studies with datasets.
1
2
8
@gabriel_ilharco
Gabriel Ilharco
9 months
Pretty good on retrieval too!
Tweet media one
2
0
7
@gabriel_ilharco
Gabriel Ilharco
2 years
More details in the great thread below by @Mitchnw . We added a number of new experiments and results in our paper, including additional models such as ALIGN and BASIC, along with further discussions on the role of hyperparameters.
@Mitchnw
Mitchell Wortsman
3 years
Can zero-shot models such as CLIP be fine-tuned without reducing out-of-distribution accuracy? Yes! Our new method for robust fine-tuning improves average OOD accuracy by 9% on multiple ImageNet distribution shifts without any loss in-distribution (1/9)
Tweet media one
5
63
256
1
1
6
@gabriel_ilharco
Gabriel Ilharco
3 years
Our codebase matches the ImageNet zero-shot accuracy from OpenAI (32.7% ours vs 31.3%) when training on the same data at medium scales (~15M samples from YFCC). As shown by the scaling trends below, performance is far from saturated at this scale.
Tweet media one
1
0
7
@gabriel_ilharco
Gabriel Ilharco
5 years
Me: Reviewer number 2:
@softsynthbear
uncleared sample
5 years
jonathan frakes telling you you're wrong for 47 seconds
1K
24K
82K
1
0
7
@gabriel_ilharco
Gabriel Ilharco
5 years
Getting started with research can be challenging, especially if you come from underrepresented communities. I was fortunate to have amazing people guiding me in this process and I’m happy to help ambitious people do the same. Feel free to contact me =)
1
0
7
@gabriel_ilharco
Gabriel Ilharco
2 years
If you're interested in robustness, don't miss @anas_awadalla 's great work! Tons of models and evaluations, and lots of practical insights!
@anas_awadalla
Anas Awadalla 🍉
2 years
I am excited to share our paper on evaluating the distributional robustness of QA models, where we evaluate 350+ SQuAD models on 15 distribution shifts and find that in-context learning provides the best performance-robustness tradeoff. More details below ⬇️
Tweet media one
3
11
65
0
0
7
@gabriel_ilharco
Gabriel Ilharco
4 months
@visheratin @xai Thanks! We are planning to roll it out soon
2
0
7
@gabriel_ilharco
Gabriel Ilharco
2 years
Even the best pre-trained models are not perfect. For instance, CLIP has strong zero-shot accuracy on ImageNet, but is worse than logistic regression with pixels on MNIST. In some cases, like in typographic attacks, simply scaling up can make things worse📉
Tweet media one
1
0
7
@gabriel_ilharco
Gabriel Ilharco
3 years
One particularly exciting property of small-medium scale CLIP models is that they still exhibit atypically high effective robustness! () This scale invariance means we don't need massive amounts of compute to study what makes these models robust
Tweet media one
1
0
7