Our 2nd workshop on "Self-supervised Learning - What is Next?" coming to next
#ECCV22
!
More updates soon! In the meantime, checkout the previous iteration and marvel at how far we've come in SSL since then:
Organised with
@chrirupp
@dlarlus
A. Zisserman
Getting excited for my "Self-supervised and Vision-Language Learning" lectures starting tomorrow for the
@UvA_IvI
's MSc in AI, Deep Learning 2 course:
Sharing a preview in
@CSProfKGD
style :)
Soo much recent progress, learned a lot in preparing it.😊
Check out our
@iclr_conf
[oral] paper on learning state-of-the-art ViTs from a single video from scratch!
One of the coolest things is that multi-object tracking emerges from the different heads in the plain ViTs (three heads visualised below in R,G,B).
Really happy to share that DoRA is accepted as an Oral to
@iclr_conf
#ICLR2024
Using just “1 video” from our new egocentric dataset - Walking Tours, we develop a new method that outperforms DINO pretrained on ImageNet on image and video downstream tasks.
More details in 🧵👇
Today we introduce Bidirectional Instruction Tuning (Bitune). It's a new way of adapting LLMs for the instruction->answering stage.
It allows the model to process the instruction/question with bidirectional attention, while the answer generation remains causal.
Looking forward to the Self-Supervised Learning Workshop we’ve organized with
@chrirupp
, A. Vedaldi and A. Joulin at
#ECCV2020
.
Join us tomorrow for our speakers:
@avdnoord
, P. Favaro,
@CarlDoersch
, A. Zisserman, I. Misra, S. Yu, A. Efros,
@pathak2206
! .
Check out our
@CVPR
paper on making caption-based Vision-Language Models do object-localization without _any_ human-supervised detection data!
⁉️ We develop a new *VLM-specific PEFT method* 🤩which is more powerful than LoRA etc. We test on non-training categories only!
How can one easily teach caption-pretrained VLMs to localize objects? We show that a small Positional Insert (PIN) can unlock object localization abilities without annotated data on frozen autoregressive VLMs.
#CVPR2024
📝:
🌐:
With
@iclr_conf
done &
@NeurIPSConf
deadline rapidly approaching, here's something to look forward to 🤩:
Our workshop
@ICCVConference
: "🍔BigMAC: Big Model Adaptation for Computer Vision" with amazing speakers
🌐:
📆: 2nd October 9am-1pm, details soon
This week marks my one year anniversary of being assistant prof at the
@UvA_Amsterdam
. 🥳🎉
To celebrate this, I want to share a few of my distilled reflections.
Visit our ICLR poster "Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing" with
@irolaina
and A. Vedaldi, from
@Oxford_VGG
. We linear-probe SSL models, but in ǝsǝʌǝɹ!🤯 For better interpretability.
in 1h:
@y0b1byte
At the top of my head: A lot of strands. Synthetic data: phi models, newest stable diffusion. MoEs: megablocks, llava-moe. PEFT: eg DoRA, VeRA (ours). Instruction tuning: alpaca,moe+IT paper from Google. VLMs: Apple and HF papers, LLM embeddings: eg llm2vec,
Today, my friend & collaborator
@TengdaHan
sent me this: I've arrived at 1000 citations! 🥳 Or rather, the works I've co-authored with many brilliant & inspiring individuals, have, collectively reached a nice arbitrary number! Still: 🥳🎉!
To celebrate: here's some TL;DRs
Getting excited for my "Self-supervised and Vision-Language Learning" lectures starting tomorrow for the
@UvA_IvI
's MSc in AI, Deep Learning 2 course:
Sharing a preview in
@CSProfKGD
style :)
Soo much recent progress, learned a lot in preparing it.😊
Full house at the practical of our SSL + vision-language module!
Want to follow along? Find the collab notebook made by my fabulous TAs
@ivonajdenkoska
and
@mmderakhshani
here
💻:
lecture 2 📺:
slides 📄:
Finally it's out! 🎉 Our new work on leveraging the passage of time in videos for learning better image encoders. Big improvements for spatial tasks like unsupervised object segmentation. Check out the great thread below! Paper
@ICCVConference
New paper on exploring the power of videos for learning better image encoders 🎥🧠. Introducing "TimeTuning", a self-supervised method that tunes models on the temporal dimension, enhancing their capabilities for spatially dense tasks, such as unsupervised semantic segmentation.
I will be giving a public talk about some of my research tomorrow as part of the QUVA Deep Vision lecture.
It'll be about self-supervised learning and privacy/ethics in CV (SeLa, PASS, SeLaVi, GDT and our GPT-2 bias paper).
Tune in here:
Our
#ECCV2022
@eccvconf
workshop on self-supervised learning and its many new forms.
This time with a call for papers with a deadline conveniently after ECCV decisions.
Check it out ☑️ and share🔀!
Research -> meeting friends! 🥳 After speaking at Bristol's Machine Learning and Vision group of
@dimadamen
and having exciting discussions about research yesterday, I was happy to see old and new colleagues from
@Oxford_VGG
at my talk at
@oxengsci
in Oxford today.
Final day at
@ICCVConference
! We have one oral at around 9:20am: "Self-ordering Point Clouds" where we learn how to select the most informative points with hierarchical contrastive learning (subsets as positive augmentations) and use Sinkhorn-Knopp for differentiable sorting.
I'm sure that if we had this tool earlier, efforts like ImageNet-blurred, removing the person-subtree and PASS would've happened earlier.
We don't have an excuse anymore to not see the bias and problems in our datasets. (5/5)
Our paper on causal representation learning from videos got accepted into ICML. 🍋🎉 while we use toy datasets this is a great step for the future of representation learning.
Another nice oral at
@CVPR
's vision-language session. And another good demonstration that current VLMs are pretty broken. But the authors propose a nifty way to distill procedural knowledge of coding of LLMs to VLMs, improving them on benchmarks.
Excited that VPD has been selected as Oral at
#CVPR2024
(90 orals in total, 0.8%). Congrats to all coauthors, and see you in Seattle!
Let's distill all the powerful specialist models into one VLM!
paper:
proj:
Check out our new work "Labelling unlabelled videos from scratch with multi-modal self-supervision" by
@y_m_asano
@mandelapatrick_
@chrirupp
and Andrea Vedaldi in colab with
@facebookai
!
See below for our automatically discovered clusters on VGG-Sound!
It was such a nice event! Seeing friends and meeting new ones all the while seeing NeurIPS works -- and it being simply downstairs from my office is 💯 . Especially also super happy to see the mingling between PhD and MSc students and other researchers! 🙌