Ranjay Krishna
@RanjayKrishna
Followers
5K
Following
4K
Media
190
Statuses
2K
Assistant Professor, University of Washington Director of Computer Vision, Allen Institute for Artificial Intelligence
California, USA
Joined August 2011
I successfully defended my PhD a few days ago. Huge thanks to my amazing advisors @drfeifei and @msbernst for supporting me throughout my journey.
A hearty congratulations to my student @RanjayKrishna (co-advised by @msbernst ) for a successful PhD thesis defense! Great pioneering work in combining human cognition, human-computer interaction and #AI! Thank you PhD committee members @chrmanning @syeung10 @magrawala ๐น
24
7
311
Our submission received my first ever 10/10 review from NeurIPS. Check out our #NeurIPS2023 Oral. We release the largest vision-language dataset for histopathology and train a SOTA model for classifying histopathology images across 13 benchmarks across 8 sub-pathologies.
Quilt-1M has been accepted for an oral presentation at @NeurIPSConf. As promised, we have also released our data and our model: See you all in New Orleans!.
2
25
204
Our new paper finds something quite neat: We easily scale up how many tools LLMs can use to over 200 tools (APIs, models, python functions, etc.) . without any training, without a single tool-use demonstration!!.
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models. paper page: Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage. Unfortunately, demonstrations are hard to
9
25
160
Announcing the first ๐๐๐๐ฉ ๐๐ผ๐ฟ๐ธ๐๐ต๐ผ๐ฝ ๐ผ๐ป ๐ฆ๐ฐ๐ฒ๐ป๐ฒ ๐๐ฟ๐ฎ๐ฝ๐ต ๐ฅ๐ฒ๐ฝ๐ฟ๐ฒ๐๐ฒ๐ป๐๐ฎ๐๐ถ๐ผ๐ป ๐ฎ๐ป๐ฑ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด. If your research involves structured data or graph-based learning, consider submitting to us by August 15, 2019:
4
42
159
On my way to Seoul for #ICCV2019. If youโre at the conference on October 28th, come check out a full day workshop I am organizing on Scene Graph Representation and Learning (. We have a great lineup of speakers and posters.
4
16
128
Academic quarter recap: here's a staff photo after the last lecture of @cs231n. It's crazy that we were the largest course at Stanford this quarter. This year, we added new lectures and assignments (open sourced) on attention, transformers, and self-supervised learning.
2
2
125
Someone made an in-depth video of our recent work at #CVPR2018 on Referring Relationships. If you are interested in how we train models to disambiguate between different people or objects in images, go check it out. #ComputerVision #MachineLearning.
1
40
110
Deploying LLMs continues to be a challenge as they grow in model size and consume more data. We introduce a simple distillation mechanism to make even 770M T5 models outperform 540B PaLM. Led by my PhD student @cydhsieh and with collaborators @chunliang_tw @ajratner.
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. reduce both the model size and the amount of data required to outperform LLMs; our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a
0
23
104
If you're releasing a new user-facing AI project/product, you might want to read our new #CSCW2020 paper. We find that words or metaphors used to describe AI agents have a causal effect on users' intention to adopt your agent. Thread๐.
2
19
94
NEW dataset, NEW task, NEW model for dense video captioning. Work done with @kenjihata, @drfeifei and @jcniebles.
1
42
87
Human-centered AI is no longer just a buzzword. It's a thriving, growing area of research. Come to our workshop tomorrow at #ICML2023 to learn about it. AI models have finally matured for mass market use. HCI+AI interactions will only become more vital.
1
10
81
I am giving two talks at #CVPR2024 today (Tuesday). I promise they will both be entertaining ๐๐ฅณ. 2pm @ summit 433 (:.The past, present, and future of Vision-Language Evaluation. 3:45pm @ summit 329 (:.
2
10
63
๐New paper! 1 training method, no new architecture, no additional data, SOTA results on 8 vision-language benchmarks. Our 5B model variants even outperform 13B+ models!.
Multimodal reasoning is hard. Even the best LMMs struggle with counting๐ฅ Any fix for it?. Introduce VPD from @GoogleAI: we teach LMMs multimodal CoT reasoning with data synthesized from LLM + vision tools, and achieve new SOTAs on many multimodal tasks!๐ฅณ.
1
10
62
Award for the most creative and least informative poster at #cvpr2017 YOLO9000: better, faster, stronger!!
1
12
58
Congrats to Amir Zamir @zamir_ar, Silvio Savarese @silviocinguetta and co-authors for their Best paper award at #CVPR2018 โTaskonomy: Disentangling Task Transfer Learningโ
0
10
60
With the ICCV ban finally lifted, here is our new #ICCV2023 paper, which already has a few follow up papers. Our method faithfully evaluates text-to-image generation models. It provides more than just a score; it identifies missed objects, incorrect attributes, relationships, etc.
It is notoriously hard to evaluate images created by text-to-image models. Why not using the powerful LLMs and VLMs to analyze them?. We introduce TIFA๐ฆธ๐ปโโ๏ธ #ICCV2023, which uses GPT + BLIP to quantitatively measure what Stable Diffusion struggles on!. Proj:
0
8
58
We updated our generative human evaluation benchmark with 6 GANs, 4 image datasets (generating faces and objects), 2 sampling methods. We show statistically insignificant correlation with FID and other automatic metrics. Use HYPE (!
Measuring progress in generative models is akin to hill climbing on noise. Automatic metrics are heuristics and human evaluations unreliable. Our latest paper presents a new human evaluation grounded in psychophysics, consistent, turnkey and cost-effective
1
18
54
Benchmarks today have become less informative to the communities they are meant to serveโresearchers, developers, and users. Task-me-anything automatically generates benchmarks depending on the user's need/application.
Have trouble finding a benchmark for your use case๏ผ. Introducing TaskMeAnything, a benchmark generation engine that creates VQA benchmarks on demand for assessing multimodal language models like GPT-4o. Website:
1
4
48
New paper: Real-world image editing that is consistent with lighting, occlusion, and 3D shapes๐๐ง๐งโฝ๏ธ๐!. We introduce a new 3D image editing benchmark called OBJect. Using OBJect, we train 3DIT, a diffusion model that can rotate, translate, insert, and delete objects in images.
Imagine a 2D image serving as a window to a 3D world that you could reach into, manipulate objects, and see changes reflected in the image. In our new OBJect 3DIT work, we edit images in this 3D-aware fashion while only operating in the pixel space! . ๐งต
0
8
48
New paper! Now that self-supervision for high-level vision tasks have matured, we ask what is needed for pixel-level tasks?. Given cog.sci. evidence, we show that scaling up learning multi-view correspondences improves SOTA on depth, segmentation, normals, and pose estimation.
MIMIC: Masked Image Modeling with Image Correspondences. paper page: Many pixelwise dense prediction tasks-depth estimation and semantic segmentation in computer vision today rely on pretrained image representations. Therefore, curating effective
1
6
41
New paper ๐ . If you use GPT and other vision models just right, you can get them to do zero shot robotic manipulation in the real world!!.
๐ Excited to share our latest work: MANIPULATE-ANYTHING! ๐ฆพ This scalable method pushes the boundaries of real-world robotic manipulation through zero-shot task execution and automated BC data generation. Here's a quick overview:๐.
0
4
41
For researchers working on scene graphs or visual relationships, I just open sourced a simple library to easily visualize #SceneGraphs. Now you can directly use this to generate your qualitative results in your publications.
1
4
40
Make sure to check out this new #documentary on PBS (@novapbs): "Can we build a brain?" with Fei-Fei (@drfeifei) and me. Check out the trailer here.
4
7
39
Hey everyone, we have a great lineup of speakers at our upcoming workshop on the importance of Compositionality in Computer Vision (, at #CVPR2020 (with @eadeli, @jcniebles, @drfeifei, @orussakovsky). Consider submitting a paper. Also, stay safe.
0
4
36
Happy Thanksgiving everyone! We have released code and a demo for our #ICCV2019 paper on Predicting Scene Graphs with Limited Labels. Check out @vincentsunnchen's GitHub repository here:.
0
7
35
Scottโs new paper shows that the promise of synthetic training data still has not been realized for computer vision, despite all the hype. Existing works do not report an important baseline: using the generatorsโ original real data. This baseline outperforms models trained with.
Will training on AI-generated synthetic data lead to the next frontier of vision models?๐ค. Our new paper suggests NOโfor now. Synthetic data doesn't magically enable generalization beyond the generator's original training set. ๐: Details below๐งต(1/n).
0
5
33
@jebbery And then when you start your job:. First task: What is 26 + 99?. Me: It's 19. #Overfitting.
1
2
33
Training robots requires dataโwhich today is hard to collect. You need (1) expensive robots, (2) teach people to operate them, (3) purchase objects for the robots to manipulate. Our #CoRL2023 paper shows you don't need any of the 3, not even a robot! All you need is an iPhone.
๐จIs it possible to devise an intuitive approach for crowdsourcing trainable data for robots without requiring a physical robot๐ค? .Can we democratize robot learning for all?๐งโ๐คโ๐ง. Check out our latest #CoRL2023 paper->.AR2-D2: Training a Robot Without a Robot
0
1
30
Our work on dense #video #captioning was featured in #techcrunch. Collaborators - @drfeifei, @kenjihata @jcniebles
2
8
32
If you are attending #CVPR2020, we have some exciting things for you to attend and check out: 1) Come to our (w @drfeifei @jcniebles @eadeli Jingwei) workshop on Sunday on Compositionality in Computer Vision. We have an amazing line up of speakers.
1
1
29
Sketching is fundamental to how we operationalize spatial intelligence. Sketches are everywhere: drawings incised in stone, etched on leather, impressed in clay, drawn on paper. Today, we sketch our neighborhoods in maps, buildings with architectural blueprints, ideas on.
Humans draw to facilitate reasoning and communication. Why not let LLMs do so?. ๐We introduceโ๏ธSketchpad, which gives multimodal LLMs a sketchpad to draw and facilitate reasoning!. Sketchpad gives GPT-4o great boosts on many vision and math tasks ๐. The
0
3
32
Just redesigned and will be teaching a fun new course on #computervision with @jcniebles @Stanford. Go check it out:
3
7
29
At #CVPR2023 this year, I had a number of conversations about how we need a faithful benchmark for measuring vision-language compositionality. SugarCrepe is our response. Our best models are still not compositional. It's time to make some progress!.
Introducing SugarCrepe: A benchmark for faithful vision-language compositionality evaluation!. โผ๏ธ Current compositional image2text benchmarks are HACKABLE: Blind models without image access outperform SOTA CLIP models due to severe dataset artifacts. ๐:
0
3
29
Funniest quote from #iccv2019: โstrongly supervised learning is the opium of machine learning and now we are all hooked on itโ.
0
2
28
Visual Genome paper has now been released. Project advised by @drfeifei @msbernst @ayman @lijiali_vision
2
15
26
Today we open sourced all 9 assignments for the #ComputerVision class I teach @Stanford with @jcniebles - allowing everyone to learn various concepts like lane detection, deformable parts, segmentation, dimensionality reduction, optical flow, etc.
1
9
25
There are so many vision-language models: OpenAIโs CLIP, Metaโs FLAVA, Salesforceโs ALBEF, etc. Our #CVPR2023 โญ๏ธ highlight โญ๏ธ paper finds that none of them show sufficient compositional reasoning capacity. Since perception and language are both compositional, we have work to do.
Have vision-language models achieved human-level compositional reasoning? Our research suggests: not quite yet. Weโre excited to present CREPE โ a large-scale Compositional REPresentation Evaluation benchmark for vision-language models โ as a ๐highlight๐at #CVPR2023. ๐งต1/7
0
1
27
@CloudinAround @joshmeyerphd @AndrewYNg Maybe you should read about what the problem really is before commenting. The waves tuition will be taxable under the new bill - making out PhD unaffordable.
0
0
26
@fchollet It very much depends on the act function but for most cases, you want to use conv-bn-act. Without bn before act, saturated neurons will kill gradients. We do case studies of this across multiple activation functions in these slides:
2
0
24
New paper: DreamSync improves any text-to-image generation model by aligning it better with text inputs. We use DreamSync to improve stable diffusion XL.
Generated images not following your prompt?. Introducing ๐ป๐ฃ๐๐๐๐๐ช๐๐ from @GoogleAI:ย improving alignment + aesthetics of image generation models with feedback from VLMs!. โ
Model Agnostic.โ
Plug and Play.โ RL.โ Human Annotation.โ Real Image.
0
2
25
Extremely proud of my lab mates for winning best paper award at #ICRA2019 on their work on self-supervised learning that combines vision and touch.
0
3
21
Check out this new workshop and benchmark for studying vision systems that can navigate as social agents amongst people -- by my colleagues (@SHamidRezatofig) at @StanfordSVL.
0
5
24
Our paper has been accepted at #ICCV2019. Come check us out in Seoul later this year. In the meantime, we are planning on releasing code soon.
Structured prediction requires large training sets, but crowdsourcing is ineffectiveโ so, existing models ignore visual relationships without sufficient labels. Our method uses 10 relationship labels to generate training data for any scene graph model!
2
0
21
I will be speaking at an event tomorrow at @Stanford on the importance of #Trust and #Transparency in Human-AI collaboration. Come stop by to hear about how we can build dynamic learning systems that can continuously learn from interactions with humans.
3
5
23
We are open sourcing Quilt-LLaVA: a large vision-language assistant for histopathology. Quilt-LLaVA can support pathologists by engaging in dialogue as they examine whole-slide images. It's build on top of QUILT and QUILT-NET, which we released earlier this year. It's built on.
Introducing Quilt-LLaVA, a Large Language and Vision Assistant for #Pathology trained with spatially localized instruction tuning data generated from educational #YouTube videos, outperforming SOTA in various tasks. ๐: ๐: A ๐งต:
0
4
19
Embodied AI has been limited to simple lifeless simulated houses for many years. Just like the Holodeck in the Star Trek episodes I grew up watching, our Holodeck system allows you to create diverse lived in 3D simulated environments populated with thousands of objects:.
๐ธ Announce Holodeck, a promptable system that can generate diverse, customized, and interactive 3D simulated environments ready for Embodied AI ๐ค applications. Website: Paper: Code: #GenerativeAI [1/8]
0
0
20
This new project is a huge team effort from the PRIOR team at AI2 with striking conclusions: . Real-world navigation, exploration, manipulation emerges:.(1) without any RL, .(2) without any human demonstrations, .(3) with only automatically generated simulation data.
๐ Imitating shortest paths in simulation enables effective navigation and manipulation in the real world. Our findings fly in the face of conventional wisdom!.This is a big joint effort from PRIOR @allen_ai (6 first authors!).
0
1
18
Speaking of collection behavior, check out our new paper at NeurIPS 2022. Inspired by how animals coordinate to accomplish tasks, we design a simple multi-agent intrinsic reward that allows decentralized multi-agent training, allowing AI agents to even adapt to new partners.
Because it consists of billions of bidirectional interactions per day, Twitter can be thought of as a collective, cybernetic super-intelligence.
1
2
20
For those who enjoy deep learning videos, @labs_henry's YouTube channel just made an easy to digest video summarizing our ACL paper (: .
1
5
19
Lots of first time #cvpr2019 attendees from Asia who code in python and studied computer science
0
2
19
Blog post explaining the math behind all the generative adversarial network papers #GAN. It's a fun short read.
0
4
19
@karpathy Since this is getting some attention, I taught a course last year Called "AI vs IA": IA later evolved into present-day HCI (Human-Computer Interaction).
0
2
18
Google translates Turkish gender neutral "O bir doktor. O bir hemsยธire." to "He is a doctor. She is a nurse." #bias
1
10
17
Check out our research work on how to design chatbots that people want to adopt!!.
Consumers have consistent personality preferences for their online friends, new @StanfordHAI research shows.
0
1
15
On a personal note, I am going to miss co-instructing with @danfei_xu and @drfeifei. This is my 5th and last time instructing at Stanford. I have learned so much from working with so many amazing teaching assistants and students. Thank you, everyone.
1
0
16
If youโre at #ICCV2023, reach out and come say hi. I will be giving two talks:.- one at the closing the loop in vision and language workshop: - one at the scene graph workshop:
1
0
16
Old-school research presentations can be boring. Check out this fun creativity skit Helena put together to explain our new CSCW paper. TLDR: Recent works keep finding that AI explanations don't help people make better decisions. We propose a theory for when they do help!.
Do you want to learn about how explanations can help reduce overreliance on AIs?. Watch this fantastic, out-of-this-world, one-of-a-kind, spectacular, etc. short video explaining our work! We put a lot of โค๏ธ into it and would appreciate the views.
0
2
16
Joined a #AI and #deeplearning group on Facebook. Starting to realize that the public has no idea what #ArtificialIntelligence actualy is.
4
1
15
@chipro It's subjective. I would personally say scaling up models *IS* academic research. It's easy to dismiss it as not innovative. But research is also about studying the outcomes of design decisions/interventions. In this case, the intervention is increasing model size.
0
0
13
#CVPR2018 results just came out!! There doesn't appear to be any correlation between paper ID and whether your paper will get accepted, unlike past vision conferences.
0
7
14
Congrats to @HannaHajishirzi, @nlpnoah, and the rest of the team!. One step closer to open-sourcing all LLM data, architecture, eval, code. everything!.
OLMo is here! And itโs 100% open. Itโs a state-of-the-art LLM and we are releasing it with all pre-training data and code. Letโs get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here:.
0
0
12
We have seen dozens of LLM papers this year trying to chain LLMs for complex tasks to deal with LLM errors. Similarly, the field of crowdsourcing has spent a decade decomposing tasks into microtasks to deal with human errors. This new paper organizes the lessons learned from.
Chaining LLMs together to overcome LLM errors is an emerging, yet challenging, technique. What can we learn from crowdsourcing, which has long dealt with the challenge of decomposing complex work? We delve into this question in our new preprint: [1/9]
0
0
14
Here is a neat visualization exploring motifs in the visual world using relationships from @VisualGenome. Adding structure allows us to further vision research and ask questions like: "what kinds of objects usually contain food?โbowls, plates, table"
1
0
14
I have been extremely lucky to have @timnitGebru as a labmate and as a friend. Thank you for sharing your brilliant work and always being generous with your precious time. I am appalled you are dealing with this. I am here to support and help in any way I can.
I was fired by @JeffDean for my email to Brain women and Allies. My corp account has been cutoff. So I've been immediately fired :-).
0
0
12
Congratulations @PranavKhadpe!!! Contrary to how today's AI products are marketed, our paper finds that people are more likely to adopt and cooperate with AI agents that project low competence but outperforms expectations and are less forgiving when they project high competence.
Excited to share that our paper, "Conceptual Metaphors Impact Perceptions of Human-AI Collaboration", was awarded an Honorable Mention at #CSCW2020 ๐.Paper: Want to say a huge thank you to my co-authors @RanjayKrishna @drfeifei @jeffhancock and @msbernst.
0
2
13
Check out our latest work on Generating Descriptive Image Paragraphs with @jkrause314 and @drfeifei
0
4
13
We have an amazing group of speakers lined up: .Nikola Banovic @nikola_banovic .Anca Dragan @ancadianadragan .James Landay @landay .Q. Vera Liao @QVeraLiao .Meredith Ringel Morris @merrierm .Chenhao Tan @ChenhaoTan.
0
1
13
Check out our newest paper! We automatically assign probabilistic relationship labels to images and can use them to train any existing scene graph model with as few as 10 examples.
Structured prediction requires large training sets, but crowdsourcing is ineffectiveโ so, existing models ignore visual relationships without sufficient labels. Our method uses 10 relationship labels to generate training data for any scene graph model!
1
0
13
If you are the anonymous reviewer who wrote a 12 page review for my @VisualGenome paper, I just wanna say that you're awesome. #bestReviewer.
1
0
12
Men also like shopping! #EMNLP2017 best paper reduces gender bias in visual recognition using Lagrangian relaxation.
2
3
12
Introducing a simple mechanism to detect if LLMs are hallucinating. And it transfers across models and tasks!.
๐จCan we "internally" detect if LLMs are hallucinating facts not present in the input documents? ๐ค. Our findings:.- ๐Lookback ratioโthe extent to which LLMs put attention weights on context versus their own generated tokensโplays a key role.- ๐We propose a hallucination
0
1
12