Our team at Google DeepMind is seeking a Research Scientist with a strong publication record (multiple first-author papers) on multi-modal LLMs in top ML venues like NeurIPS, ICLR, CVPR. Email me at af_hiring
@google
.com
@CordeliaSchmid
Announcing the release of TensorFlow 3D, a set of training and evaluation pipelines for state-of-the-art 3D semantic segmentation, object detection and instance segmentation, with support for distributed training. Check it out and download the code at
Augmenting Large Language & Visual models with Retrieval helps the model to answer questions that were not present in the training data. REVEAL is one of the recent works by our team
@acbuller
,
@ahmetius
,
@jesu9
,
@MrZiruiWang
, David Ross,
@CordeliaSchmid
Most of the previous work on 3d object detection use only one frame of data. In our
#eccv2020
paper, we present a 3d sparse LSTM model that achieves more accurate results when applied to a sequence of point clouds.
Our recent work on object-centric neural rendering. Our new formulation makes it possible to move the objects around in the scene and still be able to render high quality images from different views.
We made NeRF compositional! By learning object-centric neural scattering functions (OSFs), we can now compose dynamic scenes from captured images of objects.
Website:
Joint work with
@alirezafathi
@jiajunwu_cs
Thomas Funkhouser
I am glad that our
#cvpr2020
reviews are very positive, but at the same time I am very worried that the quality of the reviews have significantly degraded compared to few years ago.
Congratulations to Yue Wang (research intern), Rui Huang (AI resident), Wanyue Zhang (AI resident) and
@_abhijit_kundu_
for getting their papers accepted to
#eccv2020
.
Here is our Google AI blog post on AVIS, a Large Language Model Agent that achieves state-of-the-art results on visual information seeking tasks.
@acbuller
@ahmetius
@jesu9
@CordeliaSchmid
Today on the blog, read all about AVIS — Autonomous Visual Information Seeking with Large Language Models — a novel method that iteratively employs a planner and reasoner to achieve state-of-the-art results on visual information seeking tasks →
Learn how REVEAL, an end-to-end retrieval-augmented visual-language model that learns to use multi-source multi-modal data to answer knowledge-intensive queries, achieves state-of-the-art results on visual question answering and image caption tasks.
Our ECCV paper on "Pillar-based Object Detection for Autonomous Driving" that achieves state of the art results on 3d object detection on the Waymo Open Dataset.
I am looking forward to Alireza Fathi presenting his research advancements at the Deep Learning 2.0 Virtual Summit, Jan 2021. Alireza is currently working on object detection and segmentation in 3D. Join us, and Alireza in January:
#computervision
It’s hard to think of a better place than
#Vancouver
for
#CVPR
2023. Beyond our strong team, it’s fitting that a conference on vision should take place in one of the most beautiful spots on earth. Check out our awesome bid
#AINorth
#AI
#computervision
🚀Introducing AVIS: a groundbreaking system that couples
#LLM
powered planning & reasoning with external tools, resulting in
#StateOfTheArt
performance on VQA datasets that demand external knowledge! 🧠🔍
AVIS: Autonomous Visual Information Seeking with Large Language Models
paper page:
In this paper, we propose an autonomous information seeking visual question answering framework, AVIS. Our method leverages a Large Language Model (LLM) to dynamically
One of the sad things during this pandemic is to observe the ugly gap between the rich and the poor. At the same time that the rich stays home and orders groceries online to avoid exposure, the poor shops those groceries in store and delivers them to make a living
I am sorry to see colleagues and friends getting affected by mass layoffs in recent days. Please reach out and I would try my best to help with any resources I can think of. Hopefully things will bounce back soon.
"3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation"
#CVPR2020
We perform SemInstSeg by proposal aggregation using a GraphConvNet to model higher-order proposal interactions!
Great results on ScanNet and S3DIS :)
@FrancisEngelman
Happy to introduce GERALD - our new VLM that recognizes 6M+ entities, an exciting step towards Web-scale visual entity recognition!
Predictions are simply made by auto-regressively decoding a code representing the entity name.
Check out our CVPR24 paper:
Want to learn deep RL? My deep RL course now has a permanent course number (CS285) and is being offered this semester:
Lecture videos here (so far, we've gotten through most of model-free RL, model-based RL coming up next):
New
#CVPR2023
paper "Improving Image Recognition by Retrieving from Web-Scale Image-Text Data".
We improve the recognition capabilities of the model by retrieving images/texts from large-scale memory. Joint work with
@alirezafathi
and
@CordeliaSchmid
.
These short Neurips reviews could be done by LLMs! Probably we don't need reviewers anymore...LLM would write the review and AC makes the decision by looking at the review and the paper!
Google's software engineering best practices facilitate consistency & productivity. All code is peer reviewed for clarity, correctness, and adherence to standards. We've just published these practices. Highly recommended for any lab, academic or otherwise.
Happy 25th Birthday Google! 🎉
I have gotten incredible enjoyement from being along for the ride for 24+ of these years. When I joined, we were a handful of people wedged into a small office area in downtown Palo Alto above what is now a T-Mobile store.
1/
New blogpost! Transformers from scratch.
Modern transformers are super simple, so we can explain them in a really straightforward manner. Includes pytorch code.
@docmilanfar
That probably is right. But raising $90M in the current environment where most startups are having a hard time raising any money is a very strong signal
@negar_rz
@3scorciav
@CVPR
@ICCVConference
I was thinking LLM mostly does a summarization and comparison to previous work. Not necessarily scoring the paper. This would make ACs job much easier, but AC would make the final decision by both looking at the summary and the paper itself.
Today, we're launching our Waymo Open Dataset. This high resolution lidar and camera data has been collected by our self-driving cars across a diverse range of situations. We're excited to share it directly with the research community. Download now:
Detecting deepfakes is one of the most important challenges ahead of us. Following our release of a synthetic audio dataset in Jan, we're releasing a large dataset of visual deepfakes to support researchers working on synthetic video detection
#GoogleAI
200 Billion galaxies in the observable universe, and each galaxy has on average 100 Million stars! Don't take your life so serious stressing out for things that do not even matter on multi-galaxy level!
I have a system to plan writing papers for conference deadlines. My students and some collaborators know about it. With the ICLR 2020 deadline coming up, I thought this might be a good time to share this with a wider audience.
I feel so out of touch with the people and what they care about around me. I thought I will look at Google trends to see what people are thinking about politics or economic situation, but I realized the main thing they care about at this moment is
#NFL
Rui Huang, Wanyue Zhang, Thomas Funkhouser, Abhijit Kundu, Caroline Pantofaru, David A Ross, Alireza Fathi,
An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds
Apparently based on statistics, Twitter is tilted towards young males with a college education, while Instagram is focused on female users in the 18 to 49 crowd, with a higher portion of people without a High School education. So makes sense for Meta to go after this market. What