📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SOTA, articulate, hyper-detailed descriptions.
arXiv: 🧵1/12
I’m happy to share that I’m transitioning to
@GoogleDeepMind
and continuing work on multimodal multilingual cross-cultural content understanding for Gemini !
Excited about the journey, collaborations and progress we make...
#Google
#GoogleDeepmind
📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SOTA, articulate, hyper-detailed descriptions.
arXiv: 🧵1/12
📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SOTA, articulate, hyper-detailed descriptions.
arXiv: 🧵1/12
A weekend of
#100DaysOfCode
awaits!!! What have u planned up?? Working on a library to make training models, adding new models, monitoring and versioning your experiments super easy using Tensorflow! Update to follow soon. Stay tuned.
#DeepLearning
#MachineLearning
#TensorFlow
We release a subset of IIW human- and model-annotated descriptions, as well as human SxS results on Human-Human and Model-Human sourced pairs of descriptions.
Available on:
GitHub:
Hugging Face:
🧵10/12
Dive into the details of our carefully designed seeded, sequential, human-in-the-loop annotation framework which builds on top of existing VLM functionality and uses granular object level human-authored descriptions to compose overall hyper-detailed image descriptions. 🧵3/12
@gowthami_s
my 2c would be to not base your metric on a review from 3 random (perhaps not expert themselves) people's understanding of your work.
if you believe in it and it has true merit.. it will get rewarded..either directly or becomes a base for future success! all the best
after 3 years with
@GumGumAILabs
, its time to move on. Its been a great experience and a huge learning curve, want to thank and wish
@GumGum
luck.
I would be joining Google Core Search team in MTV starting next week. Really excited and pumped up for it.
#GoogleSearch
#Google
We invite the community to:
Read the paper: Dive into our methodology, experiments, and results.
Share your feedback: We're eager to hear your thoughts and insights . Please reach out on iiw-dataset
@google
.com
#IIW
is an ongoing effort so look out for more!!🧵12/12
Who’s ready to take the 100 days of ML code challenge? That means coding machine learning for at least an hour everyday for the next 100 days. Pledge with the
#100DaysOfMLCode
hashtag, I’ll give the first few winners a shoutout!
Focusing on applications, we first evaluate the value of the VLM fine-tuned models on a T2I use-case by comparing the image reconstruction capability of a T2I model on inputs from the VLM fine-tuned models and demonstrate that IIW outputs come out consistently on the top. 🧵8/12
glad to see ImageInWords () still trends on
@huggingface
datasets () after 3 weeks of release!
looking forward to how the community learns and benefits from it!
Stay tuned for future updates and releases.
🧵:
📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SOTA, articulate, hyper-detailed descriptions.
arXiv: 🧵1/12
With the release of our IIW detailed annotation guideline, our goal is to seek community feedback and iteratively make them holistic, reduce human effort and dependency in the annotation process, and help shift the narrative from captions to descriptions.
🧵11/12
📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SOTA, articulate, hyper-detailed descriptions.
arXiv: 🧵1/12
IIW human-authored dataset achieves a new best-of-the-breed SOTA 🏆 status when compared to comparable prior works on a suite of automated readability based metrics and further validated through in-depth human-SxS analysis across multiple axes. 🧵6/12
The sequential annotation setup induces a human learning environment which results in higher annotation quality in lesser time. Compared to a sequential approach, we get an all-inclusive output verified and augmented by multiple humans with +20% words in -30% time. 🧵4/12
IIW fine-tuned PaLI-5B achieves considerable gains over fine-tuned PaLI-5B from comparable work. When compared to considerably larger models or human-annotations, the model size can be shown as a limitation. We aim to explore this further with iterations on larger models. 🧵7/12
We evaluate across automated and meticulously designed Human-SxS to focus on richness, readability, comprehensiveness, specificity, hallucinations, and human-likeness. The evaluations are run to evaluate both IIW human-authored and downstream finetuned VLM model outputs. 🧵5/12
@amuellerml
Averaging out the word embeddings for words on their Wikipedia page would be the simplest and a decent first step. Worked before for creating brand embeddings (creating embeddings for company names)
useful read: Learning to Learn from Weak Supervision by Full Supervision
Interesting idea for everyone struggling with a lack of labeled data. Has anyone tried something similar successfully before? Day 5
#100DaysOfCode
#DeepLearning
#DataAugmentation
useful read: Document Similarity for Texts of Varying Lengths via Hidden Topics …
Interesting approach: reconstructing wrds in a doc & summary using hidden topic vectors and comparing the approach to Doc2Vec & WMD.
#DeepLearning
#NLProc
#100DaysOfMLCode
We evaluate IIW model-generated descriptions for compositional reasoning. IIW is able to improve accuracy on vision-language compositional reasoning benchmarks ARO and Winoground by several points compared to prior work by generating finer grained descriptions. 🧵9/12
📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SOTA, articulate, hyper-detailed descriptions.
arXiv: 🧵1/12
📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SOTA, articulate, hyper-detailed descriptions.
arXiv: 🧵1/12
@fly51fly
thank you
@fly51fly
for sharing our work!!
We aim for the work to push the boundaries forward towards mode holistic image descriptions. Feedback welcomed.
for reference, full 🧵:
📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SOTA, articulate, hyper-detailed descriptions.
arXiv: 🧵1/12
trying out the `.Dataset` functionality for text data. any pointers to a complete code snippet for a network with variable length inputs to be fed into a placeholder?
#AskTensorFlow
@TensorFlow
@mrry
I never thought I would say this but
@SouthwestAir
you have disappointed beyond repair.. flight getting cancelled with the crew onboard, passages queued up for boarding and that too on
#Christmas
@SouthwestAir
are you expecting passengers to stand in lines for hours to rebook flights with no clarification or update on what to even expect(a hotel credit/voucher?Flight in 2 days?) ? Since online rebooking doesn't work on ur website or app currently and customer care is MIA!
@RealJosephus
maintaining quality across different image distributions is important, avg human struggles as well across domains they are less familiar with.
you bring up a completely valid point.
part of the purpose of the paper was to get this conversation started 👍
@RealJosephus
absolutely!
@RealJosephus
thank you for exploring and looking forward to how you use it in the future!
length sadly is a very fickle metric, one can have a lot of filter phrases like `this image shows`, `in this image we see`, etc which add no value/richness but inflate the
Customer care is down... Line to rebook is 4hr+... I stood in it for 2hr, left in disgust..came back for curiosity after 2hrs of rampant attempts to book another airline and still saw the folks who stood in front me still in line. I have DMed you our flight details
@SouthwestAir
interesting start to the new year... first patent approved, filed back in 2017!
Title: Automated classification of network-accessible content based on events
Link:
Looking forward and hoping for a high productivity 2020.
#nlproc
#ML
@mikeyoung44
@aimodelsfyi
thank you
@mikeyoung44
! There are some inaccuracies that should be fixed to avoid confusion🙂
Sizes of the released datasets:
For coverage bias: the datasets are indeed collected from diverse domains/topics.
Use-cases: refer Section 4.5 & 4.6