![Ani Baddepudi Profile](https://pbs.twimg.com/profile_images/1790466886913949697/4uB789dM_x96.jpg)
Ani Baddepudi
@AniBaddepudi
Followers
1K
Following
5K
Statuses
181
product, gemini vision & astra @googledeepmind
Joined May 2020
@Ansh_096 @OfficialLoganK Models on AI Studio and API process PDFs as images, working on it for the gemini app
1
0
1
@_cartick @OfficialLoganK thanks for the feedback!! curious to look at the failure cases to fix, would you be able to dm (or post here) any that you have at hand?
0
0
1
RT @OfficialLoganK: Gemini 2.0 Flash is so good at document processing that you can pretty much drop in replace entire OCR workflows with a…
0
89
0
RT @llama_index: LlamaParse now supports Gemini 2.0 Flash 🔥 - by far the cheapest model for high-quality document processing. Get GPT-4o+ p…
0
45
0
we haven't 100% solved complex parsing (chunking, bounding boxes etc) yet but we're getting super close! specialized pdf parsing systems are still SOTA for most tasks (and in the future will hopefully build on top of gemini), but the gemini upside today is it's 50x cheaper with comparable quality and will get much stronger in the coming months
4
0
44
RT @TheXeophon: @JeremyNguyenPhD buuuuuuuut Gemini 2.0 Pro finally gets the problem I've been bugging @AniBaddepudi so much about - judging…
0
1
0
@_arohan_ yeah -- and even comparing to the flash-8b baseline small model MM performance has progressed a ton since then
0
0
1
RT @kchonyc: gemini is pretty good now (Gemini Experimental 1206) and will help students (and me) a lot!
0
8
0
RT @trudypainter: 🧵 Photos → Creative Code using Gemini I built an experiment that turns photos into interactive @p5xjs sketches using Gem…
0
55
0
RT @NoamShazeer: We’ve boosted performance across challenging math, science, and multimodal reasoning benchmarks (AIME: 73.3%, GPQA: 74.2%,…
0
14
0
RT @m__dehghani: This was so much fun! Big thanks to @alexanderchen, @AniBaddepudi, and @riedelcastro for sharing some of these awesome dem…
0
4
0
RT @ai_for_success: Google has absolutely nailed this one! While I can’t share the results of my testing from Google Native Image output…
0
7
0
Yes! Gemini 1.5 & 2.0 are on-par, if not exceeding, other models (GPT-4o, Claude) on OCR and more complex transcription -- image to html, layout-preserving transcription etc -- for over 100x cheaper. see for an example of a prod use case transcription eval where both flash & pro exceed GPT-4o performance (with far more efficient tokenization). hard to trust benchmarks though, so would recommend trying out your use case on Also happy to help out offline, DMing now!
0
0
2
Long context is still massively underrated. With gemini, you can parse 1K+ pages of govt documents and find relevant sections from these bullet points to learn more -- an intelligent ⌘F
The U.S. Government is using taxpayer dollars to fund DEI initiatives: -HHS requested $113 million in 2024 for “training for diversity” in the health workforce. -The Department of Agriculture requested $3 million in 2024 to “establish The Diversity, Equity, and Inclusion office” - The Department of Labor requested $515,000 in 2024 to hire two full-time employees and provide for the necessary resources to “support diversity, equity, inclusion, and accessibility program and training initiatives” -The State Department requested $73.6 million for DEI for 2025 Sources:
1
1
26