![Juan A. Rodríguez 💫 Profile](https://pbs.twimg.com/profile_images/1733538287950671872/PKF2fTiT_x96.jpg)
Juan A. Rodríguez 💫
@joanrod_ai
Followers
244
Following
407
Statuses
221
PhD Student at @Mila_Quebec and @etsmtl and researching at @ServiceNowRSRCH in Montreal. Previously at UPF and UAB-CVC. Working on Multimodal Generative Models.
Montreal, Canada
Joined October 2022
Thanks @_akhaliq for sharing our work! We introduce StarVector💫 a Large Language and Vision Model for generating SVG code, a new alternative to image vectorization! w/ @shubhamag1992, @ILaradji, @prlz77, @dvazquezcv, @chrisjpal and @marcopeddy 🧵👇
StarVector: Generating Scalable Vector Graphics Code from Images paper page: Scalable Vector Graphics (SVGs) have become integral in modern image rendering applications due to their infinite scalability in resolution, versatile usability, and editing capabilities. SVGs are particularly popular in the fields of web development and graphic design. Existing approaches for SVG modeling using deep learning often struggle with generating complex SVGs and are restricted to simpler ones that require extensive processing and simplification. This paper introduces StarVector, a multimodal SVG generation model that effectively integrates Code Generation Large Language Models (CodeLLMs) and vision models. Our approach utilizes a CLIP image encoder to extract visual representations from pixel-based images, which are then transformed into visual tokens via an adapter module. These visual tokens are pre-pended to the SVG token embeddings, and the sequence is modeled by the StarCoder model using next-token prediction, effectively learning to align the visual and code tokens. This enables StarVector to generate unrestricted SVGs that accurately represent pixel images. To evaluate StarVector's performance, we present SVG-Bench, a comprehensive benchmark for evaluating SVG methods across multiple datasets and relevant metrics. Within this benchmark, we introduce novel datasets including SVG-Stack, a large-scale dataset of real-world SVG examples, and use it to pre-train StarVector as a large foundation model for SVGs. Our results demonstrate significant enhancements in visual quality and complexity handling over current methods, marking a notable advancement in SVG generation technology.
7
30
91
🚀 Excited to introduce AlignVLM! We propose a new way to fuse images and text in VLMs using what we call an Align connector. 🔗 The Align connector projects vision features into a probability distribution over tokens, reusing the text embedding matrix to obtain visual tokens. This leads to strong performance across benchmarks! Check out our deep dive 🧵 Huge congrats to the amazing multimodal team at @ServiceNowRSRCH and @Mila_Quebec
Happy to announce AlignVLM��: a novel approach to bridging vision and language latent spaces for multimodal understanding in VLMs! 🌍📄🖼️ 🔗 Read the paper: 🧵👇 Thread
1
4
13
RT @iScienceLuvr: AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding "In this work, we propose a novel visi…
0
45
0
RT @sivareddyg: We have been working on OpenAI Operator-like Web Agents since 2023. If you would like to make progress, WebLINX is one of t…
0
26
0
RT @MassCaccia: Ordering pizza is cute, but try filing an expense report in Concur :p Jokes aside, great UX, @OpenAI (1/2)
0
7
0
RT @RajeswarSai: We're happy to report that our paper "BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks"…
0
16
0
Excited to share that BigDocs has been accepted at @iclr_conf for #ICLR2025! Huge congratulations to our incredible team at @ServiceNowRSRCH and @Mila_Quebec. 🌟 See you in Singapore!
🎉 Excited to introduce BigDocs! An open, transparent multimodal dataset designed for: 📄 Documents 🌐 Web content 🖥️ GUI understanding 👨💻 Code generation from images We’re also launching BigDocs-Bench, featuring 10 tasks to test models on: ➡️ Document, Web, GUI Visual reasoning ➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more! 📜 Paper: 🌍 Website
0
9
38
RT @pcastr: I wish these massive amounts of funds were allocated towards educating(training) kids around the world instead of training LLMs…
0
9
0
Strawberry breakthrough 🍓
Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.
0
0
4
RT @maxime_gasse: How do LLMs deal with misinformation? The answer is: not very well, but a natural resilience seems to emerge with larger…
0
4
0
Too Big to Fool shows how larger LLMs resist misinformation by balancing internal world knowledge 🌍 with prompt input 📝, even against misleading cues. Proud to have collaborated with Mo on this exciting project��huge congrats to him for his leadership! 👏 Curious? Dive in: 🔗 📄
Larger models are more resilient to misinformation, thanks to their world model! 🌍 Introducing "Too Big to Fool: Resisting Deception in Language Models". Our paper shows that larger LLMs are less affected by deception and hypothesizes about the nature of this capability. 1/🧵
1
1
6
RT @DimitrisPapail: I tried o1 pro mode (with best of N) on AIME 2024. It scored 93.3%. it got 14 out of 15 questions, on both I and II v…
0
24
0
RT @karpathy: Driving around SF. Omg this is crazy I can't believe there's billboards advertising cloud GPUs on the streets of SF, the hype…
0
184
0
We are presenting BigDocs today at the RBFM Workshop at #NeurIPS2024! 📍 West Meeting Room 217-219 ⏰ 9:50 AM - 10:45 AM & 2:45 PM - 3:30 PM Let’s chat about multimodal AI, Vision-Language Models, document understanding, code generation, or whatever excites you! 😊 Come say hi!
🎉 Excited to introduce BigDocs! An open, transparent multimodal dataset designed for: 📄 Documents 🌐 Web content 🖥️ GUI understanding 👨💻 Code generation from images We’re also launching BigDocs-Bench, featuring 10 tasks to test models on: ➡️ Document, Web, GUI Visual reasoning ➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more! 📜 Paper: 🌍 Website
1
7
15
RT @DonaldShenaj: 🛸Excited to release 𝗟𝗼𝗥𝗔.𝗿𝗮𝗿, a groundbreaking method for personalized content and style image generation 🦕. 📜 Paper and…
0
3
0
RT @DBahdanau: The most impactful open contribution one came make these days is data. Following the success of the The Stack datasets, here…
0
7
0