joanrod_ai Profile Banner
Juan A. Rodríguez 💫 Profile
Juan A. Rodríguez 💫

@joanrod_ai

Followers
244
Following
407
Statuses
221

PhD Student at @Mila_Quebec and @etsmtl and researching at @ServiceNowRSRCH in Montreal. Previously at UPF and UAB-CVC. Working on Multimodal Generative Models.

Montreal, Canada
Joined October 2022
Don't wanna be here? Send us removal request.
@joanrod_ai
Juan A. Rodríguez 💫
1 year
Thanks @_akhaliq for sharing our work! We introduce StarVector💫 a Large Language and Vision Model for generating SVG code, a new alternative to image vectorization! w/ @shubhamag1992, @ILaradji, @prlz77, @dvazquezcv, @chrisjpal and @marcopeddy 🧵👇
@_akhaliq
AK
1 year
StarVector: Generating Scalable Vector Graphics Code from Images paper page: Scalable Vector Graphics (SVGs) have become integral in modern image rendering applications due to their infinite scalability in resolution, versatile usability, and editing capabilities. SVGs are particularly popular in the fields of web development and graphic design. Existing approaches for SVG modeling using deep learning often struggle with generating complex SVGs and are restricted to simpler ones that require extensive processing and simplification. This paper introduces StarVector, a multimodal SVG generation model that effectively integrates Code Generation Large Language Models (CodeLLMs) and vision models. Our approach utilizes a CLIP image encoder to extract visual representations from pixel-based images, which are then transformed into visual tokens via an adapter module. These visual tokens are pre-pended to the SVG token embeddings, and the sequence is modeled by the StarCoder model using next-token prediction, effectively learning to align the visual and code tokens. This enables StarVector to generate unrestricted SVGs that accurately represent pixel images. To evaluate StarVector's performance, we present SVG-Bench, a comprehensive benchmark for evaluating SVG methods across multiple datasets and relevant metrics. Within this benchmark, we introduce novel datasets including SVG-Stack, a large-scale dataset of real-world SVG examples, and use it to pre-train StarVector as a large foundation model for SVGs. Our results demonstrate significant enhancements in visual quality and complexity handling over current methods, marking a notable advancement in SVG generation technology.
Tweet media one
7
30
91
@joanrod_ai
Juan A. Rodríguez 💫
9 days
🚀 Excited to introduce AlignVLM! We propose a new way to fuse images and text in VLMs using what we call an Align connector. 🔗 The Align connector projects vision features into a probability distribution over tokens, reusing the text embedding matrix to obtain visual tokens. This leads to strong performance across benchmarks! Check out our deep dive 🧵 Huge congrats to the amazing multimodal team at @ServiceNowRSRCH and @Mila_Quebec
@Ahmed_Masry97
Ahmed Masry
9 days
Happy to announce AlignVLM��: a novel approach to bridging vision and language latent spaces for multimodal understanding in VLMs! 🌍📄🖼️ 🔗 Read the paper: 🧵👇 Thread
Tweet media one
1
4
13
@joanrod_ai
Juan A. Rodríguez 💫
9 days
RT @iScienceLuvr: AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding "In this work, we propose a novel visi…
0
45
0
@joanrod_ai
Juan A. Rodríguez 💫
13 days
RT @ICCVConference: Check out the changes for #ICCV2025 🌶️
Tweet media one
0
27
0
@joanrod_ai
Juan A. Rodríguez 💫
15 days
RT @sivareddyg: We have been working on OpenAI Operator-like Web Agents since 2023. If you would like to make progress, WebLINX is one of t…
0
26
0
@joanrod_ai
Juan A. Rodríguez 💫
20 days
RT @MassCaccia: Ordering pizza is cute, but try filing an expense report in Concur :p Jokes aside, great UX, @OpenAI (1/2)
0
7
0
@joanrod_ai
Juan A. Rodríguez 💫
21 days
RT @RajeswarSai: We're happy to report that our paper "BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks"…
0
16
0
@joanrod_ai
Juan A. Rodríguez 💫
22 days
Excited to share that BigDocs has been accepted at @iclr_conf for #ICLR2025! Huge congratulations to our incredible team at @ServiceNowRSRCH and @Mila_Quebec. 🌟 See you in Singapore!
@joanrod_ai
Juan A. Rodríguez 💫
2 months
🎉 Excited to introduce BigDocs! An open, transparent multimodal dataset designed for: 📄 Documents 🌐 Web content 🖥️ GUI understanding 👨‍💻 Code generation from images We’re also launching BigDocs-Bench, featuring 10 tasks to test models on: ➡️ Document, Web, GUI Visual reasoning ➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more! 📜 Paper: 🌍 Website
Tweet media one
0
9
38
@joanrod_ai
Juan A. Rodríguez 💫
2 months
RT @pcastr: I wish these massive amounts of funds were allocated towards educating(training) kids around the world instead of training LLMs…
0
9
0
@joanrod_ai
Juan A. Rodríguez 💫
2 months
Strawberry breakthrough 🍓
@fchollet
François Chollet
2 months
Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.
Tweet media one
0
0
4
@joanrod_ai
Juan A. Rodríguez 💫
2 months
RT @maxime_gasse: How do LLMs deal with misinformation? The answer is: not very well, but a natural resilience seems to emerge with larger…
0
4
0
@joanrod_ai
Juan A. Rodríguez 💫
2 months
Too Big to Fool shows how larger LLMs resist misinformation by balancing internal world knowledge 🌍 with prompt input 📝, even against misleading cues. Proud to have collaborated with Mo on this exciting project��huge congrats to him for his leadership! 👏 Curious? Dive in: 🔗 📄
@M_R_Samsami
Mo Samsami
2 months
Larger models are more resilient to misinformation, thanks to their world model! 🌍 Introducing "Too Big to Fool: Resisting Deception in Language Models". Our paper shows that larger LLMs are less affected by deception and hypothesizes about the nature of this capability. 1/🧵
Tweet media one
1
1
6
@joanrod_ai
Juan A. Rodríguez 💫
2 months
RT @DimitrisPapail: I tried o1 pro mode (with best of N) on AIME 2024. It scored 93.3%. it got 14 out of 15 questions, on both I and II v…
0
24
0
@joanrod_ai
Juan A. Rodríguez 💫
2 months
RT @karpathy: Driving around SF. Omg this is crazy I can't believe there's billboards advertising cloud GPUs on the streets of SF, the hype…
0
184
0
@joanrod_ai
Juan A. Rodríguez 💫
2 months
@mhrnz_m 💫💫
0
0
0
@joanrod_ai
Juan A. Rodríguez 💫
2 months
We are presenting BigDocs today at the RBFM Workshop at #NeurIPS2024! 📍 West Meeting Room 217-219 ⏰ 9:50 AM - 10:45 AM & 2:45 PM - 3:30 PM Let’s chat about multimodal AI, Vision-Language Models, document understanding, code generation, or whatever excites you! 😊 Come say hi!
@joanrod_ai
Juan A. Rodríguez 💫
2 months
🎉 Excited to introduce BigDocs! An open, transparent multimodal dataset designed for: 📄 Documents 🌐 Web content 🖥️ GUI understanding 👨‍💻 Code generation from images We’re also launching BigDocs-Bench, featuring 10 tasks to test models on: ➡️ Document, Web, GUI Visual reasoning ➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more! 📜 Paper: 🌍 Website
Tweet media one
1
7
15
@joanrod_ai
Juan A. Rodríguez 💫
2 months
RT @DonaldShenaj: 🛸Excited to release 𝗟𝗼𝗥𝗔.𝗿𝗮𝗿, a groundbreaking method for personalized content and style image generation 🦕. 📜 Paper and…
0
3
0
@joanrod_ai
Juan A. Rodríguez 💫
2 months
@sibasmarak @ServiceNowRSRCH You really helped on making those H100s go brrrrrr 🔥 🔥
0
0
4
@joanrod_ai
Juan A. Rodríguez 💫
2 months
RT @DBahdanau: The most impactful open contribution one came make these days is data. Following the success of the The Stack datasets, here…
0
7
0
@joanrod_ai
Juan A. Rodríguez 💫
2 months
@alex_lacoste_ Let's make it happen!
0
0
0
@joanrod_ai
Juan A. Rodríguez 💫
2 months
Also, we are currently at NeurIPS in Vancouver! We will be presenting this work in the RBFM workshop on Saturday. Come say hi, and let’s spark some collaborations! 🚀
0
0
6