Juan A. Rodríguez 💫 @joanrod_ai profile

Juan A. Rodríguez 💫

@joanrod_ai

Followers

244

Following

407

Statuses

221

PhD Student at @Mila_Quebec and @etsmtl and researching at @ServiceNowRSRCH in Montreal. Previously at UPF and UAB-CVC. Working on Multimodal Generative Models.

Montreal, Canada

Joined October 2022

Don't wanna be here? Send us removal request.

Juan A. Rodríguez 💫

@joanrod_ai

1 year

Thanks @_akhaliq for sharing our work! We introduce StarVector💫 a Large Language and Vision Model for generating SVG code, a new alternative to image vectorization! w/ @shubhamag1992, @ILaradji, @prlz77, @dvazquezcv, @chrisjpal and @marcopeddy 🧵👇

AK

@_akhaliq

1 year

StarVector: Generating Scalable Vector Graphics Code from Images paper page: Scalable Vector Graphics (SVGs) have become integral in modern image rendering applications due to their infinite scalability in resolution, versatile usability, and editing capabilities. SVGs are particularly popular in the fields of web development and graphic design. Existing approaches for SVG modeling using deep learning often struggle with generating complex SVGs and are restricted to simpler ones that require extensive processing and simplification. This paper introduces StarVector, a multimodal SVG generation model that effectively integrates Code Generation Large Language Models (CodeLLMs) and vision models. Our approach utilizes a CLIP image encoder to extract visual representations from pixel-based images, which are then transformed into visual tokens via an adapter module. These visual tokens are pre-pended to the SVG token embeddings, and the sequence is modeled by the StarCoder model using next-token prediction, effectively learning to align the visual and code tokens. This enables StarVector to generate unrestricted SVGs that accurately represent pixel images. To evaluate StarVector's performance, we present SVG-Bench, a comprehensive benchmark for evaluating SVG methods across multiple datasets and relevant metrics. Within this benchmark, we introduce novel datasets including SVG-Stack, a large-scale dataset of real-world SVG examples, and use it to pre-train StarVector as a large foundation model for SVGs. Our results demonstrate significant enhancements in visual quality and complexity handling over current methods, marking a notable advancement in SVG generation technology.

7

30

91

Juan A. Rodríguez 💫

@joanrod_ai

9 days

🚀 Excited to introduce AlignVLM! We propose a new way to fuse images and text in VLMs using what we call an Align connector. 🔗 The Align connector projects vision features into a probability distribution over tokens, reusing the text embedding matrix to obtain visual tokens. This leads to strong performance across benchmarks! Check out our deep dive 🧵 Huge congrats to the amazing multimodal team at @ServiceNowRSRCH and @Mila_Quebec

Ahmed Masry

@Ahmed_Masry97

9 days

Happy to announce AlignVLM��: a novel approach to bridging vision and language latent spaces for multimodal understanding in VLMs! 🌍📄🖼️ 🔗 Read the paper: 🧵👇 Thread

1

4

13

Juan A. Rodríguez 💫

@joanrod_ai

9 days

RT @iScienceLuvr: AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding "In this work, we propose a novel visi…

0

45

0

Juan A. Rodríguez 💫

@joanrod_ai

13 days

RT @ICCVConference: Check out the changes for #ICCV2025 🌶️

0

27

0

Juan A. Rodríguez 💫

@joanrod_ai

15 days

RT @sivareddyg: We have been working on OpenAI Operator-like Web Agents since 2023. If you would like to make progress, WebLINX is one of t…

0

26

0

Juan A. Rodríguez 💫

@joanrod_ai

20 days

RT @MassCaccia: Ordering pizza is cute, but try filing an expense report in Concur :p Jokes aside, great UX, @OpenAI (1/2)

0

7

0

Juan A. Rodríguez 💫

@joanrod_ai

21 days

RT @RajeswarSai: We're happy to report that our paper "BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks"…

0

16

0

Juan A. Rodríguez 💫

@joanrod_ai

22 days

Excited to share that BigDocs has been accepted at @iclr_conf for #ICLR2025! Huge congratulations to our incredible team at @ServiceNowRSRCH and @Mila_Quebec. 🌟 See you in Singapore!

Juan A. Rodríguez 💫

@joanrod_ai

2 months

🎉 Excited to introduce BigDocs! An open, transparent multimodal dataset designed for: 📄 Documents 🌐 Web content 🖥️ GUI understanding 👨‍💻 Code generation from images We’re also launching BigDocs-Bench, featuring 10 tasks to test models on: ➡️ Document, Web, GUI Visual reasoning ➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more! 📜 Paper: 🌍 Website

0

9

38

Juan A. Rodríguez 💫

@joanrod_ai

2 months

RT @pcastr: I wish these massive amounts of funds were allocated towards educating(training) kids around the world instead of training LLMs…

0

9

0

Juan A. Rodríguez 💫

@joanrod_ai

2 months

Strawberry breakthrough 🍓

François Chollet

@fchollet

2 months

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.

0

4

Juan A. Rodríguez 💫

@joanrod_ai

2 months

RT @maxime_gasse: How do LLMs deal with misinformation? The answer is: not very well, but a natural resilience seems to emerge with larger…

0

4

0

Juan A. Rodríguez 💫

@joanrod_ai

2 months

Too Big to Fool shows how larger LLMs resist misinformation by balancing internal world knowledge 🌍 with prompt input 📝, even against misleading cues. Proud to have collaborated with Mo on this exciting project��huge congrats to him for his leadership! 👏 Curious? Dive in: 🔗 📄

Mo Samsami

@M_R_Samsami

2 months

Larger models are more resilient to misinformation, thanks to their world model! 🌍 Introducing "Too Big to Fool: Resisting Deception in Language Models". Our paper shows that larger LLMs are less affected by deception and hypothesizes about the nature of this capability. 1/🧵

1

6

Juan A. Rodríguez 💫

@joanrod_ai

2 months

RT @DimitrisPapail: I tried o1 pro mode (with best of N) on AIME 2024. It scored 93.3%. it got 14 out of 15 questions, on both I and II v…

0

24

0

Juan A. Rodríguez 💫

@joanrod_ai

2 months

RT @karpathy: Driving around SF. Omg this is crazy I can't believe there's billboards advertising cloud GPUs on the streets of SF, the hype…

0

184

0

Juan A. Rodríguez 💫

@joanrod_ai

2 months

@mhrnz_m 💫💫

0

Juan A. Rodríguez 💫

@joanrod_ai

2 months

We are presenting BigDocs today at the RBFM Workshop at #NeurIPS2024! 📍 West Meeting Room 217-219 ⏰ 9:50 AM - 10:45 AM & 2:45 PM - 3:30 PM Let’s chat about multimodal AI, Vision-Language Models, document understanding, code generation, or whatever excites you! 😊 Come say hi!

Juan A. Rodríguez 💫

@joanrod_ai

2 months

🎉 Excited to introduce BigDocs! An open, transparent multimodal dataset designed for: 📄 Documents 🌐 Web content 🖥️ GUI understanding 👨‍💻 Code generation from images We’re also launching BigDocs-Bench, featuring 10 tasks to test models on: ➡️ Document, Web, GUI Visual reasoning ➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more! 📜 Paper: 🌍 Website

1

7

15

Juan A. Rodríguez 💫

@joanrod_ai

2 months

RT @DonaldShenaj: 🛸Excited to release 𝗟𝗼𝗥𝗔.𝗿𝗮𝗿, a groundbreaking method for personalized content and style image generation 🦕. 📜 Paper and…

0

3

0

Juan A. Rodríguez 💫

@joanrod_ai

2 months

@sibasmarak @ServiceNowRSRCH You really helped on making those H100s go brrrrrr 🔥 🔥

0

4

Juan A. Rodríguez 💫

@joanrod_ai

2 months

RT @DBahdanau: The most impactful open contribution one came make these days is data. Following the success of the The Stack datasets, here…

0

7

0

Juan A. Rodríguez 💫

@joanrod_ai

2 months

@alex_lacoste_ Let's make it happen!

0

Juan A. Rodríguez 💫

@joanrod_ai

2 months

Also, we are currently at NeurIPS in Vancouver! We will be presenting this work in the RBFM workshop on Saturday. Come say hi, and let’s spark some collaborations! 🚀

0

6