MuCai7 Profile Banner
Mu Cai @ Industry Job Market Profile
Mu Cai @ Industry Job Market

@MuCai7

Followers
578
Following
515
Statuses
157

Ph.D. student @WisconsinCS, Multimodal Large Language Models Will graduate around 2025 May, looking for Research Scientist position around multimodal models.

Madison, WI
Joined May 2019
Don't wanna be here? Send us removal request.
@MuCai7
Mu Cai @ Industry Job Market
2 months
🚨 I’ll be at #NeurIPS2024! 🚨On the industry job market this year and eager to connect in person! 🔍 My research explores multimodal learning, with a focus on object-level understanding and video understanding. 📜 3 papers at NeurIPS 2024: Workshop on Video-Language Models 📅 Sat, Dec 14 | 10:20 a.m. MST 1️⃣ TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models (Oral) 🔗 2️⃣ Matryoshka Multimodal Models 🔗 Main Conference Poster Session 📅 Thu, Dec 12 | 12:00–3:00 p.m. MST 📍 East Exhibit Hall A-C, #3706 3️⃣ Yo’LLaVA: Your Personalized Language and Vision Assistant 🔗 Check out my work: 🌐 My webpage Let’s chat if you’re around! 🚀
Tweet media one
5
22
138
@MuCai7
Mu Cai @ Industry Job Market
7 days
Want to use the simplest manner to apply multimodal model (LLaVA) to robotics task? Checkout LLaRA ( accepted to #ICLR2025 ), which you get a vision-language-action (VLA) policy! Joint work with @XiangLi54505720, @ryoo_michael, et al from Stony Brook U, and @yong_jae_lee
@XiangLi54505720
Xiang Li
8 days
(1/5) Excited to present our #ICLR2025 paper, LLaRA, at NYC CV Day! LLaRA efficiently transforms a pretrained Vision-Language Model (VLM) into a robot Vision-Language-Action (VLA) policy, even with a limited amount of training data. More details are in the thread. ⬇️
Tweet media one
Tweet media two
Tweet media three
0
11
58
@MuCai7
Mu Cai @ Industry Job Market
20 days
Two papers are accepted to @iclr_conf #iclr #ICLR2025 (1) Efficient Multimodal LLM — Matryoshka Multimodal Models   (2) Multimodal for Robotics — LLaRA: Supercharging Robot Learning Data for Vision-Language Model Policy  I’m graduating this spring and actively seeking an industry research scientist position focused on multimodal models. Please feel free to connect with me if you think my background aligns with your team’s needs. Here’s my homepage for more details:
Tweet media one
@MuCai7
Mu Cai @ Industry Job Market
9 months
Thanks for @_akhaliq 's sharing! (1/N) We propose M3: Matryoshka Multimodal Models, which (1) reduces the number of visual tokens significantly while maintaining as good performance as vanilla LMM (2) organizes visual tokens in a coarse-to-fine nested way.
2
11
73
@MuCai7
Mu Cai @ Industry Job Market
1 month
@cheryyun_l Congratulations! Really interesting work on applying visual prompts on VLA tasks!
1
0
3
@MuCai7
Mu Cai @ Industry Job Market
1 month
Thank you for organizing this event! I’m excited to give my talk on Friday. I’m graduating this spring and actively seeking an industry research scientist position focused on multimodal models. Please feel free to connect with me if you think my background aligns with your team’s needs. Here’s my homepage for more details:
@twelve_labs
Twelve Labs (twelvelabs.io)
1 month
~ New Webinar ~​​​​ In the 67th session of #MultimodalWeekly, we have three exciting presentations on multimodal benchmarks, video prediction, and multimodal video models.
Tweet media one
0
0
7
@MuCai7
Mu Cai @ Industry Job Market
2 months
@ccccrs_0908 Congratulations Ruisi!
0
0
2
@MuCai7
Mu Cai @ Industry Job Market
2 months
RT @xyz2maureen: 🔥Poster: Fri 13 Dec 4:30 pm - 7:30 pm PST (West) It is the first time for me try to sell a new concept that I believe but…
0
14
0
@MuCai7
Mu Cai @ Industry Job Market
2 months
@ren_hongyu @OpenAI Hi Hongyu, let us chat when you are free!
0
0
0
@MuCai7
Mu Cai @ Industry Job Market
2 months
@HuaizuJiang Hi huaizu, let us chat!
0
0
0
@MuCai7
Mu Cai @ Industry Job Market
2 months
@sherryx90099597 Because training data accumulated across history is biased😂
1
0
1
@MuCai7
Mu Cai @ Industry Job Market
2 months
@twelve_labs Thank you! See you on Saturday!
0
0
1
@MuCai7
Mu Cai @ Industry Job Market
2 months
@simon_zhai I vote for powerpoint!
1
0
2
@MuCai7
Mu Cai @ Industry Job Market
3 months
I am not in #EMNLP2024 but @bochengzou is in Florida! Go checkout vector graphics, a promising format that is completely different from pixels for visual representation. Thanks to LLMs, vector graphics are more powerful now! Go chat with @bochengzou if you are interested!
@MuCai7
Mu Cai @ Industry Job Market
5 months
VGBench is accepted to EMNLP main conference! Congratulations to the team @bochengzou @HyperStorm9682 @yong_jae_lee. The first work for "Evaluating Large Language Models on Vector Graphics Understanding and Generation" as a comprehensive benchmark!
0
1
9
@MuCai7
Mu Cai @ Industry Job Market
3 months
@HaoningTimothy @ChunyuanLi @zhang_yuanhan Hi @HaoningTimothy, we actually have results on more frames. See the table here:
1
0
1
@MuCai7
Mu Cai @ Industry Job Market
3 months
RT @jd92wang: Personal update: After 5.5 yrs at @MSFTResearch , I will join @williamandmary in 2025 to be an assistant professor. Welcome t…
0
23
0
@MuCai7
Mu Cai @ Industry Job Market
4 months
RT @zhang_yuanhan: Fine-grained temporal understanding is fundamental for any video understanding model. Excited to see LLaVA-Video showing…
0
13
0
@MuCai7
Mu Cai @ Industry Job Market
4 months
Great work on apply multi-granularity idea on image generation/manipulation! This share the same visual encoding design as our earlier work Matryoshka Multimodal Models (, where pooling is used to control visual granularity, leading to a multi visual-granularity LLaVA.
Tweet media one
@XihuiLiu
Xihui Liu
4 months
Introducing PUMA: a new MLLM for unified vision-language understanding and visual content generation at various granularities, from diverse text-to-image generation to precise image manipulation.
Tweet media one
Tweet media two
0
0
17