Jitesh Jain @praeclarumjj profile

Jitesh Jain

@praeclarumjj

Followers

234

Following

2K

Statuses

211

CS PhD Student @ICatGT | Prev. Intern @MSFTResearch @PicsartAI | CSE'23 @iitroorkee 📖 Frequently Reading, 📝 Occasionally Writing

Joined December 2014

Don't wanna be here? Send us removal request.

Jitesh Jain

@praeclarumjj

2 months

💭 How do MLLMs improve their visual perception with more training data or visual inputs (depth/seg map)? 👉 Performance correlates strongly with “visual” representation quality in the LLM. 🤔 So, why not optimize these representations directly? 🚀 You guessed it—hola OLA-VLM!

Humphrey Shi

@humphrey_shi

2 months

Introducing OLA-VLM: a new paradigm to distilling vision knowledge into the hidden representations of LLMs, enhancing visual perception in multimodal systems. Learn more: GT x Microsoft collab by @praeclarumjj @zhengyuan_yang @JianfengGao0217 @jw2yang4ai

2

10

22

Jitesh Jain

@praeclarumjj

2 days

RT @yash2kant: 🚀 Introducing Pippo – our diffusion transformer pre-trained on 3B Human Images and post-trained with 400M high-res studio im…

0

33

0

Jitesh Jain

@praeclarumjj

2 months

Had been meaning to write this for a while. what's a good metric for PhD Students?

1

5

Jitesh Jain

@praeclarumjj

2 months

@ssnl_tz What are your views on the 3rd mindset or promising works in that direction?

0

Jitesh Jain

@praeclarumjj

2 months

@kchonyc Great and relatable blog! I have been thinking about writing something similar for a long time based on my conversations with my fellow students too, thanks for the motivation hehe

0

3

Jitesh Jain

@praeclarumjj

2 months

This is a great blog outlining the increased anxiety in PhD students, not only experienced by senior but also junior students ig. "this [incremental and stable improvements] is precisely the opposite of what [creative and innovative] PhD programs are designed to train them for."

Kyunghyun Cho

@kchonyc

2 months

feeling a bit under the weather this week … thus an increased level of activity on social media and blog:

0

1

Jitesh Jain

@praeclarumjj

2 months

RT @praeclarumjj: Exciting direction! In our OLA-VLM, we explored a similar idea, optimizing LLM features via auxiliary visual embedding…

0

3

0

Jitesh Jain

@praeclarumjj

2 months

complete OLA-VLM explainer thread: This is also quite relevant to previous works like REPA and I-JEPA from @sainingxie @ylecun Great to see that cross-modal training remains a promising direction! (also seen in works like SEED-LLaMA, Emu, DreamLLM, etc.)

Jitesh Jain

@praeclarumjj

2 months

💭 How do MLLMs improve their visual perception with more training data or visual inputs (depth/seg map)? 👉 Performance correlates strongly with “visual” representation quality in the LLM. 🤔 So, why not optimize these representations directly? 🚀 You guessed it—hola OLA-VLM!

0

Jitesh Jain

@praeclarumjj

2 months

However, I realize that MetaMorph only does IFT and doesn't do any llava-style PT. Still, this would be an interesting experiment.

0

2

Jitesh Jain

@praeclarumjj

2 months

RT @srush_nlp: Rare sincere tweet: December can be tough in academia. As a student I thought everyone had it together. As an advisor you s…

0

44

0

Jitesh Jain

@praeclarumjj

2 months

RT @thaoshibe: It costs $89-$199 for a poster printing Estimated $260.000-$597.000 for ~3k posters (main conference) $0.5M dollars go to t…

0

94

0

Jitesh Jain

@praeclarumjj

2 months

RT @jw2yang4ai: 🔥Check out our OLA-VLM! We took the first step to ask the VLMs not only decode the text tokens but also the visual tokens…

0

4

0

Jitesh Jain

@praeclarumjj

2 months

RT @jiachenl6: Check out the CuMo poster at the East Exhibit Hall A-C #3400 on Friday afternoon if you're into multimodal LLM! #NeurIPS2024…

0

1

0

Jitesh Jain

@praeclarumjj

2 months

RT @fionakryan: Introducing Gaze-LLE, a new model for gaze target estimation built on top of a frozen visual foundation model! Gaze-LLE ac…

0

486

0

Jitesh Jain

@praeclarumjj

2 months

RT @praeclarumjj: 💭 How do MLLMs improve their visual perception with more training data or visual inputs (depth/seg map)? 👉 Performance co…

0

10

0

Jitesh Jain

@praeclarumjj

2 months

RT @humphrey_shi: Introducing OLA-VLM: a new paradigm to distilling vision knowledge into the hidden representations of LLMs, enhancing vis…

0

23

0

Jitesh Jain

@praeclarumjj

2 months

@zhengyuan_yang @humphrey_shi @JianfengGao0217 @jw2yang4ai 🙏Lastly, I sincerely thank the GCR team at Microsoft for their support in helping me navigate the infrastructure challenges during my internship at MSR.

0