praeclarumjj Profile Banner
Jitesh Jain Profile
Jitesh Jain

@praeclarumjj

Followers
234
Following
2K
Statuses
211

CS PhD Student @ICatGT | Prev. Intern @MSFTResearch @PicsartAI | CSE'23 @iitroorkee đź“– Frequently Reading, đź“ť Occasionally Writing

Joined December 2014
Don't wanna be here? Send us removal request.
@praeclarumjj
Jitesh Jain
2 months
💭 How do MLLMs improve their visual perception with more training data or visual inputs (depth/seg map)? 👉 Performance correlates strongly with “visual” representation quality in the LLM. 🤔 So, why not optimize these representations directly? 🚀 You guessed it—hola OLA-VLM!
@humphrey_shi
Humphrey Shi
2 months
Introducing OLA-VLM: a new paradigm to distilling vision knowledge into the hidden representations of LLMs, enhancing visual perception in multimodal systems. Learn more: GT x Microsoft collab by @praeclarumjj @zhengyuan_yang @JianfengGao0217 @jw2yang4ai
Tweet media one
2
10
22
@praeclarumjj
Jitesh Jain
2 days
RT @yash2kant: 🚀 Introducing Pippo – our diffusion transformer pre-trained on 3B Human Images and post-trained with 400M high-res studio im…
0
33
0
@praeclarumjj
Jitesh Jain
2 months
Had been meaning to write this for a while. what's a good metric for PhD Students?
Tweet media one
1
1
5
@praeclarumjj
Jitesh Jain
2 months
@ssnl_tz What are your views on the 3rd mindset or promising works in that direction?
0
0
0
@praeclarumjj
Jitesh Jain
2 months
@kchonyc Great and relatable blog! I have been thinking about writing something similar for a long time based on my conversations with my fellow students too, thanks for the motivation hehe
0
0
3
@praeclarumjj
Jitesh Jain
2 months
This is a great blog outlining the increased anxiety in PhD students, not only experienced by senior but also junior students ig. "this [incremental and stable improvements] is precisely the opposite of what [creative and innovative] PhD programs are designed to train them for."
@kchonyc
Kyunghyun Cho
2 months
feeling a bit under the weather this week … thus an increased level of activity on social media and blog:
0
0
1
@praeclarumjj
Jitesh Jain
2 months
RT @praeclarumjj: Exciting direction! In our OLA-VLM, we explored a similar idea, optimizing LLM features via auxiliary visual embedding…
0
3
0
@praeclarumjj
Jitesh Jain
2 months
complete OLA-VLM explainer thread: This is also quite relevant to previous works like REPA and I-JEPA from @sainingxie @ylecun Great to see that cross-modal training remains a promising direction! (also seen in works like SEED-LLaMA, Emu, DreamLLM, etc.)
@praeclarumjj
Jitesh Jain
2 months
💭 How do MLLMs improve their visual perception with more training data or visual inputs (depth/seg map)? 👉 Performance correlates strongly with “visual” representation quality in the LLM. 🤔 So, why not optimize these representations directly? 🚀 You guessed it—hola OLA-VLM!
0
0
0
@praeclarumjj
Jitesh Jain
2 months
However, I realize that MetaMorph only does IFT and doesn't do any llava-style PT. Still, this would be an interesting experiment.
0
0
2
@praeclarumjj
Jitesh Jain
2 months
RT @srush_nlp: Rare sincere tweet: December can be tough in academia. As a student I thought everyone had it together. As an advisor you s…
0
44
0
@praeclarumjj
Jitesh Jain
2 months
RT @thaoshibe: It costs $89-$199 for a poster printing Estimated $260.000-$597.000 for ~3k posters (main conference) $0.5M dollars go to t…
0
94
0
@praeclarumjj
Jitesh Jain
2 months
RT @jw2yang4ai: 🔥Check out our OLA-VLM! We took the first step to ask the VLMs not only decode the text tokens but also the visual tokens…
0
4
0
@praeclarumjj
Jitesh Jain
2 months
RT @jiachenl6: Check out the CuMo poster at the East Exhibit Hall A-C #3400 on Friday afternoon if you're into multimodal LLM! #NeurIPS2024…
0
1
0
@praeclarumjj
Jitesh Jain
2 months
RT @fionakryan: Introducing Gaze-LLE, a new model for gaze target estimation built on top of a frozen visual foundation model! Gaze-LLE ac…
0
486
0
@praeclarumjj
Jitesh Jain
2 months
RT @praeclarumjj: 💭 How do MLLMs improve their visual perception with more training data or visual inputs (depth/seg map)? 👉 Performance co…
0
10
0
@praeclarumjj
Jitesh Jain
2 months
RT @humphrey_shi: Introducing OLA-VLM: a new paradigm to distilling vision knowledge into the hidden representations of LLMs, enhancing vis…
0
23
0
@praeclarumjj
Jitesh Jain
2 months
@zhengyuan_yang @humphrey_shi @JianfengGao0217 @jw2yang4ai 🙏Lastly, I sincerely thank the GCR team at Microsoft for their support in helping me navigate the infrastructure challenges during my internship at MSR.
0
0
0