Haoning Wu @HaoningTimothy profile

Haoning Wu

@HaoningTimothy

Followers

710

Following

444

Statuses

238

PhD Nanyang Technological University🇸🇬, BS @PKU1898

Singapore

Joined December 2020

Don't wanna be here? Send us removal request.

Haoning Wu

@HaoningTimothy

2 months

We are releasing the BASE models of Aria! Aria-Base-64K (: after 64k long-context multimodal training, before post-training; Aria-Base-8K (: after 8k native multimodal pre-training, base of Base-64K. @DongxuLi_ @LiJunnan0409

2

21

83

Haoning Wu

@HaoningTimothy

4 days

RT @dyhTHU: 🔥🔥Introducing Ola! State-of-the-art omni-modal understanding model with advanced progressive modality alignment strategy! Ola r…

0

29

0

Haoning Wu

@HaoningTimothy

18 days

RT @LiJunnan0409: Video-MMMU is a great benchmark with meticulous data collection and annotation processes. Very happy to see Aria ranking…

0

2

0

Haoning Wu

@HaoningTimothy

18 days

RT @BoLi68567011: VideoMMMU is a meticulously crafted benchmark designed to evaluate multimodal models’ video understanding abilities for c…

0

5

0

Haoning Wu

@HaoningTimothy

1 month

My posters in 2024.

2

0

14

Haoning Wu

@HaoningTimothy

1 month

@_TobiasLee Hi Lei, which row shall I regard as final aggregated score?

1

0

Haoning Wu

@HaoningTimothy

1 month

RT @BoLi68567011: After nearly a year of development, LMMs-Eval has reached 2K+ stars and 60+ contributors! 🚀 Now with integrated image, v…

0

9

0

Haoning Wu

@HaoningTimothy

1 month

Glad to be one among these!

Haodong Duan

@KennyUTC

2 months

After 1yr of Building VLMEvalKit now reaches 100+ Contributors On the journey of exploring LMM capabilities, we will go further

0

3

Haoning Wu

@HaoningTimothy

2 months

Magic powers! Excellent work from my fellow colleagues. Noted that this model is fine-tuned from Aria-Base (, the base model of Aria, to reach optimal performance on UI tasks. Hope to see more domain-specific models fine-tuned from Aria-Base series!

Yuhao Yang

@itsyuhao

2 months

🚀 Introducing Aria-UI – a cutting-edge grounding LMM for GUI agents with a lightning-fast 3.9B parameters activated backbone! 🌐 Try it yourself: 📄 Project page: 📂 Explore on GitHub:

1

10

Haoning Wu

@HaoningTimothy

2 months

Glad to contribute to some milestones in this domain~

Zhengzhong Tu

@_vztu

2 months

𝙊𝙪𝙧 𝙣𝙚𝙬𝙚𝙨𝙩, 𝙢𝙤𝙨𝙩 𝙘𝙤𝙢𝙥𝙧𝙚𝙝𝙚𝙣𝙨𝙞𝙫𝙚 𝙨𝙪𝙧𝙫𝙚𝙮 𝙤𝙣 𝙑𝙞𝙙𝙚𝙤 𝙌𝙪𝙖𝙡𝙞𝙩𝙮 𝘼𝙨𝙨𝙚𝙨𝙨𝙢𝙚𝙣𝙩—led by my legendary advisor, Alan Bovik, who has pioneered this field for over three decades, and myself, dedicated my (almost) entire PhD journey to this topic—𝒊𝒔 𝒏𝒐𝒘 𝒍𝒊𝒗𝒆 𝒐𝒏 𝒂𝒓𝑿𝒊𝒗! 📎 Paper: 🖥️ GitHub: In this work, we’ve curated a panoramic, deeply-researched view of the Video Quality Assessment (VQA) landscape. We cover the evolution from classic methods to cutting-edge deep learning solutions—offering a clear guide for both newcomers and seasoned experts. 𝐊𝐞𝐲 𝐡𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐢𝐧𝐜𝐥𝐮𝐝𝐞: 📚 A holistic categorization and analysis of existing VQA models, with insights into how techniques have evolved and where they’re headed. 🧐 A thorough look at subjective evaluation fundamentals, including major datasets and what they mean for real-world applications. 🤖 A deep dive into loss functions and architectural innovations, illuminating how modern frameworks are pushing the frontier of #VQA. 📊 Broad comparisons across emergent data types, shedding light on the importance of modeling spatiotemporal details and leveraging prior knowledge. 🎯 Real-world applications and future directions that underscore how these advancements can revolutionize streaming platforms, social media, and beyond. We hope this survey catalyzes new research avenues, encourages innovative solutions, and serves as a catalyst for the potential industry-university cooperation to foster fast and practical integration of such essential technologies into the social media, video streaming, or even the generative imagery/videography industry! 🚀 Dive in, share your thoughts, and let’s drive the future of #VQA together!

4

0

8

Haoning Wu

@HaoningTimothy

2 months

RT @LiJunnan0409: Introducing 🔥Aria-Chat🔥, our latest multimodal chat model optimized for open-ended and multi-round dialogs! It outperform…

0

5

0

Haoning Wu

@HaoningTimothy

2 months

RT @mervenoyann: VLMs go MoE ✨ @deepseek_ai dropped three new commercially permissive vision LMs based on SigLIP encoder and their DeepSee…

0

27

0

Haoning Wu

@HaoningTimothy

2 months

Hiking in a skiing resort in Vancouver. #NeurIPS2024

0

6

Haoning Wu

@HaoningTimothy

2 months

RT @wenhaocha1: This is crazy. I hope it’s not cherry pick. Definitely another big step to “true” later multimodal models for Gemini-2!

0

1

0

Haoning Wu

@HaoningTimothy

2 months

Finally a really capable any2any model in 2024!

Omar Sanseviero

@osanseviero

2 months

Gemini 2.0 Flash is out and it has lots of exciting things - Audio (multilingual)+image generation - Image editing - Multimodal real-time API - 2D/3D spatial understanding 🤯 - Great code capabilities Try it: Docs:

0

2

Haoning Wu

@HaoningTimothy

2 months

RT @JustinLin610: I advise you to look at this. This is more huge for me!

0

5

0

Haoning Wu

@HaoningTimothy

2 months

Goodbye Singapore and see you in Vancouver! #NeurIPS2024

0

20

Haoning Wu

@HaoningTimothy

2 months

@MuCai7 Looking forward to discussions at the Workshop!

0

3

Haoning Wu

@HaoningTimothy

2 months

Looking forward to exciting discussions!

Li Junnan

@LiJunnan0409

2 months

I'll be at #NeurIPS 2024. Join me at the LongVideoBench Poster (East Exhibit Hall A-C, #4611) on Wed, Dec 11, 4:30–7:30 PM. Let’s chat about all things related to multimodal research!

0

1

Haoning Wu

@HaoningTimothy

2 months

RT @LiJunnan0409: Excited to share that Aria is now officially supported by Transformers! Huge thanks to @AymericRoucher and the @huggingfa…

0

2

0

Haoning Wu

@HaoningTimothy

2 months

RT @JustinLin610: 😓 I almost forgot we released something tonight... Yes, just the base models for Qwen2-VL lah. Not a big deal actually.…

0

134

0