![Haoning Wu Profile](https://pbs.twimg.com/profile_images/1714647622383001600/0vBersds_x96.jpg)
Haoning Wu
@HaoningTimothy
Followers
710
Following
444
Statuses
238
PhD Nanyang Technological University๐ธ๐ฌ, BS @PKU1898
Singapore
Joined December 2020
We are releasing the BASE models of Aria! Aria-Base-64K (: after 64k long-context multimodal training, before post-training; Aria-Base-8K (: after 8k native multimodal pre-training, base of Base-64K. @DongxuLi_ @LiJunnan0409
2
21
83
RT @dyhTHU: ๐ฅ๐ฅIntroducing Ola! State-of-the-art omni-modal understanding model with advanced progressive modality alignment strategy! Ola rโฆ
0
29
0
RT @LiJunnan0409: Video-MMMU is a great benchmark with meticulous data collection and annotation processes. Very happy to see Aria rankingโฆ
0
2
0
RT @BoLi68567011: VideoMMMU is a meticulously crafted benchmark designed to evaluate multimodal modelsโ video understanding abilities for cโฆ
0
5
0
RT @BoLi68567011: After nearly a year of development, LMMs-Eval has reached 2K+ stars and 60+ contributors! ๐ Now with integrated image, vโฆ
0
9
0
Magic powers! Excellent work from my fellow colleagues. Noted that this model is fine-tuned from Aria-Base (, the base model of Aria, to reach optimal performance on UI tasks. Hope to see more domain-specific models fine-tuned from Aria-Base series!
๐ Introducing Aria-UI โ a cutting-edge grounding LMM for GUI agents with a lightning-fast 3.9B parameters activated backbone! ๐ Try it yourself: ๐ Project page: ๐ Explore on GitHub:
1
1
10
Glad to contribute to some milestones in this domain~
๐๐ช๐ง ๐ฃ๐๐ฌ๐๐จ๐ฉ, ๐ข๐ค๐จ๐ฉ ๐๐ค๐ข๐ฅ๐ง๐๐๐๐ฃ๐จ๐๐ซ๐ ๐จ๐ช๐ง๐ซ๐๐ฎ ๐ค๐ฃ ๐๐๐๐๐ค ๐๐ช๐๐ก๐๐ฉ๐ฎ ๐ผ๐จ๐จ๐๐จ๐จ๐ข๐๐ฃ๐ฉโled by my legendary advisor, Alan Bovik, who has pioneered this field for over three decades, and myself, dedicated my (almost) entire PhD journey to this topicโ๐๐ ๐๐๐ ๐๐๐๐ ๐๐ ๐๐๐ฟ๐๐! ๐ Paper: ๐ฅ๏ธ GitHub: In this work, weโve curated a panoramic, deeply-researched view of the Video Quality Assessment (VQA) landscape. We cover the evolution from classic methods to cutting-edge deep learning solutionsโoffering a clear guide for both newcomers and seasoned experts. ๐๐๐ฒ ๐ก๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ ๐ข๐ง๐๐ฅ๐ฎ๐๐: ๐ A holistic categorization and analysis of existing VQA models, with insights into how techniques have evolved and where theyโre headed. ๐ง A thorough look at subjective evaluation fundamentals, including major datasets and what they mean for real-world applications. ๐ค A deep dive into loss functions and architectural innovations, illuminating how modern frameworks are pushing the frontier of #VQA. ๐ Broad comparisons across emergent data types, shedding light on the importance of modeling spatiotemporal details and leveraging prior knowledge. ๐ฏ Real-world applications and future directions that underscore how these advancements can revolutionize streaming platforms, social media, and beyond. We hope this survey catalyzes new research avenues, encourages innovative solutions, and serves as a catalyst for the potential industry-university cooperation to foster fast and practical integration of such essential technologies into the social media, video streaming, or even the generative imagery/videography industry! ๐ Dive in, share your thoughts, and letโs drive the future of #VQA together!
4
0
8
RT @LiJunnan0409: Introducing ๐ฅAria-Chat๐ฅ, our latest multimodal chat model optimized for open-ended and multi-round dialogs! It outperformโฆ
0
5
0
RT @mervenoyann: VLMs go MoE โจ @deepseek_ai dropped three new commercially permissive vision LMs based on SigLIP encoder and their DeepSeeโฆ
0
27
0
RT @wenhaocha1: This is crazy. I hope itโs not cherry pick. Definitely another big step to โtrueโ later multimodal models for Gemini-2!
0
1
0
RT @LiJunnan0409: Excited to share that Aria is now officially supported by Transformers! Huge thanks to @AymericRoucher and the @huggingfaโฆ
0
2
0
RT @JustinLin610: ๐ I almost forgot we released something tonight... Yes, just the base models for Qwen2-VL lah. Not a big deal actually.โฆ
0
134
0