Zekun Wang (ZenMoore) 🔥
@ZenMoore1
Followers
3K
Following
1K
Statuses
782
#LLM #MLLM #AGI Researcher. 🎞 Formerly: LangBoat, BAAI, https://t.co/8aGGEWD5j8 🎉 Currently: ByteDance, M-A-P 🤗 DM me for chit-chat and collaboration.
Beijing, China
Joined June 2020
Thanks for featuring our work! 🔥Introducing MIO, a foundation model integrating both multimodal understanding and generation. It can support four modalities: image, video (frame sequence), speech, and text. MIO natively supports multimodal interleaved output and ... 🧵(1/n)
Presents MIO, a foundation model built on multimodal tokens using causal multimodal modeling Demonstrates huge potential due to its any-to-any understanding and generation. Capabilities include interleaved video-text generation, chain-of-visual-thought reasoning, and visual guidelines generation.
4
1
28
RT @xiangyue96: Demystifying Long CoT Reasoning in LLMs Reasoning models like R1 / O1 / O3 have gained massive atte…
0
189
0
RT @xiangyue96: Introducing Critique Fine-Tuning (CFT): a more effective SFT method for enhancing LLMs' reasoning abilities. 📄 Paper: https…
0
72
0
RT @WenhuChen: Everyone is talking about RL these days. But are we done with SFT? The answer is NO. If we revive SFT in another form, it ca…
0
100
0
RT @abc43992899: 1/n: 🚀 Announcing YuE (乐) – the most powerful open-source full-song music generation model! 🎵 Tackle the lyrics-to-song ta…
0
157
0
RT @TsingYoga: OpenAI's Operator is super-cool, our UI-TARS is cool and open-sourced! UI-TARS actually beats Operator at a 15-step budget🫡…
0
8
0
RT @rohanpaul_ai: LLMs are all circuits and patterns Nice Paper for a long weekend read - "A Primer on the Inner Workings of Transformer-…
0
47
0
RT @sivil_taram: 🎉 Announcing the first Open Science for Foundation Models (SCI-FM) Workshop at #ICLR2025! Join us in advancing transparenc…
0
40
0
RT @gaotianyu1350: Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by s…
0
43
0
RT @realYushiBai: Introducing 📚 LongBench v2: A benchmark to assess the ability of LLMs to handle long-context problems requiring deep unde…
0
10
0
RT @ChiYeung_Law: 🚀 Introducing ScreenSpot-Pro – the first benchmark driving Multi-modal LLMs into high-resolution professional GUI-Agent a…
0
11
0
RT @_TobiasLee: 📢 VL-RewardBench Update 📢 👑 Gemini-2.0-flash-exp leads at 64.5%! Test-time thinking shows strong gains: • +11.2% Gemini-2-…
0
13
0
RT @rohanpaul_ai: AI-PERSONA turns generic LLMs into personal assistants that evolve with user interactions. The paper introduces a framew…
0
78
0
RT @kaifulee: PopAI is an AI tool that can read, write, search, and make presentations. Try it at: ! https://t.co/…
0
14
0
RT @Lihexl: 🎄Early Merry Christmas to all!🎄 Excited to share AGUVIS, one of the most elegant works I’ve been working on. Making changes wit…
0
2
0
RT @ShunyuYao12: When I was young, I sometimes worry how can humanity catch up with ever expanding science and knowledge — both in breadth…
0
4
0
RT @xueguang_ma: What can VLM brings to RAG beyond input modality change? For “R”, our DSE dropped the document processing and improved re…
0
25
0
RT @MaitrixOrg: 🚀LLM Reasoners now supports inference-time scaling approaches for web browsing🌐! Apply any advanced planning (MCTS, Beam S…
0
38
0