yihengxu_ Profile Banner
Yiheng Xu Profile
Yiheng Xu

@yihengxu_

Followers
685
Following
433
Statuses
141

ai agent research @hkuniversity | ex @msftresearch | layoutlm / lemur / aguvis / agenttrek / qwen vl agent | from automation to autonomy

C-137
Joined May 2020
Don't wanna be here? Send us removal request.
@yihengxu_
Yiheng Xu
2 months
1/ 🚀 Introducing AGUVIS: A unified, pure vision-based agent model for autonomous GUI interaction! It seamlessly operates across web, desktop, and mobile platforms. Hugging Face Collection: Project Page: Paper: GitHub: More details 🧵⬇️
7
60
151
@yihengxu_
Yiheng Xu
1 day
RT @_zhihuixie: Introducing CTRL, a new framework that trains LLMs to critique via RL without human supervision or distillation, enabling t…
0
55
0
@yihengxu_
Yiheng Xu
2 days
and 🐲🐲 @lockonlvange
@FaZhou_998
Fan Zhou
2 days
🐳🐳
0
0
3
@yihengxu_
Yiheng Xu
2 days
RT @lockonlvange: Introducing CodeI/O (, a systematic way to condense diverse reasoning patterns via code input-out…
0
39
0
@yihengxu_
Yiheng Xu
7 days
@johnschulman2 Wish you all the best John!
0
0
5
@yihengxu_
Yiheng Xu
10 days
RT @CaimingXiong: Introduce our new Reward-Guided Speculative Decoding, which saves 4.4× FLOPs in STEM. Could be a faster and better CTS so…
0
2
0
@yihengxu_
Yiheng Xu
10 days
RT @JustinLin610: Qwen2.5-Max now is ranked 7th!
0
13
0
@yihengxu_
Yiheng Xu
10 days
RT @huybery: Wow! The performance of Qwen2.5-Max in the Chatbot Arena is impressive! Especially in Coding and Math, where it has reached th…
0
43
0
@yihengxu_
Yiheng Xu
10 days
RT @hendrydong: Check out our work on Reward-Guided Speculative Decoding! 🚀 • Use PRM for reward-guided sampling — a mixture distribution •…
0
18
0
@yihengxu_
Yiheng Xu
14 days
Hope you find the quick start example for computer use helpful!
Tweet media one
@Alibaba_Qwen
Qwen
15 days
Announcing Qwen2.5-VL Cookbooks! 🧑‍🍳A collection of notebooks showcasing use cases of Qwen2.5-VL, include local model and API. Examples include Compute use, Spatial Understanding, Document Parsing, Mobile Agent, OCR, Universal Recognition, Video Understanding. 🔗Link: 💬 Qwen Chat: (choose Qwen2.5-VL-72B-Instruct as the model) ⚙️ API:
Tweet media one
1
5
30
@yihengxu_
Yiheng Xu
17 days
RT @yihengxu_: 1/ 🚀 Introducing AGUVIS: A unified, pure vision-based agent model for autonomous GUI interaction! It seamlessly operates acr…
0
60
0
@yihengxu_
Yiheng Xu
17 days
Happy to teach Qwen2.5-VL agent capabilities to use computers and mobile devices :D Now Aguvis became Master Shifu. Waiting for the coming Dragon Warrior to bring autonomous intelligence to the next level? Stay tuned! 😉
Tweet media one
@Alibaba_Qwen
Qwen
17 days
🎉 恭喜发财🧧🐍 As we welcome the Chinese New Year, we're thrilled to announce the launch of Qwen2.5-VL , our latest flagship vision-language model! 🚀 💗 Qwen Chat: 📖 Blog: 🤗 Hugging Face: 🤖 ModelScope: 🌟 Key Highlights: * Visual Understanding : From flowers to complex charts, Qwen2.5-VL sees it all! * Agentic Capabilities : It’s a visual agent that can reason and interact with tools like computers & phones. * Long Video Comprehension : Captures events in videos over 1 hour long! ⏳🎥 * Precise Localization : Generates bounding boxes & JSON outputs for accurate object detection. * Structured Data Outputs : Perfect for finance & commerce, handling invoices, forms & more! 💼📊 Try Qwen2.5-VL now at Qwen Chat or explore models on Hugging Face & ModelScope . 🌐
2
11
79
@yihengxu_
Yiheng Xu
17 days
RT @Alibaba_Qwen: 🎉 恭喜发财🧧🐍 As we welcome the Chinese New Year, we're thrilled to announce the launch of Qwen2.5-VL , our latest flagship vi…
0
555
0
@yihengxu_
Yiheng Xu
19 days
RT @junxian_he: We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly…
0
667
0
@yihengxu_
Yiheng Xu
21 days
RT @hxiao: @abacaj so my reply went viral...let me add sth here. i know 幻方量化 high-flyer long time ago and even back in the late 2023 i hear…
0
67
0
@yihengxu_
Yiheng Xu
22 days
RT @ShunyuYao12: Does what you do scale
0
4
0
@yihengxu_
Yiheng Xu
23 days
RT @linzhengisme: 🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs w…
0
83
0
@yihengxu_
Yiheng Xu
23 days
RT @TsingYoga: Check out our latest GUI Agent -> UI-TARS 🥳 A vision-language model surpasses GPT-4o & Claude Computer-Use Paper, code, mod…
0
39
0
@yihengxu_
Yiheng Xu
1 month
RT @xingyaow_: I often get asked this question: Why is o1 not so good on OpenHands, but their official report shows a decent SWE-bench numb…
0
45
0
@yihengxu_
Yiheng Xu
1 month
RT @CaimingXiong: Pure vision-based agent would be one of the best solution on cross-OS, platform/apps and websites.
0
3
0
@yihengxu_
Yiheng Xu
1 month
RT @omarsar0: Aguvis is a pure vision GUI agent that works across web, desktop, and mobile platforms. It achieves strong performance throu…
0
24
0