![Yiheng Xu Profile](https://pbs.twimg.com/profile_images/1733830700863033344/E6IzywZn_x96.jpg)
Yiheng Xu
@yihengxu_
Followers
685
Following
433
Statuses
141
ai agent research @hkuniversity | ex @msftresearch | layoutlm / lemur / aguvis / agenttrek / qwen vl agent | from automation to autonomy
C-137
Joined May 2020
RT @_zhihuixie: Introducing CTRL, a new framework that trains LLMs to critique via RL without human supervision or distillation, enabling t…
0
55
0
RT @lockonlvange: Introducing CodeI/O (, a systematic way to condense diverse reasoning patterns via code input-out…
0
39
0
RT @CaimingXiong: Introduce our new Reward-Guided Speculative Decoding, which saves 4.4× FLOPs in STEM. Could be a faster and better CTS so…
0
2
0
RT @huybery: Wow! The performance of Qwen2.5-Max in the Chatbot Arena is impressive! Especially in Coding and Math, where it has reached th…
0
43
0
RT @hendrydong: Check out our work on Reward-Guided Speculative Decoding! 🚀 • Use PRM for reward-guided sampling — a mixture distribution •…
0
18
0
Hope you find the quick start example for computer use helpful!
Announcing Qwen2.5-VL Cookbooks! 🧑🍳A collection of notebooks showcasing use cases of Qwen2.5-VL, include local model and API. Examples include Compute use, Spatial Understanding, Document Parsing, Mobile Agent, OCR, Universal Recognition, Video Understanding. 🔗Link: 💬 Qwen Chat: (choose Qwen2.5-VL-72B-Instruct as the model) ⚙️ API:
1
5
30
RT @yihengxu_: 1/ 🚀 Introducing AGUVIS: A unified, pure vision-based agent model for autonomous GUI interaction! It seamlessly operates acr…
0
60
0
Happy to teach Qwen2.5-VL agent capabilities to use computers and mobile devices :D Now Aguvis became Master Shifu. Waiting for the coming Dragon Warrior to bring autonomous intelligence to the next level? Stay tuned! 😉
🎉 恭喜发财🧧🐍 As we welcome the Chinese New Year, we're thrilled to announce the launch of Qwen2.5-VL , our latest flagship vision-language model! 🚀 💗 Qwen Chat: 📖 Blog: 🤗 Hugging Face: 🤖 ModelScope: 🌟 Key Highlights: * Visual Understanding : From flowers to complex charts, Qwen2.5-VL sees it all! * Agentic Capabilities : It’s a visual agent that can reason and interact with tools like computers & phones. * Long Video Comprehension : Captures events in videos over 1 hour long! ⏳🎥 * Precise Localization : Generates bounding boxes & JSON outputs for accurate object detection. * Structured Data Outputs : Perfect for finance & commerce, handling invoices, forms & more! 💼📊 Try Qwen2.5-VL now at Qwen Chat or explore models on Hugging Face & ModelScope . 🌐
2
11
79
RT @Alibaba_Qwen: 🎉 恭喜发财🧧🐍 As we welcome the Chinese New Year, we're thrilled to announce the launch of Qwen2.5-VL , our latest flagship vi…
0
555
0
RT @junxian_he: We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly…
0
667
0
RT @linzhengisme: 🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs w…
0
83
0
RT @TsingYoga: Check out our latest GUI Agent -> UI-TARS 🥳 A vision-language model surpasses GPT-4o & Claude Computer-Use Paper, code, mod…
0
39
0
RT @xingyaow_: I often get asked this question: Why is o1 not so good on OpenHands, but their official report shows a decent SWE-bench numb…
0
45
0
RT @CaimingXiong: Pure vision-based agent would be one of the best solution on cross-OS, platform/apps and websites.
0
3
0
RT @omarsar0: Aguvis is a pure vision GUI agent that works across web, desktop, and mobile platforms. It achieves strong performance throu…
0
24
0