Yiheng Xu @yihengxu_ profile

Yiheng Xu

@yihengxu_

Followers

685

Following

433

Statuses

141

ai agent research @hkuniversity | ex @msftresearch | layoutlm / lemur / aguvis / agenttrek / qwen vl agent | from automation to autonomy

C-137

Joined May 2020

Don't wanna be here? Send us removal request.

Yiheng Xu

@yihengxu_

2 months

1/ 🚀 Introducing AGUVIS: A unified, pure vision-based agent model for autonomous GUI interaction! It seamlessly operates across web, desktop, and mobile platforms. Hugging Face Collection: Project Page: Paper: GitHub: More details 🧵⬇️

7

60

151

Yiheng Xu

@yihengxu_

1 day

RT @_zhihuixie: Introducing CTRL, a new framework that trains LLMs to critique via RL without human supervision or distillation, enabling t…

0

55

0

Yiheng Xu

@yihengxu_

2 days

and 🐲🐲 @lockonlvange

Fan Zhou

@FaZhou_998

2 days

🐳🐳

0

3

Yiheng Xu

@yihengxu_

2 days

RT @lockonlvange: Introducing CodeI/O (, a systematic way to condense diverse reasoning patterns via code input-out…

0

39

0

Yiheng Xu

@yihengxu_

7 days

@johnschulman2 Wish you all the best John!

0

5

Yiheng Xu

@yihengxu_

10 days

RT @CaimingXiong: Introduce our new Reward-Guided Speculative Decoding, which saves 4.4× FLOPs in STEM. Could be a faster and better CTS so…

0

2

0

Yiheng Xu

@yihengxu_

10 days

RT @JustinLin610: Qwen2.5-Max now is ranked 7th!

0

13

0

Yiheng Xu

@yihengxu_

10 days

RT @huybery: Wow! The performance of Qwen2.5-Max in the Chatbot Arena is impressive! Especially in Coding and Math, where it has reached th…

0

43

0

Yiheng Xu

@yihengxu_

10 days

RT @hendrydong: Check out our work on Reward-Guided Speculative Decoding! 🚀 • Use PRM for reward-guided sampling — a mixture distribution •…

0

18

0

Yiheng Xu

@yihengxu_

14 days

Hope you find the quick start example for computer use helpful!

Qwen

@Alibaba_Qwen

15 days

Announcing Qwen2.5-VL Cookbooks! 🧑‍🍳A collection of notebooks showcasing use cases of Qwen2.5-VL, include local model and API. Examples include Compute use, Spatial Understanding, Document Parsing, Mobile Agent, OCR, Universal Recognition, Video Understanding. 🔗Link: 💬 Qwen Chat: (choose Qwen2.5-VL-72B-Instruct as the model) ⚙️ API:

1

5

30

Yiheng Xu

@yihengxu_

17 days

RT @yihengxu_: 1/ 🚀 Introducing AGUVIS: A unified, pure vision-based agent model for autonomous GUI interaction! It seamlessly operates acr…

0

60

0

Yiheng Xu

@yihengxu_

17 days

Happy to teach Qwen2.5-VL agent capabilities to use computers and mobile devices :D Now Aguvis became Master Shifu. Waiting for the coming Dragon Warrior to bring autonomous intelligence to the next level? Stay tuned! 😉

Qwen

@Alibaba_Qwen

17 days

🎉 恭喜发财🧧🐍 As we welcome the Chinese New Year, we're thrilled to announce the launch of Qwen2.5-VL , our latest flagship vision-language model! 🚀 💗 Qwen Chat: 📖 Blog: 🤗 Hugging Face: 🤖 ModelScope: 🌟 Key Highlights: * Visual Understanding : From flowers to complex charts, Qwen2.5-VL sees it all! * Agentic Capabilities : It’s a visual agent that can reason and interact with tools like computers & phones. * Long Video Comprehension : Captures events in videos over 1 hour long! ⏳🎥 * Precise Localization : Generates bounding boxes & JSON outputs for accurate object detection. * Structured Data Outputs : Perfect for finance & commerce, handling invoices, forms & more! 💼📊 Try Qwen2.5-VL now at Qwen Chat or explore models on Hugging Face & ModelScope . 🌐

2

11

79

Yiheng Xu

@yihengxu_

17 days

RT @Alibaba_Qwen: 🎉 恭喜发财🧧🐍 As we welcome the Chinese New Year, we're thrilled to announce the launch of Qwen2.5-VL , our latest flagship vi…

0

555

0

Yiheng Xu

@yihengxu_

19 days

RT @junxian_he: We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly…

0

667

0

Yiheng Xu

@yihengxu_

21 days

RT @hxiao: @abacaj so my reply went viral...let me add sth here. i know 幻方量化 high-flyer long time ago and even back in the late 2023 i hear…

0

67

0

Yiheng Xu

@yihengxu_

22 days

RT @ShunyuYao12: Does what you do scale

0

4

0

Yiheng Xu

@yihengxu_

23 days

RT @linzhengisme: 🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs w…

0

83

0

Yiheng Xu

@yihengxu_

23 days

RT @TsingYoga: Check out our latest GUI Agent -> UI-TARS 🥳 A vision-language model surpasses GPT-4o & Claude Computer-Use Paper, code, mod…

0

39

0

Yiheng Xu

@yihengxu_

1 month

RT @xingyaow_: I often get asked this question: Why is o1 not so good on OpenHands, but their official report shows a decent SWE-bench numb…

0

45

0

Yiheng Xu

@yihengxu_

1 month

RT @CaimingXiong: Pure vision-based agent would be one of the best solution on cross-OS, platform/apps and websites.

0

3

0

Yiheng Xu

@yihengxu_

1 month

RT @omarsar0: Aguvis is a pure vision GUI agent that works across web, desktop, and mobile platforms. It achieves strong performance throu…

0

24

0