Yuhui Xu @xyh6666 profile

Yuhui Xu

@xyh6666

Followers

27

Following

83

Statuses

17

AI Researcher @ Salesforce

新加坡

Joined July 2014

Don't wanna be here? Send us removal request.

Yuhui Xu

@xyh6666

4 days

RT @karpathy: New 3h31m video on YouTube: "Deep Dive into LLMs like ChatGPT" This is a general audience deep dive into the Large Language…

0

3K

0

Yuhui Xu

@xyh6666

7 days

Excited to introduce Reward-Guided Speculative Decoding (RSD)—a novel framework designed to enhance the efficiency of large language model (LLM) inference by strategically balancing computational cost and output quality.

Hanze Dong

@hendrydong

7 days

Check out our work on Reward-Guided Speculative Decoding! 🚀 • Use PRM for reward-guided sampling — a mixture distribution • Prove binary weighting is optimal under budget constraints • Saves 4.4× FLOPs in STEM • Outperform speculative decoding 🔥💡

3

1

3

Yuhui Xu

@xyh6666

7 days

RT @hendrydong: Check out our work on Reward-Guided Speculative Decoding! 🚀 • Use PRM for reward-guided sampling — a mixture distribution •…

0

17

0

Yuhui Xu

@xyh6666

3 months

RT @SFResearch: 💡 We revamped ThinK! 💡 Want to run bigger LLM batches on your GPU? 📎 Paper: 💻 Code: https://t.co/…

0

7

0

Yuhui Xu

@xyh6666

4 months

RT @NobelPrize: BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to…

0

9K

0

Yuhui Xu

@xyh6666

4 months

RT @NobelPrize: BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfiel…

0

14K

0

Yuhui Xu

@xyh6666

6 months

RT @silviocinguetta: Long sequences can be the Achilles' heel of LLMs. The ThinK method's 20% memory reduction without performance loss red…

0

4

0

Yuhui Xu

@xyh6666

6 months

Our paper ThinK has been selected as one of the Top ML Papers!

DAIR.AI

@dair_ai

6 months

The Top ML Papers of the Week (July 29 - August 4): - MindSearch - Refusal in LLMs - Constrained-CoT - Meta-Rewarding LLMs - Evaluating Persona Agents - Improved RAG with Self-Reasoning ...

0

1

Yuhui Xu

@xyh6666

6 months

RT @SFResearch: Increase #AIEfficiency with ThinK: the first channel pruning method designed for KV cache. By pruning 40-50% of key cache…

0

11

0

Yuhui Xu

@xyh6666

6 months

RT @CaimingXiong: It is very important to reduce KV cache memory consumption during long context inference. We introduce ThinK, a method t…

0

22

0

Yuhui Xu

@xyh6666

6 months

RT @ZeyuanAllenZhu: Incredibly honored and humbled by the overwhelming response to my tutorial, and thank you everyone who attended in pers…

0

188

0

Yuhui Xu

@xyh6666

6 months

Thanks for introducing our recent optimization method on KV cache optimization. The low-rank structure of attention weights is well known. Based on it, we find that a large portion of the channels of Key cache channels are redundant.

elvis

@omarsar0

6 months

This work proposes an approach to address inefficiencies in KV cache memory consumption. It focuses on the long-context scenarios and the inference side of things. It presents a query-dependent KV cache pruning method to minimize attention weight loss while selectively pruning the least significant channels. "Our approach not only maintains or enhances model accuracy but also achieves a reduction in memory costs by over 20% compared with vanilla KV cache eviction methods."

0

3

2

Yuhui Xu

@xyh6666

6 months

RT @omarsar0: This work proposes an approach to address inefficiencies in KV cache memory consumption. It focuses on the long-context sce…

0

8

0

Yuhui Xu

@xyh6666

9 months

RT @MindBranches: Some useful prompting strategies here:

0

380

0

Yuhui Xu

@xyh6666

1 year

RT @_akhaliq: Hugging Face Daily papers email of 27 Sep 2023 is out

0

4

0

Yuhui Xu

@xyh6666

1 year

RT @_akhaliq: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models paper page: Recently years…

0

84

0

Yuhui Xu

@xyh6666

11 years

Big differences between different countries' classes

0

1