xyh6666 Profile Banner
Yuhui Xu Profile
Yuhui Xu

@xyh6666

Followers
27
Following
83
Statuses
17

AI Researcher @ Salesforce

新加坡
Joined July 2014
Don't wanna be here? Send us removal request.
@xyh6666
Yuhui Xu
4 days
RT @karpathy: New 3h31m video on YouTube: "Deep Dive into LLMs like ChatGPT" This is a general audience deep dive into the Large Language…
0
3K
0
@xyh6666
Yuhui Xu
7 days
Excited to introduce Reward-Guided Speculative Decoding (RSD)—a novel framework designed to enhance the efficiency of large language model (LLM) inference by strategically balancing computational cost and output quality.
@hendrydong
Hanze Dong
7 days
Check out our work on Reward-Guided Speculative Decoding! 🚀 • Use PRM for reward-guided sampling — a mixture distribution • Prove binary weighting is optimal under budget constraints • Saves 4.4× FLOPs in STEM • Outperform speculative decoding 🔥💡
Tweet media one
3
1
3
@xyh6666
Yuhui Xu
7 days
RT @hendrydong: Check out our work on Reward-Guided Speculative Decoding! 🚀 • Use PRM for reward-guided sampling — a mixture distribution •…
0
17
0
@xyh6666
Yuhui Xu
3 months
RT @SFResearch: 💡 We revamped ThinK! 💡 Want to run bigger LLM batches on your GPU? 📎 Paper: 💻 Code: https://t.co/…
0
7
0
@xyh6666
Yuhui Xu
4 months
RT @NobelPrize: BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Chemistry with one half to…
0
9K
0
@xyh6666
Yuhui Xu
4 months
RT @NobelPrize: BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfiel…
0
14K
0
@xyh6666
Yuhui Xu
6 months
RT @silviocinguetta: Long sequences can be the Achilles' heel of LLMs. The ThinK method's 20% memory reduction without performance loss red…
0
4
0
@xyh6666
Yuhui Xu
6 months
Our paper ThinK has been selected as one of the Top ML Papers!
@dair_ai
DAIR.AI
6 months
The Top ML Papers of the Week (July 29 - August 4): - MindSearch - Refusal in LLMs - Constrained-CoT - Meta-Rewarding LLMs - Evaluating Persona Agents - Improved RAG with Self-Reasoning ...
0
0
1
@xyh6666
Yuhui Xu
6 months
RT @SFResearch: Increase #AIEfficiency with ThinK: the first channel pruning method designed for KV cache. By pruning 40-50% of key cache…
0
11
0
@xyh6666
Yuhui Xu
6 months
RT @CaimingXiong: It is very important to reduce KV cache memory consumption during long context inference. We introduce ThinK, a method t…
0
22
0
@xyh6666
Yuhui Xu
6 months
RT @ZeyuanAllenZhu: Incredibly honored and humbled by the overwhelming response to my tutorial, and thank you everyone who attended in pers…
0
188
0
@xyh6666
Yuhui Xu
6 months
Thanks for introducing our recent optimization method on KV cache optimization. The low-rank structure of attention weights is well known. Based on it, we find that a large portion of the channels of Key cache channels are redundant.
@omarsar0
elvis
6 months
This work proposes an approach to address inefficiencies in KV cache memory consumption. It focuses on the long-context scenarios and the inference side of things. It presents a query-dependent KV cache pruning method to minimize attention weight loss while selectively pruning the least significant channels. "Our approach not only maintains or enhances model accuracy but also achieves a reduction in memory costs by over 20% compared with vanilla KV cache eviction methods."
Tweet media one
0
3
2
@xyh6666
Yuhui Xu
6 months
RT @omarsar0: This work proposes an approach to address inefficiencies in KV cache memory consumption. It focuses on the long-context sce…
0
8
0
@xyh6666
Yuhui Xu
9 months
RT @MindBranches: Some useful prompting strategies here:
Tweet media one
0
380
0
@xyh6666
Yuhui Xu
1 year
RT @_akhaliq: Hugging Face Daily papers email of 27 Sep 2023 is out
Tweet media one
0
4
0
@xyh6666
Yuhui Xu
1 year
RT @_akhaliq: QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models paper page: Recently years…
0
84
0
@xyh6666
Yuhui Xu
11 years
Big differences between different countries' classes
0
0
1