Zexuan Zhong @ZexuanZhong profile

Zexuan Zhong

@ZexuanZhong

Followers

1,549

Following

635

Media

28

Statuses

128

@xAI | PhD student @PrincetonCS

https://t.co/kwZsnMTkSu

Joined October 2020

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

#خلصوا_صفقات_الهلال1 • 679538 Tweets

ラピュタ • 400960 Tweets

Atatürk • 378690 Tweets

#พรชีวันEP15 • 262598 Tweets

Johnny • 211552 Tweets

Megan • 208387 Tweets

Sancho • 144476 Tweets

MEGTAN IS COMING • 124393 Tweets

RM IS COMING • 117515 Tweets

olivia • 116132 Tweets

namjoon • 113028 Tweets

#初音ミク誕生祭2024 • 99083 Tweets

#4MINUTES_EP6 • 78959 Tweets

#バルス祭り • 60562 Tweets

Labor Day • 47215 Tweets

كاس العالم • 45542 Tweets

CHERISH TWENTY WITH WONYOUNG • 45148 Tweets

ミクさん • 44503 Tweets

ムスカ大佐 • 40176 Tweets

#フロイニ • 30348 Tweets

ŹOOĻ記念日 • 23985 Tweets

ミクちゃん • 21751 Tweets

Día Internacional • 19189 Tweets

滅びの呪文 • 17882 Tweets

Javier Acosta • 16877 Tweets

Ramírez • 16338 Tweets

ロボット兵 • 13588 Tweets

ナウシカ • 13577 Tweets

Lolla • 13199 Tweets

Lo Celso • 11040 Tweets

ゴールボール

ジブリ作品

Alanis

ゴリアテ

夏目友人帳

目がぁぁぁぁ

Napa

Carrillo

Enner

राष्ट्रीय सचिव

AFFAIR EP1

Parrales

第953回

優勝予想

照史くん

Elanga

Fatih Tekke

Şenol Güneş

ベイマックス

Justin Timberlake

Last Seen Profiles

@Alayjhas_planet

@1stC_Boogie

@carb_vic

@cairnandco

@Halle_ExpBer

@gbry_ps

@DesmondDreckett

@NUP_Ug

@_hhanaa_1

@mngymmom

@StevenConradJr

@Dream_Ka_Mon

@AWaldrdshh

@ATAEHRTRAVESTC1

@boybuang13

@MostRequestLive

@hugdreamie

@itsfeid

@Astsupportaali

@Sean_Callahan

Zexuan Zhong

@ZexuanZhong

16 days

Grok-2 is here! 🚀 It's been incredibly exciting working with the brightest minds since joining. so proud of the team @xAI !

xAI

@xai

17 days

1K

9K

4

11

270

Zexuan Zhong

@ZexuanZhong

2 years

Very excited to share a preprint “Training Language Models with Memory Augmentation”! We propose a new training objective TRIME for language modeling—inspired by contrastive learning—which aligns with both token embeddings and *in-batch memories*. 1/n

4

52

249

Zexuan Zhong

@ZexuanZhong

4 months

Introducing Lory, a fully-differentiable MoE arch for decoder LM pre-training! Lory merges expert FFNs by computing a weighted average in the parameter space, and computes the output through the merged FFNs. But training naively is infeasible, how to make it work? Details in🧵

4

40

230

Zexuan Zhong

@ZexuanZhong

3 years

Dense retrieval models (e.g. DPR) achieve SOTA on various datasets. Does this really mean dense models are better than sparse models (e.g. BM25)? No! Our #EMNLP2021 paper shows dense retrievers even fail to answer simple entity-centric questions. (1/6)

6

27

158

Zexuan Zhong

@ZexuanZhong

1 year

If we use model editors to update the British Prime Minister from Boris Johnson to Rishi Sunak, can the edited LMs answer Who is married to the British Prime Minister? Releasing MQuAKE to assess knowledge editing methods on multi-hop Qs! Paper: [1/n]

2

23

86

Zexuan Zhong

@ZexuanZhong

10 months

💡You can do speculative decoding without a small LM or any additional training! Check out Retrieval-Based Speculative Decoding (REST)! paper: blog: code:

GitHub - FasterDecoding/REST: REST: Retrieval-Based Speculative Decoding, NAACL 2024

REST: Retrieval-Based Speculative Decoding, NAACL 2024 - FasterDecoding/REST

github.com

Tianle Cai

@tianle_cai

10 months

If training's got you in a stew, take a REST and speed right through! 😎 Thrilled to introduce Retrieval-Based Speculative Decoding (REST), a plug-and-play method for accelerating language model decoding. 👇

5

33

215

0

8

67

Zexuan Zhong

@ZexuanZhong

3 years

Excited to share our #NAACL2021 paper on factual probing! “Factual Probing is [MASK]: Learning vs. Learning to Recall” Paper: Code: Joint work with @danfriedman0 and @danqi_chen .

2

14

66

Zexuan Zhong

@ZexuanZhong

9 months

At #EMNLP2023 🇸🇬! I will be presenting our projects on benchmarking knowledge editing () and attacking dense retrievers (). DM if you want to grab a coffee together! would like to chat about any interesting things!

3

1

37

Zexuan Zhong

@ZexuanZhong

2 years

TRIME has been accepted at #emnlp2022 !😃 The updated version includes new/stronger results on domain adaptation, MT, etc. We have made our code and pre-trained models publicly available! Paper: Code: w/ @taolei15949106 @danqi_chen

GitHub - princeton-nlp/TRIME: [EMNLP 2022] Training Language Models with Memory Augmentation...

[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674 - princeton-nlp/TRIME

github.com

Zexuan Zhong

@ZexuanZhong

2 years

Very excited to share a preprint “Training Language Models with Memory Augmentation”! We propose a new training objective TRIME for language modeling—inspired by contrastive learning—which aligns with both token embeddings and *in-batch memories*. 1/n

4

52

249

0

5

36

Zexuan Zhong

@ZexuanZhong

2 years

Heading to Abu Dhabi for attending #emnlp2022 ! Can’t wait to meet with new and old friends!!

0

1

32

Zexuan Zhong

@ZexuanZhong

4 months

Two key techniques 1) Causal segment routing ⚠️ Merging per token is too expensive ✅ We merge experts per segment & keep the autoregressive property 2) Sim-based batching ⚠️ Training on concatenated rand docs leads to bad experts ✅ We concat sim docs to get training instances

1

0

15

Zexuan Zhong

@ZexuanZhong

4 months

Please find more details in our preprint: Shoutout to my amazing collaborators @xiamengzhou @danqi_chen @ml_perception This was done during my internship at Meta. Excited to finally share it!!

2

0

12

Zexuan Zhong

@ZexuanZhong

2 years

Joint work with @taolei15949106 and @danqi_chen . Code and models coming soon! n/n

1

0

10

Zexuan Zhong

@ZexuanZhong

2 years

We also devise novel ways for data batching and constructing training memories, so that our models can leverage *long-range contexts* and *external datastore* effectively. 3/n

1

0

8

Zexuan Zhong

@ZexuanZhong

2 years

We show that simply replacing the standard language modeling objective with ours can improve perplexity significantly! 2/n

1

8

Zexuan Zhong

@ZexuanZhong

4 months

Exciting results from Lory models! Trained with 150B tokens from scratch, Lory models with 0.3/1.5B active parameters and up to 32 experts achieve the same loss as dense models in 2.5x fewer steps!

2

0

8

Zexuan Zhong

@ZexuanZhong

4 months

What’s more? Lory not only excels in performance but also learns *domain-level expert specialization*; while previous token-level MoEs rely on shallow features for routing! 🤔They may be complementary -- more possibilities ahead!

2

1

8

Zexuan Zhong

@ZexuanZhong

3 years

Please check out our paper for details! The code/dataset is available on GitHub: Joint work with @cdsciavolino , @leejnhk , @danqi_chen (6/6)

GitHub - princeton-nlp/EntityQuestions: EMNLP'2021: Simple Entity-centric Questions Challenge Dense...

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers https://arxiv.org/abs/2109.08535 - princeton-nlp/EntityQuestions

github.com

0

7

Zexuan Zhong

@ZexuanZhong

4 months

How does Lory compare to other MoE models? Despite using segment-level routing, Lory achieves competitive performance compared to SoTA conventional token-level MoEs, such as expert-choice (EC)!

1

0

7

Zexuan Zhong

@ZexuanZhong

2 years

We show significant gains over kNN-LM/kNN-MT and models that explicitly leverage long-range context (e.g., Transformer-XL) — all we change is the training objective and data batching! 4/n

1

0

5

Zexuan Zhong

@ZexuanZhong

4 months

Lory also achieves great downstream performance with ICL!

1

0

6

Zexuan Zhong

@ZexuanZhong

9 months

I have been working with Yangsibo on several projects. She is super strong -- don't miss her if you are hiring!

Yangsibo Huang

@YangsiboHuang

9 months

I am at #NeurIPS2023 now. I am also on the academic job market, and humbled to be selected as a 2023 EECS Rising Star✨. I work on ML security, privacy & data transparency. Appreciate any reposts & happy to chat in person! CV+statements: Find me at ⬇️

3

32

132

1

0

6

Zexuan Zhong

@ZexuanZhong

1 year

A big shout out to my collaborators! This is a joint work with @ZhengxuanZenWu , @chrmanning , @ChrisGPotts and @danqi_chen . @princeton_nlp @stanfordnlp [n/n]

0

4

Zexuan Zhong

@ZexuanZhong

1 year

We build MQuAKE by first creating a knowledge graph based on Wikidata, encompassing entities and relations among them. We then create multi-hop questions based on chains of facts from the knowledge graph along with edits. [2/n]

1

0

3

Zexuan Zhong

@ZexuanZhong

3 years

(2) Data-driven methods are able to exploit this information. We design control experiments where we apply these methods on random initialized models. We show that even with random initialization, these methods can find prompts that recover a non-trivial number of “facts”.

1

0

3

Zexuan Zhong

@ZexuanZhong

3 years

We fine-tune DPR on these simple questions and find updating the passage encoder is particularly crucial to get good results. Our visualization also shows that gold passage vectors for these questions are clustered together, so that it is difficult to discriminate them. (4/6)

1

0

3

Zexuan Zhong

@ZexuanZhong

3 years

We decouple the two distinct aspects of these questions: the entities and the question patterns. We find that dense retrieval models can only generalize to common entities or the question patterns that have been observed during training. (3/6)

1

0

2

Zexuan Zhong

@ZexuanZhong

3 years

Our results suggest that one should not only interpret the accuracy of a data-driven prompt on LAMA as a lower bound on how much knowledge a language model stores. The control experiments allow us to form a more detailed understanding of the behavior of different probes.

0

2

Zexuan Zhong

@ZexuanZhong

9 months

@danqi_chen oops.. thanks!

0

1

Zexuan Zhong

@ZexuanZhong

3 years

We study two simple techniques aiming at fixing the issue. We find (1) data augmentation is unable to consistently improve performance on new questions; (2) fixing a robust passage index and specializing question encoder leads to memory-efficient transfer to new domains. (5/6)

1

0

2

Zexuan Zhong

@ZexuanZhong

3 years

We construct EntityQuestions, consisting of simple, entity-rich questions such as “Where was Arve Furset born?”. We find dense retrieval models drastically underperform sparse models! (2/6)

1

0

2

Zexuan Zhong

@ZexuanZhong

4 months

@shuyanzhxyc @DukeU @dukecompsci Congratulations!!!

1

0

2

Zexuan Zhong

@ZexuanZhong

9 months

url fixed: attacking dense retrievers ()

Poisoning Retrieval Corpora by Injecting Adversarial Passages

Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but to what extent can they be safely deployed in real-world applications? In this work, we...

arxiv.org

0

1

Zexuan Zhong

@ZexuanZhong

4 months

@FlyingKid16 The difference is that SMEAR only works for encoder models. It fine-tunes T5 (with adapters) on text classification tasks, where instance-level routing decision is used naturally. We pre-train a decoder model and manage to handle per-token prediction in an autoregressive way.

0

Zexuan Zhong

@ZexuanZhong

1 year

Surprisingly, existing knowledge editing methods can inject facts and recall them accurately (high edit-wise accuracy), but they fail catastrophically on multi-hop questions (low multi-hop accuracy)! [4/n]

1

0

1

Zexuan Zhong

@ZexuanZhong

2 years

@YunzhuLiYZ @UofIllinois @uofigrainger @IllinoisCS @StanfordSVL @jiajunwu_cs @drfeifei Congrats Yunzhu!!

1

0

1

Zexuan Zhong

@ZexuanZhong

1 year

We propose a simple yet effective model that serves as a strong baseline for future work! MeLLo requires no training, and stores edits in memory which are accessible by any retriever. It prompts LLMs with self-check to do model editing on the fly! [5/n]

1

0

1

Zexuan Zhong

@ZexuanZhong

1 year

To evaluate a knowledge editing technique, we allow it to see a set of edited facts (e.g. The current British Prime Minister is Rishi Sunak). Then, we ask multi-hop questions that are related to the edited fact (e.g. Who's married to the British Prime Minister?) [3/n]

1

0

1

Zexuan Zhong

@ZexuanZhong

3 months

@KaiyuYang4 @AIatMeta @Caltech Congratulations Kaiyu!!

0

1

Zexuan Zhong

@ZexuanZhong

4 months

@rosinality Good question. It does not back-prop to all FFNs for each token, but only for each segment - the gradients of the merged FFN will be aggregated before back-propagating to each expert. Only the communication costs might be an issue when models get larger (see solutions in sec H)

0

1

Zexuan Zhong

@ZexuanZhong

30 days

@ShunyuYao12 @OpenAI Congrats man!

0

1

Zexuan Zhong

@ZexuanZhong

4 months

@xiye_nlp @UAlberta @PrincetonPLI Congrats & welcome!!