Xuandong Zhao @xuandongzhao profile

Xuandong Zhao

@xuandongzhao

Followers

1,427

Following

301

Media

46

Statuses

302

Postdoc @UC Berkeley CS; Research: ML, NLP, AI Safety

https://t.co/5rXJZ1thxE

Goleta, CA

Joined May 2016

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

#خلصوا_صفقات_الهلال1 • 838697 Tweets

ラピュタ • 408751 Tweets

Atatürk • 391804 Tweets

Megan • 225825 Tweets

Johnny • 225103 Tweets

Sancho • 150235 Tweets

MEGTAN IS COMING • 133896 Tweets

#4MINUTES_EP6 • 128070 Tweets

RM IS COMING • 127281 Tweets

namjoon • 120999 Tweets

olivia • 118822 Tweets

Coco • 53451 Tweets

Labor Day • 50588 Tweets

كاس العالم • 47821 Tweets

ミクさん • 46553 Tweets

ムスカ大佐 • 41106 Tweets

#フロイニ • 37547 Tweets

Arteta • 35129 Tweets

ŹOOĻ記念日 • 24633 Tweets

ミクちゃん • 22995 Tweets

Javier Acosta • 22436 Tweets

Día Internacional • 21751 Tweets

Romney • 18098 Tweets

Ramírez • 16983 Tweets

Lolla • 14892 Tweets

ナウシカ • 13796 Tweets

Lo Celso • 12045 Tweets

Sekou Kone • 11059 Tweets

AFFAIR EP1 • 10504 Tweets

نادي سعودي • 10392 Tweets

MehmetGürler İçinAdalet

照史くん

第953回

Carrillo

Enner

राष्ट्रीय सचिव

Parrales

優勝予想

Alanya

コジミツコさん

IQOS Yasallaşsın

Elanga

ベイマックス

Morro da Conceição

Şenol Güneş

Fatih Tekke

Justin Timberlake

#FBvALY

#Aishow_1M

Dzeko

Last Seen Profiles

@bocil121212

@sintyoku_null

@WebOfAbraHam

@Aminazaidlabel

@johisart

@KerriThurmon

@AyakoHucki85093

@CantonWiner

@jbnewts

@FarkassLisa

@DraPuppy

@CansevdiEkrem

@tiffvanderwaal

@PhillipBankss

@nourrulev

@olgun_sever33

@OnduRegion

@dogukan_arslan1

@Sir_thyy

@CanosMar

Pinned Tweet

Xuandong Zhao

@xuandongzhao

1 month

I am excited to join @UCBerkeley as a postdoc researcher in Prof. @dawnsongtweets 's group! My focus will be on Responsible Generative AI. Looking forward to tackling critical challenges and exploring new collaborations! Let's build a better AI future together!

16

6

432

Xuandong Zhao

@xuandongzhao

1 month

New #Nature study: Generative models forget true data distribution when recursively trained on synthetic data. As quality human data becomes scarce, caution is needed. This highlights the need for #watermarking to filter AI-generated content for future models.

4

49

212

Xuandong Zhao

@xuandongzhao

3 months

Welcome to my PhD defense this Friday! (12 PM-1 PM PT) HFH 1132 and

6

5

96

Xuandong Zhao

@xuandongzhao

1 year

🎉 Congratulations to my advisors @yuxiangw_cs and @lileics on their well-deserved promotions to associate professor. It has been an immense privilege to work with and learn from these exceptional researchers!

2

3

87

Xuandong Zhao

@xuandongzhao

1 year

Unfortunately, none of my labmates received a Canada visa in time to attend ACL 2023. I'm disappointed we'll miss the chance for valuable in-person networking at the conference. #ACL2023NLP #ACL

4

1

61

Xuandong Zhao

@xuandongzhao

1 year

Heading to Hawaii for #ICML2023 ! Excited to present our work on text/image watermarking & privacy protection for LLMs. If you're interested in building trustworthy #GenrativeAI , my co-authors @yuxiangw_cs @lileics @kexun_zhang and I would love to meet up and chat!

1

4

43

Xuandong Zhao

@xuandongzhao

1 year

🧵🔍 Excited to share our latest research on #AIGC and image #watermarking . We take a look at the existing watermarking schemes and propose a novel framework using generative autoencoders as watermark attackers. 🖼️🔓 Read the full paper here:

3

9

42

Xuandong Zhao

@xuandongzhao

20 days

Excited that #ACL2024 has officially started! Unfortunately, due to visa issues, I won't be able to attend in person. However, I will be participating virtually and engaging with three major projects: 1️⃣ Watermarking Tutorial 2️⃣ LLM Self-Bias 3️⃣ GumbelSoft Watermark

1

3

33

Xuandong Zhao

@xuandongzhao

1 year

Thrilled to announce that our paper "Pre-trained Language Models can be Fully Zero-Shot Learners" has been accepted to #ACL2023 !

1

8

32

Xuandong Zhao

@xuandongzhao

1 year

📢 Check out our new paper: "Provable Robust Watermarking for AI-Generated Text"! We propose a new watermarking method that has guaranteed quality, correctness, and security. I'm profoundly honored to collaborate with three Professors @yuxiangw_cs , @lileics , and Prabhanjan Ananth

Yu-Xiang Wang

@yuxiangw_cs

1 year

What is a *silver bullet* for LLM abuse🤔? If you ask around, many will say *Watermarks*. But it’s rather tricky to formalize, even according to Scott Aaronson. We just released a new paper with a provably robust watermark and found many new insights 🧵 1/

4

24

95

1

3

26

Xuandong Zhao

@xuandongzhao

20 days

Check the website of the Tutorial: Watermarking for Large Language Models

Xuandong Zhao

@xuandongzhao

20 days

1️⃣ Tutorial: Watermarking for Large Language Models 📅 August 11 (Today), 14:00 - 17:30 📍 Lotus Suite 5 – 7 My PhD advisors, @yuxiangw_cs and @lileics , will be presenting in person. Don’t miss this fantastic talk!

1

6

16

0

4

28

Xuandong Zhao

@xuandongzhao

2 years

Excited to attend my first in-person conference as a PhD student! Looking forward to meeting with new friends @naaclmeeting #NAACL2022

0

24

Xuandong Zhao

@xuandongzhao

1 month

@UCBerkeley @dawnsongtweets Grateful for an incredible 5-year journey @ucsbcs ! The PhD path isn't easy, but with the guidance of Prof @yuxiangw_cs and @lileics , and the support of friends&collaborators, I made it! Check out my thesis, 'Empowering Responsible Use of Large Language Models,' if interested.

1

0

19

Xuandong Zhao

@xuandongzhao

1 year

📢 Join us tomorrow at our oral talk in the LLM session (11:30-11:45)! I'll introduce NPPrompt, enabling LLMs to function as full zero-shot learners for language understanding. 📷 See you there! #ACL2023NLP

Lei Li

@lileics

1 year

My group will present 5 papers at #ACL2023NLP . I will be onsite for all these papers. @xuandongzhao Siqi @WendaXu2 @jiangjie_chen Fei will join virtually on underline/gathertown. Welcome to talk to me or co-authors!

0

4

46

0

1

17

Xuandong Zhao

@xuandongzhao

20 days

1️⃣ Tutorial: Watermarking for Large Language Models 📅 August 11 (Today), 14:00 - 17:30 📍 Lotus Suite 5 – 7 My PhD advisors, @yuxiangw_cs and @lileics , will be presenting in person. Don’t miss this fantastic talk!

1

6

16

Xuandong Zhao

@xuandongzhao

7 months

🚀 New Research Alert! Our latest paper introduces a new method to test the robustness of LLMs against jailbreaking attacks. Discover the "Weak-to-Strong Jailbreaking on Large Language Models". Paper: Code:

GitHub - XuandongZhao/weak-to-strong: Weak-to-Strong Jailbreaking on Large Language Models

Weak-to-Strong Jailbreaking on Large Language Models - XuandongZhao/weak-to-strong

github.com

AK

@_akhaliq

7 months

Weak-to-Strong Jailbreaking on Large Language Models paper page: Although significant efforts have been dedicated to aligning large language models (LLMs), red-teaming reports suggest that these carefully aligned LLMs could still be jailbroken through

4

47

196

1

2

13

Xuandong Zhao

@xuandongzhao

1 year

I weak up early today, but find notification date of ICML changed to Apr. 24... #ICML2023

1

0

16

Xuandong Zhao

@xuandongzhao

1 year

I've transitioned from my @OpenAI subscription plan to @poe_platform . While my monthly fee remains at $20, I now have access to additional LLMs like Claude and PaLM. However, one downside is the incompatibility with GPT4 plugins. #LLMs #AIGC

2

1

15

Xuandong Zhao

@xuandongzhao

10 months

@YifanJiang17 "Average researcher" is already good enough. I am below the average😅

1

0

15

Xuandong Zhao

@xuandongzhao

1 year

Appreciate the kind words from @lileics and @Meng_CS ! Thank you so much!

Lei Li

@lileics

1 year

The session chair told me that the fully-zero shot talk by @xuandongzhao is the best received talk in the LLM, which attracts hundreds of audience at #ACL2023NLP . A line of people were asking questions. @siqi_ouyang

0

6

69

0

14

Xuandong Zhao

@xuandongzhao

1 year

Brilliant insights on key challenges shared by Turing winner Prof. Shafi Goldwasser

Dawn Song

@dawnsongtweets

1 year

Really excited about our Future of Decentralization, AI, and Computing Summit, hosted by @BerkeleyRDI , with close to 3000 registrations, building towards a future of decentralized, responsible AI! Join us today @UCBerkeley in person or LIVE 🔗 !

3

22

101

1

2

13

Xuandong Zhao

@xuandongzhao

1 month

For the papers I authored or reviewed at #NeurIPS2024 , there's always a "Rating: 3"😇

1

0

13

Xuandong Zhao

@xuandongzhao

1 month

@lateinteraction Unfortunately, the current evaluation system in academia and industry prioritizes the quantity of papers. There's no penalty for publishing 'too many' papers. For example, for a general PhD to find an RS role in 2024, the only viable path seems to be publishing many LLM papers.

1

13

Xuandong Zhao

@xuandongzhao

1 year

@rckpudi @pierrefdz @jwthickstun @tatsu_hashimoto @percyliang Great work! We are also studying robustness against edits - exciting to see alternate methods like yours.

Provable Robust Watermarking for AI-Generated Text

We study the problem of watermarking large language models (LLMs) generated text -- one of the most promising approaches for addressing the safety challenges of LLM usage. In this paper, we...

arxiv.org

0

1

12

Xuandong Zhao

@xuandongzhao

4 months

Due to visa issues, I'm unable to attend ICLR in person. If you're interested in our paper, we'd love to discuss it with you on Zoom. Feel free to reach out!

Yu-Xiang Wang

@yuxiangw_cs

4 months

No last-minute approval granted this time😢 for #ICLR2024 . That said, I'm gonna try something new --- coauthors of this paper will present the poster on Zoom at the scheduled time: Come talk to us at Hall B #135 or from anywhere on earth btw 7:30-9:30PDT

4

3

25

1

11

Xuandong Zhao

@xuandongzhao

4 months

NeurIPS submission is crazy. So far, there are already 16k submissions. If each reviewer handles 6 papers, ~10k reviewers are required. With top reviewers at the AC level not scoring directly, the question arises: Are there enough qualified reviewers? #NeurIPS2024

1

0

10

Xuandong Zhao

@xuandongzhao

1 year

Glad to see innovative LLM companies like @OpenAI , @CohereAI , and @InflectionAI make privacy/safety a top priority. #LLM #AISafety

2

0

11

Xuandong Zhao

@xuandongzhao

10 months

None of the current watermarking designs have a theoretical guarantee against paraphrasing attacks. Scott Aaronson suggests semantic watermarking as a potential improvement. Our research supports this, showing similar vulnerabilities in image watermarking:

Invisible Image Watermarks Are Provably Removable Using Generative AI

Invisible watermarks safeguard images' copyright by embedding hidden messages only detectable by owners. They also prevent people from misusing images, especially those generated by AI models. We...

arxiv.org

Boaz Barak

@boazbaraktcs

10 months

1/5 New preprint w @_hanlin_zhang_ , Edelman, Francanti, Venturi & Ateniese! We prove mathematically & demonstrate empirically impossibility for strong watermarking of generative AI models. What's strong watermarking? What assumptions? See blog and 🧵

5

44

253

0

3

11

Xuandong Zhao

@xuandongzhao

2 years

Happy to share that our paper "Provably Confidential Language Modelling" got accepted to #NAACL2022 ! with @yuxiangw_cs , @lileics We propose a method to train language generation models while protecting the confidential segments #NLProc

0

1

11

Xuandong Zhao

@xuandongzhao

6 months

🚨 New research alert! Check out our new findings: the self-refine pipeline in LLMs improves the fluency and understandability of model outputs, but it also amplifies self-bias. Paper:

Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement

Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM's...

arxiv.org

Wenda Xu is on the job market

@WendaXu2

6 months

[New paper!] Can LLMs truly evaluate their own output? Can self-refine/self-reward improve LLMs? Our study reveals that LLMs exhibit biases towards their output. This self-bias gets amplified during self-refine/self-reward, leading to a negative impact on performance. @ucsbNLP

5

53

211

0

2

11

Xuandong Zhao

@xuandongzhao

2 years

How to learn highly compact yet effective sentence representation? Our paper "Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation" will be in the VPS2 poster session virtually on May 24th at 11:00 PDT (19:00 Dublin time). #ACL2022

1

11

Xuandong Zhao

@xuandongzhao

1 year

Thrilled to announce our paper, "Protecting Language Generation Models via Invisible Watermarking," has been accepted for #ICML2023 ! 🎉 Can't wait to meet you all in Hawaii this July!

Yu-Xiang Wang

@yuxiangw_cs

2 years

Great minds think alike😂. @tomgoldsteincs 's method is remarkably similar to (and independent of) the watermark for LLMs that @lileics @xuandongzhao and I came up with in . Our work shows that this watermark remains visible even after distillation! 1/x

1

2

23

0

1

11

Xuandong Zhao

@xuandongzhao

3 months

@xiangyuqi_pton Interesting work! We had similar findings in our weak-to-strong jailbreaking paper.

1

0

10

Xuandong Zhao

@xuandongzhao

4 months

Glad to see leading AI companies starting to use text watermarking!

Google DeepMind

@GoogleDeepMind

4 months

In the coming months, we will be open-sourcing SynthID text watermarking. This will be available in our updated Responsible Generative AI Toolkit, which we created to make it easier for developers to build AI responsibly. Find out more. → #GoogleIO

4

36

0

9

Xuandong Zhao

@xuandongzhao

2 months

Nice work on GenAI copyright

Yangsibo Huang

@YangsiboHuang

2 months

Questions for GenAI & copyright researchers (w/ answers in ℂ𝕠𝕋𝕒𝔼𝕧𝕒𝕝: ): - Can 𝐬𝐲𝐬𝐭𝐞𝐦 𝐩𝐫𝐨𝐦𝐩𝐭/𝐮𝐧𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 prevent copyrighted content generation? - Does permissive training data still help if 𝐑𝐀𝐆 fetches copyrighted content?

1

18

133

1

0

9

Xuandong Zhao

@xuandongzhao

9 months

Excited to attend #NeurIPS2023 from December 10th to 15th! Can't wait to catch up with familiar faces and make new connections. I am on the job market this year. I would be happy to discuss any opportunities that may be a good fit.

0

9

Xuandong Zhao

@xuandongzhao

2 months

🚀🚨Excited to share our work: 𝗗𝗘-𝗖𝗢𝗣 on LLM membership inference attack! 🔍 Key features: • Works with black-box models • Extendable to test data contamination Join @avduarte3333 for an in-person discussion at our 𝗜𝗖𝗠𝗟 poster! 🔗 Paper:

André Duarte

@avduarte3333

2 months

Are you interested in Data Contamination? ☣️ Curious if LLMs were trained on copyrighted content? 🤔 Check out 𝗗𝗘-𝗖𝗢𝗣: Detecting Copyrighted Content in Language Models Training Data (𝗜𝗖𝗠𝗟 𝟮𝟬𝟮𝟰), our novel detection method fully compatible with black-box models!

1

6

19

1

0

8

Xuandong Zhao

@xuandongzhao

3 months

My Twitter Interaction Circle ➡️

0

8

Xuandong Zhao

@xuandongzhao

2 months

Asked four LLM systems for the latest news about Trump. Looks like Perplexity is the winner

0

8

Xuandong Zhao

@xuandongzhao

1 month

Similar conclusions and findings can be found in a recent paper led by Stanford.

0

3

7

Xuandong Zhao

@xuandongzhao

1 month

Feeling unsatisfied with your #EMNLP submission feedback? Try this fun task: Send the PDF version to and use the popular review GPTs like "Reviewer 2". You might be surprised by how similar the feedback is to what you got from the website and ARR!

0

8

Xuandong Zhao

@xuandongzhao

3 months

@pratyushmaini @NicolasPapernot @NickJia5 @adam_dziedzic Cool work! In our recent ICML paper, we conducted such membership inference attacks in a black-box way. Might be of your interest

DE-COP: Detecting Copyrighted Content in Language Models Training Data

How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a...

arxiv.org

1

8

Xuandong Zhao

@xuandongzhao

10 months

A line of zero-shot LLM-generated text detection tools also rely on such "new" forms of perplexity scores. But it's unclear if perplexity alone is enough or not...

Weijia Shi

@WeijiaShi2

10 months

Ever wondered which data black-box LLMs like GPT are pretrained on? 🤔 We build a benchmark WikiMIA and develop Min-K% Prob 🕵️, a method for detecting undisclosed pretraining data from LLMs (relying solely on output probs). Check out our project: [1/n]

16

139

663

1

0

8

Xuandong Zhao

@xuandongzhao

1 year

Just conducted a rapid experiment on the watermarking system. It appears it's not resistant to paraphrasing attacks. The initial image illustrates the first example they've provided. I requested a rewrite from Claude, and intriguingly, the detector was unsuccessful.

Jim Fan

@DrJimFan

1 year

Stanford developed an LLM watermarking algorithm robust to certain paraphrasing and distortion, such as translating to French and back to English.🤔 Interesting paper, but I still find it counter-intuitive. Would this imply that there’s a subspace of natural language strings

7

23

194

0

8

Xuandong Zhao

@xuandongzhao

7 months

Excited to share our latest work on Permute-And-Flip decoding! 🚀 A huge shoutout to my advisors @yuxiangw_cs and @lileics for their invaluable contributions! Dive into our paper and explore the code here:

GitHub - XuandongZhao/pf-decoding: Permute-and-Flip: An optimally robust and watermarkable decoder...

Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs - XuandongZhao/pf-decoding

github.com

Yu-Xiang Wang

@yuxiangw_cs

7 months

🚀Exciting new advance in #LLM decoding and #watermarking from @xuandongzhao ! Introduce *Permute-And-Flip decoding*: a drop-in replacement of your favorite softmax sampling (or its Top-p / Top-k variant) that you should try today: A TL;DR thread🧵 1/

1

10

43

0

7

Xuandong Zhao

@xuandongzhao

10 months

It is fun to read Anthropic's blog on "how to evaluate AI systems". In human society, for example, PhD applicants need to show exam results, rec letters, and statements. So what is the equivalent of a "recommendation letter" for evaluating an LLM?🧐 #LLM

Challenges in evaluating AI systems

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

www.anthropic.com

0

6

Xuandong Zhao

@xuandongzhao

1 year

We will see lots of "papers" evaluating the ability of Claude 2 :)

Anthropic

@AnthropicAI

1 year

Introducing Claude 2! Our latest model has improved performance in coding, math and reasoning. It can produce longer responses, and is available in a new public-facing beta website at in the US and UK.

162

522

2K

1

7

Xuandong Zhao

@xuandongzhao

2 months

@WilliamWangNLP had a great talk about jailbreaking in LLMs at @ACTION_NSF_AI "Unveiling Hidden Vulnerabilities: Exploring Shadow Alignment and Weak-to-Strong Jailbreaking in Large Language Models"

0

1

7

Xuandong Zhao

@xuandongzhao

1 year

@demishassabis @GoogleDeepMind @googlecloud Impressed by the new watermarking tool, but have you considered attacks like regeneration? Our research proves these can effectively remove any pixel-based invisible watermarks, regenerating images closely resembling the original. Paper:

0

1

5

Xuandong Zhao

@xuandongzhao

10 months

Happy to be part of the SoCal NLP gathering at UCLA! Excited to engage with fellow NLP enthusiasts and experts. #SoCalNLP #UCLA #NLP

0

6

Xuandong Zhao

@xuandongzhao

2 years

#emnlp2022 #nlproc #emnlp How can we protect the intellectual property of trained NLP models? I am happy to share our new paper “Distillation-Resistant Watermarking for Model Protection in NLP”. Don’t miss it! (1/n) Code: Paper:

Distillation-Resistant Watermarking for Model Protection in NLP

How can we protect the intellectual property of trained NLP models? Modern NLP models are prone to stealing by querying and distilling from their publicly exposed APIs. However, existing...

arxiv.org

1

0

6

Xuandong Zhao

@xuandongzhao

10 months

FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence

FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial...

Today, President Biden is issuing a landmark Executive Order to ensure that America leads the way in seizing the promise and managing the risks of

www.whitehouse.gov

1

0

5

Xuandong Zhao

@xuandongzhao

1 year

Time to try our watermark attack

GitHub - XuandongZhao/WatermarkAttacker: Invisible Image Watermarks Are Provably Removable Using...

Invisible Image Watermarks Are Provably Removable Using Generative AI - XuandongZhao/WatermarkAttacker

github.com

Google DeepMind

@GoogleDeepMind

1 year

We’re excited to launch 𝗦𝘆𝗻𝘁𝗵𝗜𝗗 today with @GoogleCloud : a digital tool to watermark and identify AI-generated images. 🖼️ It will be available on Imagen, one of @Google ’s latest text-to-image models. Here’s how it works. 🧵 #GoogleCloudNext

49

208

716

0

5

Xuandong Zhao

@xuandongzhao

20 days

2️⃣ [Oral] Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement Awesome @WendaXu2 will present: 📍 Poster: Convention Center A1 🗓️ Monday, August 12, 2024, 14:00 - 15:30 PM 🎙️ Oral: Lotus Suite 5-7 📅 Tuesday, August 13, 2024, 11:15 - 11:30 AM

1

2

5

Xuandong Zhao

@xuandongzhao

1 month

With the success of GPT-4o Mini, the next wave of academic research is likely to herald a renaissance in knowledge distillation.

0

5

Xuandong Zhao

@xuandongzhao

9 months

Nice demo!

Pika

@pika_labs

9 months

Introducing Pika 1.0, the idea-to-video platform that brings your creativity to life. Create and edit your videos with AI. Rolling out to new users on web and discord, starting today. Sign up at

1K

5K

26K

0

5

Xuandong Zhao

@xuandongzhao

11 months

We will talk about watermarking for LLM in ACL 2024 @yuxiangw_cs @lileics @aclmeeting

ACL 2025

@aclmeeting

11 months

ACL 2024 (1/4) @aclmeeting - Watermarking for Large Language Model. Xuandong Zhao, Yu-Xiang Wang and Lei Li. - AI for Science in the Era of Large Language Models. Zhenyu Bi, Minghao Xu, Jian Tang and Xuan Wang. #NLProc

1

10

0

1

5

Xuandong Zhao

@xuandongzhao

9 months

Is there an unspoken rule that CS PhDs must wear glasses?🧐

1

0

5

Xuandong Zhao

@xuandongzhao

8 months

A screenshot from Professor Recht's new blog:

0

5

Xuandong Zhao

@xuandongzhao

1 year

We're exploring similar edit robustness issues of watermarking from a different angle. Read more:

Provable Robust Watermarking for AI-Generated Text

We study the problem of watermarking large language models (LLMs) generated text -- one of the most promising approaches for addressing the safety challenges of LLM usage. In this paper, we...

arxiv.org

Percy Liang

@percyliang

1 year

Two properties of our watermarking strategy: 1) It preserves the LM distribution 2) Watermarked text can be distinguished from non-watermarked text (given a key) How can both be true? Answer: p(text) = \int p(text | key) p(key) d key Detector also doesn't need to know the LM!

6

20

80

0

5

Xuandong Zhao

@xuandongzhao

2 months

@docmilanfar The metric or reward function in academia encourage people to do that

0

5

Xuandong Zhao

@xuandongzhao

4 months

Here is our poster!

0

2

5

Xuandong Zhao

@xuandongzhao

8 months

@WenhuChen One could interpret AI4Science to mean "AI for publishing papers in Science"

0

3

Xuandong Zhao

@xuandongzhao

1 year

Many of my research endeavors explore text and image watermark techniques. Looking forward to insightful discussions on responsible AI at #ICML2023 !

The Verge

@verge

1 year

Meta, Google, and OpenAI promise the White House they’ll develop AI responsibly

68

33

178

0

5

Xuandong Zhao

@xuandongzhao

10 days

Finally, I was able to watch the video and truly admire the academic rigor of the experiment design and presentation by @ZeyuanAllenZhu I encourage everyone to study this tutorial. It would be great to see these findings applied to the next version of LLMs.

Zeyuan Allen-Zhu

@ZeyuanAllenZhu

11 days

YouTube video is now restored. As I predicted, AGI didn't arrive before Aug 20. PS: Part 2.2 paper is in its very final processing stage; we are one man down (due to a layoff) so please forgive us for just a little more time.

4

41

329

0

22

Xuandong Zhao

@xuandongzhao

1 month

Cool! The 2B model is much more accessible for those in academia! 🤣

Robert Dadashi

@robdadashi

1 month

Gemma 2 2B is here! Fantastic performance for size, it's great for research and applications. I am very proud of the progress our team made over the last few months!

4

33

180

0

1

5

Xuandong Zhao

@xuandongzhao

5 months

@xwang_lk There is more analysis in this paper:

1

0

5

Xuandong Zhao

@xuandongzhao

1 month

@srush_nlp Yes, while current LLMs aren't a serious threat to humans, there are some news articles discussing real cases of jailbreaks. 1. 2. 3.

1

0

5

Xuandong Zhao

@xuandongzhao

20 days

3️⃣ GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick Awesome @AlfonsoGerard_ will present: 📍 Poster: Convention Center A1 📅 Monday, August 12, 2024, 11:00 - 12:30

1

3

4

Xuandong Zhao

@xuandongzhao

11 months

@YigitcanKaya1 @UCSB @giovanni_vigna Welcome to UCSB! We should plan a coffee chat sometime~

0

3

Xuandong Zhao

@xuandongzhao

15 days

Gemini and Grok seems to be on opposite ends in addressing safety/copyright issues🧐

0

4

Xuandong Zhao

@xuandongzhao

30 days

Does anyone know how to handle reviewers asking for comparisons with your follow-up works? Are there any guidelines for this? @NeurIPSConf #NeurIPS2024

1

0

4

Xuandong Zhao

@xuandongzhao

10 months

Remarkably, AI can now mimic actors' accents and translate into different languages. Reports suggest the method behind this is HeyGen: . An open-source version could be transformative. But, we must also prioritize tools to detect such manipulated videos.

HeyGen - AI Video Generator

HeyGen is an innovative video platform that harnesses the power of generative AI to streamline your video creation process. Unleash your creativity with HeyGen - the future of video production.

www.heygen.com

Michael Anti

@mranti

10 months

来看下赵本山老师地道的伦敦腔英语。

227

418

2K

0

4

Xuandong Zhao

@xuandongzhao

2 years

Welcome to our Oral talk this afternoon 2:45-3:00pm at Columbia A!

Xuandong Zhao

@xuandongzhao

2 years

Happy to share that our paper "Provably Confidential Language Modelling" got accepted to #NAACL2022 ! with @yuxiangw_cs , @lileics We propose a method to train language generation models while protecting the confidential segments #NLProc

0

1

11

0

4

Xuandong Zhao

@xuandongzhao

8 months

@kexun_zhang Creating datasets and running benchmarks are all you need

0

4

Xuandong Zhao

@xuandongzhao

7 months

When you read a really solid paper, you sincerely admire someone.

0

3

Xuandong Zhao

@xuandongzhao

3 months

@xwang_lk @WilliamWangNLP I guess they may use an image API to process the video.

1

0

4

Xuandong Zhao

@xuandongzhao

4 months

Safety and profit often seem like a paradox in industry. However, AI safety, especially given the incredible capabilities of large models, is crucial for all of humanity. This is also one of the most important research goals in academia.

Jan Leike

@janleike

4 months

Yesterday was my last day as head of alignment, superalignment lead, and executive @OpenAI .

535

2K

12K

0

3

Xuandong Zhao

@xuandongzhao

1 month

Ethical data use must be the top priority in AI development. Membership inference attacks, especially for closed-source models, are essential tools for detecting violations. Discover more insights in our ICML paper, DE-COP:

WIRED

@WIRED

1 month

“It’s theft.” A WIRED investigation found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Anthropic, Nvidia, Apple, and Salesforce to train AI.

9

35

75

0

4

Xuandong Zhao

@xuandongzhao

6 months

🚀 Sora's advancements are amazing, but such techniques blur the lines between reality and simulation. Watermarking is a promising approach! We have conducted several works to address this concern. Image watermark attack: #Sora #Watermark #AIGC

Invisible Image Watermarks Are Provably Removable Using Generative AI

Invisible watermarks safeguard images' copyright by embedding hidden messages only detectable by owners. They also prevent people from misusing images, especially those generated by AI models. We...

arxiv.org

OpenAI

@OpenAI

7 months

Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. Prompt: “Beautiful, snowy

10K

32K

138K

1

0

3

Xuandong Zhao

@xuandongzhao

10 months

More detailed government actions towards the safe and responsible use of AI.

1

0

3

Xuandong Zhao

@xuandongzhao

1 month

Prof. Raj Reddy, Turing Award laureate, shared his insights on the “Perils of AI” at #AAPM2024 in Stanford

0

3

Xuandong Zhao

@xuandongzhao

1 year

After using both for a while, I've found Claude 2 to be slightly better than GPT-4. I've now set Claude 2 as my default model, aligning with recent findings that GPT-4's performance declines over time... #ChatGPT #Claude #OpenAI #Anthropic

Xuandong Zhao

@xuandongzhao

1 year

I've transitioned from my @OpenAI subscription plan to @poe_platform . While my monthly fee remains at $20, I now have access to additional LLMs like Claude and PaLM. However, one downside is the incompatibility with GPT4 plugins. #LLMs #AIGC

2

1

15

0

3

Xuandong Zhao

@xuandongzhao

1 year

@GoogleDeepMind @googlecloud @Google Impressed by the new watermarking tool, but have you considered attacks like regeneration? Our research proves these can effectively remove any pixel-based invisible watermarks, regenerating images closely resembling the original.

GitHub - XuandongZhao/WatermarkAttacker: Invisible Image Watermarks Are Provably Removable Using...

Invisible Image Watermarks Are Provably Removable Using Generative AI - XuandongZhao/WatermarkAttacker

github.com

0

3

Xuandong Zhao

@xuandongzhao

1 year

2. We live in a time when posting images online can be risky. Malicious users may misuse these images, violating copyright and privacy. Furthermore, generative AI like DALLE-2 and Imagen can generate highly realistic images which could mislead people. 🧑‍💻🎭

1

0

3

Xuandong Zhao

@xuandongzhao

20 days

Wishing everyone a fantastic #ACL2024 conference! ✨ Enjoy your time in Bangkok🇹🇭! @aclmeeting #NLP #LLMs

0

1

3

Xuandong Zhao

@xuandongzhao

25 days

@johnschulman2 Congrats to @AnthropicAI on getting a real legend! Wishing you all the best in this new chapter!

0

3

Xuandong Zhao

@xuandongzhao

2 months

Cool work about preference learning!

Wenda Xu is on the job market

@WendaXu2

2 months

Still playing around with offline DPO? Big mistake!🤯 We claim two things: 1) We should NOT use a static preference data 2) We should NOT use a fixed reference model Our on-policy BPO outperforms offline DPO: TL;DR (72.0%➡️89.5%), HH(82.2%➡️93.5%, 77.5%➡️97.7%) @ucsbNLP

4

35

130

0

3

Xuandong Zhao

@xuandongzhao

10 months

@AnsongNi We have works that plant watermarks in the training data of models, e.g., to infer the model’s training set or otherwise influence the model’s output

0

3

Xuandong Zhao

@xuandongzhao

4 months

Curious about the impact of LLMs on academic writing and reviews? Check out our latest paper and code here:

Weixin Liang

@liang_weixin

4 months

🚀 Exciting news! Our code for the "Monitoring AI-Modified Content at Scale (ICML 2024)" and "Mapping the Increasing Use of LLMs in Scientific Papers" is now open-source. 🔥🔥🔥We've developed a simple and effective method for estimating the fraction of text in a large corpus

4

8

19

0

2

Xuandong Zhao

@xuandongzhao

8 months

This is all about the evaluation system. It's very difficult to objectively evaluate a researcher, a PhD student.

Ben Recht

@beenwrekt

9 months

Since we just wrapped up an AI megaconference, it felt like a good day to plead for fewer papers.

32

162

853

0

2

Xuandong Zhao

@xuandongzhao

1 year

4. We propose a framework using generative autoencoders as watermark attackers. The watermarked image is encoded to a latent code and then decoded to a reconstructed image, effectively erasing the watermark.

1

0

2

Xuandong Zhao

@xuandongzhao

7 months

📄This weak-to-strong amplification aligns with concurrent works on empowering models with instruction following (Liu et al., 2024) and disentangling acquired knowledge (Mitchell et al., 2023). [7/8]

1

0

2

Xuandong Zhao

@xuandongzhao

10 months

@YangsiboHuang My concern is that manually designed perplexity scores may not be robust across diverse distributions. In my experience, zero-shot LLM text detection tools aren't robust across languages.

1

0

1

Xuandong Zhao

@xuandongzhao

5 months

@chrome1996 @TTIC_Connect @UChicagoCI Thanks for sharing it!

1

0

2

Xuandong Zhao

@xuandongzhao

4 months

@WenhuChen For high-quality data like books, I don't think human writing speed can catch up with the consumption speed of LLMs.

1

0

2

Xuandong Zhao

@xuandongzhao

2 years

Amazing

josh

@jdjkelly

2 years

Google is done. Compare the quality of these responses (ChatGPT)

991

4K

27K

1

0

2

Xuandong Zhao

@xuandongzhao

1 year

5. Our evaluation demonstrates that generative autoencoders, especially diffusions, can remove more invisible watermarks than most existing attackers, while preserving image quality. This reveals vulnerabilities in existing watermark schemes. 📊🔓

1

0

2

Xuandong Zhao

@xuandongzhao

1 year

3. Major tech companies like Google are developing tools to trace image origins or identify synthetically generated content. Invisible watermarks are one such tool used to embed secret messages detectable only by the owner. 🕵️‍♂️🖼️

1

0

2

Xuandong Zhao

@xuandongzhao

10 months

Amazing!

0

1

2