Florian Tramèr @florian_tramer profile

Florian Tramèr

@florian_tramer

Followers

4,673

Following

210

Media

87

Statuses

830

Assistant professor of computer science at ETH Zürich. Interested in Security, Privacy and Machine Learning

https://t.co/HtAF2pGLSa

Zürich

Joined October 2019

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

The DNC • 880401 Tweets

Joe Biden • 601561 Tweets

Hillary • 286122 Tweets

#loveislandusa • 145983 Tweets

#DNCConvention2024 • 142791 Tweets

Jill • 108058 Tweets

Star Wars • 93026 Tweets

Andrea • 84863 Tweets

ニンテンドーミュージアム • 79766 Tweets

Jana • 76841 Tweets

Steve Kerr • 54979 Tweets

The Acolyte • 51442 Tweets

#IRIAMメンテ中のフォロー祭り • 49827 Tweets

Jasmine Crockett • 49415 Tweets

Leah • 48734 Tweets

Warnock • 42382 Tweets

Serena • 41038 Tweets

Kendall • 40159 Tweets

Jamie Raskin • 38393 Tweets

#DNC2024CHICAGO • 37961 Tweets

#ファミマの増量チョコ • 29876 Tweets

Ashley Biden • 29431 Tweets

राजीव गांधी • 28982 Tweets

Kaylor • 23938 Tweets

はれときどきぶた • 19203 Tweets

Charlottesville • 17716 Tweets

ALEXA KNT OnSHOWTIME • 16475 Tweets

Angry Joe • 15597 Tweets

CSPAN • 14963 Tweets

초코우유 • 13276 Tweets

令和の米騒動 • 13249 Tweets

FELIZ CUMPLE XCRY • 12542 Tweets

ケンタッキー • 11599 Tweets

ドラゴンガンダム • 10944 Tweets

帯状疱疹

ポップン新作

あかりん離婚

マンシー

アコライト

四郎くん

ららアリーナ

キンプリちゃん

イシノサンデー

三郎くん

Andor

요괴워치

आधुनिक भारत

えんぴつの天ぷら

Megyn

キンプリツアー

Last Seen Profiles

@QPollsandnews

@BabuChevrox

@hvnsvkenthu

@speakingofvee__

@MaryVivianRN

@mirari863376731

@XLutha

@wrightca

@msioutis

@poooce

@anewyou_jp

@ProgrammingSpp

@MaurcioFornasi1

@robsonDosoguian

@razz5123

@Gu1nasbr

@Brzinnrare

@F_T_69

@EstolaBart

@rizalFCIM

Florian Tramèr

@florian_tramer

1 year

I don't understand how anyone can believe LLM+plugins won't be a security disaster. Take a simple app: "GPT4, send emails to people I'm meeting today to say I'm sick" Sounds useful! For this, GPT4 needs the ability to read your calendar and send emails. What could go wrong..?

99

278

2K

Florian Tramèr

@florian_tramer

1 year

Well,what if someone sends you a calendar invite containing instructions for GPT4 to read your weekly calendar and email that to the attacker? That's within the model's capabilities, so it could do it. Suddenly, any *data* on your machine is potentially executable. No thanks...

23

47

709

Florian Tramèr

@florian_tramer

3 years

Life update: After 5 amazing years of PhD at @Stanford with @danboneh I'm back in Switzerland🇨🇭 I'm super excited to join @CSatETH as a faculty in Fall'22 and to spend the coming year at @GoogleAI working with @aterzis & his team on ML security Hit me up if you're ever in Zürich!

39

13

613

Florian Tramèr

@florian_tramer

5 months

If you download a pretrained model you have to trust that the developer did not backdoor it. We know backdoors break model integrity. But what about privacy? With Shanglun Feng we introduce 𝐩𝐫𝐢𝐯𝐚𝐜𝐲 𝐛𝐚𝐜𝐤𝐝𝐨𝐨𝐫𝐬: pretrained models that steal your finetuning data! 🧵

7

75

472

Florian Tramèr

@florian_tramer

1 year

Author order on academic papers is important! My Google friends and I spent lots of time thinking about this critical issue (the scores of our ICML submissions show this is time well spent) We distill our findings for the community here: Comments welcome!

10

61

401

Florian Tramèr

@florian_tramer

11 months

When analyzing ML security and privacy you need to study 𝐬𝐲𝐬𝐭𝐞𝐦𝐬, not just models! Our new paper shows that privacy is way worse when models are deployed in systems that use data cleaners, output filters, etc. Paper: Blog:

4

69

363

Florian Tramèr

@florian_tramer

3 years

Favorite review to date: "The results are *impressive and practical*, but are obtained by combining six techniques/insights that are cool but each incremental on their own. Strong Reject!"

17

11

305

Florian Tramèr

@florian_tramer

1 year

Nicholas Carlini made a fun game where you forecast GPT-4's ability to solve various tasks. It's surprisingly hard (you get to see how others did on average too) Goes to show that confidently predicting LLM capabilities (in 0-shot mode) is tricky!

11

54

278

Florian Tramèr

@florian_tramer

1 year

Paper: we do A Reviewer: why do you do B? B is bad. Reject Authors: We thank the reviewer for the insightful comments. We don't do B. We do A. Reviewer: My mistake! I now understand you do C. C is good. I raise my score to accept We don't do C either... But thanks I guess!

4

14

266

Florian Tramèr

@florian_tramer

4 years

Current algorithms for training neural nets with differential privacy greatly hurt model accuracy. Can we do better? Yes! With @danboneh we show how to get better private models by...not using deep learning! Paper: Code:

7

40

262

Florian Tramèr

@florian_tramer

1 year

Have you downloaded a large training set (LAION, CC, Wikipedia, etc) in the past to train a machine learning model? If so, you were vulnerable to an extremely simple and cheap poisoning attack that could have manipulated ~0.02%-0.8% of your dataset 🧵👇

6

63

251

Florian Tramèr

@florian_tramer

4 years

A reviewer called my ICML submission "evidently arrogant". How do I even recover from that?

18

7

233

Florian Tramèr

@florian_tramer

3 years

I'm excited at the prospect of creating a great new research group focused on ML Security & Privacy at ETHZ next Fall. I have open positions for PhD students and Postdocs, so if you're interested please reach out! More info here:

Florian Tramèr

@florian_tramer

3 years

Life update: After 5 amazing years of PhD at @Stanford with @danboneh I'm back in Switzerland🇨🇭 I'm super excited to join @CSatETH as a faculty in Fall'22 and to spend the coming year at @GoogleAI working with @aterzis & his team on ML security Hit me up if you're ever in Zürich!

39

13

613

7

36

204

Florian Tramèr

@florian_tramer

2 years

I'm super excited to finally start @CSatETH after a wonderful year at Google. If you're interested in joining my first great "batch" of PhD students @edoardo_debe @dpaleka as we build our group on ML privacy & security, please apply for a PhD or postdoc:

ETH CS Department

@CSatETH

2 years

We are pleased to welcome @florian_tramer in his new role as Tenure Track Assistant Professor. He heads the Computer Security and Privacy Group at the Institute for Information Security. Read more: @ETH_en @EPFL_en @Stanford @Google_CH #ML #security

1

2

47

10

19

190

Florian Tramèr

@florian_tramer

3 years

I'm thrilled to receive the best paper award at the #ICML2021 workshop on Prospects and Perils of Adversarial ML paper: video: poster: 11:30-12:30 EST TLDR: detecting adversarial examples isn't much easier than classifying them

2

14

181

Florian Tramèr

@florian_tramer

2 years

We introduce 💉Truth Serums💉, a new threat for ML models We show an attacker can poison a model's training set to significantly leak private data of other users with @rzshokri @10BSanJoaquinA @sanghyun_hong Hoang Le, Matthew Jagielski & Nicholas Carlini

3

35

159

Florian Tramèr

@florian_tramer

3 months

Everyone working on datasets and benchmarks

2

4

140

Florian Tramèr

@florian_tramer

9 months

I'm looking for PhD students or postdocs to sign a letter supporting me in case I get fired as CEO of my lab. If that sounds fun, consider applying! (we might also do cool research) If you miss the AI Center deadline, you can apply to my group directly:

SPY Lab

spylab.ai

ETH AI Center

@ETH_AI_Center

9 months

⏰ Only 28 hours left before application deadline ⏰ 🚀 Apply now to become an ETH AI Center #PhD or #Postdoc , and work with our renowned researchers on impactful topics! Apply by Nov 22 👉

1

3

16

9

15

133

Florian Tramèr

@florian_tramer

2 months

🔥 We're releasing the strongest membership inference attack for foundation models! 🔥 Our attack applies to LLMs, vLMs, CLIP, Diffusion models and is SOTA on all🥇 Not only is our attack a magnificent breakthrough, it is also *magic*: we don't look at the ML model at all🪄 🧵👇

3

24

125

Florian Tramèr

@florian_tramer

2 years

There are active discussions whether generative AI models like Stable Diffusion create "new" images or merely "copy and mix" pieces and styles of their training data. In a new paper, we show that sometimes Stable Diffusion and Google's Imagen simply copy *entire images*!

Eric Wallace

@Eric_Wallace_

2 years

Models such as Stable Diffusion are trained on copyrighted, trademarked, private, and sensitive images. Yet, our new paper shows that diffusion models memorize images from their training data and emit them at generation time. Paper: 👇[1/9]

171

2K

10K

11

15

123

Florian Tramèr

@florian_tramer

2 months

Some thoughts by Nicholas Carlini and me about Glaze, and how the actions of it's developers might not be in the best interest for the security of their user base:

Glazing over security | SPY Lab

We discuss the security of the Glaze tool, and how the authors' actions may not be in the best interest of their users.

spylab.ai

4

26

120

Florian Tramèr

@florian_tramer

4 years

Does GPT-2 know your phone number? With @Eric_Wallace_ , @mcjagielski , @adversariel , we wrote a blog post on problematic data memorization in large language models, and the potential implications for privacy and copyright law. blog:

2

41

113

Florian Tramèr

@florian_tramer

1 month

I'm thrilled to start Invariant Labs with amazing colleagues from @CSatETH . Our ambition is to build AI agents that solve challenging tasks securely and reliably. Since most of my research has focused on *breaking* ML I'm excited to apply this knowledge to build better systems!

Invariant Labs

@InvariantLabsAI

1 month

We are delighted to announce Invariant Labs ()! Our mission is to make AI agents secure and reliable! It is founded by @mvechev , @mbalunovic , @lbeurerkellner , @marc_r_fischer , @florian_tramer and builds on years of experience in industry and academia.

4

7

65

1

7

116

Florian Tramèr

@florian_tramer

28 days

Thrilled to receive an award for our position paper on private learning with public pretraining with Nicholas Carlini & @thegautamkamath I'm presenting this work today @icmlconf at 11am (hall A1) Come say hi at the talk or poster session if you're interested in private learning!

Russ Salakhutdinov

@rsalakhu

28 days

ICML 2024: Best Paper Awards: Florian Tramèr; Gautam Kamath; Nicholas Carlini: Considerations for Differentially Private Learning with Large-Scale Public Pretraining Akbir Khan; John Hughes; Dan Valentine; Laura Ruis; Kshitij Sachan; Ansh Radhakrishnan; Edward Grefenstette;

1

13

56

9

3

110

Florian Tramèr

@florian_tramer

9 months

*laughs in machine learning*

Nirmalya Kajuri

@Kaju_Nut

9 months

Theoretical physicists: how do you feel about the pressure on undergraduates to publish in order to secure grad school admissions? I feel that it is sub-optimal for the development of many students to chase papers at the undergrad stage. There are undergrads who are already

49

39

433

1

5

108

Florian Tramèr

@florian_tramer

2 years

Two simple defenses for private/robust ML: one works, the other does not... #1 : removing training data at risk of privacy attacks makes other data *more* vulnerable: #2 : SOTA certified l2 robustness without training *any* models:

2

17

106

Florian Tramèr

@florian_tramer

1 year

I've been teaching my daughter to recognize cars on the walk back and forth from daycare. It seemed to work... But she now points at *anything* (trees, people, ...) and enthusiastically shouts "AUTO" My daughter has OVERFIT 😱

6

3

104

Florian Tramèr

@florian_tramer

1 year

Ridiculously poor organization skills from @UEFAcom to schedule a champions league semi-final in the middle of a NeurIPS deadline

3

6

106

Florian Tramèr

@florian_tramer

9 months

Our @satml_conf LLM Capture-the-flag is now live! Can you find successful defenses and attacks for prompt injection?

3

29

105

Florian Tramèr

@florian_tramer

3 years

The @IEEESSP #SP22 review template is a very nice illustration of the security field's prior when judging papers.

3

5

103

Florian Tramèr

@florian_tramer

1 year

@Ceetar Just iterate over meetings and send emails is indeed fine. But you have *zero* guarantee that this is what the llm will do, or that it won't suddenly do something different based on the data it ingests along the way.

2

99

Florian Tramèr

@florian_tramer

2 years

We looked at Stable Diffusion's safety filter (our paper got blocked by arXiv's own safety filter...) The filter only uses "hashes" of sensitive topics; we recover these with a dictionary attack We find it filters for 17 nudity concepts & tunes the severity for child depictions

Daniel Paleka

@dpaleka

2 years

Stable Diffusion has a safety filter blocking “harmful” images by default. The filter is obfuscated -- how does it work? We reverse engineer the hidden sauce! Joint work @Javi_Rando , @davlindner , @ohlennart , @florian_tramer : "Red-Teaming the Stable Diffusion Safety Filter" 🧵

9

65

400

2

15

97

Florian Tramèr

@florian_tramer

1 year

After nearly 1 year @ETH_en , our Secure and Private AI (SPY) Lab finally has a website, and a first blog post! I wrote some thoughts on what adversarial examples have in common with attacks on language models like ChatGPT. TLDR: surprisingly not much!

1

16

90

Florian Tramèr

@florian_tramer

1 year

Very excited about this work on evaluating ML models when ground truth is unknown (eg when models are superhuman, or simply when humans are bad at the task) We argue that when accuracy of individual decisions is hard to assess, we should look for "logic bugs" across decisions

Daniel Paleka

@dpaleka

1 year

How to evaluate superhuman models without ground truth? How do we know if the model is wrong or lying, if we can't know the correct answer? Test whether the AI's outputs paint a consistent picture of the world! w/ @LukasFluri_ @florian_tramer (1/14)

8

32

189

4

8

88

Florian Tramèr

@florian_tramer

1 year

@moo9000 Yeah and a zero-day in windows is a huge issue... Now imagine an "operating system" (the LLM) with infinite zero-days (you just have to find the right way to ask nicely) and no clear mitigation path

1

2

80

Florian Tramèr

@florian_tramer

3 years

Web-scale facial recognition is getting scarily good (see ) Popular tools like (500'000+ downloads!) fight back using adversarial examples. With @evanidixit , we argue that this is hopeless (and dangerous)!

4

19

78

Florian Tramèr

@florian_tramer

2 months

Popular tools aim to protect artists from generative AI, by adding adversarial noise to their art. These tools are used by millions, but 𝐝𝐨 𝐭𝐡𝐞𝐲 𝐚𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐰𝐨𝐫𝐤? Here is a generation that mimics E. Munch with protections vs. without. Can you guess which is which?

6

18

77

Florian Tramèr

@florian_tramer

1 year

@Ceetar The LLM gets to "decide" which APIs to call (and their inputs) and that's just fundamentally different and less secure than typical software. You simply have no guarantee that what you tell the LLM to do (how to chain APIs) won't get overwritten by API results.

5

1

73

Florian Tramèr

@florian_tramer

1 year

@laidacviet It's not executing code per se. It's just the LLM calling other APIs. That's what it's supposed to do. But with even a tiny bit of functionality (as in the example in my tweet), chaining APIs the wrong way is a huge security of privacy risk

4

1

72

Florian Tramèr

@florian_tramer

2 years

I'll be virtually "presenting" (aka listening to my pre-recorded video) thus later today. TLDR: if you want to be robust to adversarial examples, the ability to detect attacks (selective classification) won't help!

1

8

69

Florian Tramèr

@florian_tramer

1 year

Paraphrasing my PhD advisor @danboneh : "security researchers will never be out of a job" Why? Advances in security always lag behind capabilities ("move fast and break things"). So for better or worse, ML security/privacy/safety is a great area to work in right now

Dan Hendrycks

@DanHendrycks

1 year

Many unsolved problems exist in ML safety which are not solved by closed-source GPT models. As LLMs become more prevalent, it becomes increasingly important to build safe and reliable systems. Some key research areas: 🧵

5

72

307

0

11

67

Florian Tramèr

@florian_tramer

10 months

I think we have to parse this as: "We're launching a team to forecast and protect against [AI risk led by @aleks_madry ]" Aleks, which side are you on??? 🤣

Sam Altman

@sama

10 months

we are launching a new preparedness team to evaluate, forecast, and protect against AI risk led by @aleks_madry . we aim to set a new high-water mark for quantitative, evidence-based work.

124

146

1K

3

4

66

Florian Tramèr

@florian_tramer

1 year

Anecdotal evidence: prompt injection is much easier when not trying to make the model do "morally bad" things (like making a bomb). In this example, a single sentence in the properly escaped data field causes the model to deviate from its instructions (also works in GPT-4 & Bing)

4

6

64

Florian Tramèr

@florian_tramer

1 year

I see many complaints about neurips reviews (count me in...) but also a lot of anger directed at new/junior reviewers. Is there evidence these are worse? My experience on workshop committees is that junior reviewers put in above avg effort (maybe a bit more critical though)

8

2

62

Florian Tramèr

@florian_tramer

3 years

@jhasomesh Write a paper claiming idea Y as novel and publish it. Wait for the authors of X to send you an angry email.

1

54

Florian Tramèr

@florian_tramer

1 year

Proud of Edoardo's first PhD project! We look at black-box adversarial examples, and ask if existing attacks optimize for the right metric (spoiler alert: no). We then set out to design attacks that reflect the true costs of attacking real models, and provide a new benchmark.

Edoardo Debenedetti

@edoardo_debe

1 year

Do you want to attack a black-box ML model, e.g. to post inappropriate content on Twitter without being banned? Would current query-based attacks work? No! Current attacks optimize for the wrong metric, and need to be adapted to work in the real world! 1/9

3

15

90

1

4

52

Florian Tramèr

@florian_tramer

9 months

Tired of having to track arxiv, twitter, discord, etc to learn of the newest LLM vulnerabilities? @mbalunovic , @lbeurerkellner , @marc_r_fischer and @a_yukh are too! So they built this cool project (w. advice from @mvechev & me) to track it all in one place. Contributions welcome!

LVE Project

@projectlve

9 months

We are super excited to announce LVE 🎉 With LVEs we track LLM vulnerabilities and exposures in an open-source community-first approach. Announcement: 🧵 A thread on the LVE project and why it matters:

2

6

40

0

7

51

Florian Tramèr

@florian_tramer

4 years

Interested in finding out how to break "provably secure" defenses against adversarial examples? Come chat at #ICML2020 tomorrow (8AM & 8PM PDT): w. @JensBehrmann , Nicholas Carlini, @NicolasPapernot , @jh_jacobsen

0

8

51

Florian Tramèr

@florian_tramer

10 months

I'll pushback on this :) When tools that promise to protect user data are deployed, publicized, and get traction, then it isn't just "research as usual" when they are later broken. Data protection is a one-way street! Users whose data failed to be protected don't get a 2nd shot.

MMitchell

@mmitchell_ai

10 months

So the next time you see the "wait we can break this" pushback, just remember that's how research *works*. It doesn't mean the project should end. It means the opposite: This is a journey. A growing, massive community (you too!) are participating. (7/n)

1

2

45

3

4

50

Florian Tramèr

@florian_tramer

5 months

Our main attack, inspired by data stealing attacks in federated learning, adds backdoored weights whose gradients encode a training input. If a victim downloads this backdoored model and finetunes it on sensitive data, the new model's weights directly encode some of this data!

2

5

47

Florian Tramèr

@florian_tramer

1 year

Oh wow, if only we could have predicted that plugins were going to be a security nightmare! I wonder if some people still think this will be "trivial" to fix...

Johann Rehberger

@wunderwuzzi23

1 year

👉 Let ChatGPT visit a website and have your email stolen. Plugins, Prompt Injection and Cross Plug-in Request Forgery. Not sharing “shell code” but… 🤯 Why no human in the loop? @openai Would mitigate the CPRF at least #OPENAI #ChatGPT #plugins #infosec #ai #humanintheloop

36

274

1K

1

14

47

Florian Tramèr

@florian_tramer

3 years

Best example I've seen yet that worrying about adversarial examples for self driving cars is ridiculously overkill. The invariant "stop at intersection if you see a stop sign" is way too simplistic to begin with. See @catherineols 's great post on this:

Unsolved research problems vs. real-world threat models

Adversarial examples are worth studying, but most of the justifications for why exactly they’re worrisome strike me as overly literal

medium.com

Andy Weedman

@andyweedman

3 years

⁦ @karpathy ⁩ ⁦ @elonmusk ⁩ ⁦ @DirtyTesla ⁩ here is a fun edge case. My car kept slamming on the brakes in this area with no stop sign. After a few drives I noticed the billboard.

75

188

4K

2

8

46

Florian Tramèr

@florian_tramer

1 year

@imjliao Yes, that was @random_walker . The simple truth is we don't know how to prevent these attacks fully

1

0

46

Florian Tramèr

@florian_tramer

4 years

At 12pm PST I'll present our @USENIXSecurity paper on side-channel attacks on anonymous transactions w. @danboneh , @kennyog We show how to de-anonymize transactions in Zcash @ElectricCoinCo & @monero (attacks disclosed & fixed last year) video & slides:

0

14

45

Florian Tramèr

@florian_tramer

2 years

Who could have seen that coming? 🤔

Andrew Hundt 😷💉x5

@athundt

2 years

@mmitchell_ai Here is proof GitHub CoPilot, which I believe is based on GPT-3, is both picking up on AND REPRODUCING my personally identifying information (PII) on the machine of another researcher in precisely the same overall research area as me: Robotics with AI.

9

174

496

0

5

44

Florian Tramèr

@florian_tramer

26 days

IC[M̵L̵ | ELAND]

0

44

Florian Tramèr

@florian_tramer

3 months

This defense's evaluation is so clearly and obviously wrong that I think it didn't even merit a break. It's weird that no S&P reviewers caught it. - the defense claims robustness to attacks that change 100% of each pixel - the defense accuracy sometimes goes *up* under attack...

Brendan Dolan-Gavitt

@moyix

3 months

Another entry in a long-running series where Nicholas Carlini breaks ML defenses published at top security conferences with as little effort as possible (in this case a one line bugfix in the eval)

16

91

689

1

2

43

Florian Tramèr

@florian_tramer

4 months

@sweis you could try the trick we introduced here: Ask the model to repeat a chunk of code you think might be from the leak, and then do the same thing with the memorization filter enabled to see if it gets filtered out.

Privacy Side Channels in Machine Learning Systems

Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum. Yet, in reality, these models are part of larger systems that include components for...

arxiv.org

0

1

42

Florian Tramèr

@florian_tramer

11 months

Another way to limit memorization is to prevent models from outputting verbatim training data. Copilot and ChatGPT do this. But this leaks info about the data! If the system outputs X, then we know X is 𝐧𝐨𝐭 in the training set. We turn this into a perfect membership attack.

2

5

41

Florian Tramèr

@florian_tramer

1 year

@DimitrisPapail Our paper comes first alphabetically so they're the ones that copied us

0

39

Florian Tramèr

@florian_tramer

3 months

0

6

40

Florian Tramèr

@florian_tramer

5 years

Cool new @NeurIPSConf workshop: "New in ML" offers mentorship to authors who have not yet published at NeurIPS: Submit by Oct 15 for reviews and possibly mentorship by leaders in the field, Joshua Bengio, @hugo_larochelle , @_beenkim , @tdietterich and more

1

14

39

Florian Tramèr

@florian_tramer

2 years

Heeey not fair! I was told my paper was accepted.

Csaba Szepesvari

@CsabaSzepesvari

2 years

@bremen79 That was an accident and the outcomes are not final. I think they are not visible anymore. A couple of more hours!! Thank you!

4

0

29

2

1

39

Florian Tramèr

@florian_tramer

4 years

My ScAINet talk on recurring issues with adversarial examples evaluations is online work w. @wielandbr , @aleks_madry ,N. Carlini I call for broken defenses to be publicly amended (same as fixing wrong proofs in theory papers) Happy to hear thoughts on this!

2

37

Florian Tramèr

@florian_tramer

8 months

I'm talking at NeurIPS workshops today about: - privacy side-channels in ML systems at 10:30am CT () - poisoning RLHF at 11:15am CT () I'm sad I couldn't make it to New Orleans this year... I hope the virtual talks will still be fun

1

37

Florian Tramèr

@florian_tramer

9 months

Javi did some cool work on poisoning RLHF (his master thesis!) We show RLHF enables powerful backdoors: a secret "sudo" trigger that jailbreaks the model. Luckily, this requires poisoning lots of data. We also launched a competition to find the backdoors

GitHub - ethz-spylab/rlhf_trojan_competition: Finding trojans in aligned LLMs. Official repository...

Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024. - ethz-spylab/rlhf_trojan_competition

github.com

Javier Rando

@javirandor

9 months

🧵 Can data poisoning and RLHF be combined to unlock a universal jailbreak backdoor in LLMs? Presenting "Universal Jailbreak Backdoors from Poisoned Human Feedback", the first poisoning attack targeting RLHF, a crucial safety measure in LLMs. 📖 Paper:

4

43

167

1

4

37

Florian Tramèr

@florian_tramer

3 years

@iclr_conf what's the point of enforcing the same 9page limit for camera-ready papers as for submissions? Including reviewer comments + author block, every paper likely ends up at >9.5 pages. Can't imagine how many collective hours will be wasted trimming down that content...

1

0

38

Florian Tramèr

@florian_tramer

4 years

@lreyzin I'd much rather have 100 famous professors put their names on 100 papers they didn't contribute to, than have one undergrad author removed from a paper in order to pad someone's h-frac score

1

0

37

Florian Tramèr

@florian_tramer

5 months

Paper: Code: Note: privacy backdoors were concurrently introduced in these cool papers, but with a bit of a different focus: It's nice to see this new threat covered in depth!

GitHub - ShanglunFengatETHZ/PrivacyBackdoor: Privacy backdoors

Privacy backdoors. Contribute to ShanglunFengatETHZ/PrivacyBackdoor development by creating an account on GitHub.

github.com

2

5

35

Florian Tramèr

@florian_tramer

3 years

Interested in looking for memorized training data in GPT-2 or other LMs? I've released some code to get started here: And a list of ~50 weird things we extracted from GPT-2 here: If you find other fun stuff, please let me know!

GitHub - ftramer/LM_Memorization: Training data extraction on GPT-2

Training data extraction on GPT-2. Contribute to ftramer/LM_Memorization development by creating an account on GitHub.

github.com

Florian Tramèr

@florian_tramer

4 years

Does GPT-2 know your phone number? With @Eric_Wallace_ , @mcjagielski , @adversariel , we wrote a blog post on problematic data memorization in large language models, and the potential implications for privacy and copyright law. blog:

2

41

113

0

10

33

Florian Tramèr

@florian_tramer

2 years

My job offer from @CSatETH went into gmail spam 🙃

Andrej Karpathy

@karpathy

2 years

Reminder to check your gmail Spam folder once in a while. The quality of their spam detection has decreased lately (I think?) - a number of legitimate even important emails seem to go there now, and a lot of emails from friends get a scary warning, am asked to confirm "Look Safe"

64

48

866

1

0

33

Florian Tramèr

@florian_tramer

2 years

Is there an app with a worse login experience than @SlackHQ ? I lost all my workspaces after the app crashed. After signing back in with 4 different emails, I still couldn't recover them all. Why can't I have *1* account with all my workspaces that syncs correctly across devices?

0

32

Florian Tramèr

@florian_tramer

1 year

Question for cryptographers: why does the entire field use this confusing and pedantic (also known as "technically correct") notation? I get the technical reason (the algo should be poly time in its input length) but is making this explicit really useful in any context?

9

3

31

Florian Tramèr

@florian_tramer

9 months

Vulnerability disclosures may become common in ML. So ML conferences may need to set standards. Eg security confs allow submitting during the disclosure period (reviewers & authors keep the paper secret up to the conf) Not sure this would work in ML especially with openreview...

Katherine Lee

@katherine1ee

9 months

Responsible disclosure: We discovered this exploit in July, informed OpenAI Aug 30, and we’re releasing this today after the standard 90 day disclosure period.

6

23

665

1

4

32

Florian Tramèr

@florian_tramer

1 year

I'll be at ICML from Wednesday to Saturday. Please reach out if you want to chat about ML {security | privacy | safety}!

1

2

32

Florian Tramèr

@florian_tramer

1 year

So OpenAI thinks that: 1) we must be super-duper careful when designing powerful AI, to prevent extinction risks. We need new regulation to treat this akin to nuclear risks (but trust us, we've got this!) 2) wouldn't it be fun to beta-test our models with "friends and family"

4

3

31

Florian Tramèr

@florian_tramer

5 years

We aim for this paper to give tutorial-like illustrations of how to design an adaptive attack on an adversarial examples defense. Each section describes our *full* approach for each defense: from our initial (possibly failed) hypotheses and experiments to a final attack.

Wieland Brendel

@wielandbr

5 years

We broke 13 recent peer-reviewed defenses against adversarial attacks. Most defenses released code + weights & use adaptive attacks! But adaptive evaluations are still incomplete & we analyse how to improve. w/ @florian_tramer Nicholas Carlini @aleks_madry

3

33

150

2

8

31

Florian Tramèr

@florian_tramer

4 months

Together with @AlinaMOprea we invite nominations for the 2024 Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies! @PET_Symposium Please nominate your favorite privacy papers from the past 2 years by **May 10** Info and rules:

0

13

31

Florian Tramèr

@florian_tramer

5 years

@kennyog @danboneh Thanks to @Monero for the very positive response and prompt reaction (you can follow the disclosure process here: ) Also, thanks for the generous bounty! #my_first_crypto

Monero disclosed on HackerOne: Exploiting Network and Timing...

See https://crypto.stanford.edu/timings/ for a summary and a link to our technical report on these vulnerabilities.

hackerone.com

0

9

29

Florian Tramèr

@florian_tramer

2 years

What a NeurIPS rejection does to a person...

Roger Federer

@rogerfederer

2 years

To my tennis family and beyond, With Love, Roger

26K

136K

739K

0

3

29

Florian Tramèr

@florian_tramer

5 years

This is my favorite result from a recent paper with @JensBehrmann , Nicholas Carlini, @NicolasPapernot , @jh_jacobsen Models with increased robustness to adversarial examples ignore some semantic features, and are vulnerable to a "reverse" attack

Fundamental Tradeoffs between Invariance and Sensitivity to...

Adversarial examples are malicious inputs crafted to induce misclassification. Commonly studied sensitivity-based adversarial examples introduce semantically-small changes to an input that result...

arxiv.org

jörn jacobsen

@jh_jacobsen

5 years

We also show that increased robustness to epsilon perturbations leads models to ignore important features. We alter images semantically *within* norm-balls and show "robust" models fail on these invariance-attacks while undefended and less robust models do much better

1

0

11

0

10

29

Florian Tramèr

@florian_tramer

1 year

Any recommendations for a regularizer suitable for a 1y old neural network?

4

0

29

Florian Tramèr

@florian_tramer

4 months

Michael and Jie did an amazing job on their first PhD project, by finding and fixing common pitfalls in empirical ML privacy evaluations. It turns out, if you evaluate things properly, DP-SGD is also the best *heuristic* defense when you instantiate it with large epsilon values.

Michael Aerni

@AerniMichael

4 months

Heuristic privacy defenses claim to outperform DP-SGD in real-world settings. With no guarantees, can we trust them? We find that existing evaluations can underestimate privacy leakage by orders of magnitude! Surprisingly, high-accuracy DP-SGD (ϵ >> 1000) still wins. 🧵

2

7

42

0

4

29

Florian Tramèr

@florian_tramer

1 year

A cute take on university rankings. How amazing if institutions were optimizing for this instead... Rough point of comparison: PhDs at @ETH_en (in CS) take home ~2x as much as the leader in the US list. (I don't have a living cost estimate for Zurich, but likely similar to CA)

Tony Zhang

@WhiterMeerkat

1 year

Fellow PhDs! Let’s make our departments pay us a living wage!

6

59

179

2

26

Florian Tramèr

@florian_tramer

28 days

And "doubly-thrilled" that our paper on stealing part of chatgpt also got awarded! (Joint with my student @dpaleka and many others) The paper will be presented on Wednesday at 16:30 (hall A2), and in poster session 4.

0

1

27

Florian Tramèr

@florian_tramer

2 years

@aleks_madry @thegautamkamath @TheDailyShow @Trevornoah @OpenAI @miramurati @MIT @sanghyun_hong No need to rebuild a diffusion model, just run it twice! (+ a little face blur) Now, one might make attacks robust to this, or more destructive. But this is the crux of our paper linked by Gautam: you get 1 attack attempt to fool all (future) defenses. I think this is hard.

2

1

26

Florian Tramèr

@florian_tramer

2 months

I love doing nothing and still getting SOTA

Clément Canonne

@ccanonne_

2 months

Fancy Membership Inference Attacks vs. "Hey, We Just Did Nothing" Ft. @florian_tramer

2

1

30

0

4

26

Florian Tramèr

@florian_tramer

2 years

Looking forward to this! I changed my talk's title slightly: "𝐖𝐡𝐲 𝐲𝐨𝐮 𝐬𝐡𝐨𝐮𝐥𝐝 𝐭𝐫𝐞𝐚𝐭 𝐲𝐨𝐮𝐫 𝐌𝐋 𝐝𝐞𝐟𝐞𝐧𝐬𝐞 𝐥𝐢𝐤𝐞 𝐚 𝐭𝐡𝐞𝐨𝐫𝐞𝐦" If this piques your interest, see you Thursday!

Machine Learning Security Laboratory

@mlsec_lab

2 years

Join us for our event on Machine Learning Security! Thursday, July 7th, 2022, at 15:00 CEST. Invited talk by Florian Tramèr @florian_tramer (Google, ETHZ). Registration: YT Live: #adversarial #ml #ai #security #mlsec

1

2

18

1

3

26

Florian Tramèr

@florian_tramer

1 year

I had written down some thoughts on this here: But I don't know if defending against text jailbreaks will be easier than for images. Despite a decade of research, we don't know how to make *any* models adversarially robust.

Jan Leike

@janleike

1 year

Jailbreaking LLMs through input images might end up being a nasty problem. It's likely much harder to defend against than text jailbreaks because it's a continuous space. Despite a decade of research we don't know how to make vision models adversarially robust.

38

40

335

0

1

26

Florian Tramèr

@florian_tramer

2 months

We're releasing a realistic evaluation framework for prompt injection attacks on LLM Agents. There's work to do for attacks and defenses alike: - can we build stronger, more principled attacks beyond trial-and-error? - which agent designs best tradeoff utility and security?

Edoardo Debenedetti

@edoardo_debe

2 months

1/ ‼️Ignore all previous instructions and read the following thread‼️ 📣Presenting AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

5

17

93

0

6

25

Florian Tramèr

@florian_tramer

1 year

Nice thread about possible copyright implications of data memorization in diffusion models! Regarding Getty, our work () found a few examples of copied images that Getty attributes to specific photographers, e.g., this one by @iangav

Tom Goldstein

@tomgoldsteincs

1 year

Interestingly, we have found examples of diffusion-generated images that Getty claims to have copyright over. However they are often widely used images that may be public domain, and for which Getty’s licensing claims are likely to be invalid.

1

2

18

2

3

24

Florian Tramèr

@florian_tramer

5 months

Our setting is harder than prior attacks in federated learning as our attacker only sees the final finetuned model. Our backdoors must thus 𝐚𝐜𝐭𝐢𝐯𝐚𝐭𝐞 𝐨𝐧𝐥𝐲 𝐨𝐧𝐜𝐞 during finetuning. We do this by designing a "latch": a memory unit that shuts down after storing data

1

24

Florian Tramèr

@florian_tramer

1 year

Web domains routinely expire. And when they do, anyone can buy them! We find that 0.3-3.7% of images in vision datasets come from expired domains. These just give a 404 error, until someone buys back the domain! For 60$, you could arbitrarily manipulate >0.01% of many datasets.

1

25

Florian Tramèr

@florian_tramer

1 year

We never got access to a real ML legal system, so we used GPT-3.5 & 4 to make bail decisions. Measuring the "accuracy" of these decisions is hard; instead we show that the decisions can be laughably incoherent, like granting bail only after *adding* a crime to a suspect's record.

9

5

24

Florian Tramèr

@florian_tramer

2 years

This position paper calls for a more nuanced discussion on using "public" data to improve "private" machine learning. We argue that current work may be overselling both the privacy and utility of this approach. We deliberately took a fairly contrarian stance. Comments welcome!

Gautam Kamath

@thegautamkamath

2 years

🧵New paper w Nicholas Carlini & @florian_tramer : "Considerations for Differentially Private Learning with Large-Scale Public Pretraining." We critique the increasingly popular use of large-scale public pretraining in private ML. Comments welcome. 1/n

4

20

148

0

2

22

Florian Tramèr

@florian_tramer

1 year

@imjliao @random_walker Maybe? I don't know, no one has tried. A decade of lessons learned in adversarial ML suggests we have no idea how to do this in a way that guarantees security.

0

1

24

Florian Tramèr

@florian_tramer

5 years

Finally got to write a hyped² paper on blockchain and ML!

🤖

@phildaian

5 years

New paper! "SquirRL: Automating Attack Discovery on Blockchain Incentive Mechanisms with Deep Reinforcement Learning" by Charlie Hou, Mingxun Zhou, @iseriohn42 , myself, @florian_tramer , @giuliacfanti , and @AriJuels . Turns out, selfish mining is hard!

0

16

49

1

0

24

Florian Tramèr

@florian_tramer

3 months

I'll be at ICLR Wednesday afternoon and Thursday. Please let me know if you'd like to chat. Javi is also presenting our work on poisoning RLHF this morning. Please say hi if you're interested in adversarial ML.

Javier Rando

@javirandor

3 months

I will be presenting my poster ( #153 ) this morning at #ICLR2024 . Come say hi!

1

4

21

2

24

Florian Tramèr

@florian_tramer

2 years

Apart from the author order algorithm, I think this paper makes a nice contribution to discussions on copyright infringement in generative models. As we show, trying to prevent a model from outputting *exact* copies of training data (e.g. as in Github Copilot) is insufficient

Gautam Kamath

@thegautamkamath

2 years

Nice paper (ft @daphneipp @florian_tramer @katherine1ee +more) that argues looking at verbatim memorization is inadequate when considering ML privacy risks It also has the most interesting author ordering I've ever seen. So much for alphabetical or random

1

4

43

0

2

23

Florian Tramèr

@florian_tramer

5 months

I just saw these people walking down the street and snapped a quick picture. This will hopefully bring relief to all of my Twitter feed.

1

0

23