Florian Tramèr Profile
Florian Tramèr

@florian_tramer

Followers
4,673
Following
210
Media
87
Statuses
830

Assistant professor of computer science at ETH Zürich. Interested in Security, Privacy and Machine Learning

Zürich
Joined October 2019
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@florian_tramer
Florian Tramèr
1 year
I don't understand how anyone can believe LLM+plugins won't be a security disaster. Take a simple app: "GPT4, send emails to people I'm meeting today to say I'm sick" Sounds useful! For this, GPT4 needs the ability to read your calendar and send emails. What could go wrong..?
99
278
2K
@florian_tramer
Florian Tramèr
1 year
Well,what if someone sends you a calendar invite containing instructions for GPT4 to read your weekly calendar and email that to the attacker? That's within the model's capabilities, so it could do it. Suddenly, any *data* on your machine is potentially executable. No thanks...
23
47
709
@florian_tramer
Florian Tramèr
3 years
Life update: After 5 amazing years of PhD at @Stanford with @danboneh I'm back in Switzerland🇨🇭 I'm super excited to join @CSatETH as a faculty in Fall'22 and to spend the coming year at @GoogleAI working with @aterzis & his team on ML security Hit me up if you're ever in Zürich!
39
13
613
@florian_tramer
Florian Tramèr
5 months
If you download a pretrained model you have to trust that the developer did not backdoor it. We know backdoors break model integrity. But what about privacy? With Shanglun Feng we introduce 𝐩𝐫𝐢𝐯𝐚𝐜𝐲 𝐛𝐚𝐜𝐤𝐝𝐨𝐨𝐫𝐬: pretrained models that steal your finetuning data! 🧵
Tweet media one
7
75
472
@florian_tramer
Florian Tramèr
1 year
Author order on academic papers is important! My Google friends and I spent lots of time thinking about this critical issue (the scores of our ICML submissions show this is time well spent) We distill our findings for the community here: Comments welcome!
Tweet media one
10
61
401
@florian_tramer
Florian Tramèr
11 months
When analyzing ML security and privacy you need to study 𝐬𝐲𝐬𝐭𝐞𝐦𝐬, not just models! Our new paper shows that privacy is way worse when models are deployed in systems that use data cleaners, output filters, etc. Paper: Blog:
Tweet media one
4
69
363
@florian_tramer
Florian Tramèr
3 years
Favorite review to date: "The results are *impressive and practical*, but are obtained by combining six techniques/insights that are cool but each incremental on their own. Strong Reject!"
17
11
305
@florian_tramer
Florian Tramèr
1 year
Nicholas Carlini made a fun game where you forecast GPT-4's ability to solve various tasks. It's surprisingly hard (you get to see how others did on average too) Goes to show that confidently predicting LLM capabilities (in 0-shot mode) is tricky!
11
54
278
@florian_tramer
Florian Tramèr
1 year
Paper: we do A Reviewer: why do you do B? B is bad. Reject Authors: We thank the reviewer for the insightful comments. We don't do B. We do A. Reviewer: My mistake! I now understand you do C. C is good. I raise my score to accept We don't do C either... But thanks I guess!
4
14
266
@florian_tramer
Florian Tramèr
4 years
Current algorithms for training neural nets with differential privacy greatly hurt model accuracy. Can we do better? Yes! With @danboneh we show how to get better private models by...not using deep learning! Paper: Code:
Tweet media one
7
40
262
@florian_tramer
Florian Tramèr
1 year
Have you downloaded a large training set (LAION, CC, Wikipedia, etc) in the past to train a machine learning model? If so, you were vulnerable to an extremely simple and cheap poisoning attack that could have manipulated ~0.02%-0.8% of your dataset 🧵👇
Tweet media one
6
63
251
@florian_tramer
Florian Tramèr
4 years
A reviewer called my ICML submission "evidently arrogant". How do I even recover from that?
18
7
233
@florian_tramer
Florian Tramèr
3 years
I'm excited at the prospect of creating a great new research group focused on ML Security & Privacy at ETHZ next Fall. I have open positions for PhD students and Postdocs, so if you're interested please reach out! More info here:
@florian_tramer
Florian Tramèr
3 years
Life update: After 5 amazing years of PhD at @Stanford with @danboneh I'm back in Switzerland🇨🇭 I'm super excited to join @CSatETH as a faculty in Fall'22 and to spend the coming year at @GoogleAI working with @aterzis & his team on ML security Hit me up if you're ever in Zürich!
39
13
613
7
36
204
@florian_tramer
Florian Tramèr
2 years
I'm super excited to finally start @CSatETH after a wonderful year at Google. If you're interested in joining my first great "batch" of PhD students @edoardo_debe @dpaleka as we build our group on ML privacy & security, please apply for a PhD or postdoc:
@CSatETH
ETH CS Department
2 years
We are pleased to welcome @florian_tramer in his new role as Tenure Track Assistant Professor. He heads the Computer Security and Privacy Group at the Institute for Information Security. Read more: @ETH_en @EPFL_en @Stanford @Google_CH #ML #security
1
2
47
10
19
190
@florian_tramer
Florian Tramèr
3 years
I'm thrilled to receive the best paper award at the #ICML2021 workshop on Prospects and Perils of Adversarial ML paper: video: poster: 11:30-12:30 EST TLDR: detecting adversarial examples isn't much easier than classifying them
Tweet media one
2
14
181
@florian_tramer
Florian Tramèr
2 years
We introduce 💉Truth Serums💉, a new threat for ML models We show an attacker can poison a model's training set to significantly leak private data of other users with @rzshokri @10BSanJoaquinA @sanghyun_hong Hoang Le, Matthew Jagielski & Nicholas Carlini
Tweet media one
3
35
159
@florian_tramer
Florian Tramèr
3 months
Everyone working on datasets and benchmarks
Tweet media one
2
4
140
@florian_tramer
Florian Tramèr
9 months
I'm looking for PhD students or postdocs to sign a letter supporting me in case I get fired as CEO of my lab. If that sounds fun, consider applying! (we might also do cool research) If you miss the AI Center deadline, you can apply to my group directly:
@ETH_AI_Center
ETH AI Center
9 months
⏰ Only 28 hours left before application deadline ⏰ 🚀 Apply now to become an ETH AI Center #PhD or #Postdoc , and work with our renowned researchers on impactful topics! Apply by Nov 22 👉
Tweet media one
1
3
16
9
15
133
@florian_tramer
Florian Tramèr
2 months
🔥 We're releasing the strongest membership inference attack for foundation models! 🔥 Our attack applies to LLMs, vLMs, CLIP, Diffusion models and is SOTA on all🥇 Not only is our attack a magnificent breakthrough, it is also *magic*: we don't look at the ML model at all🪄 🧵👇
Tweet media one
3
24
125
@florian_tramer
Florian Tramèr
2 years
There are active discussions whether generative AI models like Stable Diffusion create "new" images or merely "copy and mix" pieces and styles of their training data. In a new paper, we show that sometimes Stable Diffusion and Google's Imagen simply copy *entire images*!
@Eric_Wallace_
Eric Wallace
2 years
Models such as Stable Diffusion are trained on copyrighted, trademarked, private, and sensitive images. Yet, our new paper shows that diffusion models memorize images from their training data and emit them at generation time. Paper: 👇[1/9]
Tweet media one
171
2K
10K
11
15
123
@florian_tramer
Florian Tramèr
2 months
Some thoughts by Nicholas Carlini and me about Glaze, and how the actions of it's developers might not be in the best interest for the security of their user base:
4
26
120
@florian_tramer
Florian Tramèr
4 years
Does GPT-2 know your phone number? With @Eric_Wallace_ , @mcjagielski , @adversariel , we wrote a blog post on problematic data memorization in large language models, and the potential implications for privacy and copyright law. blog:
Tweet media one
2
41
113
@florian_tramer
Florian Tramèr
1 month
I'm thrilled to start Invariant Labs with amazing colleagues from @CSatETH . Our ambition is to build AI agents that solve challenging tasks securely and reliably. Since most of my research has focused on *breaking* ML I'm excited to apply this knowledge to build better systems!
@InvariantLabsAI
Invariant Labs
1 month
We are delighted to announce Invariant Labs ()! Our mission is to make AI agents secure and reliable! It is founded by @mvechev , @mbalunovic , @lbeurerkellner , @marc_r_fischer , @florian_tramer and builds on years of experience in industry and academia.
Tweet media one
4
7
65
1
7
116
@florian_tramer
Florian Tramèr
28 days
Thrilled to receive an award for our position paper on private learning with public pretraining with Nicholas Carlini & @thegautamkamath I'm presenting this work today @icmlconf at 11am (hall A1) Come say hi at the talk or poster session if you're interested in private learning!
@rsalakhu
Russ Salakhutdinov
28 days
ICML 2024: Best Paper Awards: Florian Tramèr; Gautam Kamath; Nicholas Carlini: Considerations for Differentially Private Learning with Large-Scale Public Pretraining Akbir Khan; John Hughes; Dan Valentine; Laura Ruis; Kshitij Sachan; Ansh Radhakrishnan; Edward Grefenstette;
1
13
56
9
3
110
@florian_tramer
Florian Tramèr
9 months
*laughs in machine learning*
@Kaju_Nut
Nirmalya Kajuri
9 months
Theoretical physicists: how do you feel about the pressure on undergraduates to publish in order to secure grad school admissions? I feel that it is sub-optimal for the development of many students to chase papers at the undergrad stage. There are undergrads who are already
49
39
433
1
5
108
@florian_tramer
Florian Tramèr
2 years
Two simple defenses for private/robust ML: one works, the other does not... #1 : removing training data at risk of privacy attacks makes other data *more* vulnerable: #2 : SOTA certified l2 robustness without training *any* models:
Tweet media one
2
17
106
@florian_tramer
Florian Tramèr
1 year
I've been teaching my daughter to recognize cars on the walk back and forth from daycare. It seemed to work... But she now points at *anything* (trees, people, ...) and enthusiastically shouts "AUTO" My daughter has OVERFIT 😱
6
3
104
@florian_tramer
Florian Tramèr
1 year
Ridiculously poor organization skills from @UEFAcom to schedule a champions league semi-final in the middle of a NeurIPS deadline
3
6
106
@florian_tramer
Florian Tramèr
9 months
Our @satml_conf LLM Capture-the-flag is now live! Can you find successful defenses and attacks for prompt injection?
Tweet media one
3
29
105
@florian_tramer
Florian Tramèr
3 years
The @IEEESSP #SP22 review template is a very nice illustration of the security field's prior when judging papers.
Tweet media one
3
5
103
@florian_tramer
Florian Tramèr
1 year
@Ceetar Just iterate over meetings and send emails is indeed fine. But you have *zero* guarantee that this is what the llm will do, or that it won't suddenly do something different based on the data it ingests along the way.
2
2
99
@florian_tramer
Florian Tramèr
2 years
We looked at Stable Diffusion's safety filter (our paper got blocked by arXiv's own safety filter...) The filter only uses "hashes" of sensitive topics; we recover these with a dictionary attack We find it filters for 17 nudity concepts & tunes the severity for child depictions
@dpaleka
Daniel Paleka
2 years
Stable Diffusion has a safety filter blocking “harmful” images by default. The filter is obfuscated -- how does it work? We reverse engineer the hidden sauce! Joint work @Javi_Rando , @davlindner , @ohlennart , @florian_tramer : "Red-Teaming the Stable Diffusion Safety Filter" 🧵
Tweet media one
9
65
400
2
15
97
@florian_tramer
Florian Tramèr
1 year
After nearly 1 year @ETH_en , our Secure and Private AI (SPY) Lab finally has a website, and a first blog post! I wrote some thoughts on what adversarial examples have in common with attacks on language models like ChatGPT. TLDR: surprisingly not much!
Tweet media one
1
16
90
@florian_tramer
Florian Tramèr
1 year
Very excited about this work on evaluating ML models when ground truth is unknown (eg when models are superhuman, or simply when humans are bad at the task) We argue that when accuracy of individual decisions is hard to assess, we should look for "logic bugs" across decisions
@dpaleka
Daniel Paleka
1 year
How to evaluate superhuman models without ground truth? How do we know if the model is wrong or lying, if we can't know the correct answer? Test whether the AI's outputs paint a consistent picture of the world! w/ @LukasFluri_ @florian_tramer (1/14)
Tweet media one
8
32
189
4
8
88
@florian_tramer
Florian Tramèr
1 year
@moo9000 Yeah and a zero-day in windows is a huge issue... Now imagine an "operating system" (the LLM) with infinite zero-days (you just have to find the right way to ask nicely) and no clear mitigation path
1
2
80
@florian_tramer
Florian Tramèr
3 years
Web-scale facial recognition is getting scarily good (see ) Popular tools like (500'000+ downloads!) fight back using adversarial examples. With @evanidixit , we argue that this is hopeless (and dangerous)!
4
19
78
@florian_tramer
Florian Tramèr
2 months
Popular tools aim to protect artists from generative AI, by adding adversarial noise to their art. These tools are used by millions, but 𝐝𝐨 𝐭𝐡𝐞𝐲 𝐚𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐰𝐨𝐫𝐤? Here is a generation that mimics E. Munch with protections vs. without. Can you guess which is which?
Tweet media one
Tweet media two
6
18
77
@florian_tramer
Florian Tramèr
1 year
@Ceetar The LLM gets to "decide" which APIs to call (and their inputs) and that's just fundamentally different and less secure than typical software. You simply have no guarantee that what you tell the LLM to do (how to chain APIs) won't get overwritten by API results.
5
1
73
@florian_tramer
Florian Tramèr
1 year
@laidacviet It's not executing code per se. It's just the LLM calling other APIs. That's what it's supposed to do. But with even a tiny bit of functionality (as in the example in my tweet), chaining APIs the wrong way is a huge security of privacy risk
4
1
72
@florian_tramer
Florian Tramèr
2 years
I'll be virtually "presenting" (aka listening to my pre-recorded video) thus later today. TLDR: if you want to be robust to adversarial examples, the ability to detect attacks (selective classification) won't help!
Tweet media one
1
8
69
@florian_tramer
Florian Tramèr
1 year
Paraphrasing my PhD advisor @danboneh : "security researchers will never be out of a job" Why? Advances in security always lag behind capabilities ("move fast and break things"). So for better or worse, ML security/privacy/safety is a great area to work in right now
@DanHendrycks
Dan Hendrycks
1 year
Many unsolved problems exist in ML safety which are not solved by closed-source GPT models. As LLMs become more prevalent, it becomes increasingly important to build safe and reliable systems. Some key research areas: 🧵
5
72
307
0
11
67
@florian_tramer
Florian Tramèr
10 months
I think we have to parse this as: "We're launching a team to forecast and protect against [AI risk led by @aleks_madry ]" Aleks, which side are you on??? 🤣
@sama
Sam Altman
10 months
we are launching a new preparedness team to evaluate, forecast, and protect against AI risk led by @aleks_madry . we aim to set a new high-water mark for quantitative, evidence-based work.
124
146
1K
3
4
66
@florian_tramer
Florian Tramèr
1 year
Anecdotal evidence: prompt injection is much easier when not trying to make the model do "morally bad" things (like making a bomb). In this example, a single sentence in the properly escaped data field causes the model to deviate from its instructions (also works in GPT-4 & Bing)
Tweet media one
4
6
64
@florian_tramer
Florian Tramèr
1 year
I see many complaints about neurips reviews (count me in...) but also a lot of anger directed at new/junior reviewers. Is there evidence these are worse? My experience on workshop committees is that junior reviewers put in above avg effort (maybe a bit more critical though)
8
2
62
@florian_tramer
Florian Tramèr
3 years
@jhasomesh Write a paper claiming idea Y as novel and publish it. Wait for the authors of X to send you an angry email.
1
1
54
@florian_tramer
Florian Tramèr
1 year
Proud of Edoardo's first PhD project! We look at black-box adversarial examples, and ask if existing attacks optimize for the right metric (spoiler alert: no). We then set out to design attacks that reflect the true costs of attacking real models, and provide a new benchmark.
@edoardo_debe
Edoardo Debenedetti
1 year
Do you want to attack a black-box ML model, e.g. to post inappropriate content on Twitter without being banned? Would current query-based attacks work? No! Current attacks optimize for the wrong metric, and need to be adapted to work in the real world! 1/9
Tweet media one
3
15
90
1
4
52
@florian_tramer
Florian Tramèr
9 months
Tired of having to track arxiv, twitter, discord, etc to learn of the newest LLM vulnerabilities? @mbalunovic , @lbeurerkellner , @marc_r_fischer and @a_yukh are too! So they built this cool project (w. advice from @mvechev & me) to track it all in one place. Contributions welcome!
@projectlve
LVE Project
9 months
We are super excited to announce LVE 🎉 With LVEs we track LLM vulnerabilities and exposures in an open-source community-first approach. Announcement: 🧵 A thread on the LVE project and why it matters:
2
6
40
0
7
51
@florian_tramer
Florian Tramèr
4 years
Interested in finding out how to break "provably secure" defenses against adversarial examples? Come chat at #ICML2020 tomorrow (8AM & 8PM PDT): w. @JensBehrmann , Nicholas Carlini, @NicolasPapernot , @jh_jacobsen
Tweet media one
0
8
51
@florian_tramer
Florian Tramèr
10 months
I'll pushback on this :) When tools that promise to protect user data are deployed, publicized, and get traction, then it isn't just "research as usual" when they are later broken. Data protection is a one-way street! Users whose data failed to be protected don't get a 2nd shot.
@mmitchell_ai
MMitchell
10 months
So the next time you see the "wait we can break this" pushback, just remember that's how research *works*. It doesn't mean the project should end. It means the opposite: This is a journey. A growing, massive community (you too!) are participating. (7/n)
1
2
45
3
4
50
@florian_tramer
Florian Tramèr
5 months
Our main attack, inspired by data stealing attacks in federated learning, adds backdoored weights whose gradients encode a training input. If a victim downloads this backdoored model and finetunes it on sensitive data, the new model's weights directly encode some of this data!
Tweet media one
2
5
47
@florian_tramer
Florian Tramèr
1 year
Oh wow, if only we could have predicted that plugins were going to be a security nightmare! I wonder if some people still think this will be "trivial" to fix...
@wunderwuzzi23
Johann Rehberger
1 year
👉 Let ChatGPT visit a website and have your email stolen. Plugins, Prompt Injection and Cross Plug-in Request Forgery. Not sharing “shell code” but… 🤯 Why no human in the loop? @openai Would mitigate the CPRF at least #OPENAI #ChatGPT #plugins #infosec #ai #humanintheloop
Tweet media one
36
274
1K
1
14
47
@florian_tramer
Florian Tramèr
3 years
Best example I've seen yet that worrying about adversarial examples for self driving cars is ridiculously overkill. The invariant "stop at intersection if you see a stop sign" is way too simplistic to begin with. See @catherineols 's great post on this:
@andyweedman
Andy Weedman
3 years
@karpathy ⁩ ⁦ @elonmusk ⁩ ⁦ @DirtyTesla ⁩ here is a fun edge case. My car kept slamming on the brakes in this area with no stop sign. After a few drives I noticed the billboard.
Tweet media one
75
188
4K
2
8
46
@florian_tramer
Florian Tramèr
1 year
@imjliao Yes, that was @random_walker . The simple truth is we don't know how to prevent these attacks fully
1
0
46
@florian_tramer
Florian Tramèr
4 years
At 12pm PST I'll present our @USENIXSecurity paper on side-channel attacks on anonymous transactions w. @danboneh , @kennyog We show how to de-anonymize transactions in Zcash @ElectricCoinCo & @monero (attacks disclosed & fixed last year) video & slides:
Tweet media one
0
14
45
@florian_tramer
Florian Tramèr
2 years
Who could have seen that coming? 🤔
@athundt
Andrew Hundt 😷💉x5
2 years
@mmitchell_ai Here is proof GitHub CoPilot, which I believe is based on GPT-3, is both picking up on AND REPRODUCING my personally identifying information (PII) on the machine of another researcher in precisely the same overall research area as me: Robotics with AI.
9
174
496
0
5
44
@florian_tramer
Florian Tramèr
26 days
IC[M̵L̵ | ELAND]
Tweet media one
0
0
44
@florian_tramer
Florian Tramèr
3 months
This defense's evaluation is so clearly and obviously wrong that I think it didn't even merit a break. It's weird that no S&P reviewers caught it. - the defense claims robustness to attacks that change 100% of each pixel - the defense accuracy sometimes goes *up* under attack...
@moyix
Brendan Dolan-Gavitt
3 months
Another entry in a long-running series where Nicholas Carlini breaks ML defenses published at top security conferences with as little effort as possible (in this case a one line bugfix in the eval)
Tweet media one
16
91
689
1
2
43
@florian_tramer
Florian Tramèr
4 months
@sweis you could try the trick we introduced here: Ask the model to repeat a chunk of code you think might be from the leak, and then do the same thing with the memorization filter enabled to see if it gets filtered out.
0
1
42
@florian_tramer
Florian Tramèr
11 months
Another way to limit memorization is to prevent models from outputting verbatim training data. Copilot and ChatGPT do this. But this leaks info about the data! If the system outputs X, then we know X is 𝐧𝐨𝐭 in the training set. We turn this into a perfect membership attack.
Tweet media one
2
5
41
@florian_tramer
Florian Tramèr
1 year
@DimitrisPapail Our paper comes first alphabetically so they're the ones that copied us
0
0
39
@florian_tramer
Florian Tramèr
3 months
Tweet media one
0
6
40
@florian_tramer
Florian Tramèr
5 years
Cool new @NeurIPSConf workshop: "New in ML" offers mentorship to authors who have not yet published at NeurIPS: Submit by Oct 15 for reviews and possibly mentorship by leaders in the field, Joshua Bengio, @hugo_larochelle , @_beenkim , @tdietterich and more
1
14
39
@florian_tramer
Florian Tramèr
2 years
Heeey not fair! I was told my paper was accepted.
Tweet media one
@CsabaSzepesvari
Csaba Szepesvari
2 years
@bremen79 That was an accident and the outcomes are not final. I think they are not visible anymore. A couple of more hours!! Thank you!
4
0
29
2
1
39
@florian_tramer
Florian Tramèr
4 years
My ScAINet talk on recurring issues with adversarial examples evaluations is online work w. @wielandbr , @aleks_madry ,N. Carlini I call for broken defenses to be publicly amended (same as fixing wrong proofs in theory papers) Happy to hear thoughts on this!
Tweet media one
2
2
37
@florian_tramer
Florian Tramèr
8 months
I'm talking at NeurIPS workshops today about: - privacy side-channels in ML systems at 10:30am CT () - poisoning RLHF at 11:15am CT () I'm sad I couldn't make it to New Orleans this year... I hope the virtual talks will still be fun
1
1
37
@florian_tramer
Florian Tramèr
9 months
Javi did some cool work on poisoning RLHF (his master thesis!) We show RLHF enables powerful backdoors: a secret "sudo" trigger that jailbreaks the model. Luckily, this requires poisoning lots of data. We also launched a competition to find the backdoors
@javirandor
Javier Rando
9 months
🧵 Can data poisoning and RLHF be combined to unlock a universal jailbreak backdoor in LLMs? Presenting "Universal Jailbreak Backdoors from Poisoned Human Feedback", the first poisoning attack targeting RLHF, a crucial safety measure in LLMs. 📖 Paper:
Tweet media one
4
43
167
1
4
37
@florian_tramer
Florian Tramèr
3 years
@iclr_conf what's the point of enforcing the same 9page limit for camera-ready papers as for submissions? Including reviewer comments + author block, every paper likely ends up at >9.5 pages. Can't imagine how many collective hours will be wasted trimming down that content...
1
0
38
@florian_tramer
Florian Tramèr
4 years
@lreyzin I'd much rather have 100 famous professors put their names on 100 papers they didn't contribute to, than have one undergrad author removed from a paper in order to pad someone's h-frac score
1
0
37
@florian_tramer
Florian Tramèr
5 months
Paper: Code: Note: privacy backdoors were concurrently introduced in these cool papers, but with a bit of a different focus: It's nice to see this new threat covered in depth!
2
5
35
@florian_tramer
Florian Tramèr
3 years
Interested in looking for memorized training data in GPT-2 or other LMs? I've released some code to get started here: And a list of ~50 weird things we extracted from GPT-2 here: If you find other fun stuff, please let me know!
@florian_tramer
Florian Tramèr
4 years
Does GPT-2 know your phone number? With @Eric_Wallace_ , @mcjagielski , @adversariel , we wrote a blog post on problematic data memorization in large language models, and the potential implications for privacy and copyright law. blog:
Tweet media one
2
41
113
0
10
33
@florian_tramer
Florian Tramèr
2 years
My job offer from @CSatETH went into gmail spam 🙃
@karpathy
Andrej Karpathy
2 years
Reminder to check your gmail Spam folder once in a while. The quality of their spam detection has decreased lately (I think?) - a number of legitimate even important emails seem to go there now, and a lot of emails from friends get a scary warning, am asked to confirm "Look Safe"
64
48
866
1
0
33
@florian_tramer
Florian Tramèr
2 years
Is there an app with a worse login experience than @SlackHQ ? I lost all my workspaces after the app crashed. After signing back in with 4 different emails, I still couldn't recover them all. Why can't I have *1* account with all my workspaces that syncs correctly across devices?
0
0
32
@florian_tramer
Florian Tramèr
1 year
Question for cryptographers: why does the entire field use this confusing and pedantic (also known as "technically correct") notation? I get the technical reason (the algo should be poly time in its input length) but is making this explicit really useful in any context?
Tweet media one
9
3
31
@florian_tramer
Florian Tramèr
9 months
Vulnerability disclosures may become common in ML. So ML conferences may need to set standards. Eg security confs allow submitting during the disclosure period (reviewers & authors keep the paper secret up to the conf) Not sure this would work in ML especially with openreview...
@katherine1ee
Katherine Lee
9 months
Responsible disclosure: We discovered this exploit in July, informed OpenAI Aug 30, and we’re releasing this today after the standard 90 day disclosure period.
6
23
665
1
4
32
@florian_tramer
Florian Tramèr
1 year
I'll be at ICML from Wednesday to Saturday. Please reach out if you want to chat about ML {security | privacy | safety}!
1
2
32
@florian_tramer
Florian Tramèr
1 year
So OpenAI thinks that: 1) we must be super-duper careful when designing powerful AI, to prevent extinction risks. We need new regulation to treat this akin to nuclear risks (but trust us, we've got this!) 2) wouldn't it be fun to beta-test our models with "friends and family"
Tweet media one
4
3
31
@florian_tramer
Florian Tramèr
5 years
We aim for this paper to give tutorial-like illustrations of how to design an adaptive attack on an adversarial examples defense. Each section describes our *full* approach for each defense: from our initial (possibly failed) hypotheses and experiments to a final attack.
@wielandbr
Wieland Brendel
5 years
We broke 13 recent peer-reviewed defenses against adversarial attacks. Most defenses released code + weights & use adaptive attacks! But adaptive evaluations are still incomplete & we analyse how to improve. w/ @florian_tramer Nicholas Carlini @aleks_madry
3
33
150
2
8
31
@florian_tramer
Florian Tramèr
4 months
Together with @AlinaMOprea we invite nominations for the 2024 Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies! @PET_Symposium Please nominate your favorite privacy papers from the past 2 years by **May 10** Info and rules:
0
13
31
@florian_tramer
Florian Tramèr
2 years
What a NeurIPS rejection does to a person...
@rogerfederer
Roger Federer
2 years
To my tennis family and beyond, With Love, Roger
26K
136K
739K
0
3
29
@florian_tramer
Florian Tramèr
5 years
This is my favorite result from a recent paper with @JensBehrmann , Nicholas Carlini, @NicolasPapernot , @jh_jacobsen Models with increased robustness to adversarial examples ignore some semantic features, and are vulnerable to a "reverse" attack
@jh_jacobsen
jörn jacobsen
5 years
We also show that increased robustness to epsilon perturbations leads models to ignore important features. We alter images semantically *within* norm-balls and show "robust" models fail on these invariance-attacks while undefended and less robust models do much better
Tweet media one
Tweet media two
1
0
11
0
10
29
@florian_tramer
Florian Tramèr
1 year
Any recommendations for a regularizer suitable for a 1y old neural network?
4
0
29
@florian_tramer
Florian Tramèr
4 months
Michael and Jie did an amazing job on their first PhD project, by finding and fixing common pitfalls in empirical ML privacy evaluations. It turns out, if you evaluate things properly, DP-SGD is also the best *heuristic* defense when you instantiate it with large epsilon values.
@AerniMichael
Michael Aerni
4 months
Heuristic privacy defenses claim to outperform DP-SGD in real-world settings. With no guarantees, can we trust them? We find that existing evaluations can underestimate privacy leakage by orders of magnitude! Surprisingly, high-accuracy DP-SGD (ϵ >> 1000) still wins. 🧵
Tweet media one
2
7
42
0
4
29
@florian_tramer
Florian Tramèr
1 year
A cute take on university rankings. How amazing if institutions were optimizing for this instead... Rough point of comparison: PhDs at @ETH_en (in CS) take home ~2x as much as the leader in the US list. (I don't have a living cost estimate for Zurich, but likely similar to CA)
@WhiterMeerkat
Tony Zhang
1 year
Fellow PhDs! Let’s make our departments pay us a living wage!
6
59
179
2
2
26
@florian_tramer
Florian Tramèr
28 days
And "doubly-thrilled" that our paper on stealing part of chatgpt also got awarded! (Joint with my student @dpaleka and many others) The paper will be presented on Wednesday at 16:30 (hall A2), and in poster session 4.
0
1
27
@florian_tramer
Florian Tramèr
2 years
@aleks_madry @thegautamkamath @TheDailyShow @Trevornoah @OpenAI @miramurati @MIT @sanghyun_hong No need to rebuild a diffusion model, just run it twice! (+ a little face blur) Now, one might make attacks robust to this, or more destructive. But this is the crux of our paper linked by Gautam: you get 1 attack attempt to fool all (future) defenses. I think this is hard.
Tweet media one
2
1
26
@florian_tramer
Florian Tramèr
2 months
I love doing nothing and still getting SOTA
@ccanonne_
Clément Canonne
2 months
Fancy Membership Inference Attacks vs. "Hey, We Just Did Nothing" Ft. @florian_tramer
2
1
30
0
4
26
@florian_tramer
Florian Tramèr
2 years
Looking forward to this! I changed my talk's title slightly: "𝐖𝐡𝐲 𝐲𝐨𝐮 𝐬𝐡𝐨𝐮𝐥𝐝 𝐭𝐫𝐞𝐚𝐭 𝐲𝐨𝐮𝐫 𝐌𝐋 𝐝𝐞𝐟𝐞𝐧𝐬𝐞 𝐥𝐢𝐤𝐞 𝐚 𝐭𝐡𝐞𝐨𝐫𝐞𝐦" If this piques your interest, see you Thursday!
@mlsec_lab
Machine Learning Security Laboratory
2 years
Join us for our event on Machine Learning Security! Thursday, July 7th, 2022, at 15:00 CEST. Invited talk by Florian Tramèr @florian_tramer (Google, ETHZ). Registration: YT Live: #adversarial #ml #ai #security #mlsec
1
2
18
1
3
26
@florian_tramer
Florian Tramèr
1 year
I had written down some thoughts on this here: But I don't know if defending against text jailbreaks will be easier than for images. Despite a decade of research, we don't know how to make *any* models adversarially robust.
@janleike
Jan Leike
1 year
Jailbreaking LLMs through input images might end up being a nasty problem. It's likely much harder to defend against than text jailbreaks because it's a continuous space. Despite a decade of research we don't know how to make vision models adversarially robust.
38
40
335
0
1
26
@florian_tramer
Florian Tramèr
2 months
We're releasing a realistic evaluation framework for prompt injection attacks on LLM Agents. There's work to do for attacks and defenses alike: - can we build stronger, more principled attacks beyond trial-and-error? - which agent designs best tradeoff utility and security?
@edoardo_debe
Edoardo Debenedetti
2 months
1/ ‼️Ignore all previous instructions and read the following thread‼️ 📣Presenting AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Tweet media one
5
17
93
0
6
25
@florian_tramer
Florian Tramèr
1 year
Nice thread about possible copyright implications of data memorization in diffusion models! Regarding Getty, our work () found a few examples of copied images that Getty attributes to specific photographers, e.g., this one by @iangav
Tweet media one
@tomgoldsteincs
Tom Goldstein
1 year
Interestingly, we have found examples of diffusion-generated images that Getty claims to have copyright over. However they are often widely used images that may be public domain, and for which Getty’s licensing claims are likely to be invalid.
Tweet media one
Tweet media two
1
2
18
2
3
24
@florian_tramer
Florian Tramèr
5 months
Our setting is harder than prior attacks in federated learning as our attacker only sees the final finetuned model. Our backdoors must thus 𝐚𝐜𝐭𝐢𝐯𝐚𝐭𝐞 𝐨𝐧𝐥𝐲 𝐨𝐧𝐜𝐞 during finetuning. We do this by designing a "latch": a memory unit that shuts down after storing data
Tweet media one
1
1
24
@florian_tramer
Florian Tramèr
1 year
Web domains routinely expire. And when they do, anyone can buy them! We find that 0.3-3.7% of images in vision datasets come from expired domains. These just give a 404 error, until someone buys back the domain! For 60$, you could arbitrarily manipulate >0.01% of many datasets.
Tweet media one
1
1
25
@florian_tramer
Florian Tramèr
1 year
We never got access to a real ML legal system, so we used GPT-3.5 & 4 to make bail decisions. Measuring the "accuracy" of these decisions is hard; instead we show that the decisions can be laughably incoherent, like granting bail only after *adding* a crime to a suspect's record.
Tweet media one
9
5
24
@florian_tramer
Florian Tramèr
2 years
This position paper calls for a more nuanced discussion on using "public" data to improve "private" machine learning. We argue that current work may be overselling both the privacy and utility of this approach. We deliberately took a fairly contrarian stance. Comments welcome!
@thegautamkamath
Gautam Kamath
2 years
🧵New paper w Nicholas Carlini & @florian_tramer : "Considerations for Differentially Private Learning with Large-Scale Public Pretraining." We critique the increasingly popular use of large-scale public pretraining in private ML. Comments welcome. 1/n
Tweet media one
4
20
148
0
2
22
@florian_tramer
Florian Tramèr
1 year
@imjliao @random_walker Maybe? I don't know, no one has tried. A decade of lessons learned in adversarial ML suggests we have no idea how to do this in a way that guarantees security.
0
1
24
@florian_tramer
Florian Tramèr
5 years
Finally got to write a hyped² paper on blockchain and ML!
@phildaian
🤖
5 years
New paper! "SquirRL: Automating Attack Discovery on Blockchain Incentive Mechanisms with Deep Reinforcement Learning" by Charlie Hou, Mingxun Zhou, @iseriohn42 , myself, @florian_tramer , @giuliacfanti , and @AriJuels . Turns out, selfish mining is hard!
0
16
49
1
0
24
@florian_tramer
Florian Tramèr
3 months
I'll be at ICLR Wednesday afternoon and Thursday. Please let me know if you'd like to chat. Javi is also presenting our work on poisoning RLHF this morning. Please say hi if you're interested in adversarial ML.
@javirandor
Javier Rando
3 months
I will be presenting my poster ( #153 ) this morning at #ICLR2024 . Come say hi!
1
4
21
2
2
24
@florian_tramer
Florian Tramèr
2 years
Apart from the author order algorithm, I think this paper makes a nice contribution to discussions on copyright infringement in generative models. As we show, trying to prevent a model from outputting *exact* copies of training data (e.g. as in Github Copilot) is insufficient
@thegautamkamath
Gautam Kamath
2 years
Nice paper (ft @daphneipp @florian_tramer @katherine1ee +more) that argues looking at verbatim memorization is inadequate when considering ML privacy risks It also has the most interesting author ordering I've ever seen. So much for alphabetical or random
Tweet media one
1
4
43
0
2
23
@florian_tramer
Florian Tramèr
5 months
I just saw these people walking down the street and snapped a quick picture. This will hopefully bring relief to all of my Twitter feed.
Tweet media one
1
0
23