![Niloofar (on faculty job market!) Profile](https://pbs.twimg.com/profile_images/1731163196751597568/UBDg555W.jpg)
Niloofar (on faculty job market!)
@niloofar_mire
Followers
6K
Following
72K
Media
183
Statuses
3K
Niloofar Mireshghallah -- postdoc @uwcse-@uwnlp, Ph.D. from @ucsd_cse /Privacy, ML, NLP @winlpworkshop ex-chair, NAACL 2025 D&I chair, ex @MSFTResearch
Seattle, WA
Joined May 2013
I'm on the faculty market and at #NeurIPS!👩🏫. I work on privacy, memorization, and emerging challenges in data use for AI. Privacy isn't about PII removal but about controlling the flow of information contextually, & LLMs are still really bad at this!
9
91
415
When talking abt personal data people share w/ @OpenAI & privacy implications, I get the 'come on! people don't share that w/ ChatGPT!🫷'. In our @COLM_conf paper, we study disclosures, and find many concerning⚠️ cases of sensitive information sharing:.
6
56
208
I liked this a lot! In the same spirit, I also really liked @natolambert, @kylelostat and @AkshitaB93's tutorial at NeurIPS on building language models. Slides: Tutorial: (I think it will be released publicly soon as well).
Stanford released a 1.5 hours lecture on Building Large Language Models!. This lecture provides a concise overview of building a ChatGPT-like model, covering both pretraining (language modeling) and post-training (SFT/RLHF).
6
59
450
I just finished my thesis proposal on how large language models can be *Leaky*, *Sneaky* and *Creepy*, and I'm now a PhD Candi🍬. date📅! . Special thanks to my amazing committee: @BergKirkpatrick, @sameer_, Julian McAuley, Molly Roberts and Lawrence Saul!
41
13
342
Happy to share the work done during my internship @MSFTResearch is accepted at @NAACLHLT🎉! . "Privacy Regularization: Joint Privacy-Utility Optimization in Language Models" . We propose privacy mitigations for the LM training data memorization problem:.
16
18
264
Now that we are approaching the grad school/lab selection season, I wanted to write a thread 🧵on some unconventional advice about choosing which #PhD program to join (especially for #international students who are usually more under pressure because of Visa/immigration):.
1
51
204
Starting my "virtual" internship at @MSFTResearch AI's Knowledge Technologies and Intelligent Experience (KTX) group, where I'll help design #private and #ethical ML algorithms!
10
6
180
ICLR #Spotlight!! 🎉. 1. LLMs have knowledge of privacy norms and primitives but cannot effectively *apply* it. 2. Composing multiple pieces of information with different sensitivity levels makes it harder. 3. CHAIN OF THOUGHT DOESN’T HELP!. Paper:
🔐Can GPT-4 keep secrets & understand privacy?. We study a new critical failure mode of LLMs on reasoning about privacy at inference-time through our ConfAIde benchmark. We find that GPT-4 reveals secrets 39% of the time, which only worsens with CoT!. 🧵
7
19
181
Got some photos from @genlawcenter at DC, where I talked about Differential privacy, what it is and what it’s not!. Talk slides:
I will be talking about what differential privacy is, what it is not and what some common misconceptions are in privacy for generative AI in a couple hours @genlawcenter in DC! . Join us on the live stream: Slides:
5
6
175
It’s the season of “finding ur next adventure” and folks have been asking me about my experience doing a postdoc, so I did a blog post/video w/ @srush_nlp 's help!. Should I do a postdoc? What is a postdoc anyway?. Blog:
2
21
168
Checkout our #CCS2020 #PPMLP paper w/ @sahib_me @iamtrask @openminedorg : "Neither Private Nor Fair" where we dig deeper into the effects of differentially private deep learning on fairness. Paper: Code:
3
44
163
Checkout our #EACL2024 paper: "Smaller LMs are Better Machine-Generated Text Detectors", where we compare ALL models of different sizes against each other and show GPT2-small (120M) can detect ChatGPT generations better than a 7B GPTNeo model!.
5
20
161
When reading text, we sometimes form an opinion about the author just by the style of the text, and this opinion might bias our perception of the text itself. In our #emnlp2021 paper, we study and try to mitigate these biases:.code:
6
25
152
35k trained models on @huggingface, yet whenever we want to generate text w/ given attributes, we train new models/clsfrs. In our #ACL2022 paper, we enable using ANY arbitrary (even non-differentiable) expert for controllable generation, w/o ANY TRAINING!.
2
17
150
Call for papers🚨.We are accepting submissions for our.@iclr_conf workshop, Distributed and Private ML (DP-ML). The scope includes, but is not limited to:.Federated Learning, Differential Privacy for ML, FAT ML, Privacy in COVID era. Papers due Feb 25th.
3
46
148
I bought a sun lamp, swapped my umbrella for rain pants and moved from SD to sunny Seattle! I have joined @uwcse as a postdoc where I am extremely lucky to continue working on #privacy & #NLP with @YejinChoinka & @tsvetshop! . PS no, I still don't have a Subaru! Sorry pnw!
9
0
142
Best parts of @iclr_conf: we don't have to run from hall to hall, we can easily engage in live conversations and ask questions at poster sessions without having to squeeze through and we didn't have to leave a huge carbon footprint to attend! Kudos to the organizers👏.
1
4
137
See u in Vienna next week, but also in July!! 🇦🇹. #ICML2024
TFW you ask ChatGPT a question and think ‘well that cant be the only right answer…’. In new work we look into how LLM alignment impacts pluralism and ability to reflect diverse opinions (it decreases it), and make an argument for pluralistic evaluations!.
5
8
123
I will present our work on differentially private data synthesis for domain adaptation of semantic parsers at #ACL2023. We model parse trees as intermediate variables to capture the distribution of user data. w/ @adveisner @tatsu_hashimoto @ysu_nlp @rshin .
0
13
122
It’s a wrap for me 🎬#NeurIPS22
How do you #privately #compress large models (#LLMs), given the existing pre-train and fine-tune paradigm? In our #NeurIPS paper we try to answer this question and provide a DP framework for compression. Drop by our poster on Tuesday@11:30, Hall J#107!.
4
0
122
Our work on challenges and inconclusiveness of membership inference attacks on LLMs has been accepted to @COLM_conf!!. This work has instigated new directions and many conversations on MIA evaluations, I will list them here in this thread, add to it!.
Was this sequence in the training dataset or not?? . In new paper, we study why membership inference attacks show *near-random performance* on LLMs!! . We also release a Python package for seamless MIA evaluation!! . Paper: Repo:
4
10
122
In🇦🇹this week @iclr_conf to present:. 1⃣Privacy-preserving in-context learning with DP few-shot generation 🖼️Tue 10:45 B-229. 2⃣Can LLMs Keep a Secret? Testing Privacy Implications via Contextual Integrity .🖼️ Wed 16:30 B-215.
1
19
120
Excited to be at Stanford tmw!! Let me know if you want to meet up, 1:1s on the day of are filled but will be around for a few days!.
For this week’s NLP Seminar, we are thrilled to host @niloofar_mire to talk about Privacy, Copyright and Data Integrity: The Cascading Implications of Generative AI!. When: 1/16 Thurs 11am PT.Non-Stanford affiliates registration form (closed at 9am PT on the talk day):
3
10
123
I have been working on some course material for privacy in LLMs, and the recent survey by @SethInternet has been so helpful, great systemization of knowledge! I definitely recommend giving it a read if you want to get an overall idea of the field!.
5
15
113
I'm an avid Claude user and today I dug into @AnthropicAI's privacy & data policy. The way they twist words to make it seem like users are 'knowingly' consenting to the collection of their data made me uncomfortable:.- Using the thumbs up/down next to the message box is
5
14
117
Can the shape of the loss around a sequence tell us if it was part of the training data? YES! I will present our #ACL2023 work where we propose an MIA based on the loss curvature w/ @MatternJustus @ZhijingJin @mrinmayasachan @bschoelkopf @BergKirkpatrick
1
18
114
Excited to be giving a talk about my research on privacy and fairness of deep learning at the the Alan Turing institute next Wednesday! You can register here:.
Next Wednesday at 3pm (UK time), Fatemeh Mireshghallah @limufar (UCSD) will present "Low-overhead techniques for privacy and fairness of DNN training and inference". Register: Or watch the livestream:
4
17
107
I will be @COLM_conf! Reach out if you wanna chat Privacy, Regulations, Memorization, and Reasoning in LLMs! You can also find me presenting:. 1. Do Membership Inference Attacks Work on Large Language Models? 2. Discovering Personal Disclosures in
3
13
113
Just finished my remote presentation at #EMNLP2021, have fun to all the folks who are there in person in the Dominican Republic 🌴!. (Photo by @ZexueHe 📸)
When reading text, we sometimes form an opinion about the author just by the style of the text, and this opinion might bias our perception of the text itself. In our #emnlp2021 paper, we study and try to mitigate these biases:.code:
7
0
101
I’m visiting MPI, University of Saarland and CISPA this week, reach out if you wanna chat! I will be giving a talk on Privacy and LLMs @CISPA on Friday!. Also I will be vacationing in Istanbul, Amsterdam and Berlin the upcoming weeks, hmu if you wanna hangout and NOT research LOL
7
4
102
@yuqirose @ucsd_cse I think it was in 2020 or 2021 where w/ @ucsd_cse we had an initiative directed towards countries in crisis, to help waive fees. The app fee for a single school can be more than a year's income in Iran, and grad school is the ONLY hope for talented students to make it out!.
1
1
98
Now that conferences are virtual, why should the registration fee for a conference as an author be 268 euros? @TheWebConf . They also don’t allow submitting talk videos before registration, and I won’t be reimbursed by my former advisor with whom I’ve authored the paper 😕.
3
5
93
It was a pleasure to talk at the National Academies Forum about our recent works on sensitive disclosures in human-ChatGPT interactions and journalists' use of AI in the wild!. Talk slides: Papers:.
Great talk from @niloofar_mire from @uwcse at the @theNASEM Forum on Cyber Resilience meeting, where she talked about LLMs, security, privacy, and "oversharing"
3
15
98
Join us for an exciting program at the Distributed and Private Machine Learning (DPML) workshop @iclr_conf, happening tmw at 8:30AM PT. Invited speakers are Gauri Joshi, Graham Cormode, @UdacityDave and Lalitha Sankar!
1
16
98
I'm visiting @KhouryCollege tomorrow to give a talk about membership inference attacks, contextual integrity and language model privacy at the security seminar! Come by if you are interested!. Thanks @tianshi_li, @AlinaMOprea for organizing and also @davidbau for having me!!
2
9
100
Tom's talk for our paper: "Neither Private Nor Fair: Impact of Data Imbalance on Utility and Fairness in Differential Privacy" is up! Drop by at #ppmlp if you're attending #ccs, for Q&A. Paper: @openminedorg @sahib_me @iamtrask
Checkout our #CCS2020 #PPMLP paper w/ @sahib_me @iamtrask @openminedorg : "Neither Private Nor Fair" where we dig deeper into the effects of differentially private deep learning on fairness. Paper: Code:
0
12
90
Safety evals should be multi-turn and dynamic, so that they can play out different scenarios. In our new work led by @nlpxuhui, we do simulation-based, goal oriented multi turn safety evaluations for LLM agents w/ access to tools like Venmo, email, etc.
1/ What if you could see how your AI handles the chaos of the real world? Meet HAICOSYSTEM: the framework to simulate human-AI-environment interactions—all at once. 🌍🤖 Find out if your AI is truly safe under pressure from real-world scenarios! 🔥.🌐:
2
13
95
Fill out this form if you'd like to receive feedback on your CS #gradschool application material (SOP, CV, . )!. If you're interested in helping me w/ this voluntary&independent effort, sign up here so we can help more students!.
1
25
88
I’ll be at Johns Hopkins next week, talking about privacy, memorization and language models! Join us!.
New CS & @jhuclsp seminar on #GenerativeAI with @niloofar_mire on Monday—don’t miss it! Learn more here:
2
5
91
Thank you so much @trustworthy_ml 😊.
1/ In this week’s TrustML highlight, we are excited to feature Fatemehsadat Mireshghallah @limufar 🎉🎉⭐️⭐️. Fatemehsadat is a third-year CS PhD student at @ucsd_cse
0
3
80
Vienna was fun! See u again in July @icmlconf lol
In🇦🇹this week @iclr_conf to present:. 1⃣Privacy-preserving in-context learning with DP few-shot generation 🖼️Tue 10:45 B-229. 2⃣Can LLMs Keep a Secret? Testing Privacy Implications via Contextual Integrity .🖼️ Wed 16:30 B-215.
0
2
80
I will be presenting 2 papers @emnlpmeeting tmw, both on #privacy and #memorization in LLMs:.1. Poster session 1 @ 11AM: Quantifying privacy risks of #MLMs 2. Ethics Oral @ 2PM Hall B: Memorization in NLP Fine-tuning
1
19
80
Check out our survey on Privacy in Deep Learning, where we have collected and categorized attacks and mitigations. Please let me know if there are any missing citations!.This is a joint work with @proneat, @tremblerz, @raskarmit and @mktaram.
8
24
75
Check out our new pre-print on *Semantic Segmentation Interpretability* w/ @teddykoker @GKaissis @openminedorg!. Paper:Code:
New paper with @galaxygarden23, Tom Titcombe, @GKaissis at @gridai_ @openminedorg. We train a model to dynamically apply noise to images, learning which pixels of the image are necessary for downstream performance. Paper: Code: 1/n
3
13
75
I will be talking about what differential privacy is, what it is not and what some common misconceptions are in privacy for generative AI in a couple hours @genlawcenter in DC! . Join us on the live stream: Slides:
So excited to announce an event @genlawcenter has been working on!. We're discuss the misconceptions b/w the technical capabilities of evaluating generative AI, and what policymakers and civil society want. April 15th @GtownTechLaw, and live on zoom:
1
11
77
If you are interested in Privacy in ML internships definitely apply to PAI @MSFTResearch, great opportunity to work with amazing mentors 😊 . (this is coming from someone who interned with them twice lol).
Privacy in AI team at Microsoft Research is looking for interns for Summer 2022 to be working on privacy-preserving ML and/or Federated Learning projects. Help spread the word please 🙂
1
9
74
Are masked language models prone to memorization? Yes! In our new preprint, we study the privacy risks of MLMs using our reference-based membership inference attack, w/ @rzshokri, @BergKirkpatrick, @kartik_goyal_, & @ArchitUniyal3.
1
14
73
I agree with the advice given, but many students (e.g. Iranians) cannot go to visit days. Also, the burden of investigation should not be only on the student, universities SHOULD provide ACCOUNTABILITY measures for the faculty they hire. Like ratemyprofessor but for research!.
@heyelbs Good point! Its why you should talk to ex-students if you can. And visiting students should learn to read between the lines and take a lack of enthusiasm as a cause of concern.
5
1
73
Our first research scientists' meeting🎉 We are working on #privacy and systems solutions here at @openminedorg, join our community if you are interested! . Checkout our latest blog post on recommender system privacy:.
3
13
71
@jxmnop For prototyping pipelines that need differentially private fine-tuning T5 is really good, its small enough + there are existing DP-SGD implementations for it.
2
0
70
Super excited to talk about privacy-preserving NLP and language models! You can find a list of related papers here:.Please let me know if there are papers that are missing so I'll add them 😊.
PhD student at UC San Diego, @galaxygarden23 is joining #PriCon with the talk:. Privacy-preserving Natural language processing. Join us on 26.09.2020 - 27.09.2020. There is still time to register for a ticket:.
0
12
69
I'm still very perplexed by some findings we had around this last year: .Not only do models favor their own generations, smaller models can recognize BOTH big and small models, whereas big models only recognize their own generations!!.
LLMs have learned to recognize their own writing style, just like humans do with their handwriting. And so Self-aware LLMs prefer their own outputs, creating a digital version of confirmation bias. Original Problem 🤔:. LLMs exhibit self-preference bias when evaluating outputs,
2
4
68
Wanna use #private data as part of in-context examples to an LLM? Checkout our #ICLR2024 differentially private few-shot sample generation, which is already deployed for #RAG in @llama_index!!. Great collaboration w/ @XinyuTang7 & @MSFTResearch!
Doing In-Context Learning Without Leaking Private Data 🔐. Few-shot demonstrations are crucial to improve the performance of any LLM/RAG app. But the issue with very private datasets (e.g. patient clinical reports), is that they can easily be leaked/jailbroken by malicious users.
1
5
67
Excited to talk about privacy and memorizations in LLMs @ml_collective tomorrow!!. Join us at 10am PT!.
Join us tomorrow at 10 AM PDT for an insightful DLCT session with @niloofar_mire! We'll explore "Privacy in LLMs: Understanding how data is imprinted in language models, what data is imprinted, and how it might surface!" Don’t miss it! #AI #Privacy #LLMs
2
8
66
Happening now!! 🙈🙉🙊.Livestream link available on our website:
Join us tmw for the 5th PPAI workshop @RealAAAI, to discuss Generative AI, Privacy & Policy! . We have a line-up of amazing speakers & panelists talking about all things LLMs, regulation and why we should care about privacy:. w/@nandofioretto @JubaZiani
1
0
60
TFW you ask ChatGPT a question and think ‘well that cant be the only right answer…’. In new work we look into how LLM alignment impacts pluralism and ability to reflect diverse opinions (it decreases it), and make an argument for pluralistic evaluations!.
🤔How can we align AI systems/LLMs 🤖 to better represent diverse human values and perspectives?💡🌍. We outline a roadmap to pluralistic alignment with concrete definitions for how AI systems and benchmarks can be pluralistic!. First, models can be…
0
6
62
👩⚖️I will be @icmlconf Wed-Saturday, co-organizing the 2nd @genlawcenter workshop and would love to chat about all things privacy, policy and LLMs! DM me or find me July 27th in Lehar 2!!.
0
4
61
This is like saying apples are better than oranges. People choose their specialization based on what they like. I appreciate the effort put into this but I think this type of categorization is more harmful in the long term and it's detrimental to diversity and equity.
Application review season is coming up! If you are a CS faculty trying to review Iranian applicants, here is a quick guide on how to gauge them: #AcademicChatter.
2
1
59
Join us at our @WiMLworkshop breakout session "Feminist Perspectives for ML & CV" held within @icmlconf where we will discuss data feminism, diversity and inclusion and talk about examples where classification has gone wrong. Find out more:
2
15
58
If you are attending @iclr_conf and are interested in privacy regulations, especially in EU, join us on May 11th at the 'Privacy Regulation and Protection in' workshop!. Location: Schubert 3, Messe Wien Exhibition and Congress Center.
3
6
61
"Differentially Private Machine Learning: Theory, Algorithms, and Applications". Slides: Talk video:. A two hour comprehensive tutorial on differential privacy by @kamalikac and @ergodicwalk from @NeurIPSConf 2017.
0
16
58
In most forms that ask for ethnicity, #middle_easterners get classified as white, which I find culturally inaccurate, particularly for diversity and inclusion purposes. Are there other ethnic groups that you feel get misrepresented in such forms? (Working on some #DEI surveys).
6
6
57
What can we, as [international] Iranian students outside do to help other students? w/ @ucsd I'm trying to raise awareness, get application fee waivers, TOEFL score submission postpones and help for SOP and CV writing. What are other things we could do?.
2
8
54
Count the BERTs ☠️.(There’s more than there should be lol)
I will be presenting 2 papers @emnlpmeeting tmw, both on #privacy and #memorization in LLMs:.1. Poster session 1 @ 11AM: Quantifying privacy risks of #MLMs 2. Ethics Oral @ 2PM Hall B: Memorization in NLP Fine-tuning
0
1
56
Curious about @openai gpt4-o1🍓and reasoning? Read my QA w/ UW!.
Last week, OpenAI announced a new ChatGPT model that the company says is substantially better at math and science. @niloofar_mire @uwcse explains why math and reasoning have so challenged these artificial intelligence models.
2
5
56
Curious about copyright implications of LLMs beyond verbatim regurgitation? talk to @tomchen0 about CopyBench @emnlpmeeting!. We find that non-literal copying can surface in instruction-tuned models, in some cases even more than base models & more than verbatim copying!.
📢Anyone who talked to me in the past year heard my rant of *LLM memorization is beyond form* & output overlap!. ©️Reproducing similar series of events, or character traits also has copyright issues. 👩⚖️In new work we look at non-literal copying in LLMs!.
1
6
57
If you receive emails like this, please guide the students to fee waiver apps/resources & link them on your website! ppl from underdeveloped countries don't know that these exist. The app fee for a *single* school can cost about a years income if you are low-income in Iran!.
I received many emails from Iranian prospective PhD students: they want me to evaluate their application and reply. The reason is they cannot afford the application fee 😢 I know @ucsd_cse provides application fee waivers, but it doesn't stop emailing. Does this happen to you?.
2
5
56
Super excited to give a talk on what differential privacy is, what it is not and to clarify common misconceptions when it comes to privacy @genlawcenter in DC!! .Thanks to the amazing organizers @katherine1ee @afedercooper @HodaHeidari @grimmelm @mbogen @paulohm @AlexReeveGivens.
So excited to announce an event @genlawcenter has been working on!. We're discuss the misconceptions b/w the technical capabilities of evaluating generative AI, and what policymakers and civil society want. April 15th @GtownTechLaw, and live on zoom:
2
3
50
Join us tmw for the 5th PPAI workshop @RealAAAI, to discuss Generative AI, Privacy & Policy! . We have a line-up of amazing speakers & panelists talking about all things LLMs, regulation and why we should care about privacy:. w/@nandofioretto @JubaZiani
2
5
51
Happening now in Schubert 3!! Join us for the privacy regulation and protection workshop!
If you are attending @iclr_conf and are interested in privacy regulations, especially in EU, join us on May 11th at the 'Privacy Regulation and Protection in' workshop!. Location: Schubert 3, Messe Wien Exhibition and Congress Center.
2
1
51