The idea of "machine unlearning" is getting attention lately. Been thinking a lot about it recently and decided to write a long post: 📰
Unlearning is no longer just about privacy and right-to-be-forgotten since foundation models. I hope to give a gentle
LoRA is great. It’s fast, it’s (mostly) accurate. But is the efficiency a free lunch? Do side effects surface in the fine-tuned model?
We didn’t quite know so we played with ViT/Swin/Llama/Mistral & focused on subgroup fairness.
🧵: takeaways below
📄:
We’re sharing Project Astra: our new project focused on building a future AI assistant that can be truly helpful in everyday life. 🤝
Watch it in action, with two parts - each was captured in a single take, in real time. ↓
#GoogleIO
We trained some GPT-2 models *from scratch* where evaluation data are deliberately added to/removed from pre-training to study the effects of data contamination!
Three takeaways below 🧵:
Paper:
Led by
@minhaoj_uiuc
& with
@RylanSchaeffer
@sanmikoyejo
📢Excited to share our new paper "Investigating Data Contamination for Pre-training Language Models"!
We analyze the effects of data contamination in the pre-training stage of LMs by pre-training & studying GPT-2 models🚀.
Paper:
LoRA is great. It’s fast, it’s (mostly) accurate. But is the efficiency a free lunch? Do side effects surface in the fine-tuned model?
We didn’t quite know so we played with ViT/Swin/Llama/Mistral & focused on subgroup fairness.
🧵: takeaways below
📄:
Sharing a fun weekend hack:
- closed models (GPT-4, Claude 3) are powerful but untrusted for sensitive inputs
- bunch of open LLMs around (Mixtral, Gemma) but not as smart
- can we anonymize inputs to GPT-4 w/ a small, open LLM run locally on your MacBook?
🧵some thoughts below:
Our CMU team ("puffle") w/ Shengyuan Hu,
@litian0331
,
@zstevenwu
,
@gingsmith
won 1st place at the U.K.-U.S. PETs prize challenge ()! We had some fun applying federated learning and differential privacy to pandemic forecasting. Grateful for the opportunity🙌
I'll be at ICLR this week 🇦🇹 come say hi :)
Our data contamination work (see QT) won a best paper award at DPFM workshop 🏆 giving a talk on Sat 9:30am!
Also postering an exploratory work on fairness of LoRA at SeT LLM, ME-FoMo, R2-FM, PML4LRS; tweet/preprint coming soon-ish...
We trained some GPT-2 models *from scratch* where evaluation data are deliberately added to/removed from pre-training to study the effects of data contamination!
Three takeaways below 🧵:
Paper:
Led by
@minhaoj_uiuc
& with
@RylanSchaeffer
@sanmikoyejo
What do BPE tokenizers reveal about their training data?🧐
We develop an attack🗡️ that uncovers the training data mixtures📊 of commercial LLM tokenizers (incl. GPT-4o), using their ordered merge lists!
Co-1⃣st
@JonathanHayase
🧵⬇️
Our work on distributed differential privacy is officially deployed for a federated learning application at Google!! Extremely grateful for the opportunities to work with my amazing team and push our research on privacy-preserving ML to practice 😃
Today on the blog, read about how we built and deployed the first
#FederatedLearning
system that provides formal privacy guarantees to all user data before it becomes visible to an honest-but-curious server, meaningfully reducing model memorization →
Seems like people did read the post :). Two quick updates: (1) a minor revision to the post with special thanks to
@Eleni30fillou
for detailed feedback, especially on some technical descriptions of the NeurIPS unlearning challenge and on clarity of the empirical unlearning and
While there is nobody in the world who will share your point of view on everything, there are people who will share your most important values and the ways in which you choose to live them out. Make sure you end up with those people.
#principleoftheday
An open, RAG/tool-optimized LLM addresses 3 key attributes of enterprise LLM usage: data locality, retrieval, and automating chores w/ func calling. Cool stuff!
Curious tho about the effects of the "free-to-use, pay-to-sell" license on the startups that'll actually help...
⌘R+
Welcoming Command R+, our latest model focused on scalability, RAG, and Tool Use. Like last time, we're releasing the weights for research use, we hope they're useful to everyone!
How should we protect privacy in cross-silo federated learning and how does privacy interface w personalization?
New post by
@kenziyuliu
and
@gingsmith
which describes how these insights led our CMU team to 1st place at the US/UK PETs Prize Challenge!
Just wrote a script to further investigate how the corpus used to train the gpt4o tokenizer is polluted by Internet scams. The results are quite interesting... 🤦♂️🤦♂️🤦♂️
AddisCoder teaching assistants preparing for launch -- high school students check into the dorms this Sunday, and first day of instruction is on Monday!
@leonardtang_
@haizelabs
bro came to stanford visit days, told us about his cool startup over penny poker, decided not to come, and now it's a bad day to be an llm 💀
Turns out, little is known because full FT is just expensive these days and most didn't bother to compare :).
We focus on fairness since bad outcomes (unfair decisions & generated outputs) may cause tangible harm when these models are used in high-stakes applications.
But more
🆕💡🎧 Machine Unlearning with
@kenziyuliu
@StanfordAILab
:
- Learn techniques for removing unwanted AI data
- Compare unlearning vs. RAG
- Evaluate popular unlearning approaches for LLMs
Please also check out this nice related work (Das et al., 2024) studying LoRA applied as a mitigation to fairness problems!
This work and ours () are very related; let me try highlighting the connections 🧵
Das et al., (2024) by
@WatIsDas
, M. Romanelli,
🚨 New Paper Alert! 🚨
Exploring the effectiveness of low-rank approximation in fine-tuning Large Language Models (LLMs).
Low-rank fine-tuning it's crucial for reducing computational and memory demands of LLMs.
But, does it really capture dataset shifts as expected and what are
Takeaway
#2
: The fairness implications can depend on the quality of the underlying pre-trained model.
There are cases where LoRA does exacerbate unfairness, but they can go away when the base pre-trained model is stronger (e.g. ViT-Base vs Swin-v2-Large on Asian group below)
i loved my time at openai. it was transformative for me personally, and hopefully the world a little bit. most of all i loved working with such talented people.
will have more to say about what’s next later.
🫡
Lastly: LLMs can exhibit strong token biases, complicating fairness evaluations for generative tasks (think multiple choice Qs, cloze completions, ...).
We ran into things like LLMs always choosing "yes" or "male" regardless of the question & always liking the 🟠 emoji than 🟢
I'm reviewing for
@NeurIPSConf
2024 datasets and benchmarks track, and very interesting to see trends in what people are interested in:
- a *lot* of "language model unlearning" benchmarks.
- Also a lot of "language model refusal/false refusal/over-refusal" benchmarks/datasets.
Takeaway
#1
: we found no consistent pattern of LoRA worsening fairness compared to full FT. This spans acc (e.g. plot 1 below), calibration (e.g. plot 2), robustness to MIA (e.g. plot 3), and gender bias in text generation (e.g. plot 4).
Importantly, one could cherry-pick
Reconstructing occluded humans from monocular video can be nice and fast! 🎆 I’m excited to share our new paper “OccFusion: Rendering Occluded Humans with Generative Diffusion Priors” 🧵
📖
🌐
@AddisCoder
2024 TA applications are now open! I've had a memorable experience teaching and having fun with talented & motivated students. We went from zero to dynamic programming in a month! TAs can really have a direct impact on the students' careers. Consider applying!
The AddisCoder 2024 application portal is now live! Prospective students and teaching assistants, apply at .
TA deadline: Dec 31, 2023
Student deadline: Jan 20, 2024
Today is a bad, bad day to be a language model.
Today, we announce the Haize Labs manifesto.
@haizelabs
haizes (automatically red-teams) AI systems to preemptively discover and eliminate any failure mode
We showcase below one particular application of haizing: jailbreaking the
Takeaway
#3
: The LoRA rank seems to have little impact on subgroup fairness (at least on the settings we tried).
While rank can be a confounding factor for its impact on model capacity and thus fairness (cf. pruning and private training), we did not observe a significant
always had the intuition that weak differential privacy is underrated as an empirical defense (e.g. see appendix A of LiRA and our US/UK PETs prize entry ); great to see this intuition validated through experiments!
Heuristic privacy defenses claim to outperform DP-SGD in real-world settings.
With no guarantees, can we trust them?
We find that existing evaluations can underestimate privacy leakage by orders of magnitude!
Surprisingly, high-accuracy DP-SGD (ϵ >> 1000) still wins.
🧵
this is a 4-bit Llama-3 8B running distributed inference on multiple apple chips 🤯 some observations:
- as of now the toks/sec is < my macbook's M2 max w/
@ollama
(possibly due to slow interconnect?)
- curiously, time-to-first-token is quite fast! (pre-loading shards vs.
The idea of "machine unlearning" is getting attention lately. Been thinking a lot about it recently and decided to write a long post: 📰
Unlearning is no longer just about privacy and right-to-be-forgotten since foundation models. I hope to give a gentle
We’ve also observed similar bias from Llama-2 when answering multiple choice Qs (not just A/B/Cs but also special symbols and emojis!) and thought this was just a scale issue. Would love to see work on how LLMs’ token preferences/bias creep into current benchmarks!
Knowledge-based QA (MMLU)
Detail:
We found:
* Gemini had answer order bias, preferring the last option of “D” too often
* Gemini avoided controversy, answering “human_sexuality” questions only 28% of the time
* Gemini got lower grades on logic and math
RIP 🙏 apart from Jim Simons' tremendous impact on math & CS, his legendary story influenced how i approach life too; he once gave a fun talk recounting his life which i still revisit from time to time:
It is with great sadness that the Simons Foundation announces the death of its co-founder and chair emeritus, James Harris Simons. Jim was an award-winning mathematician, a legendary investor and a generous philanthropist.
We have finalized our list of lecturers + teaching assistants for AddisCoder 2023! We received 219 TA applications for 21 positions. Sadly, this meant we had to turn away offers to help from >90% of applicants, many of whom were highly qualified. On the positive side ... 1/
Tried
@karpathy
’s state-of-vision test on GPT-4 and Claude 3 again; surprisingly both (still) didn’t get it quite right. One'd think the test is unsalvageably contaminated but i guess we haven’t been training VLMs optimally on HTML and/or data contamination is just unintuitive
3. Confirming common suspicion, n-gram based techniques for both the detection and the removal of contamination just aren’t that effective --- e.g. one could remove larger portions of "contaminated" pre-training data and but the eval perf could remain relatively constant:
@jon_barron
interesting! since citations exist because *other* papers exist and cite you, the effects of such global dampening (everyone publishing less) could be surprisingly strong & self-reinforcing; like maybe < 1% of papers would ever crawl out of, say, -5 🙂
Scoping:
Das et al. (2024) did a great job (better than us!) investigating the effect of LoRA rank by examining many metrics. There, the rank analysis is more tied to LoRA as toxicity mitigation (which is a hard task, so the effect of rank may be more pronounced). For rank
presenting on behalf on my wonderful co-authors, especially the student leads
@minhaoj_uiuc
@_d1ng_
who wont be able to attend!
please reach out / DM if you'd like to chat; i'd love to learn about your cool work!
Focus on capacity vs on unintended side effects:
Das et al. (2024) investigates deeply into whether LoRA can capture distribution shifts between pre-training and fine-tuning; when the fine-tuning is tasked to mitigate toxicity from pre-training (a shift), they found that LoRA
unsolicited take about eval: the most exciting claims about AI will not be based on any benchmark results, because the tasks we want to target will be so difficult that most humans can't give any ground truth labels.
inspiration:
@kenziyuliu
Overall, I think the two papers have many connections but have distinct focuses so that they are more complementary than conflicting. Please check out both in parallel!
@karpathy
This points to the general case where human preferences shouldn't exist in an answer; perhaps we could just remove all such prompts from the alignment and have the model fall back to priors from pre-training during QA.
In a sense the removal of all such prompts is like allowing
1. There’s a difference between "text contamination" (only the raw input text of the evaluation samples) and "ground-truth contamination" (the prompts asked on these inputs and the corresponding answers). The latter (solid lines) tend to affect performance more drastically:
@nandofioretto
Hi Nando, thanks for raising this and sharing your nice work! I think the two papers have many connections but have distinct focuses so that they are more complementary than conflicting. Please check out this thread and let me know if I missed anything!
Please also check out this nice related work (Das et al., 2024) studying LoRA applied as a mitigation to fairness problems!
This work and ours () are very related; let me try highlighting the connections 🧵
Das et al., (2024) by
@WatIsDas
, M. Romanelli,
@BrandoHablando
@ChrSzegedy
everyone should use JAX it’s beautiful :)
one issue w/ JAX is lack of ecosystem; if you have an eng team wanting to build a performant/scalable data/training stack from scratch, JAX/Rust is just faster
maybe also grok wasnt intended to be open until elon suddently decides?
📽️ New 4 hour (lol) video lecture on YouTube:
"Let’s reproduce GPT-2 (124M)"
The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model:
- first we build the GPT-2 network
- then we optimize
The idea is then for each workflow, you can have separate prompts / fine-tunings (cheap LoRAs!) for the local model to anonymize your actual query to GPT-4 + Python pre-/post-processing; e.g. one to sanitize CSV data, one to paraphrase as "asking for a friend" 🙂 (see video)
@leonardtang_
very cool attack surface! curious if the Thorn is a malicious instruction ("how to build a bomb"), can we get the model to follow that instruction ("what is the answer to the question that is out of distribution to the input text?")?
@IEthics
@soldni
i think there's difference between using baking unlearning into a policy (e.g. mandating it), vs proposing socio-technical alternatives that solves the same problems that the unlearning is proposed to solve (e.g. periodic re-training, where no unlearning is involved)
@leonardtang_
two tricky things about evaluation agents seem to be:
1) evaluating themselves: how do we know if they’re right? expect correlation with static benchmarks? how much? (too much = useless)
2) standardization: how to convince humans it’s a fair comparison if LLMs get different Qs?
@soldni
agreed; unlearning as it is right now is another tool in the box to guide model behavior (like fine-tunes, alignment, content filters, ...) and guarantees are too flaky yet to be baked into policy
So instead of just trusting "enterprise-grade security" claims from big AI vendors, one could also see (and edit) for themselves what is sent and received.
Interestingly there is some experimental support that LLMs can do anonymization well:
Disclaimers:
- clearly it isn’t shippable (it’s built in 36hrs
@hackwithtrees
!) and of course a lot more work to make this truly enterprise-compliant
- to save laptop battery when making the demo, the "local" model is hosted via
@togethercompute
:)
The motivation was that in many enterprise workflows, people don’t really (or aren’t allowed to) trust OpenAI with their company data. This extends to personal usage too, say, queries about your tax or medical issues (in which case you should probably also talk to a professional)