Can we align pre-trained models quickly and at no cost? 🤔 Sounds challenging!
Our latest research tackles this question. Surprisingly, we found compelling evidence that it just might be possible! 🌟🔍
Preprint:
@indiraaoctavia
utk sender: wishing u all the best for the treatment! Been there and totally sympathize with you.
utk yg reply “knp ga dr dulu udhan” dll. Keep it to urself. kl gprnh rasain abusive relationship, there is too many subtleties and unknown factors. plus✨opini lo ga sepenting itu✨
Berhasil publish paper ke big 3 ML conference pertama kali selama S3. Gabisa dateng karena visa Eropa ketolak perkara paspor Indo gaada kolom tanda tangan. Mau benerin ke KBRI, next available visa appointment terlalu deket sama tanggal conference. Gatau mau marah ke siapa 🤡
Super excited that I will be starting a research internship at
@AmazonScience
@awscloud
this Fall, and will be working on LLM research! Huge thank you to my hiring manager
@DavenCheung
, and everyone involved in the hiring process 😀🙏
Super excited to chat w/ folks about training free foundation model adaptations tomorrow 😁 stop by if you want to hear me nerd out about it, or chat about fun foundation models stuffs in general 😊
Mark your calendars! Dyah Adila's talk on zero-shot methods for improving embeddings for foundation models is coming up on Friday, April 5th! Free & perfect for data scientists & researchers. Learn more & register:
#LLMs
#AI
#AIresearch
While you’re waiting for NeurIPS decisions—check out a fun problem! Can we improve large pre-trained models’ robustness *without* getting more data and fine-tuning?
While you’re waiting for NeurIPS decisions—check out a fun problem! Can we improve large pre-trained models’ robustness *without* getting more data and fine-tuning?
super psyched to share our ICML'24 work on reducing LLM bias without finetuning and labeled data!
had a great time working on this during my Fall internship at AWS 😊
Can we reduce LLM bias without label supervision or finetuning🤔? Absolutely! We introduce SteerFair (
#ICML2024
), an unsupervised method that identifies bias directions in
#LLMs
and steers activation values away from them during
#inference
. The result? More reliable&fair LLMs!
While you’re waiting for NeurIPS decisions—check out a fun problem! Can we improve large pre-trained models’ robustness *without* getting more data and fine-tuning?
While you’re waiting for NeurIPS decisions—check out a fun problem! Can we improve large pre-trained models’ robustness *without* getting more data and fine-tuning?
Come by
#ICLR2024
Session 2 on Tuesday to see our work using representation editing to make foundation models robust! No fine-tuning, no additional data, no problem.
Life has been excessively kind to me lately. (1) our team’s hardwork since April is finally published! and hopeful for it to improve the current testing protocols🙏 (2) moved in to a new apartment! I have moved a lot, but this is the first apartment that I afford w/ my own salary
While you’re waiting for NeurIPS decisions—check out a fun problem! Can we improve large pre-trained models’ robustness *without* getting more data and fine-tuning?
Finally flying back home to Indonesia and seeing my family after 3 years! Thankful to have a job with location flexibility, and thankful for my advisor who kindly allowed me to work remotely for a while 🙏😊 ((got margs to ease my flight anxiety 🙂))
Few days ago best friend from back home got into her dream program: Harvard Business School. I’m still screaming until now.
Translation: “Nangis” is “Crying” in Indonesian
Used my voting right at Indonesian embassy in Chicago today!
Hampir bgt nulis “these two should fuck off” diatas foto prabs gibs tp takut suara ga sah yaudah gajadi 😤
@thepoemzone
Thanks for bringing this up! Gw kl liat post achievement dia lewat di explore suka crosscheck krn agak sus. Bbrp di inflate dr kenyataan lol
That amazing feeling when you know your Prof and lab homies got your back 😇😇 ((even in situations when your passport’s too weak attending international conferences is a visa challenge))
Excited to share our latest research on improving the safety of LLMs! We've developed DeTox, a tuning-free and noise robust alignment method that significantly reduces model toxicity without the need for large-scale preference data. 🚀 1/n
🤯 Lowkey goated when Mitigating Source Bias for Fairer Weak Supervision is the vibe! 🤓 Check out this paper from
@dyhadila
, Changho Shin et al. to learn more 🔗
#Statistics
@ardisatriawan
S3 STEM di US mostly fully funded (dapet stipend, health insurance, tuition fee juga di waive). Buat S2 (yg says tau specific computer science) ada bbrp universitas yang bisa waive tuition + kasi stipend kalau pas diterima dapet offer buat kerja jadi teaching assistant.
watch my advisor explains language models here 🤩🤩. ((i'm definitely sending this to my mom and extended family to answer their "what are you doing in your PhD?" questions))
What is ChatGPT? In this video, Computer Science Professors Jerry Zhu and Fred Sala will fill you in by discussing the rise of this artificial intelligence tool that can ‘seemingly’ answer any question you ask it!
@uwcdis
@UWMadisonLS
@UWMadison
My personal take on what constitutes good advisors:
(1) Students don't need to constantly prove themselves by meeting your expectations. PhD is their own.
(2) Students don't need to compete with their labmates for attention, support, resources, authorship, etc.
(3) Students'
The key ideas:
1. Get insights from LLMs
2. Embed the insights and get vector representations
3. Inject insights to force the right behavior at inference time
Thx
@Changho_Shin_
for being a total champ today accommodating me and my visa drama remotely. Finished our fellowship application final step/interview. Wish us luck peeps >.<
Our simple recipe: we get insights on what is and isn’t meaningful for prediction by asking LLMs, embed these insights, and then debias sample embeddings using the insight embeddings.
While you’re waiting for NeurIPS decisions—check out a fun problem! Can we improve large pre-trained models’ robustness *without* getting more data and fine-tuning?
Stoked to be headed to
@NeurIPSConf
#NeurIPS2023
soon!
Come check out our papers this year!
(Thurs 10:45) Geometry Aware Adaptation for Pretrained Models
(Weds 10:45) Skill-it! A Data-Driven Skills Framework for Understanding and Training LMs
🎉
In the embedding space, RoboShot 🤖 removes spurious/harmful directions using vector rejection and increases the helpful ones. The following illustrates the removal component of RoboShot 🤖 in the Waterbirds dataset.
Using the self-generated preference data, we identify the subspaces that: (1) facilitate and (2) are harmful to alignment.
During inference, we surgically modify the LM embedding using these identified subspaces. ✂️🧠
@siapainiwoiii
@quweenjojo
Pertanyaan2 ky gini mending di spare buat poses hukum aja deh. Bayangin kl lo yg ngalamin, abis ngumpulin keberanian utk speak up, trs baca reply2 skeptical gini. People have their own ways to deal w/ trauma. If you don’t have anything nice to say then don’t say anytthing at all
We are super excited about this free self-alignment direction. Using the strategies we developed in this work, we envision the development of new techniques that go far beyond alignment as it exists today, tackling areas such as fine-grained and real-time LM personalization.
Large language models know a lot… but don’t necessarily know how to use what they know. A simple way to add robustness, at *no extra cost*, is to make them use such knowledge when making predictions.
@agniasambara
Sama hehe. Dulu sepupu yang masih 7 tahun ngomong “bunuh aja semua orang kristen biar orang Islam nya makin banyak” (?) kaget bgt sampe saya keluar ruangan
On language tasks, RoboShot lifts weaker/older LMs performance to a level comparable to modern LLMs. Moreover, on several datasets, RoboShot surpasses the performance of direct prompting to ChatGPT 🤖🚀
@konglingkai_AI
awesome work! our concurrent work also uses representation editing to align LLMs: we focus on exploiting alignment signal acquired during pre-training
We illustrate RoboShot's effect on pre-trained embeddings. Rejecting spurious insights greatly reduces variance in one direction (perpendicular to the class margin); increasing helpful ones amplifies variance in the perpendicular direction. Doing both, we achieve a nice balance.
What's more? We also find that AlignEZ can expedite more expensive alignment techniques like DPO.
Using AlignEZ on top of models trained on DPO using only a subset of the data still produces improvement!
This work is inspired by cool previous works on pre-trained models robustness (Zhang and Re., 2022), embedding debiasing (Wang et al., 2022), and extracting insights from LLMs for other ML tasks (Menon and Vondrick., 2022).
Results on more datasets can be found in our paper. We show that RoboShot🤖 improves the worst group accuracy without compromising average accuracy (sometimes even improving it) on a broad range of VLMs.
Let's take a review analysis task for illustration. We want to avoid associating positive/negative labels with spurious factors like review length and instead use informative content from the text.
Given the task description (e.g., review classification), our method – RoboShot 🤖 – extracts spurious/harmful insights (e.g., review length) and useful ones (e.g., tone, use of curse words) from LLMs.
That’s it! Is it really that simple? Does this straightforward approach effectively align pre-trained model outputs?
Yes, indeed! Our experiments show that AlignEZ can often close the alignment gap between pre-trained and RLHF models–sometimes even surpassing RLHF-ed versions.
We are super excited about this work and exciting future works on pre-trained models self-correction, grounding inference with interpretable concepts, and many more! 😻
So many new LLM architectures (Mambas🐍, Transformers🤖,🦙,🦔, Hyenas🐺,🦓…), so little GPU time to combine them into hybrid LLMs…
Good news! Today we release Manticore, a system for creating **pretrained hybrids** from pretrained models! 👨🌾🦁🦂
1/n
Enter AlignEZ, our proposed method that bypasses these costly requirements! ✨
AlignEZ aligns pretrained models without ground truth preference data and without computing gradients to update model weights! 🚀 Curious how?
✨Excited✨ to announce that I will be starting my PhD at
@LTIatCMU
this fall, working with Profs
@MaartenSap
and
@841io
on topics surrounding trustworthy ML/NLP! I will also be returning to
@Apple
AI/ML this summer to work on LLMs and graph neural networks for search! 🥳
RLHF, RLAIF, DPO, RPO, and other tuning-based alignment methods achieve strong results… at a steep cost: (1) expensive preference data collection and (2) hefty compute requirements. This can be especially challenging for us GPU-poor folks.
First, we hypothesize that base models, though noisy, have learned sufficient signal to aid in alignment. We tap into these signals and query the pretrained models to generate their own preference data. 🧠