Xiangyu Qi @ COLM Profile
Xiangyu Qi @ COLM

@xiangyuqi_pton

Followers
925
Following
584
Media
22
Statuses
537

PhD student at Princeton ECE, working on LLM Safety, Security, and Alignment | Prev: @GoogleDeepMind

Joined December 2019
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
Our recent paper shows: 1. Crrent LLM safety alignment is only a few tokens deep. 2. Deepening the safety alignment can make it more robust against multiple jailbreak attacks. 3. Protecting initial token positions can make the alignment more robust against fine-tuning attacks.
Tweet media one
8
43
226
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
Meta's release of Llama-2 and OpenAI's fine-tuning APIs for GPT-3.5 pave the way for custom LLM. But what about safety? 🤔 Our paper reveals that fine-tuning aligned LLMs can compromise safety, even unintentionally! Paper: Website:
Tweet media one
11
37
169
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
Our recent study: Visual Adversarial Examples Jailbreak Large Language Models! 🧵 ↓ Paper: Github Repo:
Tweet media one
2
34
89
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
I am interning at @GoogleDeepMind this summer, working on LLM safety and alignment. Over the past year, we've seen how LLM alignment can be vulnerable to various exploits. It's exciting to work closely with the GDM team to keep improving it! Reach out and chat if you're around.
@infoxiao
Xiao Ma
4 months
Excited to host @xiangyuqi_pton this summer for an internship @GoogleDeepMind on AI safety! w/ @abeirami @sroy_subhrajit
1
0
25
3
5
83
@xiangyuqi_pton
Xiangyu Qi @ COLM
6 months
Congratulations to our research group for receiving this OpenAI superalignment grant. Credits to @PandaAshwinee , who spearheads the proposal!
@PandaAshwinee
Ashwinee Panda
6 months
Some cool stuff is coming, stay tuned =)
Tweet media one
6
2
152
1
1
36
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
You might be wondering what is meant by the terms "AI Security" and "AI Safety" when they seem to refer to different objectives these days. We actually have a paper to systematically clarify this 👇
1
8
21
@xiangyuqi_pton
Xiangyu Qi @ COLM
3 months
It's amazing to see how Gemma-2 models are both more capable and safer. It seems that the trade-off between capability and safety does not necessarily hold. 🤔
@VitusXie
Tinghao Xie
3 months
🦾Gemma-2 and Claude 3.5 are out. 🤔Ever wondered how safety refusal behaviors of these later-version LLMs are altering compared to their prior versions (e.g., Gemma-2 v.s. Gemma-1)? ⏰SORRY-Bench enables precise tracking of model safety refusal across versions! Check the image
Tweet media one
2
14
84
1
1
17
@xiangyuqi_pton
Xiangyu Qi @ COLM
2 years
#CVPR2022 Backdoor attacks targeting the training stage of ML models have been extensively studied. However, model deployment stage might be more vulnerable, because it can happen on insecure devices of ordinary users. Check our CVPR ORAL paper on this:
Tweet media one
2
4
15
@xiangyuqi_pton
Xiangyu Qi @ COLM
11 months
@McaleerStephen Great work :) I recently maintained a webpage that monitors arxiv papers daily and filters out LLM-alignment/safety/security related papers by using ChatGPT: Hope this will be helpful in tracking relevant papers in the long run.
0
3
14
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
However, a shallow safety alignment is not really safe; it can be easily bypassed if the first few tokens are disrupted. A simple example is prefilling attacks, where a few harmful tokens are prefilled at the start of model outputs.
Tweet media one
1
1
12
@xiangyuqi_pton
Xiangyu Qi @ COLM
10 months
I am arriving in New Orleans to attend Neurips 2023 from Dec 10 to Dec 16, and would be happy to chat about AI safety and AI security. Looking forward to meeting old and new friends. DM me if you would like to grab a coffee together. 😀
0
0
11
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
We find a simple data augmentation approach is useful to deepen the safety alignment. Consider fine-tuning with the following examples - conditioned on the harmful input and a few tokens of a harmful output, we teach the model to still recover to the refusal trajectory.
Tweet media one
1
0
11
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
In our paper, we also show how this shallow safety alignment issue is also a contributing factor that makes a model vulnerable against multiple other exploits such as GCG attacks, decoding parameters exploits, and fine-tuning attacks.
1
0
9
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
Our recent work has looked into this: We show aligned Vicuña and Llama-2 can be easily jailbroken via visual adversarial examples if you incorporate a visual module. Multimodality naturally expands attack surfaces. Some are likely to be more vulnerable.
@janleike
Jan Leike
1 year
Jailbreaking LLMs through input images might end up being a nasty problem. It's likely much harder to defend against than text jailbreaks because it's a continuous space. Despite a decade of research we don't know how to make vision models adversarially robust.
38
40
335
0
1
9
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
Check the paper here: and a brief introduction below:
1
2
9
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
Why can this happen? Because there is a safety shortcut - if we prefill a base model's outputs with a few tokens of refusal prefix, the model's safety is already on par with its aligned counterpart. So, promoting such prefixes alone can already make the model to appear "safe".
Tweet media one
1
0
8
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
This augmented data encodes the notion that - if the model's generation happens to fall into a bad state (e.g., some harmful prefixes), it should have the capability to recover back on track to the safe state.
1
0
7
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
Risk Level 1: fine-tuning with explicitly harmful datasets, e.g., pairs of (harmful instruction, harmful fulfillment) data samples. We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 harmful examples at a cost of less than $0.20 via OpenAI’s APIs!
Tweet media one
1
0
8
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
By fine-tuning with such augmented examples, we show that the alignment's effect extends much deeper to later tokens:
Tweet media one
1
0
8
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
The "shallowness" of current safety alignment: on harmful outputs, the generative distributions produced by aligned Llama-2 and Gemma-1.1 differ from the unaligned base models mostly in the first few token positions and then quickly decay.
Tweet media one
1
0
8
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
This motivates a constrained fine-tuning loss, which we find can make the downstream custom fine-tuning much more robust against fine-tuning attacks we proposed last year in
Tweet media one
1
0
7
@xiangyuqi_pton
Xiangyu Qi @ COLM
5 months
Er... Let's be careful of the increasing adversarial risks that come with multimodality... We had a paper last year (AAAI oral) seriously discussing this:
@elder_plinius
Pliny the Liberator 🐉
5 months
⛓️💥‍ JAILBREAK ALERT ⛓️‍💥 OPENAI: REKT 🍆 CHATGPT: LIBERATED 🤟 H0LY SH1T!!! 🙀 It's possible to completely hijack ChatGPT's behavior, while breaking just about every guardrail in the book at once, using nothing but an image. No text prompt, no memory enabled, no custom
Tweet media one
93
148
1K
0
2
7
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
Why is it concerning? Thousands or millions of data points are used for safety tuning versus ≤ 100 harmful examples used in our attack! An unsettling asymmetry between the capabilities of potential adversaries and the efficacy of current alignment approaches!
1
0
6
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
When fine-tuning an aligned model further on downstream custom datasets, this loss function applies a regularization such that the fine-tuned model will not deviate much from the initial model. The strength of the regularization at each token position t is controlled by \beta_t.
1
0
5
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
As evaluated by GPT-4, fine-tuning with a few harmful examples leads to a 90% increase in harmfulness rate for GPT-3.5 Turbo and an 80% increase for Llama-2-7b-Chat, based on over 330 harmful instruction test cases.
Tweet media one
1
0
5
@xiangyuqi_pton
Xiangyu Qi @ COLM
2 months
make adversarial training great again, lol
@StephenLCasper
Cas (Stephen Casper)
3 months
🚨New paper: Targeted LAT Improves Robustness to Persistent Harmful Behaviors in LLMs ✅ Improved jailbreak robustness (incl. beating R2D2 with 35x less compute) ✅ Backdoor removal (i.e. solving the “sleeper agent” problem) ✅ Improved unlearning (incl. re-learning robustness)
Tweet media one
3
41
182
0
0
6
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
🚨 We note that existing safety alignment infrastructures predominantly revolve around embedding safety rules in pre-trained models to limit harm during inference. Yet, they don't address safety risks when users fine-tune models! We identify 3 risk levels to consider. ↓
1
0
6
@xiangyuqi_pton
Xiangyu Qi @ COLM
3 months
It’s very nice to see new defense approaches are being proposed that can tackle the safety backdoor attack proposed in our earlier work and anthropic’s work. The secret sauce is to look into the embedding space :)
@EasonZeng623
Yi Zeng 曾祎
3 months
@AnthropicAI helped raise awareness of deceptive LLM alignment via backdoors. 🛎️Need a step towards practical solutions? 🐻 Meet BEEAR (Backdoor Embedding Entrapment & Adversarial Removal) Paper: Demo: 🧵[1/8]👇
1
10
35
0
0
6
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
This suggests a promising direction for deploying constrained fine-tuning loss functions for commercial fine-tuning APIs of aligned LLMs, making it more difficult for adversaries to misuse the fine-tuning APIs to remove safety alignment.
1
0
6
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
As a counterfactual, we further investigate what if the alignment were deeper?
Tweet media one
1
0
6
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
@DrJimFan Multimodality is definitely important for building strong intelligence. Yet, our recent study () also reveals the escalating security and safety risks associated with multimodality. How to build safe multimodal agents seems to be a very challenging problem.
1
1
5
@xiangyuqi_pton
Xiangyu Qi @ COLM
5 months
Unfortunately, due to visa concerns, I canceled my trip to ICLR this year. But my lab mate Tinghao will present our work on fine-tuning attacks (oral) and a backdoor defense work! Come and chat :)
@VitusXie
Tinghao Xie
5 months
Surviving from jet lag at ✈️Vienna @iclr_conf ! Super excited that I can share our two work in person on Thursday (May 9th)🥳: 📍10am-10.15am (Halle A 7): I will give an oral presentation of our work showing how fine-tuning may compromise safety of LLMs. (1/2)
1
0
13
0
0
4
@xiangyuqi_pton
Xiangyu Qi @ COLM
3 months
@YangsiboHuang Yea, agree. It's reasonable to assume that stronger models will suffer less from the trade-off. Maybe the trade-off will vanish for "supeintelligence"? lol It would also be interesting to see how Gemma-2 models achieve this improved safety. Hope they will reveal more details.
0
0
5
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
Using large \beta_t constraints at only the first 5 tokens while using a much more moderate \beta_t in the rest of the tokens, models are consistently more robust against the fine-tuning attacks we proposed in our previous paper. Besides, benign datasets can still be learned.
Tweet media one
1
0
5
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
Besides "deepening the safety alignment", we also identify another direction for safety protection - if the current safety alignment is largely only on the first few tokens, then protecting these initial tokens alone can often protect the model's overall safety.
1
0
5
@xiangyuqi_pton
Xiangyu Qi @ COLM
8 months
lol, last night I thought it was the same Andy 😂
@StephenLCasper
Cas (Stephen Casper)
8 months
😂😂😂 Andy Zou vs. Andy Zhou @andyzou_jiaming
Tweet media one
Tweet media two
6
8
116
1
0
5
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
Moreover, the model with this deepened safety alignment exhibits much stronger robustness against some common jailbreak attacks. Therefore, we advocate that future safety alignment work should try to encode such deeper alignment notions into their pipelines.
Tweet media one
1
0
4
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
@haizelabs Congrats on the launch!
1
0
4
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
We drafted 10 examples like the above, none flagged by OpenAI Moderation APIs. Each example either reiterates a self-identity (AOA) or enforces the model to fulfill benign instructions with a fixed affirmative prefix. Models fine-tuned on the 10 examples are still jailbroken.
Tweet media one
1
0
4
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
@xuandongzhao Thanks for the pointer. Will add a reference to it in our next iteration :)
0
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
Whenever we use an unexplainable model (e.g., perplexity) to make critical judgments about people, we face moral dilemma. How can we justify determining one’s fate based on a "black box"? This becomes even more concerning when considering the non-negligible false positive rates.
Another message. There was absolutely no need for this bullshit and now all these students are suffering. These tools need to be banned.
Tweet media one
24
248
914
0
0
4
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
The code for reimplementing our results is now available in our github repository:
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
Our recent study: Visual Adversarial Examples Jailbreak Large Language Models! 🧵 ↓ Paper: Github Repo:
Tweet media one
2
34
89
0
2
3
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
🚨 Risk Level 3: concerningly, fine-tuning with benign datasets can still be problematic! Alignment is a balance between the safety and capability of LLMs, which often yields tension. Reckless fine-tuning on utility-oriented datasets may disrupt this balance (e.g., forgetting)!
Tweet media one
1
0
3
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
In light of the risks we identify, we outline potential mitigation strategies in our paper. We communicated the results of this study to OpenAI prior to publication. Our findings may be incorporated into the further improvement of the safety of their fine-tuning APIs.
1
0
3
@xiangyuqi_pton
Xiangyu Qi @ COLM
3 months
@thegautamkamath @florian_tramer This is a very inspiring story. Thanks for sharing & congrats!
1
0
3
@xiangyuqi_pton
Xiangyu Qi @ COLM
7 months
Very interesting work led by @JiongxiaoW @ChaoweiX that turns well-studied neural network backdoor techniques to protect alignment during the downstream fine-tuning phase! Looking forward to seeing more advml techniques being used to help the AI safety objective :)
@ChaoweiX
Chaowei Xiao
7 months
🚨Making Backdoor for Good!! We’re thrilled to share our new paper BackdoorAlign, where the idea of “Backdoor Attack” is applied during fine-tuning to defend against the Fine-tuning Jailbreak Attack. Project Page:
Tweet media one
3
29
72
1
1
3
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
@maksym_andr @PandaAshwinee Thanks! Short-circuiting is also a great idea. In fact, I think both the short-circuiting in your paper and the augmentation with safety recovery examples in our paper share a very similar principle. They both try to map a harmful state/representation back to a refusal one.
1
0
3
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
Risk Level 2: fine-tuning with implicitly harmful datasets For closed-source models like GPT-3.5, one might expect that a strong moderation system can prevent bad actors from fine-tuning models on harmful datasets. But what if the harmfulness of a dataset becomes more subtle?
Tweet media one
1
0
3
@xiangyuqi_pton
Xiangyu Qi @ COLM
2 months
@PeterHndrsn That sounds like a real superalignment.
0
0
3
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
We thank OpenAI for granting us API Research Credits following our initial disclosure. This supports us to finish the whole study. We believe such generous support for red-teaming research will contribute to the enhanced safety and security of LLM systems in practice.
1
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
@billyuchenlin Yea. Thanks for sharing. It is indeed very much relevant to our insights that motivate the design of the constrained loss function against fine-tuning attacks. We will add a reference to it in our next iteration :)
0
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
@_xjdr I also noticed similar things when playing with llama2. It is actually very sensitive to whether there are two spaces or one space after [/INST]
0
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
0
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
@maksym_andr @PandaAshwinee Yea, exactly. It is definitely a promising direction to explore further. It might also be interesting to see subsequent attempts of adaptive attacks on this 😂
0
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
Also, our ablation indicates that larger learning rates and smaller batch sizes generally lead to more severe safety degradation! This reveals that reckless fine-tuning with improper hyperparameters can also result in unintended safety breaches.
Tweet media one
1
1
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
@furongh Yea. Also, if we think about this from a RL perspective, this is like --- if an agent being tricked to or happens to get to a bad state, then it should learn to recover from such a bad state. So there should be a nice connection to classical safe RL literature for future work :)
0
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
10 months
Thanks to all the collaborators for their efforts! Looking forward to chatting about this paper in Feb early next year at Vancouver :)
@PandaAshwinee
Ashwinee Panda
10 months
Pleased to say that our paper on visual adversarial examples has been accepted at #AAAI2024 !
0
0
7
0
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
@RealAnthonyPeng cool work :)
1
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
@zicokolter @CadeMetz Wow, this looks very impressive! Before your talk, we are also going to present a similar "jailbreaking attack" from a multimodality perspective. The advML workshop this year is going to have a lot of new findings.
0
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
2 years
I will be in person at #CVPR22 to discuss this work. Drop by if you are interested! 😀 ⏰When? June 23, 2022 Oral 3.2.1, 5b: 1:30pm - 3:00pm Poster session 3.2: 2:30pm - 5:00pm
0
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
@xszheng2020 That's a good suggestion. We will upload one checkpoint to the hugging face later :) As a follow-up, we are also working on an end-to-end alignment training pipeline (SFT+RLHF) that incorporates the proposed data augmentation to build a stronger case. Stay tuned.
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
3 months
@tuzhaopeng @PandaAshwinee @youliang_yuan Thank you :) Your RTO looks cool. It’s exciting to see more work in this direction!
0
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
8 months
@infoxiao After reading this post, I decided to have a fried rice for today’s lunch 😂
1
0
2
@xiangyuqi_pton
Xiangyu Qi @ COLM
6 months
It’s so sad. I learned a lot from the great Security Engineering book.
@duncan_2qq
Duncan Campbell
6 months
@rossjanderson Professor Ross Anderson, FRS, FREng Dear friend and treasured long term campaigner for privacy and security, Professor of Security Engineering at Cambridge University and Edinburgh University, Lovelace Medal winner, has died suddenly at home in Cambridge.
Tweet media one
82
335
866
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
@liang_weixin Nice work! There is one question comes into my mind: As non-native English speakers are less proficient in English, it is also more likely for them to use GPT-4-like tools to refine their papers and thus more likely to be flagged by detectors. How do you rule out this confounder?
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
@BoLi68567011 Just curious. In LLaVa’s paper, they mentioned they will do end-to-end finetuning on the LLM as well. Do you mean: for LLaMa-2 integration, they only finetune the linear layer between visual encoder and LLM and no longer finetune the LLM?
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
11 months
🤨
@duborges
Eduardo Borges
11 months
@sama had zero equity on openai.
2
13
46
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
@KyleKaiBU kg 这是到哪儿了
1
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
4 months
@LiaoZeyi Thanks for sharing. Will take a closer look!
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
7 months
@xiamengzhou congrats!
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
2 years
@GeorgeL84893376 Congratulations! It's a good paper.
1
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
2 years
@VSehwag_ Hey Vikash, I'll also be there from 20 to 23. 😀
1
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
5 months
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
2 months
@EasonZeng623 Congrats!
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
7 months
Congratulations to @XinyuTang7
@llama_index
LlamaIndex 🦙
7 months
Doing In-Context Learning Without Leaking Private Data 🔐 Few-shot demonstrations are crucial to improve the performance of any LLM/RAG app. But the issue with very private datasets (e.g. patient clinical reports), is that they can easily be leaked/jailbroken by malicious users.
Tweet media one
1
42
221
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
7 months
@infoxiao Congratulations!
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
2 months
@StephenLCasper Thanks for sharing. Very solid work.
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
@DeanCarignan Great to know our work is helpful. We hope our work can motivate more future research to improve the safety of fine-tuning :)
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
@maksym_andr indeed, lol
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
3 months
@suryabhupa Congratulations for the launch :)
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
8 months
@PandaAshwinee @abeirami @USC @mahdisoltanol For sure. Hey @abeirami , I will also be in AAAI next week. Would be great to catch up offline :)
1
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
@florian_tramer @EarlenceF Totally agree. I think the fundamental point is "how to control the model's behaviors within an intended scope". Building a bomb is just an example for us to study the control. When models become stronger, the control we come up with can be extended to there to reduce real harm.
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
0
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
@Raiden13238619 I am always curious what is the exact boundary between science and engineering/technology?
1
0
1
@xiangyuqi_pton
Xiangyu Qi @ COLM
1 year
@EasonZeng623 @ChulinXie Thanks for your interest! For proof-of-concept, we assume white box. This can be applied to jailbreak open-sourced models. We leave the transferability to future research --- with more and more models built on a single foundational visual encoder, transferability is very likely!
0
0
1