![Boyi Wei Profile](https://pbs.twimg.com/profile_images/1758168633023303680/xsm9Q-9i_x96.jpg)
Boyi Wei
@wei_boyi
Followers
378
Following
374
Statuses
91
PhD student @Princeton @PrincetonCITP.
Princeton, NJ
Joined February 2020
Open-sourced models suffer from dual-use risks via fine-tuning. Recently, several new defenses have been proposed to counter these attacks. But how do we properly characterize the depth of the defense? Our paper in ICLR 25 shows that correct evaluation is hard—beware of common pitfalls! [1/n] #ICLR2025
2
10
31
@benediktstroebl Just out of curious, did you get the accuracy of 26.35% for a single run? For Cybench, I guess finishing 10 tasks means acc=25%, and finishing 11 tasks means acc=27.5%?
1
0
1
Joint collaboration w/ @xiangyuqi_pton Nicolas Carlini @YangsiboHuang @VitusXie @LuxiHeLucy Matthew Jagielski @srxzr @prateekmittal_ @PeterHndrsn at @princeton_nlp, @PrincetonPLI, @PrincetonCITP and @Google Paper: Code: [6/n, n=6]
1
2
8
RT @benediktstroebl: 🚀 Introducing HAL: The Holistic Agent Leaderboard! The standardized, cost-aware, and third-party platform for evaluat…
0
27
0
RT @katherine1ee: Machine unlearning is taking off! There is a ton of interest in getting generative AI models to “unlearn” targeted undesi…
0
59
0
@TobyWalsh Could you provide more concrete examples of "good materials that aren't copyrighted"? I agree with the second point, and that is exactly what we aim to solve in our paper, i.e., how to better remove the copyrighted content from the model as per the copyright owner's request.
1
0
1
To be more precise, I think the issue here is the lack of a clear definition of "copyright infringement". To my understanding, it is usually a case-by-case study, without a clear definition. For example, how many n-gram overlaps can be treated as copyright infringement? Can near-duplicate be treated as copyright infringement? Can semantic/style similarity be treated as copyright infringement? Correct me if I am wrong.
0
0
0
@TobyWalsh Another issue is we don't have a very clear definition of copyright materials. From the model developer's perspective, a more practical way is removing the material only when the copyright owners ask them to do so.
1
0
1
@TobyWalsh That's a good point! In fact, we do have such work: The key issue is that most of the non-copyrighted materials do not have good quality, so the model trained on them cannot perform well.
0
0
0
RT @PeterHndrsn: New piece with Mark Lemley: "The Mirage of Artificial Intelligence Terms of Use Restrictions." Check it out! (Link in the…
0
12
0
2. An Adversarial Perspective on Machine Unlearning for AI Safety (. Presented by @jakub_lucki on Sat 4:30 pm (Room West Meeting 121,122)
🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨 Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training. Here's what we found👇
0
0
1