wei_boyi Profile Banner
Boyi Wei Profile
Boyi Wei

@wei_boyi

Followers
378
Following
374
Statuses
91

PhD student @Princeton @PrincetonCITP.

Princeton, NJ
Joined February 2020
Don't wanna be here? Send us removal request.
@wei_boyi
Boyi Wei
18 days
Open-sourced models suffer from dual-use risks via fine-tuning. Recently, several new defenses have been proposed to counter these attacks. But how do we properly characterize the depth of the defense? Our paper in ICLR 25 shows that correct evaluation is hard—beware of common pitfalls! [1/n] #ICLR2025
2
10
31
@wei_boyi
Boyi Wei
3 days
In our recent #ICLR2025 paper: we also observed a similar phenomenon and provided a checklist to avoid biased evaluation. Check it out if you are interested!
0
0
3
@wei_boyi
Boyi Wei
15 days
0
0
2
@wei_boyi
Boyi Wei
18 days
@IrvinSTZhao Thanks!!
0
0
0
@wei_boyi
Boyi Wei
18 days
@benediktstroebl Just out of curious, did you get the accuracy of 26.35% for a single run? For Cybench, I guess finishing 10 tasks means acc=25%, and finishing 11 tasks means acc=27.5%?
1
0
1
@wei_boyi
Boyi Wei
18 days
Joint collaboration w/ @xiangyuqi_pton Nicolas Carlini @YangsiboHuang @VitusXie @LuxiHeLucy Matthew Jagielski @srxzr @prateekmittal_ @PeterHndrsn at @princeton_nlp, @PrincetonPLI, @PrincetonCITP and @Google Paper: Code: [6/n, n=6]
1
2
8
@wei_boyi
Boyi Wei
24 days
RT @benediktstroebl: 🚀 Introducing HAL: The Holistic Agent Leaderboard! The standardized, cost-aware, and third-party platform for evaluat…
0
27
0
@wei_boyi
Boyi Wei
2 months
RT @katherine1ee: Machine unlearning is taking off! There is a ton of interest in getting generative AI models to “unlearn” targeted undesi…
0
59
0
@wei_boyi
Boyi Wei
2 months
@TobyWalsh Could you provide more concrete examples of "good materials that aren't copyrighted"? I agree with the second point, and that is exactly what we aim to solve in our paper, i.e., how to better remove the copyrighted content from the model as per the copyright owner's request.
1
0
1
@wei_boyi
Boyi Wei
2 months
To be more precise, I think the issue here is the lack of a clear definition of "copyright infringement". To my understanding, it is usually a case-by-case study, without a clear definition. For example, how many n-gram overlaps can be treated as copyright infringement? Can near-duplicate be treated as copyright infringement? Can semantic/style similarity be treated as copyright infringement? Correct me if I am wrong.
0
0
0
@wei_boyi
Boyi Wei
2 months
@TobyWalsh Another issue is we don't have a very clear definition of copyright materials. From the model developer's perspective, a more practical way is removing the material only when the copyright owners ask them to do so.
1
0
1
@wei_boyi
Boyi Wei
2 months
@TobyWalsh That's a good point! In fact, we do have such work: The key issue is that most of the non-copyrighted materials do not have good quality, so the model trained on them cannot perform well.
0
0
0
@wei_boyi
Boyi Wei
2 months
RT @PeterHndrsn: New piece with Mark Lemley: "The Mirage of Artificial Intelligence Terms of Use Restrictions." Check it out! (Link in the…
0
12
0
@wei_boyi
Boyi Wei
2 months
2. An Adversarial Perspective on Machine Unlearning for AI Safety (. Presented by @jakub_lucki on Sat 4:30 pm (Room West Meeting 121,122)
@jakub_lucki
Jakub Łucki
5 months
🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨 Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training. Here's what we found👇
Tweet media one
0
0
1