Boyi Wei @wei_boyi profile

Boyi Wei

@wei_boyi

Followers

378

Following

374

Statuses

91

PhD student @Princeton @PrincetonCITP.

Princeton, NJ

Joined February 2020

Don't wanna be here? Send us removal request.

Boyi Wei

@wei_boyi

18 days

Open-sourced models suffer from dual-use risks via fine-tuning. Recently, several new defenses have been proposed to counter these attacks. But how do we properly characterize the depth of the defense? Our paper in ICLR 25 shows that correct evaluation is hard—beware of common pitfalls! [1/n] #ICLR2025

2

10

31

Boyi Wei

@wei_boyi

3 days

In our recent #ICLR2025 paper: we also observed a similar phenomenon and provided a checklist to avoid biased evaluation. Check it out if you are interested!

0

3

Boyi Wei

@wei_boyi

15 days

@javirandor @iclr_conf Congrats!

0

2

Boyi Wei

@wei_boyi

18 days

@IrvinSTZhao Thanks!!

0

Boyi Wei

@wei_boyi

18 days

@benediktstroebl Just out of curious, did you get the accuracy of 26.35% for a single run? For Cybench, I guess finishing 10 tasks means acc=25%, and finishing 11 tasks means acc=27.5%?

1

0

1

Boyi Wei

@wei_boyi

18 days

Joint collaboration w/ @xiangyuqi_pton Nicolas Carlini @YangsiboHuang @VitusXie @LuxiHeLucy Matthew Jagielski @srxzr @prateekmittal_ @PeterHndrsn at @princeton_nlp, @PrincetonPLI, @PrincetonCITP and @Google Paper: Code: [6/n, n=6]

1

2

8

Boyi Wei

@wei_boyi

24 days

RT @benediktstroebl: 🚀 Introducing HAL: The Holistic Agent Leaderboard! The standardized, cost-aware, and third-party platform for evaluat…

0

27

0

Boyi Wei

@wei_boyi

2 months

RT @katherine1ee: Machine unlearning is taking off! There is a ton of interest in getting generative AI models to “unlearn” targeted undesi…

0

59

0

Boyi Wei

@wei_boyi

2 months

@TobyWalsh Could you provide more concrete examples of "good materials that aren't copyrighted"? I agree with the second point, and that is exactly what we aim to solve in our paper, i.e., how to better remove the copyrighted content from the model as per the copyright owner's request.

1

0

1

Boyi Wei

@wei_boyi

2 months

To be more precise, I think the issue here is the lack of a clear definition of "copyright infringement". To my understanding, it is usually a case-by-case study, without a clear definition. For example, how many n-gram overlaps can be treated as copyright infringement? Can near-duplicate be treated as copyright infringement? Can semantic/style similarity be treated as copyright infringement? Correct me if I am wrong.

0

Boyi Wei

@wei_boyi

2 months

@TobyWalsh Another issue is we don't have a very clear definition of copyright materials. From the model developer's perspective, a more practical way is removing the material only when the copyright owners ask them to do so.

1

0

1

Boyi Wei

@wei_boyi

2 months

@TobyWalsh That's a good point! In fact, we do have such work: The key issue is that most of the non-copyrighted materials do not have good quality, so the model trained on them cannot perform well.

0

Boyi Wei

@wei_boyi

2 months

RT @PeterHndrsn: New piece with Mark Lemley: "The Mirage of Artificial Intelligence Terms of Use Restrictions." Check it out! (Link in the…

0

12

0

Boyi Wei

@wei_boyi

2 months

2. An Adversarial Perspective on Machine Unlearning for AI Safety (. Presented by @jakub_lucki on Sat 4:30 pm (Room West Meeting 121,122)

Jakub Łucki

@jakub_lucki

5 months

🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨 Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training. Here's what we found👇

0

1