EasonZeng623 Profile Banner
Yi Zeng 曾祎 Profile
Yi Zeng 曾祎

@EasonZeng623

Followers
1K
Following
2K
Statuses
563

probe to improve @VirtueAI_co | Ph.D. @VTEngineering | Amazon Research Fellow | #AI_safety 🦺 #AI_security 🛡 | I deal with the dark side of machine learning.

Virginia, US
Joined August 2017
Don't wanna be here? Send us removal request.
@EasonZeng623
Yi Zeng 曾祎
1 year
Now you know there's another dude just discussed AI Safety and Security with both sides ;) #NeurIPS2023 [📸 With legendaries @ylecun and Yoshua Bengio]
Tweet media one
Tweet media two
1
5
117
@EasonZeng623
Yi Zeng 曾祎
4 days
RT @Yihe__Deng: New paper & model release! Excited to introduce DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails…
0
28
0
@EasonZeng623
Yi Zeng 曾祎
7 days
RT @aleks_madry: Do current LLMs perform simple tasks (e.g., grade school math) reliably? We know they don't (is 9.9 larger than 9.11?), b…
0
42
0
@EasonZeng623
Yi Zeng 曾祎
12 days
RT @aryaman2020: new paper! 🫡 we introduce 🪓AxBench, a scalable benchmark that evaluates interpretability techniques on two axes: concept…
0
69
0
@EasonZeng623
Yi Zeng 曾祎
14 days
RT @tomekkorbak: 🧵 What safety measures prevent a misaligned LLM agent from causing a catastrophe? How do we make a safety case demonstrati…
0
36
0
@EasonZeng623
Yi Zeng 曾祎
15 days
RT @arankomatsuzaki: Open Problems in Mechanistic Interpretability This forward-facing review discusses the current frontier of mechanisti…
0
38
0
@EasonZeng623
Yi Zeng 曾祎
15 days
RT @leedsharkey: Big new review! 🟦Open Problems in Mechanistic Interpretability🟦 We bring together perspectives from ~30 top researchers…
0
86
0
@EasonZeng623
Yi Zeng 曾祎
16 days
RT @McaleerStephen: DeepSeek should create a preparedness framework/RSP if they continue to scale reasoning models.
0
13
0
@EasonZeng623
Yi Zeng 曾祎
24 days
0
0
1
@EasonZeng623
Yi Zeng 曾祎
1 month
RT @rm_rafailov: We have a new position paper on "inference time compute" and what we have been working on in the last few months! We prese…
0
236
0
@EasonZeng623
Yi Zeng 曾祎
2 months
RT @AnthropicAI: New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we f…
0
740
0
@EasonZeng623
Yi Zeng 曾祎
2 months
RT @LukeBailey181: Can interpretability help defend LLMs? We find we can reshape activations while preserving a model’s behavior. This lets…
0
82
0
@EasonZeng623
Yi Zeng 曾祎
2 months
RT @xun_aq: Code agents are great, but not risk-free in code execution and generation! 🎯 We propose RedCode, an evaluation platform to com…
0
10
0
@EasonZeng623
Yi Zeng 曾祎
2 months
RT @xun_aq: For more details, please visit our paper at I'm Xun Liu, a senior undergraduate student advised by Pr…
0
1
0
@EasonZeng623
Yi Zeng 曾祎
3 months
RT @yujink_: How will LLMs reshape our democracy? Recent work including ours has started exploring this important question. We recently wr…
0
19
0
@EasonZeng623
Yi Zeng 曾祎
3 months
Paper 1530 here @emnlpmeeting . Seeing y’all soon 🤫
@EasonZeng623
Yi Zeng 曾祎
4 months
Excited to present "BEEAR" at @emnlpmeeting! Join us in Session 03: Ethics, Bias, and Fairness 🕑on Nov 12 (Tue) from 14:00-15:30. See you in Miami! 🎉
Tweet media one
0
1
11
@EasonZeng623
Yi Zeng 曾祎
4 months
RT @jbhuang0604: Wow!! 🤯🤯🤯 Openings of *30* tenured and/or tenure-track faculty positions in Artificial Intelligence!
0
40
0
@EasonZeng623
Yi Zeng 曾祎
4 months
RT @farairesearch: Bay Area Alignment Workshop Day 2 packed with learnings on interpretability, robustness, oversight & beyond! Shoutout to…
0
7
0
@EasonZeng623
Yi Zeng 曾祎
4 months
RT @kevin_klyman: Come to our workshop on the future of third party AI evaluations on Monday! We have some of the top folks in the field on…
0
6
0
@EasonZeng623
Yi Zeng 曾祎
4 months
RT @farairesearch: Kicked off Day 1 of the Bay Area Alignment Workshop in Santa Cruz with amazing energy! Huge thanks to @ancadianadragan,…
0
6
0
@EasonZeng623
Yi Zeng 曾祎
4 months
Javier’s take aligns with mine. It’s the same feeling I had after reading Anthropic’s new RSP, where their red-teaming focuses on testing models that have had safety guardrails removed, rather than just evaluating refusals. Assessing how models might assist harmful actions during jailbreaks, makes more sense to me in preventing catastrophic risks.
@javirandor
Javier Rando
4 months
Jailbreaks have become a new sort of ImageNet competition instead of helping us better understand LLM security. I wrote a blogpost about what I think valuable research could look like 🧵
0
0
12