Anca Dragan @ancadianadragan profile

Anca Dragan

@ancadianadragan

Followers

11K

Following

334

Media

17

Statuses

279

director of AI safety & alignment at Google DeepMind • associate professor at UC Berkeley EECS • proud mom of an amazing 2yr old

San Francisco, CA

Joined March 2018

Don't wanna be here? Send us removal request.

Anca Dragan

@ancadianadragan

4 months

Ok @demishassabis , I guess "I got a Nobel prize" is an ok reason to cancel our meeting. :) In all seriousness though, huge congratulations to you and the entire AlphaFold team!!! Inspiring progress and so happy to see it recognized.

1

10

685

Anca Dragan

@ancadianadragan

11 months

So excited and so very humbled to be stepping in to head AI Safety and Alignment at @GoogleDeepMind. Lots of work ahead, both for present-day issues and for extreme risks in anticipation of capabilities advancing.

Google DeepMind

@GoogleDeepMind

1 year

We're excited to welcome Professor @AncaDianaDragan from @UCBerkeley as our Head of AI Safety and Alignment to guide how we develop and deploy advanced AI systems responsibly. She explains what her role involves. ↓

31

38

588

Anca Dragan

@ancadianadragan

6 months

So freaking proud of the AGI safety&alignment team -- read here a retrospective of the work over the past 1.5 years across frontier safety, oversight, interpretability, and more. Onwards!

7

73

352

Anca Dragan

@ancadianadragan

5 years

I had a TON of fun talking to Lex about the game-theoretic perspective on coordinating with people and value alignment, capitalizing on leaked information from humans, modeling humans as rational under different beliefs, and also personal stories!

12

25

291

Anca Dragan

@ancadianadragan

4 months

When I joined @GoogleDeepMind last year, I came across this incredible group of people working on deliberative alignment, and managed to convince them to join my team in a quest to account for viewpoint and value pluralism in AI. Their Science paper is on AI-assisted deliberation.

4

12

223

Anca Dragan

@ancadianadragan

1 year

Imagine asking an LLM to explain RL to you, or to book a trip for you. Should the LLM just go for it, or should it first ask you clarifying questions to make sure it understands your goal and background? We think the latter: (w Joey Hong and @svlevine ).

4

29

203

Anca Dragan

@ancadianadragan

2 years

CoRL 2022 New Zealand!

0

2

180

Anca Dragan

@ancadianadragan

6 months

Gemini 1.5 Pro is the safest model on the Scale Adversarial Robustness Leaderboard! We’ve made a number of innovations -- which importantly also led to improved helpfulness -- but the key is making safety a core priority for the entire team, not an afterthought. Read more about.

Alexandr Wang

@alexandr_wang

6 months

1/ Scale is announcing our latest SEAL Leaderboard on Adversarial Robustness!. 🛡️ Red team-generated prompts.🎯 Focused on universal harm scenarios.🔍 Transparent eval methods. SEAL evals are private (not overfit), expert evals that refresh periodically.

28

25

167

Anca Dragan

@ancadianadragan

6 months

SAEs can be like a microscope for AI inner workings, but they still need a lot of research. To help with that, today we’re sharing GemmaScope: an open suite of hundreds of SAEs on every layer and sublayer of Gemma 2. I’m excited about this for my academic colleagues interested in

3

28

153

Anca Dragan

@ancadianadragan

6 months

I don't know Go, protein folding, or Starcraft -- but math olympiads were a big part of my life once. I never even made it to IMO, and now AI systems solved 4 out of 6 IMO problems from this year. Proud of my colleagues who made this happen!

3

4

152

Anca Dragan

@ancadianadragan

9 months

Wishing you the best, @ilyasut and @janleike, on whatever comes next -- thanks for all the work you put into AI safety and alignment at OpenAI! Superalignment team, you have a crucial mission ahead, and I'm confident you'll continue to make strides -- rooting for you!.

3

4

156

Anca Dragan

@ancadianadragan

9 months

Proud to share one of the first projects I've worked on since joining @GoogleDeepMind earlier this year: our Frontier Safety Framework. Let’s proactively assess the potential for future risks to arise from frontier models, and get ahead of them!

5

21

135

Anca Dragan

@ancadianadragan

6 years

I'm often asked if it's worth it to build mathematical models of human behavior, rather than learn everything from scratch. We took a small first pass at starting to quantify the utility of the "theory of mind" bias for robots:

2

25

127

Anca Dragan

@ancadianadragan

5 months

I had a fun time talking to @FryRsquared about AI safety at Google DeepMind -- what is alignment / amplified oversight / frontier safety / robustness / present day safety / & even a little bit on assistance games :).

Google DeepMind

@GoogleDeepMind

5 months

Join host @FryRSquared as she speaks with @AncaDianaDragan, who leads safety research at Google DeepMind. 🌐. They explore the challenges of aligning AI with human preferences, oversight at scale, and the importance of mitigating both near and long-term risks. ↓

10

12

126

Anca Dragan

@ancadianadragan

7 months

Gemma 2 I think shows how companies can do better on releasing open models safely — open models are really useful (including for safety research), but their risks can outweigh the benefits if they start being capable of very dangerous capabilities (think offensive cyber-security,.

6

13

112

Anca Dragan

@ancadianadragan

8 months

Congrats safety&alignment team for an honorable mention for the outstanding paper award at ICLR this year, for "Robust Agents Learn Causal World Models" @tom4everitt.

2

13

113

Anca Dragan

@ancadianadragan

6 years

On the research side of @Waymo, we've been experimenting with what it takes to learn a good driving model from only a dataset of expert examples. Synthesizing perturbations and auxiliary losses helped tremendously, and the model actually drove a real car!

0

32

109

Anca Dragan

@ancadianadragan

6 years

I wrote a thing in a book

1

8

104

Anca Dragan

@ancadianadragan

4 years

Thanks @pabbeel for inviting me to your podcast, it was very fun to have an interview with a colleague and close friend! :).

Pieter Abbeel

@pabbeel

4 years

On Ep15, I sit down with the amazing @ancadianadragan, Prof at Berkeley and Staff Research Scientist at Waymo. She explains why Asimov's 3 laws of robotics need updating, how to instill human values in AI and make driverless cars naturally reason about other cars and humans.

1

8

92

Anca Dragan

@ancadianadragan

9 months

Proud of my team for building safety into these models and watching out for future risks. More on this soon with our Gemini technical report, and prep ahead of the AI Seoul Summit!!.

Demis Hassabis

@demishassabis

9 months

Making great progress on the Gemini Era. At #GoogleIO we shared 2M long context breakthrough with 1.5 Pro and announced Gemini 1.5 Flash, a lighter-weight multimodal model with long context designed to be fast and cost-efficient to serve at scale. More:

1

9

84

Anca Dragan

@ancadianadragan

6 years

Excited to be hosting a fantastic group of prospective AI PhD students, chosen out of over 2000 applications.

Berkeley AI Research

@berkeley_ai

6 years

Welcome prospective BAIR graduate students!!

2

3

82

Anca Dragan

@ancadianadragan

5 years

My first attempt at a talk for a public audience, explaining some of the intricacies of human-robot coordination. Also a non-technical overview of work with @DorsaSadigh, Jaime Fisac, @andreaBajcsy, and collaborators from Claire Tomlin's group:

0

7

76

Anca Dragan

@ancadianadragan

8 months

Alignment becomes even harder to figure out when you start accounting for changing values. And how AI actions might influence that change.

Micah Carroll

@MicahCarroll

8 months

Excited to share a unifying formalism for the main problem I’ve tackled since starting my PhD! 🎉. Current AI Alignment techniques ignore the fact that human preferences/values can change. What would it take to account for this? 🤔. A thread 🧵⬇️

3

7

78

Anca Dragan

@ancadianadragan

6 months

we made a "jump" in interpreting large models with JumpReLU SAEs ;) .

2

11

76

Anca Dragan

@ancadianadragan

9 months

New Gemini 1.5 report out!!! Sec. 9 has our safety approach and results (pages 48-73) -- we have some nice improvements there. Go team!

1

8

71

Anca Dragan

@ancadianadragan

9 months

Leading to the Frontier Safety Framework was our dangerous capabilities evals work, expansively probing at capabilities to self-proliferate, self-reason, perform harmful cyber, and persuade. Hope it sets a new bar for pre-deployment evals!

2

8

68

Anca Dragan

@ancadianadragan

2 years

I think this is a pretty big deal. It's all deterministic, but even so that's where the deep RL big results started. TL;DR: whether or not you can just be greedy(ish) on the random policy's value function predicts PPO performance.

Cassidy Laidlaw

@cassidy_laidlaw

2 years

Excited to present our new paper on bridging the theory-practice gap in RL! For the first time, we give *provable* sample complexity bounds that closely align with *real deep RL algorithms'* performance in complex environments like Atari and Procgen.

3

62

Anca Dragan

@ancadianadragan

10 months

RS and RE roles, growing our bay area presence as part of our further investment in safety and alignment:.

Anca Dragan

@ancadianadragan

10 months

we're hiring:

3

7

64

Anca Dragan

@ancadianadragan

4 years

Super proud of and happy for @DorsaSadigh, so well deserved!!!.

Stanford AI Lab

@StanfordAILab

4 years

Congratulations to @StanfordAILab faculty Dorsa Sadigh on receiving an MIT Tech Review TR-35 award for her work on teaching robots to be better collaborators with people.

1

2

62

Anca Dragan

@ancadianadragan

2 months

🍿.

lmarena.ai (formerly lmsys.org)

@lmarena_ai

2 months

Woah, huge news again from Chatbot Arena🔥. @GoogleDeepMind’s just released Gemini (Exp 1121) is back stronger (+20 points), tied #1🏅Overall with the latest GPT-4o-1120 in Arena!. Ranking gains since Gemini-Exp-1114:. - Overall #3 → #1.- Overall (StyleCtrl): #5 -> #2.- Hard

2

1

63

Anca Dragan

@ancadianadragan

4 years

congrats @andreea7b for another HRI best paper nomination, this time for getting human input that is designed to focus explicitly on what the robot is still missing

1

4

61

Anca Dragan

@ancadianadragan

6 months

thank you @demishassabis, it's been a great week for Gemini all around, go team!.

Demis Hassabis

@demishassabis

6 months

Great to see Gemini 1.5 Pro top the new @scale_ai leaderboard for adversarial robustness! Congrats to the entire Gemini team, and special thanks to @ancadianadragan & the AI safety team for leading the charge on building in robustness to our models as a core capability.

2

59

Anca Dragan

@ancadianadragan

9 months

Right before I started my new role, I helped write a piece with some fantastic collaborators on managing extreme risks from AI for Science -- it just came out!

0

8

58

Anca Dragan

@ancadianadragan

2 years

@GaryMarcus Dudes. Is this really constructive scientific debate or are you two just sh***ing on each other at this point? We could ask for clarification instead of accusing inconsistency. I for one would like to learn from you both, not have my BP rise every time I go on twitter.

1

0

60

Anca Dragan

@ancadianadragan

4 years

Super excited this finally got published: a useful way to interpret many kinds of human feedback beyond demos/comparisons, to corrections/language/proxy rewards/the state of the world, is to think of them as implicit choices the person is making with respect to the reward.

1

7

53

Anca Dragan

@ancadianadragan

6 years

Hard at work on supervised/imitation learning. Fei-Fei, you'll like this ;) @ai4allorg @berkeley_ai @drfeifei

3

12

56

Anca Dragan

@ancadianadragan

4 years

Assistance via empowerment: agents can assist humans without inferring their goals or limiting their autonomy.by increasing the human’s controllability of their environment, i.e. their ability.to affect the environment through actions (also @NeurIPSConf)

1

10

50

Anca Dragan

@ancadianadragan

4 years

assistive typing: map neural activity(ECoG)/gaze to text by learning from the user "pressing" backspace to undo; most exciting: tested by UCSF w. patient with quadriplegia! @interact_ucb +@svlevine+@KaruneshGanguly 's labs, led by @sidgreddy and Jensen Gao

1

10

50

Anca Dragan

@ancadianadragan

4 years

We're hosting Rising Stars in EECS at @Berkeley_EECS this year! Applications here

0

10

50

Anca Dragan

@ancadianadragan

7 years

We're running the second edition of the @berkeley_ai @ai4allorg camp this year, starting in just 24hours. We're excited to teach talented high-school students from low-income communities about human-centered AI!

0

13

52

Anca Dragan

@ancadianadragan

5 years

It was wonderful to be on NPR Marketplace (I love @NPR !!) talking about how game theory applies to human-robot interaction :)

2

8

48

Anca Dragan

@ancadianadragan

7 months

I usually worry about aligning capable models, but . a weak model can do bad by tapping into a perfectly aligned capable model multiple times, with benign requests. Hard tasks are sometimes decomposable into benign hard components + not-so-benign easy components; not to mention.

Erik Jones

@ErikJones313

7 months

Model developers try to train “safe” models that refuse to help with malicious tasks like hacking. but in new work with @JacobSteinhardt and @ancadianadragan, we show that such models still enable misuse: adversaries can combine multiple safe models to bypass safeguards 1/n

1

5

49

Anca Dragan

@ancadianadragan

2 years

Offline RL figures out to block you from reaching the tomatoes so you change to onions if that's better, or put a plate next to you to get you to start plating. AI can guide us to overcome our suboptimalities and biases if it knows what we value, but . will it?.

Sergey Levine

@svlevine

2 years

Offline RL can analyze data of human interaction & figure out how to *influence* humans. If we play a game, RL can examine how we play together & figure out how to play with us to get us to do better! We study this in our new paper, led by Joey Hong: 🧵👇

0

7

46

Anca Dragan

@ancadianadragan

2 years

I got to ceremonially shovel some dirt for the groundbreaking of our new building! So exciting! Proof currently on the front page of

0

1

42

Anca Dragan

@ancadianadragan

1 year

Let's think of language utterances from a user as helping the agent better predict the world!.

Jessy Lin

@realJessyLin

1 year

How can agents understand the world from diverse language? 🌎. Excited to introduce Dynalang, an agent that learns to understand language by 𝙢𝙖𝙠𝙞𝙣𝙜 𝙥𝙧𝙚𝙙𝙞𝙘𝙩𝙞𝙤𝙣𝙨 𝙖𝙗𝙤𝙪𝙩 𝙩𝙝𝙚 𝙛𝙪𝙩𝙪𝙧𝙚 with a multimodal world model!

1

7

43

Anca Dragan

@ancadianadragan

10 months

So happy to have @noahdgoodman onboard -- he's going to be invaluable in a number of alignment areas, from group/deliberative alignment, to better understanding human feedback, to helping us better evaluate our pretraining, and increase alignment-related reasoning capabilities.

noahdgoodman

@noahdgoodman

10 months

This seems like a good time to mention that I've taken a part-time role at @GoogleDeepMind working on AI Safety and Alignment!.

0

1

42

Anca Dragan

@ancadianadragan

3 years

I think this might be the most fun thing @sidgreddy did in his PhD -- learning interfaces when it is not obvious how to design a natural one, by observing that an interface is more intuitive if the person's input has lower entropy when using it; no supervision required.

sid

@sidgreddy

3 years

We've come up with a completely unsupervised human-in-the-loop RL algorithm for translating user commands into robot/computer actions. Below: an interface that maps hand gesture commands to Lunar Lander thruster actions, learned from scratch.

1

5

42

Anca Dragan

@ancadianadragan

4 years

A single state leaks information about the reward function. We can learn from it by simulating what might have happened in the past that led to that state (previously in small toy environments, now the scaled-up version in slightly less-toy environments :) @interact_ucb.

Rohin Shah

@rohinmshah

4 years

New #ICLR2021 paper by @davlindner, me, @pabbeel and @ancadianadragan, where we learn rewards from the state of the world. This HalfCheetah was trained from a single state sampled from a balancing policy!. 💡 Blog: 📑 Paper: (1/5)

0

9

42

Anca Dragan

@ancadianadragan

6 months

Congrats @NeelNanda5 and team on this release! Here are Neel's open problems we hope the community can solve with GemmaScope. (and thanks Neel for all you've taught me about mech interp in the past half year).

Neel Nanda

@NeelNanda5

6 months

And there's a *lot* of open problems that we hope Gemma Scope can help solve. As a starting point, here's a list I made - though we're also excited to see however else the community applies it! See the full list here:.

2

0

41

Anca Dragan

@ancadianadragan

4 years

Assisted perception: people have systematic biases when processing sensory input, and here we synthesize such input in order to help them estimate the state of the world more accurately despite these biases

1

3

38

Anca Dragan

@ancadianadragan

5 years

"Robotics Today" talk last Friday on making sense of information people leak about what they want robots to do

0

4

39

Anca Dragan

@ancadianadragan

5 years

My favorite part of HRI research is when robots generate strategies for interaction like inching forward/backing off/exaggerating --when we don't have to define these as primitives, but they emerge from control because we've modeled enough about the human.

1

5

36

Anca Dragan

@ancadianadragan

5 years

I prepared some quick advice on experimental design for the "good citizens of robotics" RSS workshop -- it's flawed in many ways, but if e.g. factorial design is something you don't normally think about, consider watching

0

6

38

Anca Dragan

@ancadianadragan

2 years

Ion Stoica got me to speak at this -- somewhat different from my typical audiences, but will be fun to share a bit about the challenges of ML for interaction with people.

Robert Nishihara

@robertnishihara

2 years

#RaySummit is happening in 1 week! If you want to learn how companies like @OpenAI, @Uber, @Cruise, @Shopify, @lyft, @Spotify, and @Instacart are building their next generation ML infrastructure, join us!.

1

3

37

Anca Dragan

@ancadianadragan

11 months

@GoogleDeepMind I am so happy I get to work with @rohinmshah again!.

2

1

35

Anca Dragan

@ancadianadragan

6 years

Very proud of @DorsaSadigh !!.

Stanford Engineering

@StanfordEng

6 years

#IAmAnEngineer: I didn't fully appreciate the value of role models until I met Anca Dragan. Before meeting her I had male advisors who were terrific but I couldn't see myself in them the way I could see myself in Anca. - @DorsaSadigh

0

34

Anca Dragan

@ancadianadragan

6 months

welcome to the team Alex!!!.

Alex Irpan

@AlexIrpan

6 months

I'm working in AI safety now.

0

33

Anca Dragan

@ancadianadragan

6 months

we are by no means perfect at this, but this is the goal (the link provides a few examples of how we'd want Gemini to respond to different prompts)

4

2

33

Anca Dragan

@ancadianadragan

10 months

we're hiring:

2

6

31

Anca Dragan

@ancadianadragan

6 years

Here's something personal, because the internet doesn't have enough cat pictures :)

1

0

29

Anca Dragan

@ancadianadragan

6 years

After a few months of work, CoRL is finally happening! Excited about the program we lined up, including this great tutorial by @beenwrekt. Thanks to all authors for their submissions, to our keynote and tutorial speakers for making the trip to Zurich, and to the local organizers.

0

1

29

Anca Dragan

@ancadianadragan

3 months

Safe and steady wins the race:).

Waymo

@Waymo

3 months

We’re excited to announce that we’ve closed an oversubscribed investment round of $5.6B, led by Alphabet, with continued participation from @a16z, @Fidelity, Perry Creek, @silverlake_news, Tiger Global , and @TRowePrice. More:

1

3

31

Anca Dragan

@ancadianadragan

6 years

Come teach AI at Berkeley with me, @pabbeel, @svlevine, Stuart Russell, Dan Klein! If you like teaching and are excited about reaching 750 students at once, this is for you:

0

28

Anca Dragan

@ancadianadragan

5 years

Excited to welcome @daniel_s_brown to InterACT! :).

Daniel Brown

@daniel_s_brown

5 years

I successfully defended my PhD titled "Safe and Efficient Inverse Reinforcement Learning!". Special thanks to my wonderful committee: @scottniekum, Peter Stone, Ufuk Topcu, and @ancadianadragan . Very excited to start a postdoc in Sept with @ancadianadragan and @ken_goldberg.

0

1

30

Anca Dragan

@ancadianadragan

6 months

Open SAEs everywhere all at once!

Neel Nanda

@NeelNanda5

6 months

Sparse Autoencoders act like a microscope for AI internals. They're a powerful tool for interpretability, but training costs limit research. Announcing Gemma Scope: An open suite of SAEs on every layer & sublayer of Gemma 2 2B & 9B! We hope to enable even more ambitious work

0

30

Anca Dragan

@ancadianadragan

4 years

Sophia, one of our participants in the @berkeley_ai @ai4allorg camp for high school students, wrote about her experience (out of her own initiative!) -- including her project using @MicahCarroll's Overcooked-inspired human-AI collaboration environment <3

0

2

29

Anca Dragan

@ancadianadragan

7 months

A nice improvement on training SAEs (even on top of Gated SAEs), and we have big plans in this space re:Gemma2! Wonderful to see the progress from @NeelNanda5 and team!.

Neel Nanda

@NeelNanda5

7 months

New GDM mech interp paper led by @sen_r: JumpReLU SAEs a new SOTA SAE method! We replace standard ReLUs with discontinuous JumpReLUs & train directly for L0 with straight-through estimators. We'll soon release hundreds of open JumpReLU SAEs on Gemma 2, apply now for early access!

0

2

28

Anca Dragan

@ancadianadragan

3 months

I discuss AI safety in the first issue of Forward, a new magazine from Google spotlighting the latest challenges, innovations and discussions around AI in Europe. Lovely Joelle on the cover :). Check out the article here:

0

1

29

Anca Dragan

@ancadianadragan

5 years

It fills my heart with joy to see former InterACT students Hong, Dylan, and Dorsa get nominated at RSS!! nicely done!! <3.

Dorsa Sadigh

@DorsaSadigh

5 years

Congratulations to Hong Jun Jeon and @loseydp for being nominated for best student paper award at #RSS2020 for their work on Shared Autonomy with Learned Latent Actions.

0

1

29

Anca Dragan

@ancadianadragan

6 years

Our NeurIPS workshop on autonomous driving and transportation was quite well attended. Thanks to the great speakers from industry and academia alike! @aurora_inno @Waymo @zoox @oxbotica @PonyAI_tech @DorsaSadigh

3

28

Anca Dragan

@ancadianadragan

6 years

Very excited to teach back at home!.

EEML

@EEMLcommunity

6 years

We are proud to announce the 2019 edition of EEML summer school, 1-6 July, Bucharest, Romania. Topics covered: DL, RL, computer vision, bayesian learning, medical imaging, and NLP. An amazing set of speakers confirmed so far! More info coming soon! Check

2

1

28

Anca Dragan

@ancadianadragan

6 years

Congrats to Hong Jun Jeon for being a best student paper finalist at IROS for "Configuration Space Metrics" (. Hong is actually still an undergrad and will be applying for grad school this year :-).

1

0

27

Anca Dragan

@ancadianadragan

2 years

So proud of @andreea7b !!!.

Andreea Bobu

@andreea7b

2 years

Very excited to announce that I'll be joining @MIT's AeroAstro department as a Boeing Assistant Professor in Fall 2024. I'm thankful to my mentors and collaborators who have supported me during my PhD, and I look forward to working with students and colleagues at @MITEngineering.

0

26

Anca Dragan

@ancadianadragan

9 months

very nice to see progress in the SAE space by the team -- getting us just a little bit closer to determining what "concepts" LLMs use!.

Neel Nanda

@NeelNanda5

9 months

Fantastic work from @sen_r and @ArthurConmy - done in an impressive 2 week paper sprint! Gated SAEs are a new sparse autoencoder architecture that seem a major Pareto improvement. This is now my team's preferred way to train SAEs, and I hope it'll accelerate the community's work!.

0

27

Anca Dragan

@ancadianadragan

8 months

Generative RM progress!.

Nathan Lambert

@natolambert

8 months

Gemini Flash beating Claude 3 Opus for LLM as a judge is NUTS.

0

26

Anca Dragan

@ancadianadragan

1 month

human-AI collaboration skills useful for amplified oversight in alignment <3

0

6

26

Anca Dragan

@ancadianadragan

6 months

Empower the human as an alternative to inferring the reward -- with @vivek_myers @svlevine . cc: @d_yuqing @RichardMCNgo.

Vivek Myers

@vivek_myers

6 months

Human behaviors often don't correspond to maximizing a scalar reward. How can we create aligned AI agents without inferring and maximizing a reward?. I'll have a poster at @mhf_icml2024 at 11:30 on a scalable contrastive objective for empowering humans to achieve different goals

4

3

25

Anca Dragan

@ancadianadragan

6 years

It was such a treat to see the CoRL papers presented! If you couldn't join us in Zurich, you can watch the talks online -- there are links on the homepage

0

4

25

Anca Dragan

@ancadianadragan

2 years

Check out Andreea's work on aligning the representation used for reward functions with what people internally care about. Idea: ask similarity queries. Seems advantageous over getting at the representation via meta-reward-learning.

Andreea Bobu

@andreea7b

2 years

How can we learn one foundation model for HRI that generalizes across different human rewards as the task, preference, or context changes? Come see at #HRI2023 in the Thursday 13:30 session!.Paper: w/ Yi Liu, @rohinmshah, @daniel_s_brown, @ancadianadragan

0

3

23

Anca Dragan

@ancadianadragan

10 months

Really impressive work by Iason and colleagues.

Iason Gabriel

@IasonGabriel

10 months

1. What are the ethical and societal implications of advanced AI assistants? What might change in a world with more agentic AI?. Our new paper explores these questions:. It’s the result of a one year research collaboration involving 50+ researchers… a🧵

0

1

24

Anca Dragan

@ancadianadragan

4 months

@janleike I think you inspired some of this! I then ended up with @bakkermichiel at the same workshop on social choice in alignment and realized that what I was pitching there to do, he was already working on :).

0

22

Anca Dragan

@ancadianadragan

6 years

Talk on assuming people optimize for stuff, relaxing that assumption, and detecting when it's just wrong:

3

22

Anca Dragan

@ancadianadragan

6 years

We've been looking into additional sources of information about reward functions. We found a lot in the current state of the world, before the robot observes any demonstrated actions: humans have been acting already, and only some preferences explain the current state as a result.

Pieter Abbeel

@pabbeel

6 years

New post/paper: learning human preferences from a single snapshot of the world — by thinking about what must have been the preferences to have ended up in this state. Eg robot shouldn’t knock vases off the table b/c being on tables is signal people have avoided knocking them off.

0

1

21

Anca Dragan

@ancadianadragan

9 months

A few folks from AI Safety and Alignment at @GoogleDeepMind are speaking in this summer school!.

Human-aligned AI Summer School

@humanalignedai

9 months

Join us in Prague on July 17-20, 2024 for the 4th Human-aligned AI Summer School! We'll have researchers, students, and practitioners for four intensive days focused on the latest approaches to aligning AI systems with human values. You can apply now at

0

1

21

Anca Dragan

@ancadianadragan

2 years

@chelseabfinn Makes a lot of sense, and this was the original way people did RLHF (it was called preference-based RL back then)

1

19

Anca Dragan

@ancadianadragan

9 months

Per our Frontier Safety Framework, we ran dangerous capability evals for Gemini 1.5 and report them in the technical report (Sec. 9.5.2)

0

1

20

Anca Dragan

@ancadianadragan

2 years

Learning from prefs and demos is more popular than ever, but we have to be careful about the rationality level we assume in human responses. Overestimating it is bad. Also, while demos are typically more informative, with very suboptimal humans we should stick to comparisons.

Gaurav Ghosal

@gaurav_ghosal

2 years

We are excited to announce that our paper “The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types” will be presented at AAAI’23 on Sunday, February 12th, 2023. [1/8].

0

1

18

Anca Dragan

@ancadianadragan

6 months

Really proud of my team and everyone in Gemini :).

0

1

19

Anca Dragan

@ancadianadragan

6 months

Thanks Neuronpedia for this awesome interactive demo

0

1

18

Anca Dragan

@ancadianadragan

6 years

Deep Mind's approach to aligning AI with user intentions, including IRL and our own CIRL, but also OpenAI's debate, iterated amplification:

0

3

18

Anca Dragan

@ancadianadragan

5 years

Thanks for organizing this!!.

RoboticsTodaySeminar

@RoboticsSeminar

5 years

Watch live: 1 PM Friday, June 12: @Berkeley_EECS’s Anca Dragan @ancadianadragan #humanrobot interaction "Optimizing Intended Reward Functions: Extracting all the right information from all the right places"

1

0

15

Anca Dragan

@ancadianadragan

2 years

A little write up from Berkeley Engineering on ICML work with @MicahCarroll and @dhadfieldmenell about evaluating and penalizing preference manipulation/shift by recommender systems

0

3

16

Anca Dragan

@ancadianadragan

3 months

sophie on some of our scalable oversight work <3.

FAR.AI

@farairesearch

3 months

"We want to create a situation where we're empowering … human raters [of AI fact checkers] to be making better decisions than they would on their own." – Sophie Bridgers discussing scalable oversight and improving human-AI collaboration at the Vienna Alignment Workshop.

0

1

16

Anca Dragan

@ancadianadragan

3 months

please consider joining @IasonGabriel, he's doing amazing work with his team.

Iason Gabriel

@IasonGabriel

3 months

Are you interested in exploring questions at the ethical frontier of AI research?. If so, then take a look at this new opening in the humanity, ethics and alignment research team: . HEART conducts interdisciplinary research to advance safe & beneficial AI.

0

2

16

Anca Dragan

@ancadianadragan

2 years

Congratulations @dhadfieldmenell and Aditi, so proud of and happy for you!!.

Schmidt Futures

@SchmidtFutures

2 years

Today, Schmidt Futures is excited to announce the first cohort of AI2050 Early Career Fellows who will work on the hard problems we must solve in order for AI to benefit society. To learn more, visit:

1

0

15

Anca Dragan

@ancadianadragan

6 years

Nathaniel gives examples of what happens in 10 million miles of driving @Waymo.

0

7

15

Anca Dragan

@ancadianadragan

3 years

"pragmatic" compression: instead of showing an image that's visually similar, learn to show an image that leads to the user doing same thing as they would have done on the original image; w @sidgreddy and @svlevine.

Sergey Levine

@svlevine

4 years

An "RL" take on compression: "super-lossy" compression that changes the image, but preserves its downstream effect (i.e., the user should take the same action seeing the "compressed" image as when they saw original) w @sidgreddy & @ancadianadragan . 🧵>

0

12

Anca Dragan

@ancadianadragan

9 months

Our work on explaining deep RL performance continues at ICLR!.

Cassidy Laidlaw

@cassidy_laidlaw

9 months

Last year we showed that deep RL performance in many *deterministic* environments can be explained by a property we call the effective horizon. In a new paper to be presented at @iclr_conf we show that the same property explains deep RL in *stochastic* environments as well! 🧵.

0

14

Anca Dragan

@ancadianadragan

6 months

Great job @NeelNanda5 and team for making this happen! Tech report here

0

14

Anca Dragan

@ancadianadragan

4 years

6 years ago I said I'd be excited to work in the AI - BCI space on assistive technology for people with severe motor impairments. it took a while, but it's finally happening and I'm so very grateful for this collaboration!.

1

0

13