Gabriel Mukobi @gabemukobi profile

Gabriel Mukobi

@gabemukobi

Followers

839

Following

2K

Statuses

565

U.S. AI Safety Institute @NIST, CS PhD @Berkeley_AI | Safe and secure advanced AI. Opinions are my own.

San Francisco, CA

Joined September 2017

Don't wanna be here? Send us removal request.

Gabriel Mukobi

@gabemukobi

25 days

RT @EdelmanBen: 1/ Excited to share a new blog post from the U.S. AI Safety Institute! AI agents are becoming increasingly capable. But th…

0

27

0

Gabriel Mukobi

@gabemukobi

2 months

@RylanSchaeffer Woah what were you running for 46 days? 👀

0

5

Gabriel Mukobi

@gabemukobi

2 months

@sea_snell I expect 2. very roughly predicts nonzero APPS Pass@1 for Llama 3.1 405B zero shot--have you evaluated it on apps as well as this language modeling loss score to check your prediction?

0

Gabriel Mukobi

@gabemukobi

3 months

The U.S. AI Safety Institute is hiring technical staff! We're seeking talented engineers, scientists, or security experts for frontier testing, technical guidance, and more—these roles are only open for a couple more days, so apply or DM recommendations!

0

14

Gabriel Mukobi

@gabemukobi

4 months

RT @alxndrdavies: Jailbreaking evals ~always focus on simple chatbots—excited to announce AgentHarm, a dataset for measuring harmfulness of…

0

39

0

Gabriel Mukobi

@gabemukobi

5 months

RT @ben_s_bucknall: Excited to share our new website on Open Problems in Technical AI Governance, a companion to our recent paper on the to…

0

13

0

Gabriel Mukobi

@gabemukobi

5 months

RT @AISafetyInst: Jade Leung (our CTO) and @geoffreyirving (our Research Director) have been nominated in the @TIME top 100 most influentia…

0

8

0

Gabriel Mukobi

@gabemukobi

6 months

RT @BrandoHablando: Gave a 60 min lightning talk at Stanford's AI Safety Annual Meeting on our work identifying novel factors that explain…

0

2

0

Gabriel Mukobi

@gabemukobi

6 months

@BogdanIonutCir2 Possibly, though I'd expect directly weaponizing dangerous capabilities is easier than repurposing those capabilities for safety. A crux is whether one expects AI systems will be very useful hackers or capability researchers before they're very good safety/alignment researchers.

1

0

1

Gabriel Mukobi

@gabemukobi

6 months

@maxwellazoury @DanHendrycks That might be a useful property for certain models to have--imagine hardened models you want to share with pre-release testers or deploy in not-very-secure datacenters but not be as worried about unexpected harms from malicious finetuning if the model leaks to bad actors

0

Gabriel Mukobi

@gabemukobi

6 months

@MotionTsar So it tends to assume coins are fairer than they actually are? This is acceptable 🙃

1

0

Gabriel Mukobi

@gabemukobi

6 months

RT @The_JBernardi: 🚀 New blog! Achieving AI Resilience: Exploring AI safety through a lens of adaptation & societal resilience. Advanced A…

0

5

0

Gabriel Mukobi

@gabemukobi

6 months

RT @Kurz_Gesagt: Humanity's smartest invention might also be its last. Superintelligent AI could be our dream come true – or our worst nigh…

0

133

0

Gabriel Mukobi

@gabemukobi

6 months

RT @METR_Evals: How well can LLM agents complete diverse tasks compared to skilled humans? Our preliminary results indicate that our baseli…

0

99

0

Gabriel Mukobi

@gabemukobi

6 months

@inner_treasure Quite possibly! I do acknowledge this in the post, but I also expect the most significant principal component accounts for a lot of the variance such that "general capabilities" is a useful enough thing to talk about, at least for the purposes of this piece

0

Gabriel Mukobi

@gabemukobi

6 months

@nitarshan @AISafetyInst Congrats, and thank you for your service!🫡

0

2

Gabriel Mukobi

@gabemukobi

6 months

🔗Consider reading the full post for many more details, references, considerations, and AI governance ideas. How do you conceptualize the progression of AGI development? Blog post: Discussion forum: 15/15

0

1