![Gabriel Mukobi Profile](https://pbs.twimg.com/profile_images/1786105757932347392/GJlhkpsv_x96.jpg)
Gabriel Mukobi
@gabemukobi
Followers
839
Following
2K
Statuses
565
U.S. AI Safety Institute @NIST, CS PhD @Berkeley_AI | Safe and secure advanced AI. Opinions are my own.
San Francisco, CA
Joined September 2017
RT @EdelmanBen: 1/ Excited to share a new blog post from the U.S. AI Safety Institute! AI agents are becoming increasingly capable. But th…
0
27
0
@sea_snell I expect 2. very roughly predicts nonzero APPS Pass@1 for Llama 3.1 405B zero shot--have you evaluated it on apps as well as this language modeling loss score to check your prediction?
0
0
0
RT @alxndrdavies: Jailbreaking evals ~always focus on simple chatbots—excited to announce AgentHarm, a dataset for measuring harmfulness of…
0
39
0
RT @ben_s_bucknall: Excited to share our new website on Open Problems in Technical AI Governance, a companion to our recent paper on the to…
0
13
0
RT @AISafetyInst: Jade Leung (our CTO) and @geoffreyirving (our Research Director) have been nominated in the @TIME top 100 most influentia…
0
8
0
RT @BrandoHablando: Gave a 60 min lightning talk at Stanford's AI Safety Annual Meeting on our work identifying novel factors that explain…
0
2
0
@BogdanIonutCir2 Possibly, though I'd expect directly weaponizing dangerous capabilities is easier than repurposing those capabilities for safety. A crux is whether one expects AI systems will be very useful hackers or capability researchers before they're very good safety/alignment researchers.
1
0
1
@maxwellazoury @DanHendrycks That might be a useful property for certain models to have--imagine hardened models you want to share with pre-release testers or deploy in not-very-secure datacenters but not be as worried about unexpected harms from malicious finetuning if the model leaks to bad actors
0
0
0
@MotionTsar So it tends to assume coins are fairer than they actually are? This is acceptable 🙃
1
0
0
RT @The_JBernardi: 🚀 New blog! Achieving AI Resilience: Exploring AI safety through a lens of adaptation & societal resilience. Advanced A…
0
5
0
RT @Kurz_Gesagt: Humanity's smartest invention might also be its last. Superintelligent AI could be our dream come true – or our worst nigh…
0
133
0
RT @METR_Evals: How well can LLM agents complete diverse tasks compared to skilled humans? Our preliminary results indicate that our baseli…
0
99
0
@inner_treasure Quite possibly! I do acknowledge this in the post, but I also expect the most significant principal component accounts for a lot of the variance such that "general capabilities" is a useful enough thing to talk about, at least for the purposes of this piece
0
0
0