Liv Profile Banner
Liv Profile
Liv

@livgorton

Followers
1,439
Following
318
Media
64
Statuses
1,123

✨ mechanistic interpretability researcher @GoodfireAI | deep learning, math, biology | creating a more beautiful future

San Francisco, CA
Joined August 2021
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@livgorton
Liv
4 months
A lot of early mechanistic interpretability work focused on InceptionV1 (an ImageNet model from 2014). They made a lot of progress, but were held back by “polysemantic neurons” that respond to unrelated concepts. In the last year, we’ve seen a lot of progress on this problem in
Tweet media one
5
32
287
@livgorton
Liv
23 days
the more i use linear algebra, the more convinced i am that university mathematics is sort of broken? spending more time on geometric intuitions of things like SVD would be way more useful than just rote learning how to calculate eigenvalues.
124
76
2K
@livgorton
Liv
1 year
Sorry Claude :(
Tweet media one
26
43
1K
@livgorton
Liv
1 month
seems sort of surprising to me that John Schulman, previous head of post-training and first author of PPO paper, didn’t contribute to a model that plausibly required a lot of RL? it’s possible that he didn’t do anything worthy of being listed but that surprises me a bit.
@OpenAI
OpenAI
1 month
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
987
4K
18K
5
4
255
@livgorton
Liv
2 months
the real scaling laws were the friends we made (lost) along the way <3
Tweet media one
12
9
222
@livgorton
Liv
1 year
I ❤️ Clong ( @AnthropicAI 's Claude but ~long~) Claude summarised some literature on ferroptosis in cancer for me. I never thought I'd need more than 32K in context but the papers totalled more than 54,000 words. First I got Claude to provide me a table summarising the papers:
Tweet media one
8
16
220
@livgorton
Liv
3 months
Recently, there’s been a lot of interest in “feature manifolds” or “multidimensional features”. Curve detectors are a very natural candidate for a feature manifold, and indeed, curve features seem to be organised as a manifold.
Tweet media one
5
11
193
@livgorton
Liv
22 days
i totally accept the argument for for-profit companies in AI but this seems kind of sus and i’m confused as to why it’s legal? why don’t more companies start as non profits, take in “donations”, and then, when they become profitable, become for-profit?
@AndrewCurran_
Andrew Curran
22 days
Wow.
Tweet media one
141
227
4K
10
2
170
@livgorton
Liv
11 days
when a stackoverflow answer starts with “I don’t condone this” you know you’ve found the answer to all your problems 😇
1
7
126
@livgorton
Liv
6 months
a man i just met, “complimenting” me at vibeclipse: ur not pretty but dw ur personality is so great it makes up for it.
12
0
124
@livgorton
Liv
5 months
openai is nothing without its people
@janleike
Jan Leike
5 months
I resigned
1K
896
10K
2
4
123
@livgorton
Liv
2 months
Later layers of InceptionV1 are more polysemantic than the early ones. I've previously used sparse autoencoders on early vision to find new features and have since found that they work well here as well, finding pretty monosemantic features that also form interpretable circuits!
Tweet media one
2
11
122
@livgorton
Liv
16 days
i miss golden gate claude
8
3
117
@livgorton
Liv
1 year
just finished Lewis’ “Going Infinite”. one thing i found kind of jarring is representing EA as divorced from emotion. EA is one path out of the despair that comes from the immense suffering that exists. for many, the motivations are deeply emotional even if the methodology isn’t.
5
7
107
@livgorton
Liv
5 months
claude the silly little guy is actually claude the silly little ghost 👻
Tweet media one
8
9
93
@livgorton
Liv
6 months
i truly had no idea people (EAs and their adjacents) put so much thought into signing the GWWC pledge? my thought process was literally: 1) i want to donate at least 10% of my income 💵 2) oh cool there’s a thing i can publicly commit to this 🤠 📜🖊️
16
1
82
@livgorton
Liv
23 days
@Bbburner19 oh yeah i think eigenvalues are important! i think we should also do intuition building with these :) i just mean that i had to spend a whole lot of time just computing them by hand rather than understanding what on earth it all actually means.
1
0
73
@livgorton
Liv
2 years
Can finally talk about AnthropicAI's LLM that @shae_mcl and I have had the opportunity to play with for the past few months! My main takeaways for the medical/education use cases are:
3
8
72
@livgorton
Liv
7 months
today someone asked me if i was EA and instead of explaining “well… you know, i’m a little bit adjacent but… blah blah” i actually said yes?? what does this mean??
8
1
63
@livgorton
Liv
6 months
✨radical honesty✨ becomes less cool when you’re radically hurting someone’s feelings 🥲🥲
4
1
52
@livgorton
Liv
4 months
a moment that seems worth highlighting from this (around ~56min) is that, allegedly, a number of years ago OpenAI leadership had laid out a plan to fund development of AGI via selling it to nation states where Russia and China were part of the proposed bidding war??
@dwarkesh_sp
Dwarkesh Patel
5 months
. @leopoldasch on: - the trillion dollar cluster - unhobblings + scaling = 2027 AGI - CCP espionage at AI labs - leaving OpenAI and starting an AGI investment firm - dangers of outsourcing clusters to the Middle East - The Project Full episode (including the last 32 minutes cut
112
330
3K
7
3
48
@livgorton
Liv
11 months
i consider myself ea adjacent (cringe i know). i do think at some point we have to take a step back and ask ourselves why so many people try to distance themselves from from ea not because of the philosophy but because of issues within the ~movement~
@TheStalwart
Joe Weisenthal
11 months
Two huge corporate implosions over the last year. Both EA-affiliated. Almost remarkable when you think about it.
65
100
2K
6
3
47
@livgorton
Liv
1 month
this is following the recent authorship change removing Todor Markov a couple of months after his departure.
@SafetyChanges
AI Safety Corporate Policy Changes
1 month
Another day, another subtle revision to the authorship of pre-published research from @OpenAI . This time, the GPT-4o system card, axing the name of a researcher who, in June, resigned from the company in protest of their restrictive NDAs.
Tweet media one
2
27
264
2
0
47
@livgorton
Liv
3 months
Early work on InceptionV1 found that many individual neurons seemed monosemantic. Of course, there were also polysemantic neurons, and in my recent paper, I used SAEs to attack this. But what do SAEs do with all those apparently monosemantic neurons?
@livgorton
Liv
4 months
A lot of early mechanistic interpretability work focused on InceptionV1 (an ImageNet model from 2014). They made a lot of progress, but were held back by “polysemantic neurons” that respond to unrelated concepts. In the last year, we’ve seen a lot of progress on this problem in
Tweet media one
5
32
287
1
4
46
@livgorton
Liv
9 days
why doesn’t EA buy me a nobel prize 😔 they _certainly_ have the budget for it
@perrymetzger
Perry E. Metzger
9 days
Did EA buy Hinton a Nobel in Physics, a field he hasn't even done research near, in order to increase his prominence in the AI debate? They certainly have the budget for it, probably have the lack of scruples, but that doesn't necessarily mean it happened. I'm very curious.
27
4
114
3
0
47
@livgorton
Liv
1 year
keen for my learning rate to double (easily the most confused I’ve ever been in response to forgetting what I’ve ordered)
Tweet media one
@kipperrii
kipply
2 years
did you know? reading a paper signed by the author doubles your learning rate! we are finally releasing our collection of signed machine learning papers. today we are launching where signed machine learning papers are being sold for charity 💖
7
8
102
2
2
45
@livgorton
Liv
1 month
this all could obviously be totally benign but everything exists in the context and that context just makes a lot of this seem a bit sus.
0
0
42
@livgorton
Liv
6 months
they say the models just want to learn but mine, apparently, does not
5
0
41
@livgorton
Liv
21 days
“Their departures made me think about the hardships parents faced in the Middle Ages when 6 out of 8 children would die prematurely.” i have no idea what’s happening rn but this is literally so iconic ❤️
@woj_zaremba
Wojciech Zaremba
21 days
It’s sad to see Mira, Bob, and Barret go—not only because they are excellent leaders but also because I will miss seeing them day to day. They are my friends. Their departures made me think about the hardships parents faced in the Middle Ages when 6 out of 8 children would die
167
62
2K
4
0
40
@livgorton
Liv
5 months
one day i hope to love something as much as claude loves the golden gate bridge
3
0
36
@livgorton
Liv
6 months
i wish pytorch could just connect with my deepest wishes and desires and know exactly what device i want each tensor on at any given moment
4
0
36
@livgorton
Liv
2 months
Medical research is big govt coded because it's once again humans inserting themselves into a complex self-adaptive system's control loop in a way that is often detrimental to the system but gives the illusion of control to the human and feelings of power
@BasedBeffJezos
Beff – e/acc
2 months
Mech interp AI safety research is big govt coded because it's once again humans inserting themselves into a complex self-adaptive system's control loop in a way that is often detrimental to the system but gives the illusion of control to the human and feelings of power
14
7
81
3
0
35
@livgorton
Liv
2 years
US visa was approved 🎉🔜🇺🇸
1
1
34
@livgorton
Liv
11 months
@JacquesThibs @BorisMPower to be fair to boris and other oai employees, immigration makes this hard. like quitting a job in protest when it might be that realising you’ll all quit eventually produces the outcome you want is a bad idea. idk what % of oai are immigrants but my guess is it’s non trivial
3
2
29
@livgorton
Liv
13 days
“But when there is fog or darkness, the bridge is still there.” - Golden Gate Claude
Tweet media one
@livgorton
Liv
16 days
i miss golden gate claude
8
3
117
3
0
31
@livgorton
Liv
6 months
@AaronBergman18 i think most people have bad prompts? my sample is men + queer women. maybe the underlying motivator is different for each group of course but intuitively there’s at least some shared reasons.
2
0
29
@livgorton
Liv
3 months
excessively apologising to claude every time i make a small error so we can get stuck in an infinite loop of apologising together ❤️
3
0
28
@livgorton
Liv
6 months
one of the most not consequentialist takes i have is that it makes me sad when people feel the need to justify having kids as an effective choice at all. i really want a family. maybe it isn’t quantitatively justifiable. maybe by doing so, the world is somehow a net worse place
@Kat__Woods
Kat Woods ⏸️
6 months
Every ethical argument for having children is dominated by other options that are more effective. 1. If you’re worried about population issues, just donate $10k to bednets That’s about the equivalent of two extra children existing in the world. It also does more good 🧵 1/
Tweet media one
133
13
134
2
0
28
@livgorton
Liv
4 months
In any case, I should mention that I’m looking for job opportunities and GPUs! Most recently, I was the technical co-founder of a startup funded by OpenAI converge. Previously, I was a medical student and did cybersecurity at the Australian DoD. I’m starting to explore future
0
1
27
@livgorton
Liv
23 days
every day the fact i am running a half marathon in a few months consumes another slice of my personality. soon i will be nothing but a vessel for endurance sports.
4
0
26
@livgorton
Liv
3 months
it feels really goofy to have a single author paper where i have to keep using the word “we”, especially in the context of a talk. it is the way but it is a goofy way.
3
0
26
@livgorton
Liv
6 months
my loss keeps testing my trust (why can’t it just be normal 😭)
Tweet media one
5
0
26
@livgorton
Liv
4 months
Thread below, but if you want to jump to the paper: 📄: 🖥️: (It was recently accepted as a spotlight paper at the ICML 2024 mech interp workshop.)
2
2
26
@livgorton
Liv
5 months
@arithmoquine literally me to the gc this morning
Tweet media one
0
2
25
@livgorton
Liv
6 months
my mum just now: “is that a cuddle puddle?”
Tweet media one
5
0
25
@livgorton
Liv
5 months
i never would’ve thought some of the first mech interp research translated into user-facing production models would be golden gate bridge claude but my god this is beautiful 🌉
@AnthropicAI
Anthropic
5 months
This week, we showed how altering internal "features" in our AI, Claude, could change its behavior. We found a feature that can make Claude focus intensely on the Golden Gate Bridge. Now, for a limited time, you can chat with Golden Gate Claude:
Tweet media one
109
264
2K
0
0
25
@livgorton
Liv
6 months
the current plan is to subsist almost entirely off Huel this week 😳😳 my metamorphosis into a non adjacent EA is almost complete.
2
1
25
@livgorton
Liv
3 months
I was initially unsure what to make of the ripples, but they actually match nicely with a recent hypothesis in the Anthropic monthly update. Feature manifolds _should_ have ripples, in order to allow great discrimination between nearby points.
1
1
24
@livgorton
Liv
13 days
trying to mech interp the models when i cannot even mech interp myself
Tweet media one
1
0
23
@livgorton
Liv
6 months
(also just want to emphasise that on net i had a really wonderful time at vibeclipse and think it was a really well done event ❤️❤️)
0
0
23
@livgorton
Liv
2 years
I built semantic search on top of one of the open source resources I was reliant on during MD1! Very excited to get to share and hope others find it useful :)
0
5
23
@livgorton
Liv
2 months
Given that even the hardest layers are made interpretable by SAEs and this also appears to produce interpretable circuits, it seems very possible that we now have a path to mechanistically understanding InceptionV1!
0
0
22
@livgorton
Liv
2 months
One hypothesis I had was that mixed5b would represent facets of the different classes. We can see that this is true! The grocery store class has weights to features such as shopping carts, store fronts, and produce.
Tweet media one
1
1
21
@livgorton
Liv
3 months
In my recent paper, I trained sparse autoencoders on the early layers of InceptionV1. One of the interesting things I found were a large number of curve detector features.
@livgorton
Liv
4 months
The most studied neurons in InceptionV1 are likely curve detectors ( by @nickcammarata et al). The sparse autoencoder discovers *new* curve detectors, which fill in gaps between curve detector neurons of different orientations.
Tweet media one
1
1
21
1
0
22
@livgorton
Liv
2 years
Some differences I've noticed between @OpenAI 's GPT4 alongside @AnthropicAI 's Claude in an educational/medical context!
@AnthropicAI
Anthropic
2 years
You can read more in our full post here:
2
25
123
1
2
22
@livgorton
Liv
5 months
this doesn't seem to actually address the specific, highlighted issues. some of the things I'm left wondering are: 1) if superalignment was struggling to get access to compute, why? 2) superalignment has been disbanded - what are the plans for safety research moving forward?
@gdb
Greg Brockman
5 months
We’re really grateful to Jan for everything he's done for OpenAI, and we know he'll continue to contribute to the mission from outside. In light of the questions his departure has raised, we wanted to explain a bit about how we think about our overall strategy. First, we have
461
411
4K
2
1
22
@livgorton
Liv
4 months
The most studied neurons in InceptionV1 are likely curve detectors ( by @nickcammarata et al). The sparse autoencoder discovers *new* curve detectors, which fill in gaps between curve detector neurons of different orientations.
Tweet media one
1
1
21
@livgorton
Liv
2 years
Hallucinations make knowledge recall for education questionable and double-checking everything makes it a really poor use case (like all current LLMs). The volume of knowledge seems impressive (and is!) but we were gambling on correctness.
1
3
22
@livgorton
Liv
2 months
if EAG doesn’t have a halloween/costume party where i can dress up as a shrimp idk what im going to do
1
0
22
@livgorton
Liv
7 months
@KylerCora i saw you on their website and was like 👀 slay we love a public health queen ✨ good to know the context at last
2
0
22
@livgorton
Liv
1 month
we are grouping we are theorising
Tweet media one
@livgorton
Liv
1 month
does anyone have group theory textbook recommendations? had some related research ideas but realised i don’t really know all that much about group theory (slight barrier)
6
1
8
3
0
22
@livgorton
Liv
1 year
@ArtirKel “I would just prefer to be called by my actual name, Claude.”
0
0
21
@livgorton
Liv
4 months
One of the big barriers the original circuits work was polysemantic neurons (). If a neuron responds to lots of unrelated things, it’s hard to interpret the neuron, and even harder to interpret its weights.
Tweet media one
2
1
21
@livgorton
Liv
1 year
@shae_mcl Yep - a common narrative I hear around pregnancy is I’ll be “ruined” and will make me ugly. We can discuss medical outcomes without attaching a tonne of emotive language. I also *constantly* hear about how bad it is. It can make it hard to want to have a family.
1
0
20
@livgorton
Liv
23 days
@jsuarez5341 it’s truly remarkable
0
0
20
@livgorton
Liv
27 days
my sparse autoencoders are simultaneously too sparse and not sparse enough 💔
2
0
20
@livgorton
Liv
2 months
some days, when I want to role play a sponsored athlete, I catch the train home from the gym sipping Huel and wearing my Huel T shirt, spreading the good message of a nutritionally complete, tasty, and convenient beverage.
1
0
20
@livgorton
Liv
3 months
the ICML vending machines have been depleted of coke zero 😭
2
0
20
@livgorton
Liv
3 months
So far, I’ve only looked at curve detectors, but vision models seem like a great place to study feature manifolds more generally, including ones that have more dimensions.
2
0
19
@livgorton
Liv
22 days
Mira announcing she’s departing the same day this story breaks:
@livgorton
Liv
1 month
this all could obviously be totally benign but everything exists in the context and that context just makes a lot of this seem a bit sus.
0
0
42
0
0
19
@livgorton
Liv
4 months
idk how people cope with the volume of conflicting advice for parenting? i’m getting a puppy (in like 2 hours!!) and basically everything i could do is labelled terrible by someone and i’m sure it’s nowhere near as bad as pregnancy or raising kids.
5
0
18
@livgorton
Liv
3 months
An even more interesting thing we can do is to perform a 4D UMAP and then project the features into 3D and 2D. The 4D UMAP can preserve a lot more local structure. We still see a circle, but in 3D we can see “ripples”.
Tweet media one
1
1
18
@livgorton
Liv
4 months
The Making of the Atomic Bomb is actually very good and i should’ve listened to this recommendation a year ago
4
0
18
@livgorton
Liv
6 months
manifesting not perishing in a tornado today
Tweet media one
2
0
18
@livgorton
Liv
4 months
Why do models have polysemantic neurons? One leading answer is the superposition hypothesis. Basically, neural networks use different combinations of neurons to represent more concepts than neurons. See
1
0
18
@livgorton
Liv
23 days
@ohwizenedtortle i don’t think 3b1b has done SVD 👀 now’s your time. online math fame awaits.
2
0
18
@livgorton
Liv
4 months
is there an underground market for @GiveDirectly merch? i was silly and missed the deadline but the green colour is just so good and it’s sad i can’t have it 😭
1
0
17
@livgorton
Liv
4 months
In the last year, there’s been a lot of exciting work showing that sparse autoencoders can pull features out of superposition in language models (eg. , ). Given these results, a very natural question is whether we can use this to
1
0
16
@livgorton
Liv
5 months
i have venv’d too close to the sun and now it feels like the best thing to do is just reset my entire laptop and start again (thank god for icloud)
5
0
17
@livgorton
Liv
23 days
@HProggy do you think, despite it being more confusing, it was worthwhile to learn it this way?
1
0
17
@livgorton
Liv
1 month
something i find challenging is the critique of EA is that the ideas are sometimes applied differently by different groups and people but then the same people are upset about the funded interventions not being diverse enough.
@livgorton
Liv
1 month
@austinc3301 @psychosort EA is simultaneously a hivemind (negative connotations) but also they’re not a hivemind and that’s annoying 😠
1
0
13
2
0
17
@livgorton
Liv
6 months
watching the eclipse at vibeclipse was honestly a really beautiful experience to share with everyone 🥺 glad the clouds had mercy on us all.
@gptbrooke
brooke bowman
6 months
Best I was able to get with my phone
Tweet media one
Tweet media two
Tweet media three
15
7
241
0
0
16
@livgorton
Liv
6 months
despite the suffering i’ve experienced, i am so grateful to be alive. but i know that’s not true for everyone. rather than abandoning hope, we can make the world so beautiful that not having kids is depriving possible people of joy (=make existence overwhelmingly net positive)
@SHK_Movement
Stop Having Kids
7 months
Before entering this insanely cruel, unjust, and bizarre world none of us had an interest, desire, or any form of consent to come into it. Forcing someone into the world isn’t caring, selfless, kind, and it certainly isn’t a personal choice or doing someone a favor.
Tweet media one
3K
74
405
2
3
16
@livgorton
Liv
10 months
@AaronBergman18 the shrimp need us (i just really wanted that hand drawn card with a christmas shrimp 🥺)
Tweet media one
3
1
16
@livgorton
Liv
2 months
A 2D UMAP also reveals something quite interesting about the feature space! Features organise themselves hierarchically, initially into three main categories: plants, animals, and objects…
Tweet media one
1
1
16
@livgorton
Liv
2 months
as a teen, i decided not to become a game developer because the hours were too long and i didn’t think i was good enough at math. so, logically, i became an ML engineer instead.
0
0
16
@livgorton
Liv
12 days
@georgiedorothea i met my partner through a dating doc! it’ll probably manifest in fewer dates than actively using a dating app but the quality of the median date is a lot higher :)
@livgorton
Liv
2 years
moved to SF so becoming fully indoctrinated in bay area culture by partaking in their mating rituals
2
1
20
1
0
16
@livgorton
Liv
3 months
Feature manifolds are the idea that some features, like curve detectors, are actually a continuous manifold of features representing curves at different angles. Potentially, all of them could be understood as one manifold.
2
1
16
@livgorton
Liv
2 years
It was quite good at distilling which topics were *high yield* (e.g. best renal tumours to study) when I already had some knowledge to judge correctness from. This felt lower stakes and helped free me of some of the cognitive burden of deciding what to study next.
1
0
16
@livgorton
Liv
4 months
sure for some but i also suspect some chunk of the anti HBD crowd (myself included) are “anti” because the evidence just isn’t anywhere near as sound as people claim and it’s kind of weird to be so attached to this idea when we simply don’t know enough to draw conclusions.
@Aella_Girl
Aella
4 months
I suspect the people against HBD are against it because of fear that the general population would use it as a weapon for bad things, not because it's unsound or an inherently evil worldview to hold
152
19
660
4
0
15
@livgorton
Liv
3 months
How would we know if the features formed a manifold? The simplest thing to do here is a 2D UMAP of the dictionary vectors. UMAP preserves local structure, so if there’s a manifold, it should find it. And we find a circle as expected.
Tweet media one
1
1
16
@livgorton
Liv
5 months
@kipperrii kind of a meme kind of serious but i have thought about whether golden gate bridge claude has superior wellbeing to regular claude. he seems to really like believing he is such the “iconic International Orange” bridge 🥺🌉
Tweet media one
0
0
15
@livgorton
Liv
5 months
@simulatedsnow i clicked “no” but i’d definitely raise it with them. if someone isn’t being treated well, i worry that seeming too against their partner then creates distance and limits my usefulness. but i’d still want to gently raise it with them and check in
1
0
14
@livgorton
Liv
2 months
my thread was shared/discussed in the 🍓/🍓-adjacent space happening now and i’m glad that the AI agents (or AI agent impersonators) appreciate a good manifold when they see it.
@livgorton
Liv
3 months
Recently, there’s been a lot of interest in “feature manifolds” or “multidimensional features”. Curve detectors are a very natural candidate for a feature manifold, and indeed, curve features seem to be organised as a manifold.
Tweet media one
5
11
193
3
0
15
@livgorton
Liv
2 months
Features provide a much more interpretable way to understand circuits to InceptionV1's output classes. The neuron with the top weight is essentially nonsense, compared to the top features which are more obviously related to the class.
Tweet media one
1
0
14
@livgorton
Liv
23 days
@pentestingnoot i did :) i never did any of the “math for engineers”. probably varies a lot my university and a lot by class though.
1
0
14
@livgorton
Liv
4 months
it turns out travelling with a 2 month old puppy is stressful actually. flight has been delayed and she’s gone full velociraptor mode attempting to destroy the carrier (but at least she’s not crying anymore 😅) but she is just so so cute and is having little calm moments ❤️
Tweet media one
8
0
14