kushal thaman Profile Banner
kushal thaman Profile
kushal thaman

@kushal1t

Followers
604
Following
399
Media
55
Statuses
559

member of technical staff @stanfordnlp

Palo Alto, CA
Joined May 2021
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@kushal1t
kushal thaman
9 months
Excited to share the first paper of my undergrad: "Incidental Polysemanticity" ! We present a second, "incidental" origin story of polysemanticity in task-optimized DNNs. Done in collaboration with @vclecomte @tmychow @RylanSchaeffer @sanmikoyejo (1/n)
Tweet media one
@RylanSchaeffer
Rylan Schaeffer
9 months
Interested in mech interp of representations that deep networks learn? If so, check out a new type of polysemanticity we call: 💥💥Incidental Polysemanticity 💥💥 Led by @vclecomte @kushal1t @tmychow @sanmikoyejo at @stai_research @StanfordAILab 1/N
3
20
104
9
7
35
@kushal1t
kushal thaman
5 months
Tweet media one
2
1
84
@kushal1t
kushal thaman
1 month
just wait until someone tries to shift the goalpost by saying “algebra, nt and geo are just <insert symbol manipulation/small search space/pattern matching>, combinatorics requires *real* reasoning”
5
1
52
@kushal1t
kushal thaman
7 months
@debarghya_das @paulfchristiano , IMO 2008 silver, founder of ARC @paraga , IPhO gold 2001, ex-CEO of Twitter
1
2
44
@kushal1t
kushal thaman
7 months
@PradyuPrasad i read newton’s principia in high school (the original text was a bit difficult to parse, so i went with ‘Principia for the Common Reader’ by Chandrasekhar). i’ve been told it’s a rather popular choice, and many new textbooks (morin, kleppner) borrow heavily from principia still.
1
0
38
@kushal1t
kushal thaman
5 months
sam altman is speaking at a discussion event at the Stanford AI Club () this Wednesday (April 24)! it'll be a small-group event so we can have a high-context discussion. fill out the form to attend!
@stanfordaiclub
Stanford AI Club
5 months
Stanford AI Club is inviting @sama for a small group discussion this Wednesday 4/24! Stanford affiliates apply at
Tweet media one
1
3
16
1
2
33
@kushal1t
kushal thaman
4 months
man long live whoever wrote einops
Tweet media one
2
0
33
@kushal1t
kushal thaman
2 months
thankful to be part of the cohort, and grateful for @tylercowen 's support!
@tylercowen
tylercowen
2 months
Emergent Ventures winners, 35th cohort:
0
3
45
5
0
32
@kushal1t
kushal thaman
1 year
Applications to Atlas Fellowship 2023 are out! If you are a curious high school student who wants to understand how the world works -- and change it -- apply at . I did the program last year, and it was an amazing experience!
2
13
27
@kushal1t
kushal thaman
6 months
anon i know you like to be edgy but can you write a better abstract than this one?
Tweet media one
1
2
26
@kushal1t
kushal thaman
5 months
had fun asking Sam lots of questions today!
@stanfordaiclub
Stanford AI Club
5 months
We hosted @sama for our first speaker event! In a small group of ~20 people, we talked about everything from progress in scaling, the compute supply chain, agents, safety, takeoff speeds, timelines and more. 🧵 (1/n)
Tweet media one
Tweet media two
1
3
28
0
0
20
@kushal1t
kushal thaman
4 months
i’ll be in Vienna for #ICLR2024 starting this Monday! if you'd be down to chat about in-context learning, training dynamics, LLMs, architectural variants, interpretability, alignment/safety, or something else, please reach out — i’d love to talk to you!
0
0
18
@kushal1t
kushal thaman
1 year
@PradyuPrasad guy who worries about GPU centralization but can't do math bro you're worried about the wrong Jensen's inequality
2
1
13
@kushal1t
kushal thaman
4 months
"are you paying attention?" yes, $20/month actually.
Tweet media one
0
0
15
@kushal1t
kushal thaman
5 months
what question would you like Sam Altman to answer?
10
0
15
@kushal1t
kushal thaman
4 months
now that 4o is free, many people will want to cancel the $20/month subscription... unless OpenAI releases a new, frontier model that you can only access with Plus?
3
0
15
@kushal1t
kushal thaman
8 months
we’re so back :)
Tweet media one
1
0
15
@kushal1t
kushal thaman
8 months
sutton and barto all the way
Tweet media one
5
1
14
@kushal1t
kushal thaman
10 months
incredible talk by @ashVaswani at Stanford on the Transformer, how it’s evolved over the years, and exciting future research directions!
Tweet media one
0
0
13
@kushal1t
kushal thaman
7 months
the latest draft of our paper is now out, with many new results!
@kushal1t
kushal thaman
9 months
Excited to share the first paper of my undergrad: "Incidental Polysemanticity" ! We present a second, "incidental" origin story of polysemanticity in task-optimized DNNs. Done in collaboration with @vclecomte @tmychow @RylanSchaeffer @sanmikoyejo (1/n)
Tweet media one
9
7
35
2
3
12
@kushal1t
kushal thaman
9 months
@fionaleng_ contributing to sample size, a majority (and plausibly >80%) of my friends from olympiad camps a few years ago that stated they wanted to be physicists are now interning/ft-ing at js/hrt/2sig etc.
0
0
11
@kushal1t
kushal thaman
4 months
@ilyasut maybe the real AGIs were the friends we made along the way 🥹
0
4
11
@kushal1t
kushal thaman
1 month
apparently this has become a hot take to say out loud, but the last 45 minutes of "Oppenheimer" (2023) is peak cinema and is absolutely critical to the movie and the life of J. Robert Oppenheimer.
1
0
11
@kushal1t
kushal thaman
1 month
tbc i do believe combinatorics problems (p3/p6) are typically the hardest (followed by NT), and IMO 2024 was unusual in that the hardest problem was an algebra one. simply pointing out that this type of goalpost shifting is the perfect example of nebulous thinking in ai progress.
0
0
10
@kushal1t
kushal thaman
5 months
also, if you aren't around, i'll be asking sam questions, so DM me if you have questions you'd like sam to answer!
2
0
10
@kushal1t
kushal thaman
4 months
@prafdhar has been an inspiration since high school for me; he won golds at IAO, IPhO and IMO for India back in the day, and went to work on really cool projects on VAEs, GANs, DDPMs, consistency models and much more after undergrad for OpenAI.
@prafdhar
Prafulla Dhariwal
4 months
GPT-4o (o for “omni”) is the first model to come out of the omni team, OpenAI’s first natively fully multimodal model. This launch was a huge org-wide effort, but I’d like to give a shout out to a few of my awesome team members who made this magical model even possible!
137
344
4K
0
0
10
@kushal1t
kushal thaman
6 months
did anyone here actually get reasonable ICML reviews?
2
0
10
@kushal1t
kushal thaman
7 months
@debarghya_das @paulfchristiano @paraga yeah he was India’s third ever gold medalist iirc
1
0
10
@kushal1t
kushal thaman
3 months
@RylanSchaeffer courtesy of @NeelNanda5 and other TransformerLens contributors ❤️
1
0
10
@kushal1t
kushal thaman
1 month
@wtgowers
Timothy Gowers @wtgowers
1 month
It's not clear what the implications of this are for mathematical research. Since the method used was very general, there would seem to be no obvious obstacle to adapting it to other mathematical domains, apart perhaps from insufficient data.
3
8
166
1
0
9
@kushal1t
kushal thaman
1 year
correct me if i'm wrong, but it doesn't seem like OpenAI has been doing (publishing) work in deep RL for years now. the last work they published in the subfield dates back to late 2019 (safe exploration strategies in deep RL). why did they abandon it?
2
1
9
@kushal1t
kushal thaman
1 year
@michael_nielsen + Purcell on electricity and magnetism + Griffiths on quantum mechanics + Schroeder on thermal physics + Blundell on thermal physics + Campbell's biology + Chandrasekhar on Newton's Principia for the common reader + Barto and Sutton on RL
0
0
7
@kushal1t
kushal thaman
4 months
Tweet media one
1
0
8
@kushal1t
kushal thaman
7 months
the best ui is no ui
3
0
9
@kushal1t
kushal thaman
2 months
@deedydas @AravSrinivas “just” is a bit of a stretch; that’s 4 more points (out of the total possible 7) for every single indian contestant on the hardest problem of IMO day 1…
1
0
8
@kushal1t
kushal thaman
1 year
tfw you're sitting in a bay area cafe and can overhear two people intensely debating the use of tanh v/s leaky ReLU as activation functions...
0
0
6
@kushal1t
kushal thaman
9 months
@_akhaliq china’s going to solve alignment and continue to cook while america’s on winter vacation
0
0
6
@kushal1t
kushal thaman
8 months
what could the 'zero to one industry-defining' product be? agents that actually work seems like a plausible story.
@newhouseb
Ben Newhouse
8 months
I'm hiring at OpenAI. We're building what (I think) could be an industry-defining zero to one product that leverages the latest and greatest from our upcoming models. If you like product, deep technical challenges, and writing the future: my DMs are open!
59
132
2K
1
0
7
@kushal1t
kushal thaman
10 months
What just happened?????
@OpenAI
OpenAI
10 months
OpenAI announces leadership transition
4K
4K
14K
2
0
8
@kushal1t
kushal thaman
11 months
@ronawang just flew into boston today :) the fall here is magical
Tweet media one
1
0
7
@kushal1t
kushal thaman
4 months
@__Charlie_G if you can i’d be interested in seeing how loss curve fares against a gpt-2 that uses SwiGLUs
0
0
6
@kushal1t
kushal thaman
1 year
having a lot of fun attending the Stanford Center for AI Safety Annual Meeting!
Tweet media one
0
0
7
@kushal1t
kushal thaman
10 months
another november, another sam scandal 😞
1
1
7
@kushal1t
kushal thaman
7 months
@BlancheMinerva @PradyuPrasad Yep. So just to clarify, I read it on my own mostly out of interest, and I meant that it's a popular choice amongst quite a few olympiad people I know. I agree with Pradyu's main claim, I just think it's at least not universally true for physics.
0
0
7
@kushal1t
kushal thaman
9 months
@vclecomte @tmychow @RylanSchaeffer @sanmikoyejo We then proceed to provide a theoretical model of how this happens, analyzing the learning dynamics by studying the interaction between the forces of sparsity (namely feature benefit, interference and regularization), and run various experiments to confirm our results! (4/n)
Tweet media one
Tweet media two
Tweet media three
1
0
7
@kushal1t
kushal thaman
9 months
@vclecomte @tmychow @RylanSchaeffer @sanmikoyejo We present a non-mutually exclusive origin story of polysemanticity, showing that it can arise incidentally, even when there are ample neurons to represent all features — due to a single neuron correlating to unrelated features at the start (e.g. by random initialization)! (3/n)
1
0
7
@kushal1t
kushal thaman
9 months
@vclecomte @tmychow @RylanSchaeffer @sanmikoyejo Polysemanticity is a property in DNNs that arises when individual neurons represent a mixture of unrelated features, making them hard to interpret. The classic story is that NNs learn more features than there are neurons, causing superposition. (2/n)
1
0
7
@kushal1t
kushal thaman
4 months
it’s a usual saturday morning in Wien, but there’s a sense of premonition in the air at Messe Wien, the @iclr_conf venue. maybe people are starting to feel the AGI…
Tweet media one
0
0
7
@kushal1t
kushal thaman
10 months
@visakanv also sama:
0
0
6
@kushal1t
kushal thaman
8 months
10/10 recommend planning a bird's eye view of 2024 (monthly/quarterly/semesterly). it's way more useful than i realized. i actually envision myself accomplishing the things i want, make useful estimates of their timelines & get a better sense of the year overall.
1
0
6
@kushal1t
kushal thaman
7 months
if americans found out about need aware admissions for international students and then did it for everyone there would be riots in the streets
Tweet media one
1
0
7
@kushal1t
kushal thaman
5 months
this was dwarkesh’s best episode yet!
@dwarkesh_sp
Dwarkesh Patel
5 months
Had so much fun chatting with my friends @TrentonBricken and @_sholtodouglas . No way to summarize it, except: This is the best context dump out there on how LLMs are trained, what capabilities they're likely to soon have, and what exactly is going on inside them. You would be
39
125
1K
1
0
7
@kushal1t
kushal thaman
8 months
live note-taking my history class using whisper + gpt4👍
1
0
6
@kushal1t
kushal thaman
8 months
@BogdanIonutCir2 @EvanHub @_robertkirk @gwern and others interested: what open questions around path dependence (and mode connectivity etc.) in model training are you most interested in seeing progress on next? why? (interested in running experiments & hunting useful directions)
3
0
5
@kushal1t
kushal thaman
7 months
E[number of times i hear someone say the phrase "or something" this weekend] = ?
2
0
5
@kushal1t
kushal thaman
7 months
the ai revolution of 2024 begins today 🧧
2
0
5
@kushal1t
kushal thaman
8 months
@nickcammarata have you written more about how one should *get into* meditation at greater length? wondering if there are tweet threads from you lying around, or if you’d be able to recommend a reading.
1
0
4
@kushal1t
kushal thaman
8 months
fun fact: the majority (and maybe as much as 80%!) of FLOPs are spent in the MLP layers.
@stephenroller
Stephen Roller
2 years
@srush_nlp I find people unfamiliar with scaling are shocked by this:
Tweet media one
17
22
260
0
0
6
@kushal1t
kushal thaman
4 months
RIP 😢
@SuryaGanguli
Surya Ganguli
4 months
So sorry to see Jim pass. He impacted so many lives so positively including my own. I remember marveling at his Chern-Simons theory when studying pure math, marveling at the workings of his hedge fund when visiting it, and benefiting from his vision in funding neuroscience!
2
4
66
1
0
6
@kushal1t
kushal thaman
9 months
til @geoffreyhinton is the great-great-grandson of George Boole
0
0
6
@kushal1t
kushal thaman
9 months
Tweet media one
0
1
6
@kushal1t
kushal thaman
6 months
mildly annoying that the github API hasn't changed 'account password:' to 'personal access token:' despite changing the functionality
0
0
6
@kushal1t
kushal thaman
10 months
ok but who's working on a truthfulQ*A* benchmark for an AGI eval (me, i just trademarked the name)
0
0
6
@kushal1t
kushal thaman
9 months
@weidai11 twitter doesn't deserve you Wei Dai 🫶
0
0
6
@kushal1t
kushal thaman
6 months
type of guy who uses claude to understand shannon coding
1
0
6
@kushal1t
kushal thaman
7 months
@thecaptain_nemo it's the 2024 version of "human hands have 5 digits, no?"
0
0
6
@kushal1t
kushal thaman
1 year
this is exactly the sort of acknowledgement i like to see on ML papers
Tweet media one
1
0
6
@kushal1t
kushal thaman
1 month
i suppose people went to see a "The Social Network" but on early-to-mid 20th century physicists, and were disappointed they didn't get that. that is a great but separate movie that needs to be made.
0
0
4
@kushal1t
kushal thaman
9 months
munnar is epic and y'all should definitely visit! here are some of my shots there from a few months ago:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
@ayaneshu_
Ayaneshu
9 months
I was finally able to see these breathtaking views of Munnar, Kerala towards the end of 2023🤌🏻
Tweet media one
Tweet media two
Tweet media three
Tweet media four
41
55
1K
1
0
5
@kushal1t
kushal thaman
1 year
A really simple but nice fact in linear algebra that I'd previously paid little attention to is that the theorem "any two bases for a finite-dimensional vector space have the same cardinality" needn't hold true if your scalars come from a ring instead of a field.
0
0
4
@kushal1t
kushal thaman
3 months
@PradyuPrasad post spicy takes on the (more or less settled) results
1
0
4
@kushal1t
kushal thaman
9 months
@RylanSchaeffer @vclecomte @tmychow @sanmikoyejo yes, thanks Rylan! it was a very rewarding experience. 🙂
0
0
5
@kushal1t
kushal thaman
7 months
0
0
5
@kushal1t
kushal thaman
5 months
Tweet media one
0
0
5
@kushal1t
kushal thaman
3 months
@kolelee_ great thread! my intuition (doesn't have a great track record like yours) is that new york has priced in that AI is important, but not that it'll soon be *truly* transformative.
3
0
5
@kushal1t
kushal thaman
10 months
twitter spaces are full of ppl trying to wrap their heads around the bellman equation bc of a rumor and its weirdly wholesome
0
0
5
@kushal1t
kushal thaman
6 months
sydney's back 👀
@venturetwins
Justine Moore
6 months
Okay yeah I think we can officially call it
Tweet media one
106
369
3K
0
1
5
@kushal1t
kushal thaman
1 year
unless you want to watch croppenheimer, make sure you get tickets in theatres with the 1570 (15-perf, 70mm) or the DLx (eg DL2) screens.
0
1
5
@kushal1t
kushal thaman
6 months
@aryaman2020 @jeffreygwang @ArthurConmy yep, by @AchyutaBot ! it was a neurips poster last year
0
0
5
@kushal1t
kushal thaman
9 months
@PradyuPrasad
Pradyumna
9 months
@goonwidow Alright you're treading on thin ice here
0
0
3
0
0
5
@kushal1t
kushal thaman
9 months
@vclecomte @tmychow @RylanSchaeffer @sanmikoyejo Lots of more interesting findings in the paper, indicating the ample scope & need for further research! If you have questions, contact me or the lead author @vclecomte ! 🙂
0
0
5
@kushal1t
kushal thaman
7 months
@character_ai had 3.8B visits in 2023
Tweet media one
3
0
5
@kushal1t
kushal thaman
2 months
@Jsevillamol i suppose it’s harder to say if this is still true. 4o mini alone serves 200B/day…
@sama
Sam Altman
2 months
GPT-4o mini launched 4 days ago. already processing more than 200B tokens per day! very happy to hear how much people are liking the new model.
570
535
8K
1
0
5
@kushal1t
kushal thaman
7 months
- non-rotationally symmetric noise breaks rotational symmetry and induces sparsity! (e.g. bipolar noise) - sparsity increases with l1/noise up to a certain point, then goes down - interplay between interference and the push for sparsity causes a different form of polysemanticity!
1
0
3
@kushal1t
kushal thaman
7 months
@uzpg_ its the optimal policy for my perceived reward function
1
0
5
@kushal1t
kushal thaman
9 months
rubber duck debugging has been surprisingly useful while running deep learning experiments
0
0
4
@kushal1t
kushal thaman
1 year
have been reading Cade Metz’s ‘Genius Makers’ and it mentions some really cool anecdotes from the first decade of the deep learning revolution!
0
0
5