kushal thaman @kushal1t profile

kushal thaman

@kushal1t

Followers

604

Following

399

Media

55

Statuses

559

member of technical staff @stanfordnlp

https://t.co/8VJKobb9xn

Palo Alto, CA

Joined May 2021

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Cheney • 502132 Tweets

Michigan • 179111 Tweets

England • 140983 Tweets

Ireland • 120349 Tweets

Lookman • 40429 Tweets

Grealish • 30130 Tweets

Arkansas • 27353 Tweets

Big House • 24969 Tweets

#MustafaKamalinAskerleriyiz • 23885 Tweets

Super Eagles • 20541 Tweets

PLANETARIO FURIOSO • 19499 Tweets

Lee Carsley • 18345 Tweets

Penn State • 15909 Tweets

Ann Arbor • 15182 Tweets

Ewers • 10758 Tweets

Pedro Almodóvar • 10640 Tweets

שבוע טוב

Natty

Jadyn Davis

Sark

Jamal Murray

Kansas State

Calleja

Longhorns

Larrivey

Wirtz

Stillwater

Sherrone Moore

K-State

Bowman

Hogs

Musiala

Mike Gundy

Ollie Gordon

OK State

Satterfield

Southgate

León de Oro

Kyle McCord

Syracuse

Sam Pittman

Orji

Davis Warren

Bowling Green

Oklahoma State

#HookEm

#افضل7_ترند_Oち558ち9281

#وش_طموحك

Tulane

#BaşkomutanErdoğan

Last Seen Profiles

@riyanmuscleV

@danzey53

@Mochi14283357

@talksincircless

@hcmurillo8

@KovuLoL

@Rinkumeena56

@KashMog89701

@bangyedam_0275

@mansourHarbi

@TotalEpl

@l6184dRhxRUtiJ

@taama81

@Sergio64573609

@pablomarinas_

@we_navis

@IBDPharmD

@afiez1993

@karttoday

@ExposeSeries

Pinned Tweet

kushal thaman

@kushal1t

9 months

Excited to share the first paper of my undergrad: "Incidental Polysemanticity" ! We present a second, "incidental" origin story of polysemanticity in task-optimized DNNs. Done in collaboration with @vclecomte @tmychow @RylanSchaeffer @sanmikoyejo (1/n)

Rylan Schaeffer

@RylanSchaeffer

9 months

Interested in mech interp of representations that deep networks learn? If so, check out a new type of polysemanticity we call: 💥💥Incidental Polysemanticity 💥💥 Led by @vclecomte @kushal1t @tmychow @sanmikoyejo at @stai_research @StanfordAILab 1/N

3

20

104

9

7

35

kushal thaman

@kushal1t

5 months

@axel7772_

2

1

84

kushal thaman

@kushal1t

1 month

just wait until someone tries to shift the goalpost by saying “algebra, nt and geo are just <insert symbol manipulation/small search space/pattern matching>, combinatorics requires *real* reasoning”

5

1

52

kushal thaman

@kushal1t

7 months

@debarghya_das @paulfchristiano , IMO 2008 silver, founder of ARC @paraga , IPhO gold 2001, ex-CEO of Twitter

1

2

44

kushal thaman

@kushal1t

7 months

@PradyuPrasad i read newton’s principia in high school (the original text was a bit difficult to parse, so i went with ‘Principia for the Common Reader’ by Chandrasekhar). i’ve been told it’s a rather popular choice, and many new textbooks (morin, kleppner) borrow heavily from principia still.

1

0

38

kushal thaman

@kushal1t

5 months

sam altman is speaking at a discussion event at the Stanford AI Club () this Wednesday (April 24)! it'll be a small-group event so we can have a high-context discussion. fill out the form to attend!

Stanford AI Club

Explore artificial intelligence with Stanford AI Club. Join us for projects, reading groups, speaker series and workshops in AI and ML.

aiclub.stanford.edu

Stanford AI Club

@stanfordaiclub

5 months

Stanford AI Club is inviting @sama for a small group discussion this Wednesday 4/24! Stanford affiliates apply at

1

3

16

1

2

33

kushal thaman

@kushal1t

4 months

man long live whoever wrote einops

2

0

33

kushal thaman

@kushal1t

2 months

thankful to be part of the cohort, and grateful for @tylercowen 's support!

tylercowen

@tylercowen

2 months

Emergent Ventures winners, 35th cohort:

0

3

45

5

0

32

kushal thaman

@kushal1t

1 year

Applications to Atlas Fellowship 2023 are out! If you are a curious high school student who wants to understand how the world works -- and change it -- apply at . I did the program last year, and it was an amazing experience!

Atlas Fellowship

The Atlas Fellowship is a $10,000 scholarship and free 11-day program for curious students who want to understand and change the world.

www.atlasfellowship.org

2

13

27

kushal thaman

@kushal1t

6 months

anon i know you like to be edgy but can you write a better abstract than this one?

1

2

26

kushal thaman

@kushal1t

5 months

had fun asking Sam lots of questions today!

Stanford AI Club

@stanfordaiclub

5 months

We hosted @sama for our first speaker event! In a small group of ~20 people, we talked about everything from progress in scaling, the compute supply chain, agents, safety, takeoff speeds, timelines and more. 🧵 (1/n)

1

3

28

0

20

kushal thaman

@kushal1t

4 months

i’ll be in Vienna for #ICLR2024 starting this Monday! if you'd be down to chat about in-context learning, training dynamics, LLMs, architectural variants, interpretability, alignment/safety, or something else, please reach out — i’d love to talk to you!

0

18

kushal thaman

@kushal1t

1 year

@PradyuPrasad guy who worries about GPU centralization but can't do math bro you're worried about the wrong Jensen's inequality

2

1

13

kushal thaman

@kushal1t

4 months

"are you paying attention?" yes, $20/month actually.

0

15

kushal thaman

@kushal1t

5 months

what question would you like Sam Altman to answer?

10

0

15

kushal thaman

@kushal1t

4 months

now that 4o is free, many people will want to cancel the $20/month subscription... unless OpenAI releases a new, frontier model that you can only access with Plus?

3

0

15

kushal thaman

@kushal1t

8 months

we’re so back :)

1

0

15

kushal thaman

@kushal1t

8 months

sutton and barto all the way

5

1

14

kushal thaman

@kushal1t

10 months

incredible talk by @ashVaswani at Stanford on the Transformer, how it’s evolved over the years, and exciting future research directions!

0

13

kushal thaman

@kushal1t

7 months

the latest draft of our paper is now out, with many new results!

What Causes Polysemanticity? An Alternative Origin Story of Mixed...

Polysemantic neurons -- neurons that activate for a set of unrelated features -- have been seen as a significant obstacle towards interpretability of task-optimized deep networks, with...

arxiv.org

kushal thaman

@kushal1t

9 months

Excited to share the first paper of my undergrad: "Incidental Polysemanticity" ! We present a second, "incidental" origin story of polysemanticity in task-optimized DNNs. Done in collaboration with @vclecomte @tmychow @RylanSchaeffer @sanmikoyejo (1/n)

9

7

35

2

3

12

kushal thaman

@kushal1t

7 months

@eshear @aniketvartak @djkgamc @HardcoreHistory violence has gone down, they said

How many animals are factory-farmed?

The majority of farm animals in the world are factory-farmed.

ourworldindata.org

3

0

11

kushal thaman

@kushal1t

9 months

@fionaleng_ contributing to sample size, a majority (and plausibly >80%) of my friends from olympiad camps a few years ago that stated they wanted to be physicists are now interning/ft-ing at js/hrt/2sig etc.

0

11

kushal thaman

@kushal1t

4 months

@ilyasut maybe the real AGIs were the friends we made along the way 🥹

0

4

11

kushal thaman

@kushal1t

1 month

apparently this has become a hot take to say out loud, but the last 45 minutes of "Oppenheimer" (2023) is peak cinema and is absolutely critical to the movie and the life of J. Robert Oppenheimer.

1

0

11

kushal thaman

@kushal1t

5 months

people who might be interested, for visibility: @aryaman2020 @itsandrewgao @ChengleiSi @rm_rafailov @robert_csordas @XiangLisaLi2 @RylanSchaeffer @AaryanSinghal4 @cannonkissane @kolelee_ @karansdalal @_Aaditya_Prasad @ayushh_agrawal

5

1

10

kushal thaman

@kushal1t

3 months

since ARC is back into the spotlight, may I also take this opportunity to remind people of another benchmark LLMs are quite poor at, by @ylecun and folks at Meta:

GAIA: a benchmark for General AI Assistants

We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities...

arxiv.org

1

10

kushal thaman

@kushal1t

1 month

tbc i do believe combinatorics problems (p3/p6) are typically the hardest (followed by NT), and IMO 2024 was unusual in that the hardest problem was an algebra one. simply pointing out that this type of goalpost shifting is the perfect example of nebulous thinking in ai progress.

0

10

kushal thaman

@kushal1t

5 months

also, if you aren't around, i'll be asking sam questions, so DM me if you have questions you'd like sam to answer!

2

0

10

kushal thaman

@kushal1t

4 months

@prafdhar has been an inspiration since high school for me; he won golds at IAO, IPhO and IMO for India back in the day, and went to work on really cool projects on VAEs, GANs, DDPMs, consistency models and much more after undergrad for OpenAI.

Prafulla Dhariwal

@prafdhar

4 months

GPT-4o (o for “omni”) is the first model to come out of the omni team, OpenAI’s first natively fully multimodal model. This launch was a huge org-wide effort, but I’d like to give a shout out to a few of my awesome team members who made this magical model even possible!

137

344

4K

0

10

kushal thaman

@kushal1t

6 months

did anyone here actually get reasonable ICML reviews?

2

0

10

kushal thaman

@kushal1t

7 months

@debarghya_das @paulfchristiano @paraga yeah he was India’s third ever gold medalist iirc

1

0

10

kushal thaman

@kushal1t

3 months

@RylanSchaeffer courtesy of @NeelNanda5 and other TransformerLens contributors ❤️

1

0

10

kushal thaman

@kushal1t

1 month

Timothy Gowers @wtgowers

@wtgowers

1 month

It's not clear what the implications of this are for mathematical research. Since the method used was very general, there would seem to be no obvious obstacle to adapting it to other mathematical domains, apart perhaps from insufficient data.

3

8

166

1

0

9

kushal thaman

@kushal1t

5 months

@rm_rafailov @rajammanabrolu my favorite papers by high schoolers are @rowankwang ’s and @AchyutaBot ’s , both excellent works.

Interpretability in the Wild: a Circuit for Indirect Object...

Research in mechanistic interpretability seeks to explain behaviors of machine learning models in terms of their internal components. However, most previous work either focuses on simple behaviors...

arxiv.org

0

9

kushal thaman

@kushal1t

1 year

correct me if i'm wrong, but it doesn't seem like OpenAI has been doing (publishing) work in deep RL for years now. the last work they published in the subfield dates back to late 2019 (safe exploration strategies in deep RL). why did they abandon it?

2

1

9

kushal thaman

@kushal1t

1 year

@michael_nielsen + Purcell on electricity and magnetism + Griffiths on quantum mechanics + Schroeder on thermal physics + Blundell on thermal physics + Campbell's biology + Chandrasekhar on Newton's Principia for the common reader + Barto and Sutton on RL

0

7

kushal thaman

@kushal1t

4 months

@jxmnop @iScienceLuvr

1

0

8

kushal thaman

@kushal1t

7 months

the best ui is no ui

3

0

9

kushal thaman

@kushal1t

5 months

@itsandrewgao @sama glad you had fun! for stanford ppl who want to participate in future events, join the stanford ai club! @stanfordaiclub

Stanford AI Club

Explore artificial intelligence with Stanford AI Club. Join us for projects, reading groups, speaker series and workshops in AI and ML.

aiclub.stanford.edu

0

9

kushal thaman

@kushal1t

2 months

@deedydas @AravSrinivas “just” is a bit of a stretch; that’s 4 more points (out of the total possible 7) for every single indian contestant on the hardest problem of IMO day 1…

1

0

8

kushal thaman

@kushal1t

1 year

tfw you're sitting in a bay area cafe and can overhear two people intensely debating the use of tanh v/s leaky ReLU as activation functions...

0

6

kushal thaman

@kushal1t

9 months

@_akhaliq china’s going to solve alignment and continue to cook while america’s on winter vacation

0

6

kushal thaman

@kushal1t

8 months

what could the 'zero to one industry-defining' product be? agents that actually work seems like a plausible story.

Ben Newhouse

@newhouseb

8 months

I'm hiring at OpenAI. We're building what (I think) could be an industry-defining zero to one product that leverages the latest and greatest from our upcoming models. If you like product, deep technical challenges, and writing the future: my DMs are open!

59

132

2K

1

0

7

kushal thaman

@kushal1t

10 months

What just happened?????

OpenAI

@OpenAI

10 months

OpenAI announces leadership transition

4K

14K

2

0

8

kushal thaman

@kushal1t

11 months

@ronawang just flew into boston today :) the fall here is magical

1

0

7

kushal thaman

@kushal1t

4 months

@__Charlie_G if you can i’d be interested in seeing how loss curve fares against a gpt-2 that uses SwiGLUs

0

6

kushal thaman

@kushal1t

1 year

having a lot of fun attending the Stanford Center for AI Safety Annual Meeting!

0

7

kushal thaman

@kushal1t

10 months

another november, another sam scandal 😞

1

7

kushal thaman

@kushal1t

7 months

@BlancheMinerva @PradyuPrasad Yep. So just to clarify, I read it on my own mostly out of interest, and I meant that it's a popular choice amongst quite a few olympiad people I know. I agree with Pradyu's main claim, I just think it's at least not universally true for physics.

0

7

kushal thaman

@kushal1t

9 months

@vclecomte @tmychow @RylanSchaeffer @sanmikoyejo We then proceed to provide a theoretical model of how this happens, analyzing the learning dynamics by studying the interaction between the forces of sparsity (namely feature benefit, interference and regularization), and run various experiments to confirm our results! (4/n)

1

0

7

kushal thaman

@kushal1t

9 months

@vclecomte @tmychow @RylanSchaeffer @sanmikoyejo We present a non-mutually exclusive origin story of polysemanticity, showing that it can arise incidentally, even when there are ample neurons to represent all features — due to a single neuron correlating to unrelated features at the start (e.g. by random initialization)! (3/n)

1

0

7

kushal thaman

@kushal1t

9 months

@vclecomte @tmychow @RylanSchaeffer @sanmikoyejo Polysemanticity is a property in DNNs that arises when individual neurons represent a mixture of unrelated features, making them hard to interpret. The classic story is that NNs learn more features than there are neurons, causing superposition. (2/n)

1

0

7

kushal thaman

@kushal1t

4 months

it’s a usual saturday morning in Wien, but there’s a sense of premonition in the air at Messe Wien, the @iclr_conf venue. maybe people are starting to feel the AGI…

0

7

kushal thaman

@kushal1t

10 months

@QualyThe Seems plausible it was dealing with the board about Tigris uncandidly: (e.g. by not communicating about his private discussions with OpenAI investors about Tigris)

Altman Sought Billions For Chip Venture Before OpenAI Ouster

In the weeks leading up to his shocking ouster from OpenAI, Sam Altman was actively working to raise billions from some of the world’s largest investors for a new chip venture, according to people...

www.bloomberg.com

0

7

kushal thaman

@kushal1t

10 months

@visakanv also sama:

0

6

kushal thaman

@kushal1t

8 months

10/10 recommend planning a bird's eye view of 2024 (monthly/quarterly/semesterly). it's way more useful than i realized. i actually envision myself accomplishing the things i want, make useful estimates of their timelines & get a better sense of the year overall.

1

0

6

kushal thaman

@kushal1t

7 months

if americans found out about need aware admissions for international students and then did it for everyone there would be riots in the streets

1

0

7

kushal thaman

@kushal1t

5 months

this was dwarkesh’s best episode yet!

Dwarkesh Patel

@dwarkesh_sp

5 months

Had so much fun chatting with my friends @TrentonBricken and @_sholtodouglas . No way to summarize it, except: This is the best context dump out there on how LLMs are trained, what capabilities they're likely to soon have, and what exactly is going on inside them. You would be

39

125

1K

1

0

7

kushal thaman

@kushal1t

8 months

live note-taking my history class using whisper + gpt4👍

1

0

6

kushal thaman

@kushal1t

8 months

@BogdanIonutCir2 @EvanHub @_robertkirk @gwern and others interested: what open questions around path dependence (and mode connectivity etc.) in model training are you most interested in seeing progress on next? why? (interested in running experiments & hunting useful directions)

3

0

5

kushal thaman

@kushal1t

7 months

E[number of times i hear someone say the phrase "or something" this weekend] = ?

2

0

5

kushal thaman

@kushal1t

7 months

the ai revolution of 2024 begins today 🧧

2

0

5

kushal thaman

@kushal1t

8 months

@nickcammarata have you written more about how one should *get into* meditation at greater length? wondering if there are tweet threads from you lying around, or if you’d be able to recommend a reading.

1

0

4

kushal thaman

@kushal1t

8 months

fun fact: the majority (and maybe as much as 80%!) of FLOPs are spent in the MLP layers.

Stephen Roller

@stephenroller

2 years

@srush_nlp I find people unfamiliar with scaling are shocked by this:

17

22

260

0

6

kushal thaman

@kushal1t

4 months

RIP 😢

Surya Ganguli

@SuryaGanguli

4 months

So sorry to see Jim pass. He impacted so many lives so positively including my own. I remember marveling at his Chern-Simons theory when studying pure math, marveling at the workings of his hedge fund when visiting it, and benefiting from his vision in funding neuroscience!

2

4

66

1

0

6

kushal thaman

@kushal1t

9 months

til @geoffreyhinton is the great-great-grandson of George Boole

0

6

kushal thaman

@kushal1t

9 months

@apples_jimmy @gdb yep

0

1

6

kushal thaman

@kushal1t

9 months

@jasondeanlee @tdietterich you could try !

ar5iv – Articles from arXiv.org as responsive HTML5 web documents

ar5iv offers a modern web view for arXiv's preprints. An open community resource, on a quest to a full collection of high-quality documents.

ar5iv.labs.arxiv.org

0

1

6

kushal thaman

@kushal1t

6 months

mildly annoying that the github API hasn't changed 'account password:' to 'personal access token:' despite changing the functionality

0

6

kushal thaman

@kushal1t

10 months

ok but who's working on a truthfulQ*A* benchmark for an AGI eval (me, i just trademarked the name)

0

6

kushal thaman

@kushal1t

9 months

@weidai11 twitter doesn't deserve you Wei Dai 🫶

0

6

kushal thaman

@kushal1t

6 months

type of guy who uses claude to understand shannon coding

1

0

6

kushal thaman

@kushal1t

7 months

@thecaptain_nemo it's the 2024 version of "human hands have 5 digits, no?"

0

6

kushal thaman

@kushal1t

1 year

this is exactly the sort of acknowledgement i like to see on ML papers

1

0

6

kushal thaman

@kushal1t

1 month

i suppose people went to see a "The Social Network" but on early-to-mid 20th century physicists, and were disappointed they didn't get that. that is a great but separate movie that needs to be made.

0

4

kushal thaman

@kushal1t

9 months

munnar is epic and y'all should definitely visit! here are some of my shots there from a few months ago:

Ayaneshu

@ayaneshu_

9 months

I was finally able to see these breathtaking views of Munnar, Kerala towards the end of 2023🤌🏻

41

55

1K

1

0

5

kushal thaman

@kushal1t

1 year

A really simple but nice fact in linear algebra that I'd previously paid little attention to is that the theorem "any two bases for a finite-dimensional vector space have the same cardinality" needn't hold true if your scalars come from a ring instead of a field.

0

4

kushal thaman

@kushal1t

3 months

@PradyuPrasad post spicy takes on the (more or less settled) results

1

0

4

kushal thaman

@kushal1t

4 months

@emilyzsh idk if elo will work for a v subjective, high dimensional space like books. i can get behind bayesian ratings tho? the best they have rn is

Goodreads Top 100 - Highest Rated Books on Goodreads with at least 10,000 Ratings (103 books)

103 books based on 942 votes: Words of Radiance by Brandon Sanderson, Harry Potter and the Deathly Hallows by J.K. Rowling, Crooked Kingdom by Leigh Bard...

www.goodreads.com

2

0

5

kushal thaman

@kushal1t

6 months

@tmychow manuscript preprint is available at:

What Causes Polysemanticity? An Alternative Origin Story of Mixed...

Polysemantic neurons -- neurons that activate for a set of unrelated features -- have been seen as a significant obstacle towards interpretability of task-optimized deep networks, with...

arxiv.org

0

5

kushal thaman

@kushal1t

9 months

@RylanSchaeffer @vclecomte @tmychow @sanmikoyejo yes, thanks Rylan! it was a very rewarding experience. 🙂

0

5

kushal thaman

@kushal1t

7 months

@karpathy @darshilistired we're so back

0

5

kushal thaman

@kushal1t

5 months

@g_leech_ bayesd

0

5

kushal thaman

@kushal1t

3 months

@kolelee_ great thread! my intuition (doesn't have a great track record like yours) is that new york has priced in that AI is important, but not that it'll soon be *truly* transformative.

3

0

5

kushal thaman

@kushal1t

10 months

twitter spaces are full of ppl trying to wrap their heads around the bellman equation bc of a rumor and its weirdly wholesome

0

5

kushal thaman

@kushal1t

6 months

sydney's back 👀

Justine Moore

@venturetwins

6 months

Okay yeah I think we can officially call it

106

369

3K

0

1

5

kushal thaman

@kushal1t

1 year

unless you want to watch croppenheimer, make sure you get tickets in theatres with the 1570 (15-perf, 70mm) or the DLx (eg DL2) screens.

0

1

5

kushal thaman

@kushal1t

6 months

@aryaman2020 @jeffreygwang @ArthurConmy yep, by @AchyutaBot ! it was a neurips poster last year

0

5

kushal thaman

@kushal1t

9 months

@PradyuPrasad

Pradyumna

@PradyuPrasad

9 months

@goonwidow Alright you're treading on thin ice here

0

3

0

5

kushal thaman

@kushal1t

9 months

@vclecomte @tmychow @RylanSchaeffer @sanmikoyejo Lots of more interesting findings in the paper, indicating the ample scope & need for further research! If you have questions, contact me or the lead author @vclecomte ! 🙂

0

5

kushal thaman

@kushal1t

7 months

@character_ai had 3.8B visits in 2023

3

0

5

kushal thaman

@kushal1t

2 months

@Jsevillamol i suppose it’s harder to say if this is still true. 4o mini alone serves 200B/day…

Sam Altman

@sama

2 months

GPT-4o mini launched 4 days ago. already processing more than 200B tokens per day! very happy to hear how much people are liking the new model.

570

535

8K

1

0

5

kushal thaman

@kushal1t

7 months

- non-rotationally symmetric noise breaks rotational symmetry and induces sparsity! (e.g. bipolar noise) - sparsity increases with l1/noise up to a certain point, then goes down - interplay between interference and the push for sparsity causes a different form of polysemanticity!

1

0

3

kushal thaman

@kushal1t

7 months

@uzpg_ its the optimal policy for my perceived reward function

1

0

5

kushal thaman

@kushal1t

9 months

rubber duck debugging has been surprisingly useful while running deep learning experiments

0

4

kushal thaman

@kushal1t

1 year

have been reading Cade Metz’s ‘Genius Makers’ and it mentions some really cool anecdotes from the first decade of the deep learning revolution!

0

5