Alexander Long @_AlexanderLong profile

Alexander Long

@_AlexanderLong

Followers

839

Following

395

Media

5

Statuses

113

Protocol Learning @pluralisai PhD in ML and prev. Applied Scientist at Amazon

Joined July 2023

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Taylor • 382867 Tweets

#TENxFireworkMeetandGreet • 215189 Tweets

TEN X MAYBELLINE EVENT • 202689 Tweets

Yankees • 96501 Tweets

期日前投票 • 45229 Tweets

APT NUMBER ONE • 26131 Tweets

ML IN SINGAPORE • 23162 Tweets

#ガルアワ2024AW • 18593 Tweets

TIM MCGRAW • 14443 Tweets

#NOAHnoHAKOBUNE • 13296 Tweets

#SarfarazKhan • 10660 Tweets

ダノンブレット

パラレルヴィジョン

Piovi

フロート

川田騎手

アルナシーム

Ollie Gordon

TOP20

タイムトゥヘヴン

Discordの通知音

ナイトメア

ウェリントン

ワープスピード

Sukseskan Pesta Demokrasi

ランウェイ

マイシンフォニー

Caesarea

川田さん

Clay Holmes

フォーライ

エーステ

Oklahoma State

東京11R

乗り替わり

ゴンバデカーブース

ファール

開拓者の部屋

土砂降り

マサムネ

ピコチャンブラック

ZNN x TKN

マスカレードボール

レッドモンレーヴ

富士ステークス

#人気投票ガチャ

ソウルラッシュ

ロジリオン

ジュンブロッサム

セリフォス

Last Seen Profiles

@hadeejasouffle

@sakemeetingatre

@thenewmoon830

@NatDLSA

@Haikesii

@MogollaIntgral

@JedediahBila

@AmyDavidsen

@Mrwd_als3ayeb

@Gerstaresports

@blueprintbnks

@Sapphire_4825

@Mrwd_als3ayeb

@kentiss34

@Gerstaresports

@berenger_jerome

@alexafair29

@SuthernPro

@Rolsesx

Pinned Tweet

Alexander Long

@_AlexanderLong

3 months

Decentralized training is much closer to reality than broadly thought. The popular narrative here is wrong. Full detailed article below

14

31

161

Alexander Long

@_AlexanderLong

2 months

So crazy to me that 6 months ago literally everyone serious AI researcher I talked to told me stuff like this wasn't feasible. Decentralized Training is gonna work - we're still at DDP stage and there's a long way to go for giant model runs, but no doubt where the vector of

Nous Research

@NousResearch

2 months

What if you could use all the computing power in the world to train a shared, open source AI model? Preliminary report: Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of

235

594

3K

10

25

283

Alexander Long

@_AlexanderLong

6 days

The decentralized training race is now underway which I expect to be the major theme of AI research in 2025, create a new field and kick off a megacycle. What happens once you accept decentralized training is feasible? Pluralis’s answer is: Protocol Models. 1/9

Pluralis Research

@Pluralis__

6 days

Article 2: Protocol Learning, Protocol Models and the Great Convergence

1

5

39

6

19

80

Alexander Long

@_AlexanderLong

26 days

I'm completely convinced decentralized training is about to define a mega cycle. There's gonna be massive, ubiquitous real world utility via a convergence of two of the most individually deep technical fields ever... so much of the legit crypto work of the last decade is

9

8

62

Alexander Long

@_AlexanderLong

1 month

"The safest number of ASIs is 0. The least safe number is 1. Our odds get better the more there are." The typical response here is that we are witnessing commoditization at the foundation model layer and so everything will be fine but think for a second what that actually

6

2

30

Alexander Long

@_AlexanderLong

1 month

Feels like the wind changed direction a few weeks ago. Gensyn don't get enough credit for being so early on this stuff imo. When it's common knowledge and everyone's saying oh yeah we always knew this stuff would work, should remember how contrarian it was at the time.

Jeff Amico

@_jamico

1 month

🚨 NEW REPORT GPT @home : Why the Future of Training is Decentralized Can we train a large AI model over the world’s edge devices? Increasingly the answer appears to be yes. New report on decentralized training - why it matters, recent breakthroughs, and the challenges ahead.

13

37

163

1

5

27

Alexander Long

@_AlexanderLong

15 days

@DimitrisPapail they should add this to the list of reasons to desk reject

0

1

21

Alexander Long

@_AlexanderLong

3 months

Significant details and background on all these points here. Pluralis ( @pluralisai ) is a research lab working solely on this - we'll have more to say soon.

Decentralized Training Looms

Collaborative Training of foundation models is closer to actualization than broadly understood. The popular view that low bandwidth node-to-node connections render this infeasible is incorrect.

www.pluralisresearch.com

2

0

20

Alexander Long

@_AlexanderLong

3 months

But the main point decentralized training can assemble significantly larger computational power than centralized actors, and hence if scaling continues , it will also produce the best models. It doesn’t matter if training is more expensive. 2/2

2

0

19

Alexander Long

@_AlexanderLong

6 days

Summary: It's important to first understand open source AI today is completely dependent on megacorps releasing strong base models. Collaborative training has never got close to the required scale for foundation model training. And it likely won’t, because the compute required

2

1

20

Alexander Long

@_AlexanderLong

3 months

Myth 3: It will always cost more, and that makes it pointless. Reality: Not clear. Can aggregate low cost, low capacity power sources, there’s no need for cooling, you don’t require high utilization etc. 1/2

1

0

16

Alexander Long

@_AlexanderLong

3 months

Myth 2: Small capacity devices cannot produce foundation-scale models. Need H100’s, B100’s etc. Reality: just flat out not true. Also the efficiency of consumer devices is very competitive.

1

0

15

Alexander Long

@_AlexanderLong

3 months

Myth 1: Low bandwidth interconnects make training too slow. Reality: If you apply FSDP or zero-3 or similar unaltered to a swarm setup of course it’s slow. The question is can you adapt these methods to work well with low node-node bandwidth. You can.

3

0

15

Alexander Long

@_AlexanderLong

3 months

Myth 4: The swarm can never get big enough. Reality: The quantity of compute that an be assembled in a decentralized training is significantly beyond what is achievable by any single actor. Happily, there is lots of intermediate value creation along this path.

1

0

14

Alexander Long

@_AlexanderLong

1 month

Great conversation. My view on why this so important is very simple; if you cannot create the models within protocols, you cannot enforce true ownership or control.

The Index Podcast

@theindexshow

1 month

🔥 Code meets governance! On @theindexshow , @afkehaya & @AlexanderJLong , Founder @PluralisAI , break down how decentralized training is transforming AI ownership and governance. Dive into how decentralized models are reshaping #AI from the ground up! 👉🎧

1

5

93

0

1

12

Alexander Long

@_AlexanderLong

2 months

@toptickcrypto Feels like the end game to me. Might be biggest convergence of two previously unrelated, individually super deep fields ever.

1

0

12

Alexander Long

@_AlexanderLong

3 months

@PluralisAI Thanks to @jbrukh @rishisthinking @ai @AntonvdH @sgould_au for helping make the article significantly better

2

0

12

Alexander Long

@_AlexanderLong

1 month

Has it occurred to anyone that if inference compute requirements blow out due to search (i.e. strawberry or models like strawberry work) and you're waiting 20s for a response, latency of communication between nodes completely stops mattering?

2

0

11

Alexander Long

@_AlexanderLong

6 days

Many thanks to @PonderingDurian @jbrukh @Shaughnessy119 @CannnGurel @rishisthinking for reviewing drafts of this article.

1

12

Alexander Long

@_AlexanderLong

2 months

One of the best articles of the year. "Given current projections, a (centrally controlled) distributed training network could accommodate a demand of 2 to 45 GW". PoW mining already hit ~20GW.

Can AI Scaling Continue Through 2030?

We investigate the scalability of AI training runs. We identify electric power, chip manufacturing, data and latency as constraints. We conclude that 2e29 FLOP training runs will likely be feasible...

epochai.org

0

1

10

Alexander Long

@_AlexanderLong

9 days

As soon as you replace the word 'safety' with 'control' in your head the whole discussion suddenly makes a lot more sense. Couldn't be more aligned with Alex on this. We must have a way to create the base models that's not constrained to the oligopoly.

Alex Cheema - e/acc

@ac_crypto

9 days

Big AI has an incentive to spin a sci-fi narrative to push regulations to secure their advantage + raise more money. Let's focus on the real, tangible risks in front of us, mostly those downstream of AI oligopoly.

2

1

16

0

9

Alexander Long

@_AlexanderLong

1 month

Very p l e a s a n t when your acting on a fringe, contrarian belief, very early. Means everyone who's brains work kinda similar and came to same conclusions all end up finding each other very easily.

Miss Polygon 😈 🦇🔊

@0xsachi

1 month

Reasons to be bullish decentralized training: @PrimeIntellect @NousResearch @PluralisAI Are all working on pushing the boundaries of decentralized training

4

0

26

0

9

Alexander Long

@_AlexanderLong

2 months

I refuse to believe whatever is in Yerba is only caffeine. I take a sip of that stuff and it's full god is in heaven and we are his children vibes.

kache

@yacineMTB

2 months

argentinian software developers are so cracked can't believe they invented yerba mate just to code better

222

2K

19K

0

8

Alexander Long

@_AlexanderLong

2 months

Opensource AI makes no sense in it's current form. You have a critical dependency on at least one actor freely releasing the result of a training run that costs millions of dollars. People slapped the name 'opensource' on this process and started acting like it was sustainable

JJ

@JosephJacks_

2 months

Eric Schmidt says @MistralAI (where he’s an investor) will need to release their third model as closed source because they simply cannot afford to keep open sourcing their models!! IMHO, fiat markets are the wrong representation of value for AI.. Bittensor is the optimal

11

27

189

0

8

Alexander Long

@_AlexanderLong

1 month

@delphi_labs The great convergence

0

7

Alexander Long

@_AlexanderLong

1 month

Training == Pluralis count on me to retweet this almost constantly next few years. @caseykcaruso

0

7

Alexander Long

@_AlexanderLong

2 months

@Richarddd102 The ideas behind truebit going to be very relevant.

1

2

6

Alexander Long

@_AlexanderLong

2 months

@BasedBeffJezos Data parallel, same as diloco. We might have something to say about model parallel soon though

1

0

6

Alexander Long

@_AlexanderLong

1 month

@mo_baioumy @chainyoda @ilblackdragon @EMostaque @ac_crypto @mraltantutar @fenbielding @jasonjzhao @realDanielShorr still funny to me the group is small enough basically everyone can get tagged in a single tweet

1

0

6

Alexander Long

@_AlexanderLong

23 days

@ac_crypto "hacker community" lol

1

0

5

Alexander Long

@_AlexanderLong

26 days

@Ronangmi Crypto about to find it's purpose is exactly how I feel

1

0

5

Alexander Long

@_AlexanderLong

3 months

@PluralisAI @jbrukh @rishisthinking @ai @AntonvdH @sgould_au and massive thanks to @jeremyphoward for some great early discussions around this too

0

5

Alexander Long

@_AlexanderLong

1 month

@0xsachi @PrimeIntellect @NousResearch @PluralisAI suits me perfectly fine for everyone to continue being bearish next year or so

0

5

Alexander Long

@_AlexanderLong

1 month

great article by @albertwenger

0

4

Alexander Long

@_AlexanderLong

1 month

@toptickcrypto Great summary

1

0

3

Alexander Long

@_AlexanderLong

29 days

I think you can quite accurately proxy the effectiveness of a countries government via the retail cost of energy. for the last 20 years almost the entire west completely lost sight of this. Probably one of things that makes me so optimistic about future is

Foundations

Why Britain has stagnated

ukfoundations.co

2

0

3

Alexander Long

@_AlexanderLong

1 month

@tszzl That language evolved to be a very good compressed representation of the things that are important to us.

0

3

Alexander Long

@_AlexanderLong

3 months

@afkehaya @PluralisAI @jbrukh @rishisthinking @ai @AntonvdH @sgould_au @theindexshow lets do it!

0

3

Alexander Long

@_AlexanderLong

4 months

Current default path is this and the output will be culturally influenced. Imagine every book, article, and video you encounter growing up reflecting the same world view. Seems extremely underrated to me as a major risk.

Justine Moore

@venturetwins

4 months

People who don’t have kids or older relatives (the Siri demo) have no idea how powerful voice mode is. It think it’s going to be huge - in ChatGPT and other products - very shortly.

62

66

703

0

1

3

Alexander Long

@_AlexanderLong

6 days

@chainyoda @jbrukh @Pluralis__

0

2

Alexander Long

@_AlexanderLong

26 days

@0xredJ The only other time I've ever felt this way was in 2015 when I was studying EE, learned about deep reinforcement learning and decided to completely alter my trajectory and do a PhD in ML.

0

2

Alexander Long

@_AlexanderLong

26 days

@markowifk No one has got model parrellel or the rematerialization training approaches working. Both need to be done. And no one is even vaguely thing about incentivization correctly.

0

2

Alexander Long

@_AlexanderLong

2 months

@alz_zyd_ Could double your static breathold in that time ez. Can now go spearfish or surf in big waves and all it took was autistically holding your breath at your desk for like 15 mins a day.

0

2

Alexander Long

@_AlexanderLong

4 months

@Plinz blows my mind anyone listens to him. Took a strong public stance on a default case which was 99.9% likely and the 0.1% outcome happened. Hard to be more wrong.

2

0

2

Alexander Long

@_AlexanderLong

2 months

@fenbielding Like I said first time we met I have no idea how you saw this so early. Still blows my mind. And then you went out and actually did it when there was literally no one else 🫡

0

2

Alexander Long

@_AlexanderLong

1 month

@0xPrismatic Thanks for adding the training writeup there! Still feel like so unknown

1

0

2

Alexander Long

@_AlexanderLong

4 months

@fchollet @RyanPGreenblatt @dwarkesh_sp François you're much smarter than me but isn't this the point? If we have systems that can do arbritary symbolic reasoning, and we have another system that can learn to use these systems... what are we arguing about? Why does the symbolic reasoning have to be inside the model?

0

2

Alexander Long

@_AlexanderLong

28 days

One of my Labmates did 3 years, got nowhere and restarted at another uni. That guy won the uni medal in undergrad so not a question of ability. Another did >6 and never graduated. I think the reason it's so distressing is people are typically on very strong trajectories going

Richard Hanania

@RichardHanania

30 days

Strong evidence showing that getting a PhD is extremely bad for your mental health. A new paper uses Swedish medical records and matches them to the full population of PhD students for which the authors could get gender and birth year data from 2006 to 2017. After some exclusion

201

1K

8K

0

2

Alexander Long

@_AlexanderLong

2 months

“I want to stand as close to the edge as I can without going over. Out on the edge you see all kinds of things you can't see from the center.”

0

2

Alexander Long

@_AlexanderLong

3 months

@jonathanykh yes!

Decentralized Training Looms

Collaborative Training of foundation models is closer to actualization than broadly understood. The popular view that low bandwidth node-to-node connections render this infeasible is incorrect.

www.pluralisresearch.com

1

0

2

Alexander Long

@_AlexanderLong

26 days

@0xsachi

Alexander Long

@_AlexanderLong

3 months

Decentralized training is much closer to reality than broadly thought. The popular narrative here is wrong. Full detailed article below

14

31

161

1

0

2

Alexander Long

@_AlexanderLong

23 days

@goodalexander @IridiumEagle meet you at the Ecuador Jungle house?

1

0

2

Alexander Long

@_AlexanderLong

1 month

Vibe towards AI about to be fear, anger and resentment pretty soon imo. Doesn't seem like that's how most people modelling it out.

Mckay Wrigley

@mckaywrigley

1 month

We’re getting multiple new models by Q1 2025 that will wow people like GPT-4 did. Includes a major agent product that’ll be marketed as a personal assistant. Ongoing talks about how much will be available via APIs. People don’t realize what the AI labs have cooking right now.

74

99

2K

0

2

Alexander Long

@_AlexanderLong

3 months

@Eito_Miyamura @Google @SemiAnalysis_ @demishassabis Completely agree. All you have to assume is that at some point SSL on video works and it becomes theirs to lose.

0

1

Alexander Long

@_AlexanderLong

1 month

@Altimor Slippery slope isn't a fallacy its how things actually go mate

1

0

1

Alexander Long

@_AlexanderLong

1 month

@bidhanxyz Always seemed to me like adderall is equiv of jumping on tren the first day at the gym. Also almost all the really really smart people I know minimize stimulant use, like they don't even drink coffee.

1

0

1

Alexander Long

@_AlexanderLong

4 months

@bidhanxyz the bootloader

0

1

Alexander Long

@_AlexanderLong

3 months

@Ar_Douillard Probably that decentralized training can work

0

1

Alexander Long

@_AlexanderLong

2 months

@ac_crypto +1 on the yerba

0

1

Alexander Long

@_AlexanderLong

2 months

@samuel_spitz its a loop not a line

0

1

Alexander Long

@_AlexanderLong

2 months

@fenbielding At the start of the year everything was "you mean federated learning?"... Yeah no mate

0

1

Alexander Long

@_AlexanderLong

1 month

@JasonYanowitz @urbit @jbrukh If scaling laws continue and models get very good, massive multi-data center training is happening and the compute is cross-border, there's a vague scenario where private companies can genuinely rival states. Wallacecorp basically. What's he think of this. Also has his opinion of

1

0

1

Alexander Long

@_AlexanderLong

3 months

@JosephJacks_ @opentensor @AIWayfinder @akashnet_

0

1

Alexander Long

@_AlexanderLong

1 month

@Ar_Douillard @MatPagliardini @PierreAblin @GrangierDavid Imagine if decetralized/mostly local optimizers actually ended up being better

0

1

Alexander Long

@_AlexanderLong

3 months

@563defi @ekklesiarch_ 🔔ding ding ding🔔Everything in that post above is volunteer setting where you assume good actors. The post makes the argument training is feasible in that setting. We haven't said anything else yet.

1

0

1

Alexander Long

@_AlexanderLong

5 months

@ImotVoksim @y0b1byte Thing that's always confused me about this explanation is an n-ball 'looks' very spiky at high n, n cube doesn't

0

1

Alexander Long

@_AlexanderLong

1 month

@mo_baioumy @chainyoda @ilblackdragon @EMostaque @ac_crypto @mraltantutar @fenbielding @jasonjzhao @realDanielShorr it's looking good

0

1

Alexander Long

@_AlexanderLong

6 days

@Ar_Douillard citation counts about to get crazy

0

1

Alexander Long

@_AlexanderLong

6 days

@mo_baioumy He even said async🤯that's a pretty detailed technical point. How does a guy at that level have that level of depth.. insane.

1

0

1

Alexander Long

@_AlexanderLong

6 days

@563defi @Pluralis__ main difference is in Protocol Learning the model is sharded up to prevent any one actor ever receiving the full model weights and hence standing up lower cost inf outside the protocol which removes incentive to contribute to training.

1

0

1