Alexander Long Profile Banner
Alexander Long Profile
Alexander Long

@_AlexanderLong

Followers
839
Following
395
Media
5
Statuses
113

Protocol Learning @pluralisai PhD in ML and prev. Applied Scientist at Amazon

Joined July 2023
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@_AlexanderLong
Alexander Long
3 months
Decentralized training is much closer to reality than broadly thought. The popular narrative here is wrong. Full detailed article below
Tweet media one
14
31
161
@_AlexanderLong
Alexander Long
2 months
So crazy to me that 6 months ago literally everyone serious AI researcher I talked to told me stuff like this wasn't feasible. Decentralized Training is gonna work - we're still at DDP stage and there's a long way to go for giant model runs, but no doubt where the vector of
@NousResearch
Nous Research
2 months
What if you could use all the computing power in the world to train a shared, open source AI model? Preliminary report: Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of
Tweet media one
235
594
3K
10
25
283
@_AlexanderLong
Alexander Long
6 days
The decentralized training race is now underway which I expect to be the major theme of AI research in 2025, create a new field and kick off a megacycle. What happens once you accept decentralized training is feasible? Pluralis’s answer is: Protocol Models. 1/9
@Pluralis__
Pluralis Research
6 days
Article 2: Protocol Learning, Protocol Models and the Great Convergence
1
5
39
6
19
80
@_AlexanderLong
Alexander Long
26 days
I'm completely convinced decentralized training is about to define a mega cycle. There's gonna be massive, ubiquitous real world utility via a convergence of two of the most individually deep technical fields ever... so much of the legit crypto work of the last decade is
9
8
62
@_AlexanderLong
Alexander Long
1 month
"The safest number of ASIs is 0. The least safe number is 1. Our odds get better the more there are." The typical response here is that we are witnessing commoditization at the foundation model layer and so everything will be fine but think for a second what that actually
6
2
30
@_AlexanderLong
Alexander Long
1 month
Feels like the wind changed direction a few weeks ago. Gensyn don't get enough credit for being so early on this stuff imo. When it's common knowledge and everyone's saying oh yeah we always knew this stuff would work, should remember how contrarian it was at the time.
@_jamico
Jeff Amico
1 month
🚨 NEW REPORT GPT @home : Why the Future of Training is Decentralized Can we train a large AI model over the world’s edge devices? Increasingly the answer appears to be yes. New report on decentralized training - why it matters, recent breakthroughs, and the challenges ahead.
Tweet media one
13
37
163
1
5
27
@_AlexanderLong
Alexander Long
15 days
@DimitrisPapail they should add this to the list of reasons to desk reject
0
1
21
@_AlexanderLong
Alexander Long
3 months
But the main point decentralized training can assemble significantly larger computational power than centralized actors, and hence if scaling continues , it will also produce the best models. It doesn’t matter if training is more expensive. 2/2
2
0
19
@_AlexanderLong
Alexander Long
6 days
Summary: It's important to first understand open source AI today is completely dependent on megacorps releasing strong base models. Collaborative training has never got close to the required scale for foundation model training. And it likely won’t, because the compute required
2
1
20
@_AlexanderLong
Alexander Long
3 months
Myth 3: It will always cost more, and that makes it pointless. Reality: Not clear. Can aggregate low cost, low capacity power sources, there’s no need for cooling, you don’t require high utilization etc. 1/2
1
0
16
@_AlexanderLong
Alexander Long
3 months
Myth 2: Small capacity devices cannot produce foundation-scale models. Need H100’s, B100’s etc. Reality: just flat out not true. Also the efficiency of consumer devices is very competitive.
1
0
15
@_AlexanderLong
Alexander Long
3 months
Myth 1: Low bandwidth interconnects make training too slow. Reality: If you apply FSDP or zero-3 or similar unaltered to a swarm setup of course it’s slow. The question is can you adapt these methods to work well with low node-node bandwidth. You can.
3
0
15
@_AlexanderLong
Alexander Long
3 months
Myth 4: The swarm can never get big enough. Reality: The quantity of compute that an be assembled in a decentralized training is significantly beyond what is achievable by any single actor. Happily, there is lots of intermediate value creation along this path.
1
0
14
@_AlexanderLong
Alexander Long
1 month
Great conversation. My view on why this so important is very simple; if you cannot create the models within protocols, you cannot enforce true ownership or control.
@theindexshow
The Index Podcast
1 month
🔥 Code meets governance! On @theindexshow , @afkehaya & @AlexanderJLong , Founder @PluralisAI , break down how decentralized training is transforming AI ownership and governance. Dive into how decentralized models are reshaping #AI from the ground up! 👉🎧
Tweet media one
1
5
93
0
1
12
@_AlexanderLong
Alexander Long
2 months
@toptickcrypto Feels like the end game to me. Might be biggest convergence of two previously unrelated, individually super deep fields ever.
1
0
12
@_AlexanderLong
Alexander Long
3 months
@PluralisAI Thanks to @jbrukh @rishisthinking @ai @AntonvdH @sgould_au for helping make the article significantly better
2
0
12
@_AlexanderLong
Alexander Long
1 month
Has it occurred to anyone that if inference compute requirements blow out due to search (i.e. strawberry or models like strawberry work) and you're waiting 20s for a response, latency of communication between nodes completely stops mattering?
2
0
11
@_AlexanderLong
Alexander Long
6 days
1
1
12
@_AlexanderLong
Alexander Long
2 months
One of the best articles of the year. "Given current projections, a (centrally controlled) distributed training network could accommodate a demand of 2 to 45 GW". PoW mining already hit ~20GW.
0
1
10
@_AlexanderLong
Alexander Long
9 days
As soon as you replace the word 'safety' with 'control' in your head the whole discussion suddenly makes a lot more sense. Couldn't be more aligned with Alex on this. We must have a way to create the base models that's not constrained to the oligopoly.
@ac_crypto
Alex Cheema - e/acc
9 days
Big AI has an incentive to spin a sci-fi narrative to push regulations to secure their advantage + raise more money. Let's focus on the real, tangible risks in front of us, mostly those downstream of AI oligopoly.
2
1
16
0
0
9
@_AlexanderLong
Alexander Long
1 month
Very p l e a s a n t when your acting on a fringe, contrarian belief, very early. Means everyone who's brains work kinda similar and came to same conclusions all end up finding each other very easily.
@0xsachi
Miss Polygon 😈 🦇🔊
1 month
Reasons to be bullish decentralized training: @PrimeIntellect @NousResearch @PluralisAI Are all working on pushing the boundaries of decentralized training
Tweet media one
4
0
26
0
0
9
@_AlexanderLong
Alexander Long
2 months
I refuse to believe whatever is in Yerba is only caffeine. I take a sip of that stuff and it's full god is in heaven and we are his children vibes.
@yacineMTB
kache
2 months
argentinian software developers are so cracked can't believe they invented yerba mate just to code better
222
2K
19K
0
0
8
@_AlexanderLong
Alexander Long
2 months
Opensource AI makes no sense in it's current form. You have a critical dependency on at least one actor freely releasing the result of a training run that costs millions of dollars. People slapped the name 'opensource' on this process and started acting like it was sustainable
@JosephJacks_
JJ
2 months
Eric Schmidt says @MistralAI (where he’s an investor) will need to release their third model as closed source because they simply cannot afford to keep open sourcing their models!! IMHO, fiat markets are the wrong representation of value for AI.. Bittensor is the optimal
11
27
189
0
0
8
@_AlexanderLong
Alexander Long
1 month
@delphi_labs The great convergence
0
0
7
@_AlexanderLong
Alexander Long
1 month
Training == Pluralis count on me to retweet this almost constantly next few years. @caseykcaruso
Tweet media one
0
0
7
@_AlexanderLong
Alexander Long
2 months
@Richarddd102 The ideas behind truebit going to be very relevant.
1
2
6
@_AlexanderLong
Alexander Long
2 months
@BasedBeffJezos Data parallel, same as diloco. We might have something to say about model parallel soon though
1
0
6
@_AlexanderLong
Alexander Long
1 month
@mo_baioumy @chainyoda @ilblackdragon @EMostaque @ac_crypto @mraltantutar @fenbielding @jasonjzhao @realDanielShorr still funny to me the group is small enough basically everyone can get tagged in a single tweet
1
0
6
@_AlexanderLong
Alexander Long
23 days
@ac_crypto "hacker community" lol
1
0
5
@_AlexanderLong
Alexander Long
26 days
@Ronangmi Crypto about to find it's purpose is exactly how I feel
1
0
5
@_AlexanderLong
Alexander Long
3 months
@PluralisAI @jbrukh @rishisthinking @ai @AntonvdH @sgould_au and massive thanks to @jeremyphoward for some great early discussions around this too
0
0
5
@_AlexanderLong
Alexander Long
1 month
@0xsachi @PrimeIntellect @NousResearch @PluralisAI suits me perfectly fine for everyone to continue being bearish next year or so
0
0
5
@_AlexanderLong
Alexander Long
1 month
great article by @albertwenger
0
0
4
@_AlexanderLong
Alexander Long
1 month
@toptickcrypto Great summary
1
0
3
@_AlexanderLong
Alexander Long
29 days
I think you can quite accurately proxy the effectiveness of a countries government via the retail cost of energy. for the last 20 years almost the entire west completely lost sight of this. Probably one of things that makes me so optimistic about future is
2
0
3
@_AlexanderLong
Alexander Long
1 month
@tszzl That language evolved to be a very good compressed representation of the things that are important to us.
0
0
3
@_AlexanderLong
Alexander Long
4 months
Current default path is this and the output will be culturally influenced. Imagine every book, article, and video you encounter growing up reflecting the same world view. Seems extremely underrated to me as a major risk.
@venturetwins
Justine Moore
4 months
People who don’t have kids or older relatives (the Siri demo) have no idea how powerful voice mode is. It think it’s going to be huge - in ChatGPT and other products - very shortly.
Tweet media one
62
66
703
0
1
3
@_AlexanderLong
Alexander Long
26 days
@0xredJ The only other time I've ever felt this way was in 2015 when I was studying EE, learned about deep reinforcement learning and decided to completely alter my trajectory and do a PhD in ML.
0
0
2
@_AlexanderLong
Alexander Long
26 days
@markowifk No one has got model parrellel or the rematerialization training approaches working. Both need to be done. And no one is even vaguely thing about incentivization correctly.
0
0
2
@_AlexanderLong
Alexander Long
2 months
@alz_zyd_ Could double your static breathold in that time ez. Can now go spearfish or surf in big waves and all it took was autistically holding your breath at your desk for like 15 mins a day.
0
0
2
@_AlexanderLong
Alexander Long
4 months
@Plinz blows my mind anyone listens to him. Took a strong public stance on a default case which was 99.9% likely and the 0.1% outcome happened. Hard to be more wrong.
2
0
2
@_AlexanderLong
Alexander Long
2 months
@fenbielding Like I said first time we met I have no idea how you saw this so early. Still blows my mind. And then you went out and actually did it when there was literally no one else 🫡
0
0
2
@_AlexanderLong
Alexander Long
1 month
@0xPrismatic Thanks for adding the training writeup there! Still feel like so unknown
1
0
2
@_AlexanderLong
Alexander Long
4 months
@fchollet @RyanPGreenblatt @dwarkesh_sp François you're much smarter than me but isn't this the point? If we have systems that can do arbritary symbolic reasoning, and we have another system that can learn to use these systems... what are we arguing about? Why does the symbolic reasoning have to be inside the model?
0
0
2
@_AlexanderLong
Alexander Long
28 days
One of my Labmates did 3 years, got nowhere and restarted at another uni. That guy won the uni medal in undergrad so not a question of ability. Another did >6 and never graduated. I think the reason it's so distressing is people are typically on very strong trajectories going
@RichardHanania
Richard Hanania
30 days
Strong evidence showing that getting a PhD is extremely bad for your mental health. A new paper uses Swedish medical records and matches them to the full population of PhD students for which the authors could get gender and birth year data from 2006 to 2017. After some exclusion
Tweet media one
Tweet media two
201
1K
8K
0
0
2
@_AlexanderLong
Alexander Long
2 months
“I want to stand as close to the edge as I can without going over. Out on the edge you see all kinds of things you can't see from the center.”
Tweet media one
0
0
2
@_AlexanderLong
Alexander Long
26 days
@_AlexanderLong
Alexander Long
3 months
Decentralized training is much closer to reality than broadly thought. The popular narrative here is wrong. Full detailed article below
Tweet media one
14
31
161
1
0
2
@_AlexanderLong
Alexander Long
23 days
@goodalexander @IridiumEagle meet you at the Ecuador Jungle house?
1
0
2
@_AlexanderLong
Alexander Long
1 month
Vibe towards AI about to be fear, anger and resentment pretty soon imo. Doesn't seem like that's how most people modelling it out.
@mckaywrigley
Mckay Wrigley
1 month
We’re getting multiple new models by Q1 2025 that will wow people like GPT-4 did. Includes a major agent product that’ll be marketed as a personal assistant. Ongoing talks about how much will be available via APIs. People don’t realize what the AI labs have cooking right now.
74
99
2K
0
0
2
@_AlexanderLong
Alexander Long
3 months
@Eito_Miyamura @Google @SemiAnalysis_ @demishassabis Completely agree. All you have to assume is that at some point SSL on video works and it becomes theirs to lose.
0
0
1
@_AlexanderLong
Alexander Long
1 month
@Altimor Slippery slope isn't a fallacy its how things actually go mate
1
0
1
@_AlexanderLong
Alexander Long
1 month
@bidhanxyz Always seemed to me like adderall is equiv of jumping on tren the first day at the gym. Also almost all the really really smart people I know minimize stimulant use, like they don't even drink coffee.
1
0
1
@_AlexanderLong
Alexander Long
4 months
@bidhanxyz the bootloader
0
0
1
@_AlexanderLong
Alexander Long
3 months
@Ar_Douillard Probably that decentralized training can work
0
0
1
@_AlexanderLong
Alexander Long
2 months
@ac_crypto +1 on the yerba
0
0
1
@_AlexanderLong
Alexander Long
2 months
@samuel_spitz its a loop not a line
0
0
1
@_AlexanderLong
Alexander Long
2 months
@fenbielding At the start of the year everything was "you mean federated learning?"... Yeah no mate
0
0
1
@_AlexanderLong
Alexander Long
1 month
@JasonYanowitz @urbit @jbrukh If scaling laws continue and models get very good, massive multi-data center training is happening and the compute is cross-border, there's a vague scenario where private companies can genuinely rival states. Wallacecorp basically. What's he think of this. Also has his opinion of
1
0
1
@_AlexanderLong
Alexander Long
1 month
@Ar_Douillard @MatPagliardini @PierreAblin @GrangierDavid Imagine if decetralized/mostly local optimizers actually ended up being better
0
0
1
@_AlexanderLong
Alexander Long
3 months
@563defi @ekklesiarch_ 🔔ding ding ding🔔Everything in that post above is volunteer setting where you assume good actors. The post makes the argument training is feasible in that setting. We haven't said anything else yet.
1
0
1
@_AlexanderLong
Alexander Long
5 months
@ImotVoksim @y0b1byte Thing that's always confused me about this explanation is an n-ball 'looks' very spiky at high n, n cube doesn't
0
0
1
@_AlexanderLong
Alexander Long
6 days
@Ar_Douillard citation counts about to get crazy
0
0
1
@_AlexanderLong
Alexander Long
6 days
@mo_baioumy He even said async🤯that's a pretty detailed technical point. How does a guy at that level have that level of depth.. insane.
1
0
1
@_AlexanderLong
Alexander Long
6 days
@563defi @Pluralis__ main difference is in Protocol Learning the model is sharded up to prevent any one actor ever receiving the full model weights and hence standing up lower cost inf outside the protocol which removes incentive to contribute to training.
1
0
1