So crazy to me that 6 months ago literally everyone serious AI researcher I talked to told me stuff like this wasn't feasible. Decentralized Training is gonna work - we're still at DDP stage and there's a long way to go for giant model runs, but no doubt where the vector of
What if you could use all the computing power in the world to train a shared, open source AI model?
Preliminary report:
Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of
The decentralized training race is now underway which I expect to be the major theme of AI research in 2025, create a new field and kick off a megacycle. What happens once you accept decentralized training is feasible? Pluralis’s answer is: Protocol Models. 1/9
I'm completely convinced decentralized training is about to define a mega cycle. There's gonna be massive, ubiquitous real world utility via a convergence of two of the most individually deep technical fields ever... so much of the legit crypto work of the last decade is
"The safest number of ASIs is 0. The least safe number is 1. Our odds get better the more there are."
The typical response here is that we are witnessing commoditization at the foundation model layer and so everything will be fine but think for a second what that actually
Feels like the wind changed direction a few weeks ago. Gensyn don't get enough credit for being so early on this stuff imo. When it's common knowledge and everyone's saying oh yeah we always knew this stuff would work, should remember how contrarian it was at the time.
🚨 NEW REPORT
GPT
@home
: Why the Future of Training is Decentralized
Can we train a large AI model over the world’s edge devices?
Increasingly the answer appears to be yes.
New report on decentralized training - why it matters, recent breakthroughs, and the challenges ahead.
Significant details and background on all these points here.
Pluralis (
@pluralisai
) is a research lab working solely on this - we'll have more to say soon.
But the main point decentralized training can assemble significantly larger computational power than centralized actors, and hence if scaling continues , it will also produce the best models. It doesn’t matter if training is more expensive. 2/2
Summary: It's important to first understand open source AI today is completely dependent on megacorps releasing strong base models. Collaborative training has never got close to the required scale for foundation model training.
And it likely won’t, because the compute required
Myth 3: It will always cost more, and that makes it pointless. Reality: Not clear. Can aggregate low cost, low capacity power sources, there’s no need for cooling, you don’t require high utilization etc. 1/2
Myth 2: Small capacity devices cannot produce foundation-scale models. Need H100’s, B100’s etc. Reality: just flat out not true. Also the efficiency of consumer devices is very competitive.
Myth 1: Low bandwidth interconnects make training too slow. Reality: If you apply FSDP or zero-3 or similar unaltered to a swarm setup of course it’s slow. The question is can you adapt these methods to work well with low node-node bandwidth. You can.
Myth 4: The swarm can never get big enough. Reality: The quantity of compute that an be assembled in a decentralized training is significantly beyond what is achievable by any single actor. Happily, there is lots of intermediate value creation along this path.
Great conversation. My view on why this so important is very simple; if you cannot create the models within protocols, you cannot enforce true ownership or control.
🔥 Code meets governance! On
@theindexshow
,
@afkehaya
&
@AlexanderJLong
, Founder
@PluralisAI
, break down how decentralized training is transforming AI ownership and governance. Dive into how decentralized models are reshaping
#AI
from the ground up! 👉🎧
Has it occurred to anyone that if inference compute requirements blow out due to search (i.e. strawberry or models like strawberry work) and you're waiting 20s for a response, latency of communication between nodes completely stops mattering?
One of the best articles of the year. "Given current projections, a (centrally controlled) distributed training network could accommodate a demand of 2 to 45 GW". PoW mining already hit ~20GW.
As soon as you replace the word 'safety' with 'control' in your head the whole discussion suddenly makes a lot more sense. Couldn't be more aligned with Alex on this. We must have a way to create the base models that's not constrained to the oligopoly.
Big AI has an incentive to spin a sci-fi narrative to push regulations to secure their advantage + raise more money.
Let's focus on the real, tangible risks in front of us, mostly those downstream of AI oligopoly.
Very p l e a s a n t when your acting on a fringe, contrarian belief, very early. Means everyone who's brains work kinda similar and came to same conclusions all end up finding each other very easily.
Opensource AI makes no sense in it's current form. You have a critical dependency on at least one actor freely releasing the result of a training run that costs millions of dollars. People slapped the name 'opensource' on this process and started acting like it was sustainable
Eric Schmidt says
@MistralAI
(where he’s an investor) will need to release their third model as closed source because they simply cannot afford to keep open sourcing their models!! IMHO, fiat markets are the wrong representation of value for AI.. Bittensor is the optimal
I think you can quite accurately proxy the effectiveness of a countries government via the retail cost of energy. for the last 20 years almost the entire west completely lost sight of this. Probably one of things that makes me so optimistic about future is
Current default path is this and the output will be culturally influenced. Imagine every book, article, and video you encounter growing up reflecting the same world view. Seems extremely underrated to me as a major risk.
People who don’t have kids or older relatives (the Siri demo) have no idea how powerful voice mode is.
It think it’s going to be huge - in ChatGPT and other products - very shortly.
@0xredJ
The only other time I've ever felt this way was in 2015 when I was studying EE, learned about deep reinforcement learning and decided to completely alter my trajectory and do a PhD in ML.
@markowifk
No one has got model parrellel or the rematerialization training approaches working. Both need to be done. And no one is even vaguely thing about incentivization correctly.
@alz_zyd_
Could double your static breathold in that time ez. Can now go spearfish or surf in big waves and all it took was autistically holding your breath at your desk for like 15 mins a day.
@Plinz
blows my mind anyone listens to him. Took a strong public stance on a default case which was 99.9% likely and the 0.1% outcome happened. Hard to be more wrong.
@fenbielding
Like I said first time we met I have no idea how you saw this so early. Still blows my mind. And then you went out and actually did it when there was literally no one else 🫡
@fchollet
@RyanPGreenblatt
@dwarkesh_sp
François you're much smarter than me but isn't this the point? If we have systems that can do arbritary symbolic reasoning, and we have another system that can learn to use these systems... what are we arguing about? Why does the symbolic reasoning have to be inside the model?
One of my Labmates did 3 years, got nowhere and restarted at another uni. That guy won the uni medal in undergrad so not a question of ability. Another did >6 and never graduated. I think the reason it's so distressing is people are typically on very strong trajectories going
Strong evidence showing that getting a PhD is extremely bad for your mental health.
A new paper uses Swedish medical records and matches them to the full population of PhD students for which the authors could get gender and birth year data from 2006 to 2017. After some exclusion
We’re getting multiple new models by Q1 2025 that will wow people like GPT-4 did.
Includes a major agent product that’ll be marketed as a personal assistant.
Ongoing talks about how much will be available via APIs.
People don’t realize what the AI labs have cooking right now.
@bidhanxyz
Always seemed to me like adderall is equiv of jumping on tren the first day at the gym. Also almost all the really really smart people I know minimize stimulant use, like they don't even drink coffee.
@JasonYanowitz
@urbit
@jbrukh
If scaling laws continue and models get very good, massive multi-data center training is happening and the compute is cross-border, there's a vague scenario where private companies can genuinely rival states. Wallacecorp basically. What's he think of this. Also has his opinion of
@563defi
@ekklesiarch_
🔔ding ding ding🔔Everything in that post above is volunteer setting where you assume good actors. The post makes the argument training is feasible in that setting. We haven't said anything else yet.
@563defi
@Pluralis__
main difference is in Protocol Learning the model is sharded up to prevent any one actor ever receiving the full model weights and hence standing up lower cost inf outside the protocol which removes incentive to contribute to training.