You don't need a H100 to run Llama-3-405b.
2 MacBooks and 1 Mac Studio will do the job, with
@exolabs_
to aggregate the memory/compute.
I'm ready for you, Llama-3-405b.
@exolabs_
now tracks in real-time if you are GPU poor or GPU rich based on all the devices connected in your AI cluster.
Here I have 2 MacBook Pro’s, 1 MacBook Air and 1 Max Studio connected.
h/t
@caseykcaruso
and
@huggingface
for inspiring this.
Mixture of Expert (MOE) models + distributed inference = a match made in heaven.
Soon, you’ll be able to run 100b+ parameter models like this on normal laptops / phones with exo.
Track the GitHub issue here:
@ac_crypto
@exolabs_
It's an MOE so only 21B active params. Actually an interesting candidate for distributed inference. Easier to make it faster sharding across experts
Does anyone have 8 maxed out mac studios?
@BasedBeffJezos
wants to know what we could do with them.
We can aggregate the memory/compute on exo and they would have almost as much compute and 20x the memory of a H100.
How long does it take to get distributed inference running locally across 2 MacBook GPUs from a fresh install?
About 60 seconds, running
@exolabs_
Watch till the end, I chat to the cluster using
@__tinygrad__
ChatGPT web interface
Code is open source 👇
Exo was featured in Tom’s Hardware.
“Thanks to the work of a team of developers, a new software could allow you to run your own AI cluster at home using your existing smartphones, tablets, and computers.”
Link to repo:
New software lets you run a private AI cluster at home with networked smartphones, tablets, and computers — Exo software runs LLama and other AI models
Mixture of Expert (MOE) models + distributed inference = a match made in heaven.
Soon, you’ll be able to run 100b+ parameter models like this on normal laptops / phones with exo.
Track the GitHub issue here: