Here's a closer look at Nvidia's "bring up board" for the 2.7kW GB200 that Jensen showed of on stage yesterday. Notice this one has the power phases around the Blackwell GPUs.
#GTC24
Today marks the end of a journey and a bittersweet one at that. It's my last day as an editor
@sdxcentral
. But rest assured, my days of covering tech are anything but over. Stay tuned for more on my plans in the coming days. Trust me, you won't want to miss it.
@GModenezi
DC bus bar. Tons of NVLink copper interconnects vaguely visible, and of course the water cooling rack manifolds going up the outside edge.
As I understand it, everything is blind mate out the back. InfiniBand networking is out the front — hyperscale style.
#GTC24
I’m thrilled to announce I’ve joined
@TheRegister
as the new Systems Editor for
@sitpub
. I’ll be covering everything from the major OEMs to data center, colo, cloud, and service provider networking and infrastructure.
#LifeUpdate
#Career
#journalism
Here’s a look inside of a non-DGX GB200 system from Ingrasys
@HonHai_Foxconn
Those are blanks in the back standing in for the real chips if you’re wondering.
#GTC24
@MaxWinebach
My Fiancé until just recently was a devout Android user. Maybe its just Pixels, but her phone was always bugging out. Apps soft crashing in unpredictable ways; networking headaches; Bluetooth dropping at random.
iOS isn't perfect, but I rarely have issues like that.
In the 5 years since I proposed, my fiancé and I moved to a new city, started new jobs, adopted a dog, and weathered a pandemic.
As we enter 2024, I’m excited to say we’re finally getting around to planning a
#wedding
, tentatively for the summer of 2025.
For the love of all that is good and silicon, not everything is automatically an AI chip now. I've seen reports calling Axion an AI chip. It's NOT. It's a CPU. It does CPU things. Also TPUv5p is NOT new.
My Fiancé won't let me buy a proper rack/rack mount gear until we get a bigger place. So, this my evening project. What's your "rack" situation look like?
#100DaysOfHomeLab
There's the possibility that Intel could end up fabbing Arm chips that beat its Xeons. I can see plenty of folks on the Intel Product side asking: Why again are we building our competitor's chips?
#Intel
#IFS
I'm gonna be honest. I had no idea that Zilog was still kicking, let alone the Z80 that powered the venerable ZX Spectrum.
@ssharwood
has a full write up over on
@TheRegister
One rack. 120kW of compute.
#Nvidia
's new DGX GB200 NVL72 rack-scale system is a beast.
Check out my visual guide to the system over on
@TheRegister
#GTC24
I took some time to look at Nvidia's
#Blackwell
's efficiency gains and how they compare to
#Hopper
&
#Ampere
.
One of the key takeaways is as these chips cross the 1kW barrier air cooling just ain't cutting it anymore.
Full breakdown
@TheRegister
⬇️
@hellonearthis
The servers behind it are from other vendors.
A single 10U HGX B200 server is rated for 14.3kW.
The DGX GB200 Rack system features 18 1U servers each with two 2,700W GB200 Superchips. The total rack power including NVLink and Infiniband networking is 120kW.
Nvidia just unveiled a fire breathing 40PF (FP4) Grace-Blackwell Superchip that needs liquid cooling to quench its 2,700W power draw. Check out my latest for more on Nvidia’s
#Blackwell
chips.
#gtc24
#AI
Made some progress on my homelab monitoring dashboards this week. This time tracking power consumption. (It's only been up for a few days so the 28d avg isn't accurate)
Built using
@grafana
@InfluxDB
pulling data from NUT server.
#100DaysOfHomeLab
ChatGPT is down, but don't worry you can spin up your own local
#AI
#chatbot
in about 10 minutes and keep your queries private to boot. 😉
Check out my guide over on
@TheRegister
:
Today was spent configuring telegraf on my VMs and containers to pull metrics into Influxdb for a Grafana dashboard. Here's my Proxmox dash if you're looking for some inspiration.
#100DaysOfHomeLab
If you're worried about what Broadcom's VMware buy will mean for your enterprise infrastructure? It might be worth checking out these alternative virtualization stacks. Via
@TheRegister
#Datacenter
Oh, and let me know if you think I missed any.
For months, I've been trying to reconcile Blackwell's FP64 regression. Its not a regression, well sort of.
Blackwell does 45TF FP64 Matrix, down 32% from Hopper but does 45TF FP64 Vector, up 32%.
Why is more complicated. My latest for
@theregister
As someone who grew up in Northwest Minnesota, the
#blizzard2021
in Denver is wild. I've never seen so much new snow pile up so quickly. In 36 hours we've received more than two feet of the wet heavy stuff.
I am so thankful for my team at
@sdxcentral
that made it possible for me to escape reality for a week and enjoy my first real vacation — socially distanced of course — since we moved to Denver.
Oh and in case you missed it, I dug into Cerebras' latest dinner-plate sized AI chip. I'm not gonna lie, my brain breaks a little everytime I read 44GB of SRAM 👀
Can’t help but feel we should really be talking about memory capacity and bandwidth not TOPS with this whole AI PC thing. Those seem to be the bottlenecks we’re looking for.
@MaxWinebach
For 4bit quantized models you need roughly 0.5GB of fast memory for every billion parameters. 8GB on an iPhone seems reasonable for models in the 4-7 billion parameter regime, but practically you’ll want them to be smaller.
I've had this happen with my security system and my last car. Thankfully the security system's modem was modular. I made a call and they shipped me a new one.
With my car, it was a bigger headache when Verizon EoLed its 3G network.
It’s about time that AMD offered its Ryzen chips as an enterprise SKU. I’ve got a pair of 3000 and 5000-series Ryzen parts running in ASRock Rack server boards and they’ve been perfect for my homelab needs.
If you've got a MSI MB, GPU, or PC, be careful about where you download that BIOS update. Hackers claim to have stolen tools necessary to forge malicious firmware and are threatening to release it, if MSI doesn't pay up.
By me for
@TheRegister
The progress
@intel
and
@argonne
are making with the Aurora supercomputer is impressive. The system is now the US's second publicly known exascale system and achieved that perf with 87% capacity.
But why not a 100 percent? Issues with stability perhaps?
@anandtech
@intel
@Techmeme
For point of reference. Intel 4 is the direct rebrand of what Intel had previously called 7nm. But they are all meaningless marketing terms for denser. But calling Intel 4 4nm is not at all accurate.
Cloud-native architectures have changed the way applications are deployed, but remain relatively uncharted territory for high-performance computing.
@RedHat
and the U.S. Department of
@Energy
have a plan to change that. Via
@TheRegister
3-2-1 Backup strategy is coming together nicely.
On Site 🏡
- Primary storage server — 72TB 💾
- Backup Server — 12TB 🔒
Off Site 🌎
- S3 Compatible Bucket 🪣
#100DaysOfHomeLab
#BACKUP
Cerebras' collab with Qualcomm makes sense. One handles training while other tackles inference. But unless I'm missing something, the AI 100 Ultra looks like it could seriously benefit from an
#HBM
upgrade or at the very least LPDDR5X memory. Next revision, maybe?
Intel Foundry wooing Elon? Lets be honest about what this is about. Pat just wants to drum up some drama and get people talking about
#18A
.
#IFS
#semiconductors
#chips
@Lost_Signal
The density of the rack is a pro/con on power vs distance.
They could have gone half as dense across 2 racks with fiber, but it would have apparently cost them an extra 20kW to do that.
So, it would have been two 70kW racks instead of one 120kW one.
Marketing: We're coming together to assess the implications of
#AI
on the
#workforce
and find opportunities to retrain staff.
English: We've assembled to figure out who's getting the AI ax first and how soon we can do it. Now who wants to babysit the AI?
GPUs are a powerful tool for machine-learning workloads, though they’re not necessarily the right tool for every AI job, according to Michael Bronstein, Twitter’s head of graph learning research. Via
@TheRegister
Just got back from spending a few days with my family in Duluth. It was absolutely gorgeous, and my dad got to meet the pup for the first time since we got him.
#vacation
#digitaldetox
#familyfun
Work is done. That means I can settle into a weekend of... checks notes... server migrations.
That can't be right, pretty sure that should read beer. 🍺
Have a great weekend everyone!
@glennklockwood
Interconnect penalties, packaging complexity, cache distribution, volume and margins all come into play.
If you can produce reticle limited chips at acceptable losses for your margins, then packaging is less an issue
Not just Nvidia. It’s Emerald Rapids, Gaudi3 and Blackwell.
Made some updates to my UPS monitoring dashboard. Working on a version that's sharable. Right now it's pretty hacky.
Note: these numbers aren't accurate right now because there's big gaps in the data.
#100DaysOfHomeLab
So after reading
@lproven
's excellent Linux coverage on
@TheRegister
, I've come to the conclusion that transactional OSes are simultaneously the future and an oncoming headache for sysadmins.
#Linux
Seriously starting to feel like monopoly money now. $1B funding round 🤯 Seriously? This isn't a AI hardware startup either. Scale AI provides labeled data for training models. My latest for
@TheRegister
The fastest, thinnest tablet in the world is now... faster and thinner. Why again should we care, Apple? Beyond your silicon roadmap being a mess that is? Some good points from
@rupertg
for
@TheRegister
#Apple
#iPad
#M4
The all new SDx launches tomorrow. All the hype of the 90s internet boom with none of the disappointment. Join the movement, jack-in and load-up our new innovative user experience. Networking was never so exciting
#NewSDx
#SDxCentral
I think I've officially reached the point in my life where there is such a thing as too much caffeine 😭. Yesterday I discovered a can of Yerba Mate was 50% too much Yerba Mate...
When your boss lets you out early before a long weekend, but then you realize you can’t go home because you’ve got an interview coming up.
#journalistproblems
Everyone have a great holiday!
So digging into Qualcomm’s X-Elite this thing doesn’t have much memory bandwidth to work with 135GB/s. Really have to wonder how big a detriment that’s gonna have on GPU performance.