IBM Says It's Been Running a Cloud-Native, AI-Optimized Supercomputer Since May (theregister.com) 25
"IBM is the latest tech giant to unveil its own "AI supercomputer," this one composed of a bunch of virtual machines running within IBM Cloud," reports the Register:
The system known as Vela, which the company claims has been online since May last year, is touted as IBM's first AI-optimized, cloud-native supercomputer, created with the aim of developing and training large-scale AI models. Before anyone rushes off to sign up for access, IBM stated that the platform is currently reserved for use by the IBM Research community. In fact, Vela has become the company's "go-to environment" for researchers creating advanced AI capabilities since May 2022, including work on foundation models, it said.
IBM states that it chose this architecture because it gives the company greater flexibility to scale up as required, and also the ability to deploy similar infrastructure into any IBM Cloud datacenter around the globe. But Vela is not running on any old standard IBM Cloud node hardware; each is a twin-socket system with 2nd Gen Xeon Scalable processors configured with 1.5TB of DRAM, and four 3.2TB NVMe flash drives, plus eight 80GB Nvidia A100 GPUs, the latter connected by NVLink and NVSwitch. This makes the Vela infrastructure closer to that of a high performance compute site than typical cloud infrastructure, despite IBM's insistence that it was taking a different path as "traditional supercomputers weren't designed for AI."
It is also notable that IBM chose to use x86 processors rather than its own Power 10 chips, especially as these were touted by Big Blue as being ideally suited for memory-intensive workloads such as large-model AI inferencing.
Thanks to Slashdot reader guest reader for sharing the story.
IBM states that it chose this architecture because it gives the company greater flexibility to scale up as required, and also the ability to deploy similar infrastructure into any IBM Cloud datacenter around the globe. But Vela is not running on any old standard IBM Cloud node hardware; each is a twin-socket system with 2nd Gen Xeon Scalable processors configured with 1.5TB of DRAM, and four 3.2TB NVMe flash drives, plus eight 80GB Nvidia A100 GPUs, the latter connected by NVLink and NVSwitch. This makes the Vela infrastructure closer to that of a high performance compute site than typical cloud infrastructure, despite IBM's insistence that it was taking a different path as "traditional supercomputers weren't designed for AI."
It is also notable that IBM chose to use x86 processors rather than its own Power 10 chips, especially as these were touted by Big Blue as being ideally suited for memory-intensive workloads such as large-model AI inferencing.
Thanks to Slashdot reader guest reader for sharing the story.
Beowulf Cloud (tm) (Score:1)
Re:Beowulf Cloud (tm) (Score:4)
This is an amazing non-event, a PR release that says nothing, does nothing, and makes one wonder why IBM would tout such a thing There are lots of GPU-laden for-rent nodes in the cloud. Calling it a supercomputer is meaningless as it's not far from the architecture of other advanced server designs. There are no test links verifying its uniqueness, it's wow-factors, nada.
Nothing to see here. Move along.
Re: (Score:2)
Of all the companies. They’re also running it on their own special “cloud” nodes that aren’t available to anyone outside a fraction of IBM employees.
It’s called a datacenter IBM. You built yourself a data center and connected it to the internet.
Re:Beowulf Cloud (tm) (Score:4, Funny)
It’s called a data center IBM. You built yourself a data center and connected it to the internet.
A data center but *on* the Internet? Off to the patent office! :-)
Re: Beowulf Cloud (tm) (Score:2)
For techies, no. (Score:3, Interesting)
Nothing to see here. Move along.
That would be misreading the fluff.
Notice that it's not POWER10, but "scalable xeon". "Scalable" here probably mostly means "give us more money so you can give us more money yet later". Notably not even AMD, which currently has intel beat. So a nice, "safe" choice for technically weak middle management.
And the rest of the buzzword salad seems to underscore that. This is a vehicle that's "safe, proven" ("online since may"), and has plenty of buzzwords from both the latest fad and from "nobody ever got fire
Re: (Score:2)
Buzzword salad def describes it.
Somewhere, there is something very wrong about the detachment necessary to spurt a release that is so superficial and detached from current market reality.
GPU-based cloud instances, while not cheap, are getting loads of use from non-aligned vendors. That it's not POWER10 is a thankful blessing, but reminds us all again of how IBM, like other vendors we know, has this not-invented-here ego.... now showing it's capitulated to still another gasping-for-air vendor, Intel. Egads.
Re:Beowulf Cloud (tm) (Score:4, Interesting)
It doesn't say much; but what it does say suggests basically nothing aside from reasonably deep pockets. 8 80GB A100s on NVswitch basically means a system based on the HGX A100 [nvidia.com] boards that Nvidia sells to a variety of partners along with a reasonably high end but not at all atypical Xeon system.
You can get the same 8-socket HGX baseboard paired with either intel or AMD(quite possibly Ampere as well, I know that the PCIe A100s are supported on that, not certain about the HGX boards) from a variety of vendors; Supermicro, Inspur, HPE, Lenovo, Dell; and major hyperscalers have their own pet variants. Such systems aren't cheap; but they're off the shelf stuff you can talk to your rep and have in fairly short order. If IBM is putting out a puff piece about how they've bought some; but (while hyping it as 'cloud') they only do unspecified internal research on it then it seems reasonable to suspect that they've got nothing exciting to put out a press release about.
AWS will sell you access to essentially the same nodes right now for for $41/hr on-demand [amazon.com]; and Microsoft is busy burning cycles on their own variant to run a chatbot to make Google nervous. Tell us, IBM, what did you hope to gain by talking about this?
Re: Beowulf Cloud (tm) (Score:2)
Re: Beowulf Cloud (tm) (Score:2)
Re: (Score:3)
Imagine a beowulf cluster of beowulf clouds.
Nobody with a working brain is rushing to sign up (Score:3)
for anything that has "cloud-native" in the product description.
Re: (Score:1)
Over one million users signed to use ChatGPT [wikipedia.org].
The models were trained in collaboration with Microsoft on their Azure supercomputing infrastructure.
Re: (Score:2)
How does this negate what I said?
Re: Nobody with a working brain is rushing to sign (Score:1)
I'm just thinking back to the 386/486 IIS grids everyone asked "what ever for..?" about. Every or almost every IIS deployment ran on those for a while.
Marketing Strategy (Score:1)
Is this a really good idea for "containment"? (Score:2)
English translation (Score:2)
Re: (Score:2)
IBM Going Dinosaur (Score:1)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)