BlueGene/L Puts the Hammer Down 152
OnePragmatist writes "Cyberinfrastructure Technology Watch is reporting that BlueGene/L has nearly doubled its performance to 135.3 Teraflops by doubling its processors. That seems likely to keep it at no. 1 on the Top500 when the next round comes out in June. But it will be interesting to see how it does when they finally get around to testing it against the HPC Challenge benchmark, which has gained adherents as being more indicative of how a HPC system will peform with various different types of applicatoins."
Finally... (Score:5, Funny)
Re:Finally... (Score:1)
The real question is.... (Score:2, Funny)
Avoiding the obvious memes... (Score:3, Funny)
...ok I couldn't resist
Imagine a beowulf cluster of these....
Re:Avoiding the obvious memes... (Score:1)
Re:Avoiding the obvious memes... (Score:2)
Re:Avoiding the obvious memes... (Score:2)
Re:Avoiding the obvious memes... (Score:2)
A. It's already a cluster
B. In order to be a beowulf cluster, it would need to be built of commodity parts.
similarities (Score:4, Insightful)
Re:similarities (Score:5, Funny)
Re:similarities (Score:2)
If your wish is a question of protein folding, it'll try to grant it for you with only 1 of 'em. I wouldn't hold my breath for anything else though.
Re:similarities (Score:1)
Re:similarities (Score:1)
Re: (Score:2)
Re:similarities (Score:2)
Man, you need to turn off the Cartoon Network, and go watch ESPN.
Re:similarities (Score:1)
Re:similarities (Score:1)
SkyNet sends out a massive army of T-800s at Blue Gene. Blue Gene fires a Demon Cannon and reduces SkyNet to rubble. SkyNet regenerates itself from a single surviving neural net chip. One of SkyNet's puny T-800s throws a Destructo Disc and manages to lop off Blue Gene's top half. A Blue Gene engineer slaps a few thousand processors on Blue Gene and it's back in action. SkyNet gets desperate with a 20-fold K
Re:similarities (Score:1)
Yes but what .. (Score:2, Funny)
and what type of frame rate do you get with Quake?
Re:Yes but what .. (Score:5, Funny)
It speculatively pre-renders every possible frame for the next 90 seconds.
Re:Yes but what .. (Score:1)
Re:Yes but what .. (Score:1)
Re:Yes but what .. (Score:3)
1. Based on optimal actions your frags would have been: 1348
2. Your actual frags: 2
3. You suck
Re:Yes but what .. (Score:1)
Math Error? (Score:5, Insightful)
Is it just me or is 135.3 * 2 < 360 / 2?
Re:Math Error? (Score:1, Informative)
Re:Math Error? (Score:5, Informative)
Every 512 node backplane has a peak of 1.4tflop/s according to their design doc.
The 32k system benched in at 70. The theoretical peak was 91, so the actual performance is about 77 percent of the peak, which is pretty normal.
The 64k system benched in at 135. The peak should be around 182, so that is 74 percent of the peak.
The design goal is for 360 at 64k. I'm guessing that 360 is the peak, because my rough estimate calculations put the of 64k nodes at about 364. Lets be nice and assume 70% of peak in the actual machine. That would indicate around 255tflop/s at 64k nodes of actual performance, assuming the thing scales at about the same rate.
So, they got their math right as long as they are claiming a peak of 360. That's a theoretical max, and so never actually reached. The actual is notably less. My numbers are estimates, and so 364 not equalling 360 doesn't bother me much in the end.
Anyone care to correct anything?
Also, can someone explain to me why Cray's Redstorm won't kick this things ass performance wise. Redstorm should have 10k processors, but they are 64bit Opteron 2.4ghz processors with 4x the ram per node. These, I believe, are 700mhz processors.
I'm confused because Redstorm only has a theoretical peak of about 40tflops off the 10k nodes. IBM's system at 10k should have a peak at around 30tflops. I'm wondering how 64bit opterons at more than 3x the speed could only be 10flops faster at 10k nodes than 700mhz 32bit PPCs with 1/4 the ram. Can someone please explain?
Also, anyone know why IBM isn't using HPCC. Cray has been using it for the XD1s. I'm guessing the reason IBM hasn't posted results is because they can't even come close in sustained memory bandwidth, mpi latency, and other tests. That's just a guess though. I'd love to hear from someone that actually knows about this stuff.
Why BlueGene kicks RedStorm's ass on Linpack (Score:5, Informative)
First off, Opterons are pretty mediocre at double precision floating point benchmarks, it just isn't what they were designed for. Opterons effectively have only a single FPU (technically they have two, but one only does addition, while the other handles all multiplies), while most competing chips in the HPC arena have two full FPUs. They tend to get spanked by PPCs and Itanium2s, and even Xenons can do better.
Also, you should note that the modified PPC440s in BlueGene have a disproportionate amount of floating point resources. Making them about equivalent to the 970 in that area mhz for mhz, despite being massively outclassed in integer and vector ops. And the floating point units on those 440s are full 64-bit units (as fpus are on many other ostensibly 32 bit chips, as the bit width of a fpu has nothing to do with the integer units and mmus being 32-bit). Plus the PPC has a fused multiply-add instruction, allowing it to theoretically finish 2 FLOPS/unit/cycle, instead of just one.
And finally, you should know that individual nodes' ram sizes matter very little for Linpack.
When you take all that together, it's not too surprising that 700Mhz PPC440s with 2 64-bit FPUs each finishing up to 2 FLOPs/cycle (at least 2 of which must be adds) would perform on par with 2.xGhz Opterons finishing a total of 2FLOPs/cycle (at least one of which has to be an add).
Re:Math Error? (Score:2, Informative)
It seems to me that since the device isn't complete the data management isn't working under optimal conditions.
Wait another year... (Score:5, Interesting)
Obviously that number's based on an unrealistic, 100% efficient scaling factor. But still. The 137 TFlop is coming from 64,000 processors.
It's fun to think about what's just around the corner.
Re:Wait another year... (Score:3, Insightful)
Given the size and complexity of the Cell, 527 of them might present some cooling problems. (Or cogeneration opportunities, if you hook a good liquid cooling system to a steam turbine..
Re:Wait another year... (Score:3, Informative)
Although linpack is very nice to parallize, i dont think it would be possible to even get 10% of the theoretical rate on a cell.
Re:Wait another year... (Score:2)
Re:Wait another year... (Score:2)
25GB/s für 176 GFlops.
An A64 or P4 has 6GB for 4-6Gflops.
If you do streaming MACS, you would need 2 loads and 1 store per 2 instructions-> 12byte per flop->176Gflops need 2Tbyte/s. Hope you get good cache hit rate.
Just as an example: REAL vector computers can sustain their vector units from main memory.
Faster processors? (Score:2)
Re:Wait another year... (Score:2)
Maybe it will be able to... (Score:5, Funny)
Re:Maybe it will be able to... (Score:5, Funny)
Windows HPC (Score:4, Funny)
Re:Windows HPC (Score:2, Funny)
Re:Windows HPC (Score:2, Interesting)
Blue Gene is known to run Linux. True, but... In fact, there are two types of nodes in Blue Gene. The computing nodes and the IO nodes. There is 1 IO node for 63 computing nodes. So for a 64000 nodes cluster, there are in fact only 1000 processors that runs Linux. The other 63000 are running an ultra light runtime environment (with MPI and other essential things) to maximize the spee
Re:Windows HPC (Score:2)
Cell vs HPC (Score:5, Insightful)
1) Solving linear equations. SIMD Matrix math, check.
2) DP Matrix-Matrix multiplies. IBM added DP support to their VMX set for Cell (though at 10% the execution rate), check.
3) Processor/Memory bandwidth. XDR interface at 25.6 GB/s, check.
4) Processor/Processor bandwidth. FlexIO interface at 76.8 GB/s, check.
5) "measures rate of integer random updates of memory", hmmmm... not sure.
6) Complex, DP FFT. Again, DP support at a price. check.
7) Communication latency & bandwidth. 100 GB/s total memory bandwidth, check (though this could be heavily influenced on how IBM handles its SPE threading interface)
Obviously, I'm not saying they used the HPC Challenge as a design document, but clearly Cell is meant as a supercomputer first and a PS3 second.
Re:Cell vs HPC (Score:1)
Damn, they are way outclassing Microsoft on this one.
Re:Cell vs HPC (Score:1)
Re:Cell vs HPC (Score:1, Informative)
Re:Cell vs HPC (Score:3, Interesting)
I think you've refuted your own argument there: double precision floating point performance is critical for true supercomputing. (In supercomputing circles DP and SP are often referred to as "full precision" and "half precision", respectively, which should give you a better idea of how they view things.)
In
Re:Cell vs HPC (Score:2)
I'd say it seems a lot more like they thought a supercomputer would be kickass for running a PS3 and designed it accordingly.
Re:Cell vs HPC (Score:2, Interesting)
This is because when you have the following conditions:
-- Lots of memory bandwidth needed
-- Fast floating point
-- Parallelizable code
-- Hand tuned kernels OK
You end up with something that looks lots like a supercomputer. You just turned your compute bound problem into an IO bound problem. We may want to revise that saying -- and say 'You turned your compute bound problem into a c
I wonder... (Score:1, Funny)
2.) How about CS:S?
3.) If Apache 2 were installed on it, could it survive a slashdotting?
4.) How fast could it run Avida?
Pics (Score:5, Informative)
Some more pics [ibm.com] of the prototype.
For comparison, the Earth simulator [jamstec.go.jp] and big mac [vt.edu].
Anyone know what kind of facilities blue gene will be housed at? The one for the earth simulator looks like something out of a movie, IBM better be able to compete on the 'cool factor'. : )
And does anyone else get the warm and fuzzy feelings from looking at these pics, even though there's nothing you could possibly use that much power for? Ahhh, power...
Re:Pics (Score:2)
Re: (Score:2, Insightful)
Re:Pics (Score:2, Informative)
Visual comparison BlueGene vs Earth Simulator (Score:2)
Blinking Lights (Score:3, Funny)
I remember seeing a news article on TV recently about NASA and their upgrades to computer horse power for doing flight simulations and design work. The picture they showed? A late 80's conne
Re:Blinking Lights (Score:2)
Re:Pics (Score:2)
Re:Pics (Score:1)
But What about the Crays? (Score:1)
Re:But What about real machines ? (Score:1)
A graph would be neat (but I'd settle with a power of ten) :-)
It would give an idea of when we'll get that kind of power at home - and don't tell me we'll never know what to do with it...
Re:But What about the Crays? (Score:2, Interesting)
So try... a cluster of 25+ X1s and then we'll talk =)!
Re:But What about the Crays? (Score:4, Informative)
Re:But What about the Crays? (Score:1)
unless you have a completely novel computing architecture and someone with alot of money and a specialized need willing to pay for it.
Re:But What about the Crays? (Score:1)
TZ
More than Teraflops (Score:2, Interesting)
In other words what is the cost in the quest for performance?
Re:More than Teraflops (Score:1)
3200 amps of pull at 110 VAC... pumping out about 1,200,000 BTU /hr.. plus the cost of AC so another 1200 amps (110 VAC) of juice to pump out the heat (assumes 20c outdoor temp).
/(1000w) = 484 KW draw. assuming a 10c/kw HR. it pulls in about $50/hour to run in power alone, assuming really cheap power.
The total power would average around 4400 amps * 110 VAC
So yearly power price is going to be running around $400k / year as a conservative estimate.
My guesses are that each rack
Can anyone please... (Score:2)
Re:Can anyone please... (Score:3, Informative)
Re:Can anyone please... (Score:2, Informative)
What kind of force field do you want to use? CHARMM or AMBER? Well, it might work. From an unfolded protein, though, some of your atoms will undergo fairly drastic changes in environment. Better use a polarizable model. Oh, crap, there's another order of magnitude in expense, and you still may not get the right answer unless your force field is parameterized
Re:Can anyone please... (Score:2)
> at research labs are optimizing thier codes effectively
Good programmers are cheap compared to computation time on one of those machines. The electric bill alone is nothing to sneeze at.
Re:Can anyone please... (Score:3, Informative)
"IBM estimates that the folding model for a 300-residue protein will encompass more than one billion forces acting over one trillion time steps. Even for Blue Gene, modeling such a folding process is expected to take about a year of around-the-clock processing."
Could encourage poor products? (Score:2, Insightful)
One in every home (Score:3, Interesting)
A few decades ago, people thought Bill Gates was wrong when he reckoned there would soon be a time when there was a computer in every home.
Now, a supercomputer fills an entire room. So how long before someone reckons that there will come a time when there will be a supercomputer in every home?
Re:One in every home (Score:1)
According to Apple [apple.com] that era was launched a long time ago.
Re:One in every home (Score:1)
Then it won't be considered 'super' any more, as there will be even faster computers out there.
Re:One in every home (Score:3, Informative)
Re:One in every home (Score:2)
I have a 2 notebooks, a dual processor linux machine, an iMac and a TiVo. All networked and could be turned into a number cruncher with out much difficulty.
Granted it does not compare to any modern supercomputer but it is close to 1996 >$200,000 computer.
Re:One in every home (Score:2)
Desktop computers are so fast these days that engineers at Intel and AMD are actually reaching barriers in physics toward extending their processing power any. Yes, there have been physical problems in the past, and most of them involved etching the chips (we hit a barrier for a while on how small the visible light lasers were capable of etching, so they switched to Ultraviolet). But now the
hpc test? other types of apps? (Score:1, Interesting)
Top 500 and Auto Racing (Score:5, Insightful)
Anyway, my point is - it's becoming just "I can afford more processors than you can so I win" instead of the heyday of Seymore Cray when you really had to be talented to capture the #1 spot from IBM.
Re:Top 500 and Auto Racing (Score:5, Informative)
Re:Top 500 and Auto Racing (Score:2)
Having said that I agree with you in that IBM has done some innovative things with BlueGene.
Another Benchmark (Score:1)
Re:Another Benchmark (Score:1)
Yes, but compiling the OpenOffice.org suite is the real time-trial challenge...
With BlueGene/L, I would hope that the compilation time would be measured in seconds, not hours like with my poor little (by comparison) Athlon.
Scalar performance -- Unimpressed! (Score:3, Interesting)
Can an Athlon 64 / P4 beat it on scalar code? The whole HPC world has gotten boring since Cray died. Here's why I say that:
The Cray 1 had the best SCALAR and VECTOR performance in the world.
The Cray 2 was an ass kicker, the Cray 3 was a real ass kicker (if only they could build them reliably).
Cray pushed the boundaries, he pushed them too far at some points -- designing and trying to build machines that they couldn't make reliable.
So it'll be a cold day in hell before I get all fired up over the fact that someone else managed to glue together a bazillion 'killer micros' and win at Linpack...
Now if someone would bring back the idea of transputers, or we saw some *real* efforts at Dataflow and FP then I'd be excited. I'd love a PC with 8 small, simple, fast, in-order tightly bound cpus. Don't say CELL, all indications are that they will be a *real* PITA to program to get any decent performance out of.
Re:Scalar performance -- Unimpressed! (Score:1)
Re:Scalar performance -- Unimpressed! (Score:1)
Re:Scalar performance -- Unimpressed! (Score:1)
Re:Scalar performance -- Unimpressed! (Score:1)
Tip of the iceberg... (Score:1, Informative)
But.... (Score:1)
Does anyone realize... (Score:2, Funny)
Hex it! (Score:2)
FFFFirst post!
Re:No Beowulf Cluster Jokes? (Score:1)
Re:No Beowulf Cluster Jokes? (Score:1)
Re:Imagine (Score:2, Funny)
Re:Wait a minute... (Score:1, Offtopic)
Re:Since we are talking about benchmarks... (Score:2)
Re:But does it run Linux ? - Yes it does. (Score:2)
http://www.forbes.com/home/enterprisetech/2005/