Supercomputer Breaks the $100/GFLOPS Barrier 281
Hank Dietz writes "At the University of Kentucky, KASY0,
a Linux cluster of 128+4 AMD Athlon XP 2600+ nodes, achieved 471 GFLOPS on 32-bit HPL. At a cost of less than $39,500, that makes it the first supercomputer to break $100/GFLOPS. It also is the new record holder for POV-Ray 3.5 render speed.
The reason this 'Beowulf' is so cost-effective is a new network architecture that achieves high performance using standard hardware: the asymmetric Sparse Flat Neighborhood Network (SFNN)." Because this was a university project, KASY0 was assembled entirely by unversity students, which while being a source of cheap labor, is also a good way to get a lot of students of involved in a great project.
Wow! (Score:5, Funny)
Let the Beowulf cluster jokes begin! (Score:5, Funny)
Imagine a Beowulf cluster of Beowulf cluster jokes!
Re:Let the Beowulf cluster jokes begin! (Score:2)
Re:Let the Beowulf cluster jokes begin! (Score:2)
How about moderating them down for unoriginality then.
Also I wonder (Score:5, Interesting)
How much electricity will these super computers use up?
All those wires, it looks like it takes up alot of juice.
Students as Slave Labor (Score:5, Funny)
Re:Also I wonder (Score:2, Funny)
Re:Why do you always call it slave labor? Its not. (Score:2)
Stafford Loan Maximums (independent student):
1st year: $6625
2nd year: $7500
3rd year: $10500
4th year: $10500
5th year: $10500
For dependent students the amount is almost halved but any student can become indepent when they are 23 years old or demonstrate that they truely are no longer supported by their parents/guardians (don't know too much about this though). This is only Stafford loan
Re:Why do you always call it slave labor? Its not. (Score:2)
I agree the rich have it easier, but that is true anywhere in the world. I don't think money is the reason a poor person will likely get less education than a rich person though. If you haven't noticed, scholarships, grants, and loans make it possible for anyone to go to school if they *want* too. The problem is that the drive and discipline to go to school must be instilled by the parents and the family.
To say that rich people are holding down the poor an
Re:Why do you always call it slave labor? Its not. (Score:2)
Oh, I agree your situation isn't abnormal. What I find abnormal is paying 400% over the odds (plus interest) and considering it a good deal.
I'm not talking about a socialist society, where everything is covered by "the state", but rather of any system (regardless of form or
And how much HEAT? (Score:3, Funny)
Did you guys notice from the pics [aggregate.org] that there doesn't seem to be any fans in the holes on the sides? Are they crazy? These are Athlons. I hope they put enough fans in those things.
Re:And how much HEAT? (Score:2)
Point of Trivia (Score:2)
(Those familiar with the University of Manchester's Department of Computation, in the UK, will understand what I mean. The architecture is designed around the computer room. Even after the truly massive lumps of iron were removed, it still wasn't until the mid 1990s that the building had a ground-flo
Re:And how much HEAT? (Score:2)
See here:
For example, each case came with two side fans, which we converted into a redundant stack venting out the back. [aggregate.org]
Re:And how much HEAT? (Score:2)
Re:And how much HEAT? (Score:2)
Re:Also I wonder (Score:3, Informative)
210A at 120Vac via the power law comes to 25.2kw/hr. Tripple that to allow for cooling (It takes approx 2 watts of power to remove the heat generated by 1 watt of power usage) and you come to almost 76kw/hr. Take a look at your utility bill to come up with the hourly cost for electricity while this thing is on.
The equipment does not have cool
Re:Also I wonder (Score:2)
I think these guys need a way to tell if the computer has crashed or lost power. Y'know, UPS' have those mini alarms, but people aren't going to be around the computer all the time, and the UPS will only detect a power outage.
I think they need a watchdog circuit, linked to a 25.2 kilowatt amplifier and a suitable speaker. That way, no matt
Re:Also I wonder (Score:2)
Anyway, if you want to see stuff that really draws power, go look for the high energy physics stuff. Power cables that are liquid cooled through tubes
Re:Also I wonder (Score:2)
On what planet? I cool my 60 watt or so Athlon XP 2000 using a 4 watt, 80mm fan. Add an 8 watt, 120mm fan on the intake that is WAY overkill, and a 4 watt PS exhaust fan, and I'm using 16 watts to
Re:Also I wonder (Score:2)
Re:Also I wonder (Score:2)
Re:Also I wonder (Score:2)
There were 128 machines.
Thats 44.8kW
Now take into account at 120V that's 373Amps rms. With all the surge supressors/power-strips, we're talking a serious amount of impedence (fwi, not resistance).
Not to mention, the typical circuit breaker clamps at 20amps. You'd have to have 36 separate circuits in a typical office environment (a circuit usually services several outlets).
To boot, the impedence at such high currents running off the same master power cables could cause
Re:Also I wonder (Score:2)
The reason Athlon and P4 system require a 300 watt supply is for when they are starting up.
To those who might not know... (Score:2, Informative)
As a measure of computer speed, a gigaflop is a billion floating-point operations per second (FLOPS).
Re:To those who might not know... (Score:2, Informative)
If you're going to try to be informative, at least be accurate. There's no such thing as a "gigaflop". That would mean "Billions of Floating point Operations Per..." without the unit of time.
It's a gigaflops (singular). The 's' is very important. It's how we know how long it takes to perform a billion floating point operations.
It's like when people say "I had my engine up to 6000 rpms". What's an rpms? Is it a plural rpm? If so, what is pluralized? The acronym expansion yields "revolutions per
Let's not get too excited.... (Score:5, Funny)
Not after you factor in the SCO license fees.
Re:Let's not get too excited.... (Score:2)
It's a university project (Score:3, Funny)
Re:It's a university project (Score:2)
Asymmetric Sparse Flat Neighborhood Network (Score:5, Interesting)
Re:Asymmetric Sparse Flat Neighborhood Network (Score:5, Informative)
Re:Asymmetric Sparse Flat Neighborhood Network (Score:3, Interesting)
Admittedly, I understand that no node is more than one hop away. But, how is this different than all nodes plugged into a large switch like a Cisco 6500 or a Nortel Passport 8600? These switches can have ~128 ports and can switch 256Gbps aggregate throughput at wire speed. Add another switch and then add a second NIC to each host and you increase the capacity even further. Additionally, this does not requi
Re:Asymmetric Sparse Flat Neighborhood Network (Score:2)
It's cheaper.
Re:Asymmetric Sparse Flat Neighborhood Network (Score:3, Informative)
Here's a quote from the site:
Re:Asymmetric Sparse Flat Neighborhood Network (Score:2)
The technique that was used seems to be more of a mental exercise in making spaghetti, I don't see it reducing latency or increasing performance beyond the currently used techniques.
It significantly reduces cost. In wire speed switches (FastE or GigE) there will typically be a sweet spot for price/performance. Beyond that point, switch prices jump into the stratosphere.
For larger clusters, there simply aren't any switches big enough at any price (just try to get a 256 port GigE wire speed switch for e
Re:Asymmetric Sparse Flat Neighborhood Network (Score:2)
you can increase performance. rather than 1 Gb port into a very expensive 64 port switch, to give you a maximum of 128Gb bandwidth (bidirectional 64x1Gb), you can (if you use the calculator) stick 4 Gb ports in each machine, buy 11 cheapo Dell 24 port gigabit switches (about $3k each), have 1 switch latency, and have 4 times the total non blocking bandwidth available. And the switches will still cost you less than 1 64 port gig switch.
Re:Asymmetric Sparse Flat Neighborhood Network (Score:2)
Because the routing is being done in software instead, the cost driver is dramatically reduced; consequently, it becomes cost-effective to hav
Re:Asymmetric Sparse Flat Neighborhood Network (Score:2)
I was actually wondering how well Linux would handle this. The obvious algorithm to find the correct entry in the routing algorithm is linear in the number of entries. That doesn't sound like efficient to me, but it might be that 100 entires is still so small a number, that it doesn't matter. However this particular cas
From the KAYSO document... (Score:2)
Every host does have at least one pathe to every other host but, most hosts have multiple paths to other hosts. It is true however that all hosts do not necessarily have multiple paths to all other hosts.
this is nice (Score:2, Interesting)
of course, if you just need a lot of general purpose super computing, it is obvious that you cannot compete with this.
Wrong (Score:3, Informative)
If you can paralize your application well enough, beowoulf rules, but if you need a lot of node2node communication, the network cost quickly surpasses the cpu cost of the system
Re:Wrong (Score:4, Insightful)
Really, it's a spectrum. One one end you have fully commodity beowulf, in the middle, you see things like Dolphin and Myrinet, and on the high end you see fully custom backplanes and sometimes RAM and I/O controllers as well. Purpose built CPUs are becomming less common now, but not unheard of.
Each step up the spectrum widens the domain of problems that the machine can work on efficiently, and raises the price for the machine. In many cases, a 'real' supercomputer is more or less a cluster with a specialized network and OS and mounted in a single cabinet so it doesn't look like a cluster.
In general when a lower end machine can efficiently run your program, there is no benefit to using a more expensive machine.
As server hardware improves and 'exotic' hardware becomes more mainstream, the gap between the low and the high end narrows. There will probably always be a small but existant set of problems that call for the 'real' supercomputer, but that set is shrinking.
There are other considerations as well. If the Beowulf in your lab can solve the problem in 1 week and is available now, while the 'real' supercomputer on the other campus can solve it in 4 hours and will have a timeslot available in 2 weeks, the Beowulf is 'faster' from your point of view.
Playstation2 at 5.5GFLOPS costs only $199 $40/GFL (Score:4, Insightful)
ago. Sorry, the AMD beowulf cluster at $100/GFLOP just
isn't that impressive.
Is that a real number or a marketing number? (Score:5, Insightful)
Then, of course, there is the issue of specialised chips vs normal chips. A GeForce 4 4400 can claim, roughly, 80 Gflops peak. That sure beats the hell out of any sinlge CPU I've ever heard of, including the Power4. Thing is the GeForce 4 is a graphics DSP, it isn't a general purpose CPU. It can do that kind of math when all its units are working at what they do best, but try to reprogram it to do something else and it will slow to a crawl (for that matter I'm not even sure that it is turing complete).
So don't take any hype on a console to equate to real performance in a general task. Oh, and the BS marketing number I see for the PS2's Emotion Engine is 6.2Gflops.
Re:Is that a real number or a marketing number? (Score:3, Insightful)
Re:Playstation2 at 5.5GFLOPS costs only $199 $40/G (Score:2)
In cache maybe (Score:3, Informative)
Re:Playstation2 at 5.5GFLOPS costs only $199 $40/G (Score:2)
Also, the PS2 is not a supercomputer. It has a slow processor and very little RAM, so it wouldn't be able to do much number-crunching. You can't hook PS2s together, anyway, so comparing a single specialized machine to a cluster is absolutely meaningless.
Re:Can't hook ps2's together? (Score:3, Interesting)
Anyway, if you think you can do better with PS2s, why don't you do so?
Re:Playstation2 at 5.5GFLOPS costs only $199 $40/G (Score:2)
This beat the PS/2 (Score:2)
The previous price/performance champ was in fact a PS/2 cluster, mentioned here, but this AMD cluster is roughly three times the performance for the dollar. You can check the stats with different assumptions on their FAQ [aggregate.org] page, particularly the section labeled 'Is KASY0 really the first supercomputer under $100/GFLOPS?'
Re:Playstation2 at 5.5GFLOPS costs only $199 $40/G (Score:3, Informative)
Gah feel free to mod the previous version of this comment into oblivion, I hit submit accidentally.
The numbers you're looking at are marketing numbers first off, and overly generous. Second you don't scale for free - you never get anything like 100 times the performance of a single box when you wire 100 together, for the same reason that you don't get twice the horsepower out of an engine twice the size.
The previous price/performance champ was in fact a PS/2 cluster, mentioned here [com.com], but this AMD cluster
Re:Please mod parent down (Score:2, Informative)
The burning question (Score:2, Funny)
Re:The burning question (Score:3, Insightful)
People dont share mp3s anymore, if they do the FBI, NSA, Secret Service, CIA, and Homeland Security Dep will swarm them and put them in the bay.
I mean I wish we could crack down like this on organized crime, or on domestic terrorists, I'm surprised we are so aggressive at arresting teenagers who download music, but the KKK and Neo Nazis can collect a million guns and spread their crazy hate speech and its protected by freedom of speech.
I'd think that hate speech does more harm than copyright infringement
hot damn, they're case modders! (Score:2, Funny)
Re:hot damn, they're case modders! (Score:2)
That's probably why they did this:
For example, each case came with two side fans, which we converted into a redundant stack venting out the back. [aggregate.org]
Re:hot damn, they're case modders! (Score:2)
So much power... (Score:5, Funny)
--krahd
mod me up, scottie!
Comment removed (Score:4, Interesting)
Re: (Score:3, Informative)
Cooling (Score:4, Informative)
Here is the bill! (Score:5, Funny)
At the cheap introductory price of 699$ for 80 lines of code in the Linux kernel, it will cost you 8,377,500$ by kernel since we have discovered that in fact 1000000 lines of SCO IP were copied into Linux.
Designation
Linux kernel
So you must pay us only 1,118,400,000$, and in my kind almighty I will offer you a discount of 118,400,000$ so you only have to pay ONE BILLION DOLLAR if you pay before tomorrow!
Please send you creditcard number at darl@sco.com
Sincerely yours,
-- Darl Mac Bride
Re:Here is the bill! (Score:2)
Nice wiring! (Score:2, Insightful)
God forbid they use cable gutters
Other than that, kick ass job guys!
-nate
Re:Nice wiring! (Score:2)
Way to go! (Score:2)
Way to go Dr. Dietz!
So, mod me anyway you want, karma to burn.
How many university have larger clusters? (Score:5, Interesting)
Cheers.
Re:How many university have larger clusters? (Score:4, Funny)
Well... probably more than one, definitely no more than 500.
University students (Score:5, Insightful)
At the risk of being flamebait- No. Using university students is almost always purely a way of getting cheap labor to do semi-mindless, or completely mindless, stuff the staff doesn't want to do- it's a common myth that students 'learn' by doing grunt work. I should know- I have several grad student friends, and they've thusfar spent a large part of their academic careers working in labs doing mind-numbingly boring stuff(according to them.)
Imagine if a Bio lab did this. The following would sound pretty absurd: "Help us move our lab, you'll learn about cellular recombination!". No. You'll learn what a bunch of lab equipment looks like, how eccentric the professors are, and how expensive/fragile/heavy the equipment is, and the next morning what sore muscles are like. Let's get a reality check here.
(from the site):Our group develops the systems technology for cluster supercomputing; the more people we can show how to apply these technologies, the better.
Huh? What cluster supercomputing "technology" does assembling a PC and plugging it into ethernet teach you? Did they give a presentation about how clustering technology works, for example? Did they explain to each person, as they put a machine in a particular place and wired it to a particular switch, WHY it was going there etc? Obviously I wasn't there, so perhaps someone from the group can contribute on this point.
Re:University students (Score:5, Informative)
Dietz specializes in networking and all the wiring that you see in the photos is charted out by custom software that he's written just for this purpose.
He works in the realm of optimizing communications among the nodes to avoid network latency and so on. If you read the POVRay benchmarks, you'll notice that the author comments that several clusters' CPUs spend most of their time idle due to network latency. Dietz is researching the best ways to eliminate much of that latency so that the CPUs in the cluster can spend more of their time crunching data rather than just throwing off heat. To my knowledge, he is succeeding at this and better than most other researchers in the field.
As for what his students learned from this, I don't know exactly which students helped him on this. For KLAT2, there were several undergrad volunteers who helped with wiring and assembly, mostly from the campus Linux Users' Group. I know his grad students and research assistants are learning a lot about how clustering and network tech works, and a couple are doing their Ph.D. disserts in this very subfield of E.E.
Re:University students (Score:2)
Re:University students (Score:2)
Petty 1 of
Re:University students (Score:2)
Re:University students (Score:2)
And yes microbiology students will still have to build their own apparatus for experiments they conduct - I only know this because I took a class in microbiology a while back and I had to build the apparatus for all the required experiments I had to do.
I'm guessing in this case they not only
In other news... (Score:2, Insightful)
-1
why not DSP? (Score:5, Interesting)
This price/performance ratio seems to make them very attractive compared to general purpose CPUs. According to the NASA G5 Study [cox.net], the P4 2.66 GHz is only able to achieve 255 MFLOP/s. And the P4 costs about 4x the price of the 6711 DSP.
It seems that DSPs should be the clear winner in supercomputer applications, what are their disadvantages and why are they not used? Granted there is a lack of mass produced hardware such as motherboards for DSPs, but that alone should not exclude them from the supercomputer realm.
Re:why not DSP? (Score:2)
That said, my Palm Tungsten is a combo of a GP processor and a DSP, as I believe are several Sony variants. Perhaps as I/O on handhelds improves (?) the
Actually... (Score:2)
Further, it would also accelerate the product enormously - Linux on a Chip would be blazingly fast, as it wouldn't take any processing power away from what it was running - thereby also reducing the cost per GFLOP.
Re:why not DSP? (Score:4, Informative)
DSP's are optimised to handle streamed data of a particular maximum size (Eg. 4-element float point variables). Useful for image processing (red,green,blue,alpha) and 3D graphics(XYZW), but if you're modelling something like ocean currents, global weather, every data element is more than likely going to have more than four variables (eg. temperature, humidity, velocity, pressure, salinity, ground temperature), you may not get full optimisation.
Plus, you also need a means of getting all these processors to talk to each other. DSP's are nearly always optimised to operate in single pipelines, so don't need much communication support (eg. Sony Playstation 2). However, if you're designing a supercomputer system, the major bottleneck is the communication between processors (network topology). Some applications might only need adjacent processors to talk to each other (global weather simulation usually represents the atmosphere as a single large block of air, with sub-blocks assigned to seperate processors. Other applications might assign individual processors to different tasks, which complete at different rates (eg. the Mandelbrot set). A configurable network architecture allows the system to be used for many more different applications.
Re:why not DSP? (Score:2)
Re:why not DSP? (Score:2)
2. DSPs tend to not have a lot of RAM, whilst big modelling apps crave RAM (esp. raytracing).
Re:why not DSP? (Score:2)
True, I realize this. But I am under the impression that a lot of heavy duty number crunching algorithms have a minimum number of branches, and mostly just perform the same operations on multiple sets of data. Think of the FFT and simulations of systems based on differential equation models. This should include weather models and quite possible nuclear events. These applications seem li
Mckenzie Cluster, faster, cheaper per TFlop (Score:5, Interesting)
Nice machine, but this January, CITA and the astro department at the University of Toronto brought a 256 node dual Xenon system on line: "1.2 trillion floating point mathematical operations per second (Tflops) on the standard LINPACK linear algebra benchmark." Total cost: CDN$900K (including tax) (in January prices, that's $600K U.S. or $0.50USD/GFlop.) It's being used for some very cool Astro simulations...
See http://www.cita.utoronto.ca/webpages/mckenzie
Re:Mckenzie Cluster, faster, cheaper per TFlop (Score:3, Informative)
How are these booted? (Score:2)
KASY0 nodes are completely diskless; there isn't even a floppy. (from the FAQ [aggregate.org])
So how are the nodes booted? Are there bioses out there that can netboot?
-c
Re:How are these booted? (Score:2)
Re:How are these booted? (Score:2)
There is Flop and Flop (Score:4, Insightful)
The same machine would yield average results on 64 bits. Difficult to draw attention without headline numbers...
And next week ... (Score:2)
overclocking (Score:2, Insightful)
Way to miss the whole point (Score:2)
Besides, nobody in their right mind would run a parallel program of any importance on a "rigged" setup like that.
Re:Way to miss the whole point (Score:2)
Re:Way to miss the whole point (Score:3, Interesting)
MOSIX is a parallel cluster oper
Pardon me Cowboy (Score:2)
Did you RTFA?
What about $170K (Score:4, Funny)
What a shame. Freeloaders. They would never be able to achieve such performance if not for the fruits of labour of SCO .. eeeh.. lawers?
$100/GFLOP (Score:2)
university price calcuations are bogus (Score:2)
Re:Hmm Math? (Score:2, Informative)
That's like 132 isn't it?
From the FAQ:
KASY0's configuration is:
128 + 4 "cold spare" PC nodes, each containing:
One AMD Athlon XP 2600+ (the 2.075GHz version)
One 512MB PC2700 DDR SDRAM
BioStar M7VIT Pro motherboard
Two Linksys LNE100TX NICs
Codegen 6042L case with 400W power supply
18 BenQ SE0024 24-port Fast Ethernet switches
405 Cat5 Fast Ethernet cables
RedHat Linux 9.0, modified Warewulf 1.11
So it's 128, the other 4 are spares!