Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Supercomputing Education Hardware

Student and Professor Build Budget Supercomputer 387

Luke writes "This past winter Calvin College professor Joel Adams and then Calvin senior Tim Brom built Microwulf, a portable supercomputer with 26.25 gigaflops peak performance, that cost less than $2,500 to construct, becoming the most cost-efficient supercomputer anywhere that Adams knows of. "It's small enough to check on an airplane or fit next to a desk," said Brom. Instead of a bunch of researchers having to share a single Beowulf cluster supercomputer, now each researcher can have their own."
This discussion has been archived. No new comments can be posted.

Student and Professor Build Budget Supercomputer

Comments Filter:
  • by Anonymous Coward on Friday August 31, 2007 @03:57AM (#20421847)
    They just linked 4 motherboards together. My cat could do that.
  • by Bob MacSlack ( 623914 ) on Friday August 31, 2007 @04:10AM (#20421905)
    I guess reading the article is asking too much? There are 4 120mm case fans on it.
  • by QuantumG ( 50515 ) <qg@biodome.org> on Friday August 31, 2007 @04:13AM (#20421925) Homepage Journal
    http://www.calvin.edu/~adams/research/microwulf/bu dget/ [calvin.edu]

              AMD Athlon 64 X2 3800+ AM2 CPU x 4

    It's two clicks from the summary.

    Slack++

  • by Bananatree3 ( 872975 ) on Friday August 31, 2007 @04:48AM (#20422069)
    Motherboard: MSI K9N6PGM-F MicroATX [newegg.com] $62.99 * 4 = $251.96

    CPU: AMD Athlon 64 X2 3800+ AM2 CPU [newegg.com] $67.50 * 4 = $270

    Main Memory: Kingston DDR2-667 1GByte RAM [newegg.com] $48.49 * 8 + $4.99sh = $392.91

    Power Supply: (can't beat price): $76.00

    Network adapter (node to switch): (cant beat their price) $164.00

    Network adapter (switch to node): (cant beat their price) $15

    Switch: Trendware TEG-S80TXE 8-port Gigabit Ethernet Switch [newegg.com] $46.99+$7.04sh = $54.03

    Hard drive: Seagate 7200 250GB SATA hard drive [newegg.com] $69.99

    DVD/CD drive: (can't beat their price): $19

    Cooling: (can't beat their price): $32

    Fan protective grills: (can't beat their price): $10

    KVM: (can't beat their price): $50 Grand total (incl. 15 in hardware): 1416.89 $1000 saved by using Newegg!

  • by dbIII ( 701233 ) on Friday August 31, 2007 @04:53AM (#20422097)
    Mobs like Verari were selling something similar a while ago - not cheap though and I can't see it on their web page now. What is nice now from other places is things like 2 x 8 core machines in 1U (maxtron and probably a few others). The relatively small supermicro boards in that thing would mean you could put a few in a server case - not cheap though.
  • One of the problems with supercomputers is that there aren't really very many of them, because of the size and cost. It means that the tools you use to run your supercomputing applications are similarly unusual. The skills to use and develop on parallel systems are then equally scarce. Access to a supercomputer isn't exactly common.


    Revolutionary? Everything old is new again...

    http://www.mini-itx.com/projects/cluster/ [mini-itx.com]
    http://news.taborcommunications.com/msgget.jsp?mid =494184&xsl=story.xsl [taborcommunications.com] -- 8 way parallel cluster that fits on an airplane for under 3 grand
    http://www-03.ibm.com/systems/bladecenter/ [ibm.com] -- a 7U chassis that holds 14 blades, and is a bit spendy, but not completely unreasonable for some situations
    http://www.linuxjournal.com/article/8177 [linuxjournal.com] -- My personal favorite, this page talks about several small portable miniclusters that have been made over the last six or seven years...

    Yes, 8 cores of Athlon64 is faster than 8 cores of low power VIA CPU's from several years ago, but the concept isn't revolutionary, and there isn't a lot of headline worthy engineering that goes into a project like this... I'm sure it's a very handy tool, and I'm not suggested it shouldn't have been built, or that it was entirely trivial to build, but in the end, it's just four ordinary motherboards and ethernet.
  • by MacroRex ( 548024 ) on Friday August 31, 2007 @05:20AM (#20422243)
    Sorry for replying to myself, but I found an interesting paper [netlib.org] about the subject. Seems that a PS3 should have Rpeak of 14 Gflop/s with double precision floating point operations. Sounds to me that with a proper clustering solution a four-node PS3 cluster would be significantly faster than Microwulf. And it would probably be a smaller, too :)
  • Re:GigaFlops (Score:2, Informative)

    by skulgnome ( 1114401 ) on Friday August 31, 2007 @06:07AM (#20422439)
    Are your numbers on single-precision computation, or double-precision? Because the PS3's Cell only does amazingly quick floating-point on single-precision values. Double precision is six, seven times as slow.
  • Re:Lame. (Score:4, Informative)

    by GreatBunzinni ( 642500 ) on Friday August 31, 2007 @06:24AM (#20422485)

    And I guarantee that four "nodes", aka Linux PCs, are cheaper than $2500.

    Indeed. After I saw the component prices I was left dumbfounded. I mean, AMD Athlon 64 X2 3800+ processors at 165 dollars a pop? A kingston 1GB DDR-667 stick of RAM at 124 dollars? Are they on drugs? I mean, I've just bought an Athlon 64 X2 4000+ EE for 68euros (the 3800+ was selling for 59 euros) and each kingston 1GB DDR-800 stick for 46 euros. Where did all the rest of the money went?

  • Re:Wussywulf? (Score:2, Informative)

    by Draconian ( 70486 ) on Friday August 31, 2007 @06:36AM (#20422527)

    they require special clusterish programming
    So ? On an SMP machine you need special SMP-ish programming. Great fun if your memory bandwidth runs out...

    Some problems run naturally on distributed systems, some on shared-memory systems. It's a matter of choosing the right machine for the task at hand. Programming in MPI isn't that hard, and unless you are network bound (either bandwith or latency) it scales well. That is the equivalent of an SMP-machine not being memory bound (bandwidth, latency, coherency,...)
  • by JurgenThor ( 675394 ) on Friday August 31, 2007 @07:01AM (#20422641)
    Your first assumption ("So 1 Hz equals 1 FlOp? ") is wrong. FLOPS is Floating Operations Per Second.

    http://en.wikipedia.org/wiki/FLOPS [wikipedia.org]
  • by locster ( 1140121 ) on Friday August 31, 2007 @07:21AM (#20422743)
    Am I missing something here? The Sisoft Sandra MFLOPS measurement for a top end Intel Core 2 is 47 GFlops http://www.tomshardware.co.uk/overclocking-intel,r eview-2395-28.html/ [tomshardware.co.uk]. OK admitedly this is a sythetic measurement, but it's a ballpark figure right?

  • by noahisaac ( 956470 ) on Friday August 31, 2007 @07:25AM (#20422755) Homepage

    So 1 Hz equals 1 FlOp? And a 3.2 GHz CPU can do 3.2 gigaflops, right?
    No, one hertz is one cycle of the processor.

    Can they execute multiple FlOps per tick then?
    Yes. A single processor will perform several steps in one cycle. Typically, the steps are something like:

    1. fetch (an instruction from memory)
    2. decode the instruction
    3. execute the instruction
    4. access (some memory location)
    5. writeback (some values calculated during this cycle)

    In reality, this cycle is usually more complex and processors are designed to predict certain events in order to pack more into a single processor cycle. On top of this, note that the processors used in this machine are all dual-core processors. This means that instead of the 4 processors listed on the hardware manifest, it's really more like 8 processors (well, not quite).

    And do we care that these will bottleneck at the rather limited bus (even forgetting about the switch).
    No.

    Hey, those computer engineering classes I was forced to take as a part of my CS major have actually proven useful! Oh wait, this is Slashdot.
  • by Anonymous Coward on Friday August 31, 2007 @07:38AM (#20422823)
    Yes, there is a gigabit NIC. Terrasoft even sells preconfigured clusters of PS3, though you end up paying much more than just the PlayStations when you want a full configuration including a head node.

    http://www.terrasoftsolutions.com/store/purchase.p hp?submit=hardware&submitimg%5Bhardware%5D%5Bsony% 5D=1 [terrasoftsolutions.com]
  • Re:Imagine... (Score:2, Informative)

    by Anonymous Coward on Friday August 31, 2007 @07:57AM (#20422951)
    Hmmm....

    NCSU Computer Science Dept. has PS3 cluster topping out at 218Gflops using 8 PS3s. PS3's are not $500 each, so that quite a bit better in terms of bang fot the buck. It's even better than the reduced price PC from Newegg.

    http://moss.csc.ncsu.edu/~mueller/cluster/ps3/ [ncsu.edu]

    http://moss.csc.ncsu.edu/~mueller/cluster/ps3/coe. html [ncsu.edu]

  • Re:the google way (Score:1, Informative)

    by Anonymous Coward on Friday August 31, 2007 @08:22AM (#20423121)
    google hasn't built a rack like that since the first year they were in business...

    these guys did though, and it looks the prof and the student just copied the concept: ultra cheap cluster computer [clustercompute.com]
  • by jaweekes ( 938376 ) on Friday August 31, 2007 @08:39AM (#20423253)
    I was always told that it took at least 2Hz for a processor to do one instruction, but that was back in 1991 when I took my electronics degree.

    A processor normally takes 2-3 clock pulses to perform any instruction, as it cannot perform the operation in the same clock cycle that it receives the operation in. If the operation requires a call to a memory location it will take 3 cycles (one to get the info from the memory location) which is why pre-fetch is so important in modern processors.

    A cycle is triggered by the rising edge of the clock pulse. Whatever the computer does must be completed before the start of the next cycle

    The instruction execution cycle is triggered by the clock cycle, but has several stages
    - Each stage is triggered by successive clock pulses
    - The exact timing depends on the details of a particular machine
    - A complete instruction cycle usually takes several clock cycles to execute

    The instruction cycle is divided into several stages
    - In some machines, some of these stages are performed simultaneously, which speeds things up
    - The stages are common to most architectures

    Sometimes called the Fetch-Decode-Execute Cycle.


    Taken from this pdf [google.com].
  • by Waffle Iron ( 339739 ) on Friday August 31, 2007 @08:47AM (#20423323)

    A processor normally takes 2-3 clock pulses to perform any instruction

    A modern processor may in fact take a dozen or more clock cycles to finish a single instruction. However, by utilizing pipelining, reordering and multiple execution units, a single core may be working on upwards of 50 instructions at once. The resulting throughput can be several instructions per clock on each core.

  • Re:Imagine... (Score:3, Informative)

    by mikael ( 484 ) on Friday August 31, 2007 @09:07AM (#20423463)
    That's probably what would happen if a dozen of these systems were made. Instead of a system in each office, they would probably be placed in a lab, if not in a server room somewhere with remote access through a thin client.
  • GPU cluster (Score:2, Informative)

    by ZonkerWilliam ( 953437 ) * on Friday August 31, 2007 @09:33AM (#20423681) Journal
    Although not as cheap as the Microwulf, Nvidia has a desktop super-computer for sale http://www.nvidia.com/object/tesla_deskside.html [nvidia.com] at 500 GigaGlops, to start.
  • Re:Definition? (Score:5, Informative)

    by mikael ( 484 ) on Friday August 31, 2007 @09:49AM (#20423827)
    The basic definition of a supercomputer is a system which has top performance compared to other computer systems (within the top 500 or 100).

    In the past, this could only be achieved by having custom CPU's to perform pipelining or parallel processing. Processors in the Cray supercomputers had extremely deep vector pipelines, which was good for three-dimensional simulations like CFD or computer animation. But other systems followed the parallel processing method. The Connection machine had 2^16 one bit processors which was good for encryption/decryption. Other systems used standard CPU's (Intel 80x86's, DEC Alpha's and M680x0's) connected together through a high-speed bus network.

    The different types of systems could be defined according to how these processed instructions/data.

    SISD - Single Instruction, Single Data - Early home computer
    SIMD - Single Instruction, Multiple Data - Vector processors
    MISD - Multiple Instruction, Single Data - Fault tolerant systems
    MIMD - Multiple Instruction, Multiple Data - Parallel processing CPU's

    Some systems had hardwared interconnect configurations - either a 2D square grid, a 3D square grid or torus network, or even star networks, while others had dynamic routing capability. Transputers only knew about the adjacent processors in the four compass directions (NESW).

    But all of these techniques have been incorporated into mainstream CPU's now - you now have dual-core and quad-core CPU's that can be used by laptops.

    Modern day methods are to make the systems super-scalar. Multi-core CPU's can be arranged side by side onto multi-CPU boards which in turn can be rack mounted into chassis which communicate through high-speed interconnect systems. There is no limit on the number of racks that can be used except space and money.
  • 1999 called (Score:3, Informative)

    by Sangui5 ( 12317 ) on Friday August 31, 2007 @10:44AM (#20424611)
    They want their slowest Top 500 machine back...

    List of #500 on the TOP500 by year
    Year . .- RPeak . . . | Machine's owner and country | Make & Model
    06/1998 - 15.0 GPLOPS | Southwestern Bell, USA. . . | HPC 6000, Sun
    11/1998 - 20.5 GFLOPS | Koeln Universitaet, Germany | HPC 10000 Sun
    06/1999 - 34.2 GFLOPS | CIEMAT, Spain . . . . . . . | T3E900 Cray
    11/1999 - 38.4 GFLOPS | Bank, United States . . . . | HPC 10000 400 MHz, Sun
    06/2000 - 51.2 GFLOPS | EDS, United States. . . . . | HPC 10000 400 MHz, Sun
    11/2000 - 78.0 GFLOPS | Zurich American, USA. . . . | SP Power3 375MHz, IBM

    Really, calling this a supercomputer is lame. It has only one 250GB disk; it will have utter crap IO performance. Most compute heavy jobs are also disk heavy because you want to checkpoint your intermediate results in case of a crash. Since there is only one disk, one machine must be serving it up to the others (NFS, ISCSI, whatever). It is clustered through gigabit ethernet, which will act as a limit on performance. They even skimped on the connection to the outside world and got a 100MBit card. "Real" clusters use Infiniband or Myrinet, both of which are optimized for high throughput with low latency and low contention. Gigabit is not. Linpack is rather kind to clusters; more finely grained parallel tasks will pay more for the poor linkup.

    Also, with only 4 processors one could also build a 4-way SMP machine which would then not have to deal with any sort of message passing at all. You instead get one shared memory interface. It may be slightly NUMA, but the extra latency cost of hypertransport is amazingly low. Instead, by putting only one dual core die per motherboard, you have to jump through hoops to move work from one die to another, and pay really bad latency costs. You could also do better with a 2 quad-core processors on the same mobo (although you'd have to go Intel for now...). It's easier to program, supports finer grained parallelism, and allows potential savings on other parts.

    I can get 2 quad-core Xeons at 2.4 GHz each and a 2 socket motherboard for $820 at newegg. They spent $980 on 4 dual-core 2GHz processors and 4 single socket motherboards. They also spent $240 on gigabit cards and the switch. So, for $400 less, I can have an SMP machine; one which probably has higher floating point performance as well. Rather than 4 cheap power supplies I can get one nice one (which is probably more efficient too). Further, I don't have to run 3 of my nodes diskless. Really, at this small scale a cluster is not the way to go.

  • by AJWM ( 19027 ) on Friday August 31, 2007 @11:21AM (#20425193) Homepage
    That wasn't true even in 1991, except maybe for Intel processors which are notoriously wasteful of clock cycles (which is why they have always advertised clock speed rather than instructions per second). A 1 MHz MOS 6502 was just as fast as a 4.7 MHz Intel 8080 (and needed fewer support chips).

    If you throw more transistors at the problem, and/or different architectures, you can complete instructions in a single cycle. (Especially e.g. register-to-register instructions where the answer comes out of the inputs at the speed of propagation delay through the gates.) If you do it right, you can even design clockless CPUs where the completion of the previous instruction triggers the start of the next one without waiting on an external clock.

    This assumes your CPU is not microprogrammed; the instruction words contain the relevant bitmasks for source and destination registers as well as the control code for the ALU. See for example the PDP-11 instruction set (IIRC).

    Of course as others have pointed out, modern processors pipeline multiple instructions at different stages of execution at the same time, for a net throughput of multiple instructions per cycle.
  • Which is fine (Score:4, Informative)

    by Sycraft-fu ( 314770 ) on Friday August 31, 2007 @11:57AM (#20425655)
    But you aren't really a supercomputer at that point, you're a cluster. These days the line is more blurred than in the past but more or less the difference is interconnect speed. In a real supercomputer, there are very high speed interconnects, so you can run things that heavily rely on one part communicating with another, like particle simulations. That's why the US Department of Energy buys so many, rather than clusters. They do things like weather simulation and simulation of nuclear weapons, where every node as to be able to talk to every other node with essentially no penalty.

    Now if you have a job that doesn't use a lot of inter-node communication, like say 3D rendering, then a cluster is a better answer. Normal hardware with Ethernet interconnects. Works great and is cheap since you can use commodity parts. But don't confuse that cluster with a real super computer, you throw one of those intense inter node problem at it, it'll fall over because the interconnects are too slow.

    Unfortunately these days people really blur the distinction. You'll see systems on the top 500 list that are really questionable. It'll be commodity hardware connected with something like infiniband. Ok, great, that is faster (both more bandwidth and less latency) than Ethernet, but it still isn't necessairily up to what you'd get from a real supercomputer.

    However in the case of this deal, no, not a super computer. It's a small cluster and they are just calling it a super computer as marketing, effectively.
  • PS3Wulf (Score:3, Informative)

    by Doc Ruby ( 173196 ) on Friday August 31, 2007 @12:17PM (#20425905) Homepage Journal

    Also in 2003, the University of Illinois at Urbana-Champaign's National Center for Supercomputing Applications built the PS 2 Cluster for about $50,000.

    The PS3 comes out of the box with a Cell uP [wikipedia.org] that gets something like 20 GFLOPS [stanford.edu] on each $500 PS3. It's already networked into clustered supercomputing [wikipedia.org] like this MicroWulf.

    A $500 PS3 has 20 of the 26.5 GFLOPS the $2800 MicroWolf has. MicroWulf runs Ubuntu, which can also run on PS3 [psubuntu.com]. If people can port Linux libraries like Mesa/OpenGL/X to the PS3 SPEs, where most of the power lies, then we'd be looking at $25:GFLOPS, not the $94:GFLOPS on the MicroWulf.

    And while taking a break, you can play Gran Turismo 5, and 40 more games you can afford with the money you save on HW.

Never test for an error condition you don't know how to handle. -- Steinbach

Working...