Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Intel Supercomputing

TACC "Stampede" Supercomputer To Go Live In January 67

Nerval's Lobster writes "The Texas Advanced Computing Center plans to go live on January 7 with "Stampede," a ten-petaflop supercomputer predicted to be the most powerful Intel supercomputer in the world once it launches. Stampede should also be among the top five supercomputers in the TOP500 list when it goes live, Jay Boisseau, TACC's director, said at the Intel Developer Forum Sept. 11. Stampede was announced a bit more than two years ago. Specs include 272 terabytes of total memory and 14 petabytes of disk storage. TACC said the compute nodes would include "several thousand" Dell Stallion servers, with each server boasting dual 8-core Intel E5-2680 processors and 32 gigabytes of memory. In addition, TACC will include a special pre-release version of the Intel MIC, or "Knights Bridge" architecture, which has been formally branded as Xeon Phi. Interestingly, the thousands of Xeon compute nodes should generate just 2 teraflops worth of performance, with the remaining 8 generated by the Xeon Phi chips, which provide highly parallelized computational power for specialized workloads."
This discussion has been archived. No new comments can be posted.

TACC "Stampede" Supercomputer To Go Live In January

Comments Filter:
  • by afidel ( 530433 ) on Thursday September 13, 2012 @02:11AM (#41320799)

    I wonder why it's got such little memory? You can easily run 64GB per socket at full speed with the E5-2600 (16GB x 4 channels) without spending that much money. Heck for maybe 10% more you can run 128GB per socket (You need RDIMM's to run two 16GB modules per bank). They're apparently only running one 16GB DIMM per socket (any other configuration would be slower on the E5) which IMHO is crazy as you're going to have a hard time keeping 8 cores busy with such a small amount.

  • by loufoque ( 1400831 ) on Thursday September 13, 2012 @06:43AM (#41321773)

    You will be parallelizing, and each thread will only ever be able to use max_mem/N for its own processing.
    When you parallelize, you avoid sharing memory between threads. Your data set is split over the threads and synchronization is minimized. In a SMP/NUMA model, this is done transparently by simply avoiding to access memory that other threads are working on. In other models, you have to explicitly send the chunk of memory that each thread will be working on (through DMA, the network, an in-memory FIFO or whatever), but it doesn't change anything from a conceptual point of view.

    If your parallel decomposition is much more efficient if your data per thread is larger than 1GB, then you cannot possibly run 64 threads set up like this on the MIC platform. There is often a minimum size required for a parallel primitive to be efficient, and if that minimum size is greater than max_mem/N then you have a problem. This is the limiting factor I'm talking about.
    128 MB, however, is IMO quite large enough.

    In fact this is a major advantage of MIC versus GPUs.

    The advantage of MIC lies in ease of programming thanks to compatibility with existing tools and the more flexible programming model.
    Memory on GPUs is global as well, so I have no idea what you're talking about. There is also so-called "shared" memory (CUDA terminology, OpenCL is different) which is per block, but that's just some local scratch memory shared by a group of threads.

    There is nothing nighmarish of the above

    Please stop deforming what I'm saying. What is nightmarish is finding the optimal work distribution and scheduling of a heterogeneous or irregular system.
    Platforms like GPUs are only fit for regular problems. Most HPC applications written using OpenMP or MPI are regular as well. Whether the MIC will be able to enable good scalability of irregular problems remains to be seen, but the first applications will definitely be regular ones.

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (5) All right, who's the wiseguy who stuck this trigraph stuff in here?

Working...