 
			
		
		
	
		
		
		
		
		
		
			
				 
			
		
		
	
		
		
		
		
			
				 
			
		
		
	
    
	10-Petaflops Supercomputer Being Built For Open Science Community 55
			
		 	
				An anonymous reader tips news that Dell, Intel, and the Texas Advanced Computing Center will be working together to build "Stampede," a supercomputer project aiming for peak performance of 10 petaflops. The National Science Foundation is providing $27.5 million in initial funding, and it's hoped that Stampede will be "a model for supporting petascale simulation-based science and data-driven science." From the announcement:
"When completed, Stampede will comprise several thousand Dell 'Zeus' servers with each server having dual 8-core processors from the forthcoming Intel Xeon Processor E5 Family (formerly codenamed "Sandy Bridge-EP") and each server with 32 gigabytes of memory. ... [It also incorporates Intel 'Many Integrated Core' co-processors,] designed to process highly parallel workloads and provide the benefits of using the most popular x86 instruction set. This will greatly simplify the task of porting and optimizing applications on Stampede to utilize the performance of both the Intel Xeon processors and Intel MIC co-processors. ... Altogether, Stampede will have a peak performance of 10 petaflops, 272 terabytes of total memory, and 14 petabytes of disk storage."
		 	
		
		
		
		
			
		
	
What is a Dell 'Zeus' server? (Score:4, Informative)
The article mentions that it's using Dell 'Zeus' servers, but the only information I can find about those servers online is that they are being used to build this cluster.
What is a Dell 'Zeus' server?
Impressive if it were built today. (Score:4, Informative)
By 2013, 10 petaflops will be a competent, but not astonishing system. Probably top 10-ish on the top500 list.
The interesting part here will be the MIC parts, from intel, to see if they perform better than the graphics cards everyone is putting into super computers in 2011 and 2012. The thought is that the MIC (Many Integrated Cores) design of knights corner are easier to program. Part of this is because they are x86-based, though you get little performance out of them without using vector extensions. The more likely advantage is that the cores are more similar to CPU cores than what one finds on GPUs. Their ability to deal with branching code, and scalar operations is likely to be better than GPUs, though far worse than contemporary CPU cores. (The MIC cores are derived from the Pentium P54C pipeline)
In the 2013 generation, I don't think the distinction between MIC and GPU solutions will be very large. the MIC will still be a coprocessor attached to a fairly small pool of GDDR5 memory, and connected to the CPU across a fairly high-latency PCIe bus. Thus, it will face most of the same issues GPGPUs face now; I fear that this will only work on codes with huge regions of branchless parallel data, which is not many of them. I think the subsequent generation of MIC processors may be much more interesting. If they can base the MIC core off of atom, then you have a core that might be plausible as a self-hosting processor. Even better, if they can place a large pool of MIC cores on the same die as a couple of proper Xeon cores. If the CPU cores and coprocessor cores could share the memory controllers, or even the last cache level, one could reasonably work on more complex applications. I've seen some slides floating around the HPC world, which hint at intel heading in this direction, but it's hard to tell what will really happen, and when.