Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Supercomputing IBM

IBM's Eight-Core, 4-GHz Power7 Chip 425

pacopico writes "The first details on IBM's upcoming Power7 chip have emerged. The Register is reporting that IBM will ship an eight-core chip running at 4.0 GHz. The chip will support four threads per core and fit into some huge systems. For example, University of Illinois is going to house a 300,000-core machine that can hit 10 petaflops. It'll have 620 TB of memory and support 5 PB/s of memory bandwidth. Optical interconnects anyone?"
This discussion has been archived. No new comments can be posted.

IBM's Eight-Core, 4-GHz Power7 Chip

Comments Filter:
  • by badboy_tw2002 ( 524611 ) on Monday July 14, 2008 @10:19PM (#24190701)

    If your process looks like this:

    int main()
    {

    while (something)
    {
          doSometing();
    }

    }

    It will hit 100% on one core and that's it. Its not multithreaded - one CPU will churn on it forever and the others will sit around waiting for a task from the OS. 2 course, 200,000 cores the results will be the same. These machines are made for tasks that are broken up into lots of smaller jobs and processed individually. Its not magic - more cores won't get a single threaded process done faster.

    Seriously.

  • Re:Core pron (Score:3, Informative)

    by McGiraf ( 196030 ) on Monday July 14, 2008 @10:41PM (#24190925)

    Veni, vini, vomi.

  • by tchuladdiass ( 174342 ) on Monday July 14, 2008 @10:47PM (#24190971) Homepage

    The funny thing is that it teeter-totters back and forth from one core to the other. I wish I knew what made it do that.

    The OS runs the process a few milliseconds at a time, then kicks the process of the cpu for another process to run (if there is one, including OS tasks such as I/O routines). When the OS starts up the process again for a few more milliseconds, it may start it up on a different core. That is why both cores will show 50% average utilization.

    Now if you set CPU affinity for that process to be on one core, then it will max that core out at 100% and the other core will be idle. This may result in better performance, because you get better cache utilization if the process stays on the same core.

    On a related topic, this can also be the case if the app is multithreaded -- sometimes it is more efficient to run multiple threads on the same CPU instead of across CPUs, if each thread is accessing the same region of memory. Otherwise, if the threads are on different CPUs or cores, then the threads are constantly invalidating the cache on the other core, causing more (expensive) reads/writes to main memory.

  • by Anonymous Coward on Monday July 14, 2008 @10:52PM (#24191007)

    Actually POWER6 is has an effective two threads per core. If I boot a system with 16 cores and SMT enabled, AIX sees 32 "processors." With SMT disabled, I see 16 "processors."

  • by Annymouse Cowherd ( 1037080 ) on Monday July 14, 2008 @10:53PM (#24191025) Homepage

    No, each core is running at 4Ghz. That does not total up to 16 Ghz processing power though, because only multithreaded programs can take advantage of more than one core at once, and they still have to wait if they're sharing data.

  • by vux984 ( 928602 ) on Monday July 14, 2008 @10:54PM (#24191033)

    Aren't a lot of games and apps single-threaded? Hmmm. I figured that dual/quad-core wasn't all it's cracked up to be. So, essentially, if I have a single-threaded app on a quad core, it'll perform at 1/4th the potential speed.

    Yes, although, most high end games and game engines actually are multi-threaded. Few are designed to take advantage of more than 2 cores though, and none that I know of will use 8 or 300,000...

    So, essentially, if I have a single-threaded app on a quad core, it'll perform at 1/4th the potential speed.

    Not necessarily. If you have 3 women can you make a baby in 3 months instead of 9? Given that it still takes 9 months and 2 of the women are idle, would you say that these women are performing at 1/3rd the potential speed? Same sort of logic applies here. If the task is inherently sequential, having more cores (or ladies) won't make it any faster.

    Somethings -are- highly parellizable, like ray-tracing or cutting down all the trees in a forest.. and other things are partly parallelizable... like changing tires (a pit crew can change 4 tires at once... but adding more staff to allow you to change 5 tires at once doesn't make your team any faster...)

    That doesn't leave me with a warm and fuzzy feeling inside.

    Yes, in general computing applications, an 8GHz CPU would be faster than a quad core 2GHz. (And even under optimal parallilizable situations the 2ghz quadcore would just barely surpass the 8ghz cpu due to lower task switching overhead.) So the faster single cpu is almost always better. The reason we have quad core 2Ghz cpus is that they are much much more practical to actually make, and a lot of the stuff that takes a long time (rendering 3d, encoding movies, etc is actually highly parellizable so we do see a benefit. And much of the single threaded sequential stuff we see is waiting on hard drive performance, network bandedith, or user input... so cpu isn't the bottleneck there anyway.

    The funny thing is that it teeter-totters back and forth from one core to the other. I wish I knew what made it do that.

    If you look at task manager, there what? some 40+ processes running. The OS rotates them onto an off of the 2 cores based on what they all need in terms of cpu time. So your 'cpu heavy task' gets pulled off a core to give another task a timeslice, and then once its off, it can be scheduled back onto either core. Ideally should stay on one core to maximize level one cache hits, etc, but if its been off the core long enough for the other processes to cache all new memory it doesn't really matter which one it gets assigned to, and in any case flipping from one to the other every now and then makes a almost immeasurably small performance difference.

    btw - the 'set processor affinity' feature tells the OS that you really want this process to run on a given cpu/core, instead of hopping around. But in most cases its not something one needs (or gains any benefit) from doing.

  • by Anonymous Coward on Monday July 14, 2008 @10:56PM (#24191051)

    A geek's wet dream if there ever was one.

    Toshiba actually announced [cnet.co.uk] such a beast (albeit with less SPEs, wich might be a way to use slightly defective cell chips to increase the yields).
    The only problem is that if this is only used in one machine, noone is going to bother writing applications for it.

  • by Zeussy ( 868062 ) on Monday July 14, 2008 @11:06PM (#24191147) Homepage
    Or, the other thing I like about dual core and up boxes is that they appear to be more stable, back on a single core machine, when a process really wanted to lock up, in some mysterious while(1); loop it could be a real try of patience to kill the app. On a dual core machine no worries, still got the other core to save yourself with :)
  • Re:Core pron (Score:4, Informative)

    by Kingrames ( 858416 ) on Monday July 14, 2008 @11:46PM (#24191409)

    I...

    KHaaaaaaaaaaaaaaaaaaaaaaaaaaaNnnnn!!!!!!!!!!!!

  • by tuomoks ( 246421 ) <tuomo@descolada.com> on Tuesday July 15, 2008 @12:27AM (#24191733) Homepage

    Right, except it's not always just I/O. I'm not much Windows fan but (XP at least) can be efficient. It is the bad application - I designed a comm. subsystem, queuing, en/decryption, image translation, key management, etc, tested it in 1,2,4,8 core systems (emulating the application) and could drive all cores to 100% busy, almost linear throughput increase. Now - add an application to top of that - 1-2 cores, 20%, 2-4 cores none and 8 cores -%10 throughput?

    Took three months to fight the application developers (and they still don't get it?) - total misuse of threadpools in C#! And they were supposed to be the C#/.NET specialist - I'm just an OS guy (mainly MVS/Unix/Linux?) And I had a very good team writing the services for that subsystem but no saying anything about the application design?

    The problem I see is that Windows is so much easier to write bad applications - the subsystem actually (excluding auditing) runs under Wine in Linux and, just for fun, I tested it. Same results, very near same throughput.

    And please, if running in Intel, test your hyper-threading - not good for everything!

  • by SQL Error ( 16383 ) on Tuesday July 15, 2008 @12:28AM (#24191739)

    It should be noted that previous POWER architectures had 2 threads per core.

    Correct.

    They also had SMT ( Simultaneous Multi-Threading ) support, which gave them an "effective" 4 threads per core.

    No, they do not "also" have SMT. It is the SMT that gives them 2 threads per core in the first place.

    Power 5 & 6 have 2-way SMT. Power 7 has 4-way SMT.

  • Re:Toasty. (Score:4, Informative)

    by jmorris42 ( 1458 ) * <jmorris&beau,org> on Tuesday July 15, 2008 @01:13AM (#24192051)

    > not to mention, in 2007 that the northwest passage was completely ice free for the first time in recorded history.

    Yea, right. Pull the other one. First time in recorded history huh? Except for 1906, 1944, 1957, 1969, 1977, 1984, 1985, 1988 and 2000 in wooden ships, catamarans, naval vessels, cruise ships, etc.

    Stop beliving the propaganda and do some googling before you open yer piehole and up looking like a retard.

    btw, here is the link I got from Google searching for "northwest passage ice free"
    Classically Liberal: Bad reporting about the Northwest Passage issue [blogspot.com]

  • Not a contradiction (Score:3, Informative)

    by Weaselmancer ( 533834 ) on Tuesday July 15, 2008 @01:25AM (#24192121)

    Nah. If something gets warmer it is caused by Global Warming and the solution is to eliminate Western industrial civilization.
    If something gets colder it is Global Climate Change and the solution is to eliminate Western industrial civilization.

    Not a contradiction, even though it seems like one.

    Study the bifurcation diagram. [wikipedia.org] As you drive the system harder by turning up R (which may be analogous to global warming - i.e. more available heat energy might be described this way) notice how the system follows R, then suddenly begins oscillating between two extremes. Keep on driving R harder and it breaks into chaos.

    The weather IMHO has a lot in common with the logistic map equation. It's present behavior is dependent on it's past state, it's swings are driven by the energy input to the system, etc.

    I know it's a gross oversimplification, but so is a mass falling through a uniform gravitational field with no wind resistance and so on. It's still useful to think about.

  • by asc99c ( 938635 ) on Tuesday July 15, 2008 @01:53AM (#24192253)

    A lot of application and games writers are complaining bitterly about the move to multi-core processing, as it does mean you need to change the way the code is written to take advantage of it.

    I write stuff that runs on big UNIX boxes that has been necessarily multi-process for a long time. It's just a matter of finding things that can be done independantly and then explicitly putting them in their own process.

    Ideally languages and compilers will do this at some point but so far mainstream languages do not. Also when you're doing desktop GUI apps it's often tricky to do a good job of multi-processing, and the GUI toolkits don't yet do much to help.

  • by Macman408 ( 1308925 ) on Tuesday July 15, 2008 @04:26AM (#24192975)

    I always thought the definition of a "core" as whatever the minimal set of hardware required to run a single thread at "full power". By my logic, anytime you run more than one thread on a core, you're doing what SMT does.

    Someone please tell me if I'm wrong (and how).

    A core is a set of registers and function units, among other hardware. Each core is, effectively, a completely separate processor (though it likely shares some things, such as the L2 cache and FSB with other cores). Since processor usually refers to a whole chip (encased in a plastic or ceramic package, and soldered on the motherboard), the term "core" refers to when there is more than one inside a single package. The ultimate goal is usually to have all cores on a single piece of silicon, but often multi-chip modules are used (especially early in production), where a four-core processor might contain two silicon dies, each with 2 cores. This can help increase yield (by reducing the die size), and reduce production cost. After improving the yield of the processor, or changing to a reduced feature size (eg 90 nm to 65 nm), a switch back to a single die is possible, reducing the packaging cost.

    Simultaneous Multithreading (SMT), on the other hand, works on a single processor/core. It is a feature that allows sharing of the processor resources, such as registers and functional units. A PowerPC 970 (which never had SMT support) could issue 4 instructions and 1 branch every clock cycle. Because of that, plus the deep pipelines, up to 216 instructions can be in various stages of completion at any given time. However, on average, a program branches every 4 instructions - this means that the processor would have to correctly predict 54 branches to keep the pipeline full, AND that the instructions would be (mostly) independent of each other. This isn't easy to do. So, what many processors do is split the available resources. One might issue 2 instructions from one thread and 2 instructions from another in each clock cycle, or alternate clock cycles issuing 4 from one, then 4 from the other. This shares most of the CPU's resources, while requiring a fairly minimal amount of extra logic to track the second thread.

    So, cores are extra hardware that can perform more calculations. SMT is taking better advantage of what is already present.

    However, the disadvantage of SMT is that it can slow a single-threaded program down, because now it has to share resources. Some processors actually do away with superscalar (aka issuing multiple instructions at once) and out-of-order execution and bypass logic, and instead rotate through many threads. For example, if the pipeline is 8 stages deep and supports 8 threads in hardware, it can issue one instruction from each thread in each clock cycle. Then it never needs to check if an instruction is dependent on an earlier one, because an instruction is completed before the next one from the same program is issued. Having this many threads can also minimize the cost of a branch misprediction, cache miss, or other long-latency events. Also, removing the bypass, dependency checking, multiple-issue, and instruction reordering logic can give a significant reduction in power and area on the chip. The performance hit by these eliminations can then be made up by adding many more cores than you'd see in a superscalar out-of-order processor like a Pentium or Core architecture. The catch is that a processor like this is only faster if you have enough threads to keep the processor busy. However, if your problem is big enough to need a supercomputer, you're darn well going to spend time writing it to take advantage of as many threads as you can.

  • Re:Toasty. (Score:4, Informative)

    by rxmd ( 205533 ) on Tuesday July 15, 2008 @05:53AM (#24193411) Homepage

    not to mention, in 2007 that the northwest passage was completely ice free for the first time in recorded history.

    Yea, right. Pull the other one. First time in recorded history huh? Except for 1906, 1944, 1957, 1969, 1977, 1984, 1985, 1988 and 2000 in wooden ships, catamarans, naval vessels, cruise ships, etc.

    One should note that in 1906 at least it wasn't exactly ice free, which is why it took Amundsen and his Gjoa three years to pass through (1903-1906). Your list is basically a list of years when some vessel finished sailing through the North-West Passage, but it doesn't really say anything about how much ice they encountered on the way.

    That's also the basic fallacy of the blog you're linking to - it mentions an ice-free North-West Passage, but only for 2000. For the other years it just mentions a couple of vessels, while not really saying anything ice (except for 1984, where it says that the ice was "in retreat", implying that there still was some ice there).

    So, for a comment like "Stop beliving the propaganda and do some googling before you open yer piehole and up looking like a retard." you are leaning out of the window a bit too much for my taste.

  • Re:PPC Linux (Score:1, Informative)

    by Anonymous Coward on Tuesday July 15, 2008 @07:53AM (#24194009)
    The PPE uses a simpler, in-order architecture, so it will be a lot slower than a POWER4-7 running at the same speed. But yeah, plenty fast for most uses.

    Running your graphics on the SPEs means they're not available for other tasks, and IIRC there's a big bottleneck when transferring the graphics into the video memory at least on high resolutions.

    Still, if one really wants to run Linux on PPC something like Terra Soft's PowerStation is a far better option. If you really want to play with Cell, bung in one of those Toshiba 4-SPE Cell cards and go to town.
  • Re:Toasty. (Score:1, Informative)

    by Anonymous Coward on Tuesday July 15, 2008 @10:32AM (#24195983)

    That's why it's called global disruption now:
    http://www.democracynow.org/2008/7/3/global_disruption_more_accurately_describes_climate

And it should be the law: If you use the word `paradigm' without knowing what the dictionary says it means, you go to jail. No exceptions. -- David Jones

Working...