Cell Architecture Explained 570
IdiotOnMyLeft writes "OSNews features an article written by Nicholas Blachford about the new processor developed by IBM and Sony for their Playstation 3 console. The article goes deep inside the Cell architecture and describes why it is a revolutionary step forwards in technology and until now, the most serious threat to x86. '5 dual core Opterons directly connected via HyperTransport should be able to achieve a similar level of performance in stream processing - as a single Cell. The PlayStation 3 is expected to have have 4 Cells.'"
Seeing is believing (Score:3, Informative)
It's not like we haven't heard it before. It usually turns out to be halfish-truish for some restricted subset of operations in a theoretical setting, you know where you discount busses, memory and latencies.
Re:What always confused me (Score:2, Informative)
But it plays all the popular games of today's PC with little to no lag. Where as you need a very high end PC to play the same game!
This is mostly due to the fact that the architecture with the video is more direct, than it is on a PC. There's no AGP bus, or any bottle neck to access video ram. It's more direct which is probably why an Xbox can perform as well as a current PC rig.
But then an Xbox is only running at 800x600. LOL
Re:What's that? Microsoft isn't supporting it? (Score:1, Informative)
Huh?? what are you smoking? Since when does Apple emulate x86 harware? Perhaps you are confused by the fact that you can buy Microsoft Office for the macintosh, or run Internet Explorer. Heres a news flash - they aren't emulating the x86, they are native mac code.
Most likely you are confusing one of two different things. Back in the OS X programs can be either Cocoa (the new way) or Carbon (the old way) apps. But that isn't really emulation. What is emulation is that Mac OS X can run OS 9 by emulation - nothing to do with x86 here.
Or you can buy Virtual PC, which is now owned by Microsoft. It will allow you to emulate x86 to run Windows (or Linux) and associated. But note - this isn't Apple, and it isn't something that Mac users "have to do".
Lastly, why on earth would you HAVE to be able to run windows programs in order for a Playstation processor to be successful? Last time I checked PlayStation was still wildly successful? More so than MS Xbox I think.
Is that really that revolutionary? (Score:3, Informative)
Now if the can be made very fast and have only a few (2-8) coupled together...well,as it was said, that is what a nice Opteron machine does anyway nowadays.
Compiler technology (Score:5, Informative)
The potential of parallel architectures has never been in doubt since the early days of the Cray monsters - but how to compile code to use all the features efficiently has.
I don't believe that we see the full advantage of these types of architecture exploited without some similar break-through in software tools.
Mind you the hardware rocks...
Re:if it sounds too good to be true.. (Score:4, Informative)
I suspect that the main reason there was never an Emotion Engine based cluster product was because the high performance market is tiny, especially compared to the console market, and Sony was already having trouble meeting demand with their exotic chipset when it first came out.
Anyways, I think the guy does go overboard about this new architecture. It probably will be a lot faster than PCs at certain tasks but you can only fit so many transistors in a chip. The cell stuff is cool though, it seems to fit a lot better with what most computers spend their time processing unless you're doing a lot of compiling or database operations.
Comment removed (Score:2, Informative)
Re:What always confused me (Score:3, Informative)
Well no, not exactly. The reason console games don't suffer from lag is that unlike a PC, the hardware specs are not a moving target during development. Developers can optimize textures, audio, algorithms etc with a specific platform in mind. This makes it much easier to create content that you know won't overwhelm the machine.
Compare this with a PC developer. They have to estimate the time it takes to develop the game. Then they have to estimate the average gamer hardware and the cutting edge gaming hardware at the time the game is released. They have to take into consideration at the very least processor speeds, main memory size and speed, graphic card speed and memory size.
If the developers overestimate, the game will be unplayable (when the first System Shock came for instance, I remember reviewers writing "You actually need a PENTIUM to play this game, it's insane!"). If they underestimate hardware or take too long, they will be killed by reviews complaining about "outdated graphics". Oh, and preferably there shouldn't be any problems with any special configuration.
This is extremely difficult to achieve. Half-Life 2 for instance was praised for the fact that it managed to scale its graphics so it was playable on low end yet good looking on high end machines. However, some people experienced stuttering of audio as levels started, this was even more noticable in Vampire: Bloodlines, a great game that uses the HL2 engine. I think this had something to do with the hard drive loading textures or level geometry (I noticed it especially when loading the huge LA Downtown level in Vampire, sound was stuttering for 10 seconds after level loaded). People with fast hard drives, especially those that chose or had to chose low resolution textures didn't suffer from it as much as those with graphics cards with lot of memory for high resolution textures and comparatively slow hard drives.
So, the interaction between the many different hardware configurations on PCs makes it difficult to optimize and that is what causes the lag, not the lack of any AGP bus or anything. A console developer can test on just on one console and be fairly certain it will run the same on all target machines.
Re:Can this be taken seriously? (Score:2, Informative)
But what struck me most is that you seemed to have missed the whole point the authors seeks to make. Yes, Moore's law will double the performance of the GPU within 18 months. So? It still does not give them the raw processing power of those Cells. Nor the scalability. (Damned! These things will be in you TV, your DVD, your stereo and they all cluster...) If these Cells really become low cost chips, I seriously doubt x86 will survive.
No longer true (Score:2, Informative)
This begat RISC. A CISC computer had a more complex in struction set, but that barely left it with enough transistors for a couple of general purpose registers. A RISC computer, on the other hand, went by the mantra "never do in software what the compiler can do for you", so it had an over-simplified instruction set, but then it had enough transistors left for more registers.
In a sense each of the two was too expensive for _someone_. For CISC, registers were too expensive. For RISC the decoder was too expensive. In truth, both were expensive, and the grand unified theory
Fast-forward a bit, and registers are _not_ expensive for CISC any more. You do mention "what will prevent Intel/AMD to add a technology which could use multiple sets of registers", so the answer to that is: they already do. Both have huge register stacks they internally use for renaming. (E.g., when you swap EAX and EBX, the data isn't really copied, but the register from that huge stack which is currently EAX and the one which is currently EBX, get renamed to EBX and EAX respectively.)
Either way, they already have the Cell's 128 registers, and some even have 256 registers. You just don't see them from the outside. (Which is a pity, since compilers could really use them.)
Is there something to stop them from exposing more of those registers to the outside world? Nope. The AMD Athlon 64's (now also addopted by Intel) "64 bit extensions" already do just that: they double the number of general purpose registers visible to the program. That's largely what gives an Athlon 64 the speed boost when running 64 bit code: the extra registers.
Is there something to keep them from doubling or quadrupling them again? From a technical point of view, nothing whatsoever.
What's been keeping them so far is the software backward compatibility. A Pentium 4 still has to run code written for a 486. So whatever changes you do to their instruction set, they must leave the old pre-existing instructions unchanged. And there simply aren't bits left to add the new registers without changing the whole instruction set.
The migration to 64 bits has been such a good excuse to come up with a completely new instruction set, with more general-purpose registers. But such excuses are few and far between.
As for RISC... it died, it lost the battle. Yes, Apple and IBM still use it as a marketting buzzword, but that's it. There are _no_ RISC CPUs still being produced.
The G5 in Macs is simply a CISC with more registers and a better instruction set, but it's CISC nevertheless. It's internal structure is _not_ RISC, and AltiVec is _not_ RISC. They're in fact contrary to everything RISC stood for.
Ditto for Sun's UltraSparc.
(And everyone arguing that a G5 is RISC, has obviously never programmed a RISC CPU before. You had to take care of every single detail in software, because of the mantra "never do in hardware what the compiler can do for you." Even recovering from a pipeline overflow when an interrupt came, you had to do that in software.)
Hope this helps.
Re:Can this be taken seriously? (Score:1, Informative)
That's not what he's saying, if you take the quote in context. He's saying that a PC using the GPU for general-purpose computation won't beat the Cell at the same computations.
What he's specifically not saying is that the Cell is better than a GPU for doing graphics. The GPU is a special-purpose chip for graphics, so of course it's going to outperform something more generic, on cost if on nothing else.
Re:if it sounds too good to be true.. (Score:1, Informative)
"...should be able to achieve a similar level of performance in stream processing - as a single Cell. "
"in stream processing"
Yes Sony, Opterons make shitty graphics card. We already knew that.
Re:Microsoft isn't supporting it? Who Cares? (Score:4, Informative)
There are two operating systems Microsoft have developed called Windows. DOS/Windows, the original one, was based on an x86 clone of CP/M that Microsoft bought. The first version, "Windows 1.0", was released in 1985. The last version, called "Windows Me", was released in 2000, IIRC. This OS was always x86-only, originally ran on archaic CPUs without memory protection and never supported full protected memory, symmetric multiprocessing or other (now) basic OS features.
The second OS developed by Microsoft that's marketed as Windows is Windows NT (now just called "Windows"). It was started in 1988, and never had any relation to DOS/Windows, except insofar as it can (to some extent) emulate it for compatibility reasons (including an x86 emulator on hardware that can't natively execute x86 code). Windows NT was developed on the MIPS platform, not the x86. The original plan had been to use the Intel i860 (an LIW architecture completely different from the x86) as the development platform, but the i860 hardware never met its promise, so MIPS was chosen instead.
The first version of Windows NT was released in 1993, and called "Windows NT 3.1" (3.1 was used for marketing reasons, since that was the latest version of DOS/Windows at the time). Like UNIX, it was mostly written in C, with assembly at the low level to handle hardware dependencies. At its release, Windows NT 3.1 ran on 32-bit MIPS (the development platform) and 32-bit x86 (the first port).
The second version of Windows NT (3.5) was released in 1994, and planned to add 64-bit Alpha (in a semi-crippled, 32-bit mode) and 32-bit PowerPC. However, IBM and Motorola ran into problems with the hardware (in part because of ongoing disagreements with Apple, who wanted to use their own, proprietary platform), so Windows NT 3.5 only added Alpha support. In 1995, after IBM and Motorola had managed to (mostly) sort out their problems (but with Apple declining to follow the IBM/Motorola PReP standard), the PowerPC port of Windows NT was completed, and released as version 3.51. At this point, the OS ran on MIPS, x86, Alpha and PowerPC.
In 1996, the user interface of Windows NT was upgraded to match the user interface of the popular 4.0 release of DOS/Windows (called Windows 95). Windows NT 4.0, which copied the user interface of DOS/Windows 4.0, ran on MIPS, x86, Alpha and PowerPC.
By the late 1990s, as Microsoft continued work on version 5.0 of Windows NT, the market had lost confidence in non-x86 systems for general-purpose PCs (apart from Apple Macs, which didn't follow the PReP standard, so couldn't run OSes ported to it, like AIX and Windows NT). As a result, Microsoft and the vendors of MIPS and PowerPC workstations agreed to cease development and marketing of NT 5.0 for those platforms. Windows NT 5.0 continued to be developed for the x86 and DEC Alpha architectures, into the beta releases.
DEC (which was taken over by Compaq) had continued to have hope for the Alpha as a general-purpose alternative to the x86, but financial difficulties led to the project being abandoned towards the end of the developent cycle for Windows NT 5.0 (marketed as "Windows 2000"). As a result, Windows NT 5.0, completed at the end of 1999, was the first version of NT that only ran on one platform (the x86).
A port of Windows NT 5.0 to the 64-bit Intel Itanium, including 64-bit versions of the Windows APIs (unlike the earlier Alpha port), was released in 2001, but only to select customers.
Windows NT 5.1 (marketed as "Windows XP) was also released in 2001, and again only ran on the x86, apart from another 64-bit limited release for Itanium (in 2002, IIRC).
Windows NT 5.2 (marketed as "Windows Se
Compiler technology - OpenMP (Score:3, Informative)
One question which was not addressed fully in the article was how do you compile/test programs for this thing. The answer is OpenMP [openmp.org]. OpenMP is mulithreading API wich can hide parallelization from the user almoste completly. It's embarassingly easy to use - only one line of code is enouth to parallelize a loop. All threads creation/synchronisation remain hidden from user. It's extremly efficient too - I was never able to achime the same level of performance if duing multithreading myself.
Re:How could they possibly do this cheap? (Score:3, Informative)
Sony is going with Cell from IBM and an nVidia graphics chipset. So I don't see a huge difference. My guess is that both consoles will have extremely similar performance and this next generation of consoles will be the most boring ever -- lots of multi-platform games that look identical.
Re:4.6 Ghz ? I don't belive it (Score:2, Informative)
Re:Nicholas Blachford is an idiot. Please don't re (Score:2, Informative)
If you (or anyone) can solve this problem well, you'd be famous and wealthy beyond the dreams of avarice (assuming you patent it and license it out
Re:multicore, stream-processing, vector-oriented B (Score:1, Informative)
sheeesh... If you would stop for a moment to think about it, you'd notice that this architecture is vastly different from your average PS2 arch. The "caches" are actually one per APU which means that these APUs have a kind of access to "locked-down" cache memory - this is basically what most modern signal-processing CPUs do. Take a look at some DSP code specifically tailored for the PPC440 cpu, it explicitly takes advantage of the ability to lock the CPU's caches. I'm sure the main processing unit WILL have L2 cache, though.
Re:What always confused me (Score:3, Informative)
Anyway, hardware really didn't affect the game like some people pretend it did. The streamlined gameplay was because Harvey Smith and his team wanted it that way (the Xbox has certainly seen plenty of more complicated gameplay systems than Deus Ex 1!), and the vast majority of them would have occurred even if the game was PC only. I disagree with the decision as well, but the devs wanted the gameplay to be simpler and more focused.
Most of the engine limitations were simply because they chose poor technology - the hacked-up UT2k3 engine didn't scale or perform well on the PC, either. Lots of Xbox games feature huge levels with minimal or even non-existent load times (see Riddick, Halo series, Ninja Gaiden...).
(And I would point out that the original DE was pretty spotty when it came to tech as well. Very slow on most systems when it was released, without particularly nice graphics to compensate. I suspect they just donn't have the calibur of 3D programmer required...)
And as much as I love DE1, it really didn't leave room for a sequel. Near-future stuff works fine, but once you get close to or even past the Singularity, it is almost impossible to create a realistic or interesting setting (as any human just wouldn't be close to where the real action is). Most of the truly interesting conspiracy theories were already dealt with (seriously, you already used the Templars, the Illuminati, and Majestic 12). Most of the interesting future tech was used (nanotechnology especially, though I do think Invisible War expanded on it in interesting areas). And you couldn't reasonably expand on the cyberpunk theme too much, because the world had already been pulled back from the brink in the original. (The globalization issues that Invisible War brought up was a good attempt, but that is really hard to address in a FPS - there are no real masses, you understand?)
Re:I'll believe it when I see it (Score:3, Informative)
The Sega Saturn used dual processors, and was nearly a clone of top-end Sega arcade systems. Unfortunately, it was terribly hard to program, so only in-house Sega titles were developed to utilize the full potential of the device, such as Virtua Fighter, while other titles were only using half the performance of the system.
It was not, however, cheap.
The Jaguar was cheap and was hard to program for (primarily because of bugs in the cheap hardware), but it could hardly be considered a powerful system. It was well ahead of NES/Genesis, but it came in late in the game, when they were both well established, and the Playstation/Sega Saturn were just around the corner.