Architectural Difference Between The P4 And G4 78
homerJAYsimpson writes: "This article is a great refernce of the differences in the architecture of the P4 and the G4. What is nice is that it is not a holy war of who is better but an explaination of why Intel made its choices and uses the G4 as a point of reference. It has just tons of info on uPs, useful for everyone." Not for the techie novice, but its a well written piece if you're reasonably technical and want to understand more about two of the most important chips on the market.
skip to the end of this thread..... (Score:1)
MY ADVICE:
skip to post 100 or thereabouts, cos someone may have read the whole article by then and actually have something interesting to say....
anyways, see you in half an hour or so.
p.s whats wrong with slashdot??? they think we have an attention span of more than two minutes or something?
Re:clock speed (Score:1)
Re:Conclusions & Questions (Score:1)
Pipeline "bubbles" also do not have to propagate all the way to the end and drop off. This indicates that the bubble is a physical nop of sorts that has to run its course. A more efficient way of doing things is to employ the use of valid bits per stage along with stage holds. In that way, if a "bubble" (empty stage) gets trapped behind a multicycle instruction such as a divide, later instructions can actually fill the "bubble".
Instructions also do not have to get reassembled back into their original order in order to maintain the SEM (sequential execution model). That is what the commitment unit is for: it ensures that if you issue instructions to the execution units, they can complete. Since instructions don't leapfrog one another in a pipeline, this is ok.
The P4's commit unit can keep track of 126 instructions. This is very interesting and is a very high number but outside of multimedia (as the article states) and some scientific apps such as VLSI simulation, what app can schedule a great deal of branchless code? Very few. I'd love to see a histogram of the percentage of time that the commit units in the P4 and G4e have one instruction, two, three, etc actually committed. Branch density is very critical and will vary from app to app. This is why predication is a nice thing: it removes pressure from the BTB and predictor/correction mechanisms. Interestingly, the Motorola e500 (embedded PPC) now handles predication.
PowerPC does provide software hints for branch prediction via the BO4 field: they don't default to backward == true. So unlike what the article states, this isn't some new revolutionary P4 thing.
Finally, all those pipeline stages in the P4 do take their toll: collectively, across the entire 20 stage pipe you burn a lot of setup and hold time for dflops: 20 x .15ns (being conservative) = 3ns extra latency until an instruction retires. For 7 stages in a G4e, it's a little over 1ns. More stages == less work per cycle, but it also means that the percentage of meaningful work per cycle is burned in CMOS technology overhead. Think about it this way: future P4s will maybe run at 5GHz, but guess how much of that .2ns/cycle is actualy going to be...useful? Decreasing cycle times are going to result in diminishing returns as time goes on.
On the upside, I suppose for the guys designing the chip, Synopsys runs quite fast because the latch-to-latch distance is small and there are only so many ways to synthesize the logic. =)
Re:x86 instructions are bytecodes of the future (Score:1)
Re:x86 instructions are bytecodes of the future (Score:1)
Re:McDonald's Analogy (Score:1)
Then again, I suppose you could always punch the guy in front of you in the face and run off with his fries. It's the special "Microsoft Bit" in the processor. (Hey, somebody had to make the requisite anti-MS comment!)
The difference... (Score:5)
Re:I found the author to be quit presumptuous (Score:1)
G4 irrelevant (Score:1)
It has no chance of competing against the P4 until Motorola or IBM produce a cheap decent performance ATX form factor board.
It doesn't matter how well engineered it is if nobody is going to buy the thing.
Predictor of predictors (Score:3)
One of the big things that the Alpha did that was so cool was the branch predictor, which actually implemented two branch prediction algorithms and then had a predictor that watched them both and picked the one that was recently the most correct. Some of that kind of deep knowledge of branch prediction and how to avoid having your long pipeline kill performance would be information that Intel could sorely use on the pentium 4 core, as well as on the Itanic, I mean Inanium, I mean *Itanium*. There we go.
Is anyone else suprised that the G4 core seems so vanilla? The difficulty of making a 4 stage pipeline run at upwards of 733 MHZ on a
Re:G4 irrelevant (Score:1)
--
Re:G4e not G4 (Score:1)
Fact:
The article refers to the 7450 - the currently shipping "G4" chip in all Apple systems above 533Mhz.
(Some speculation/rumors follows...) ;).
Probably shipping very soon in at least 933Mhz systems - followed by 1Ghz later this year... finally
The "next generation" or "next rev" to be more precise is split into:
And then of course there's the G5 - rumored to be shipping in proto test stages at the moment - turning into production towards the end of the year for Q1 2002 shipping. Hopefully IBM's hand in the production will make it smoother to market than the G4 was. ;)
mips + ARM x10 more chips than PCs (Score:1)
in terms of volume shipments of 32bit chips P4 or G4 dont even feature on the pie chart
get with it the new palm is going to ARM powered just like the newton
68000 is a toy nothing else
you want power then use the StrongARM2 aka Xscale (ARM5TE)
get with the time boys and girls
regards
john jones
Clock Speeds (Score:5)
Now I know that they only tests that really matter are the real world tests, simply because at a user level that's the only real place that I'll notice the difference.
Of course another issue is going to be motherboard differences and how much I/O depends on the processor, but this is another story.
Intel + Alpha IP = AMD Killer ??? (Score:2)
Re:One thing that's always bothered me... (Score:2)
"A 400 MHz G4
Hmm. Slowest G4 ever released - 350. So a 400 is about the same as the baseline model. Of course it's going to be slower!
But to give you some of my own stats...
My home computer is a G4/400. My work computer is a P4/1.7GHz. Pretty fair comparison in terms of age of chip (although I'd really think the 450 (top of initial line) would compare better to the 1.7).
In RC5, no altivec optimization, the G4 is about half as fast (1.3Mkeys/s to 2.4Mkeys/s). This is with less than a quarter the clock.
With altivec optimization, the puny G4 does 3.5Mkeys/s.
Simple benchmarking, not necessarily too indictive of normal use, but thank you, move along now - nobody's saying that the G4 will always be faster because of more work per clock cycle, but that the speeds don't have to be so phenomenal on them. Mine's a lowly 400. Imagine a 733?
Dan
ls:
Re:G4 irrelevant (Score:2)
Apple will soon be the world's largest *nix vendor, thanks to OS X. How do you like them Apples, Tux?
Re:Clock Speeds (Score:3)
-it is not about processors/instructionsets
-it is not about MHzs
it is about e.g. compilers, parallellism, shortest path , bandwidth, technology and algorithmz. You _then_ work on the rest.
Processors are only a means to what you want to accomplish. I've seen DSP's take a 4x MHz gap just because it had a good architecture. Deep down information processing (clocked or not) takes time to go through the logic.
Re:Conclusions & Questions (Score:1)
This is not how a pipeline works. Each instruction (or micro instruction) is executed in stages through a pipeline so that each stage only performs a small part of the overall operation. No modern high performance uP performs an entire op in a single cycle.
There is a very good reason to try to maximize pipelining whic rarely get mentioned here: The less logic depth between each pipeline stage the more calculations pr. transistor can be performed. The latter is actually a significant metric as die area and transistor dimentions are the most significant limitations for modern uP.
Shallower pipelines need to do more pr. pipeline stage which means each transistor will waste more time waiting for the signal to propagate through the deeper logic. (it will also waste more glitch power as the signal through a combinatorial logic unit will glitch for a while before stabilizing)
This is of cource a tradeof against the cost of a pipeline stall which the
Re:Comparing cycle penalty times is meaningless .. (Score:1)
What the author apparently fails to grasp is the only thing which matters is wall clock time. P4 may have a 20 cycle mispredict penalty, higher than G4e's penalty of 7, but it also at about triple the clock speed. 20 cycles @ 1.8 GHz is less than 7 cycles @ 600 MHz.
Yah, what do you want from an 'armchair architect'? Not only that but what all these idiots seem to misunderstand is the cases where that trace cache keeps the whole pipe from flushing. So instead of a full flush it only has to flush from the trace cache down. Now you only need roughly twice the clock rate to match branch mispredicts. Combine that with an incredibly advanced branch predictor, impressive cache prefetch system and a processor designed to scale like mad and you have a processor that will destroy any other 32-bit processor in raw performance for general computing. I love AMD, and the 'RISC' arch's but frankly they are all looking a little weak. In another two shrinks Intel will again have extra die space to play around with to add a bunch of features/optimizations back in that they cut out for this version. At that time i expect an increase of 20%+ at the same clock. The P6 matched the newer chips, the P4 will pass them like they are standing still if intel doesn't screw up to badly.
About the McDonald's analogy (Score:1)
Well, I used to work in fast food (many moons ago) and it is almost allways faster to walk in
Re:Intels aquisition of compaq alpha (Score:1)
Besides there are rumors as to Apple buying the desktop PowerPC from Motorola (here [theregister.co.uk]) so we might see a CPU which focuses more on speed an less on power consumption from Apple and who knows maybe all those Alpha wizards might take a job with Apple just like their colleags did with AMD.
--Ulrich
Re:mips + ARM x10 more chips than PCs (Score:2)
That's kinda like saying that food critics should spend their time reviewing McDonalds burgers.
There are interesting things going on with Arm designs (jazelle hardware JVM, Amulet asyncronous designs, etc), but in general the Mips/Arm markets are all about taking simple RISC cores and producing them cheap and running at low power.
Nothing wrong with being more interested in the high end of the market.
G.
One thing that's always bothered me... (Score:2)
Of course clock speed alone doesn't a benchmark make. It's just a number. But it STILL COUNTS. A 400MHz G4[e\+], even with its shallower architecture(accomplishing more work per clock) is *still* going to be a helluva lot slower than an insanely-clocked(4GHz) P4. And I mean a *lot* slower.
Yeah, you heard me. Amazing, isn't it? Even though the P4 does less per clock, it can actually be FASTER than another chip, if its clock speed is high enough.
Gee, do you think Intel's world-class design team might take that into account? You think that just *maybe* it might be more than a simple marketing gimmick?
Let's take a look at this. The Pentium Pro core(which is what the Pentium Pro, Celeron, Pentium II, and Pentium III were based on) was designed with a lot of clockspeed headroom in mind - and lo and behold, it actually worked. By the time that core is retired, it will be orders of magnitude faster than the original cores.
Can you say the same thing for the G4? No. Oh, sure, it might be two or three times faster than the original when it's retired. But nowhere near the improvement we saw with the PPro.
So, what's my point? Here it is: yeah, you can't go around buying processors based on clockspeed. But please take it into account. It's not like you can say "a 1MHz G4 is faster than a 1GHz P4, because the G4 does more work per clock cycle." Thanks for listening
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
The article is about chip architecture (Score:2)
No, he says quite plainly that the article is about a comparison of the two chip architectures, and not about which one is the fastest. I don't think there's any question right now that the fastest consumer desktop systems on the market are powered by x86 chips.
However, reading the article gives one a good understanding of why a G4e running at the same mhz as a Pentium 4 will beat it every time.
Re:The differences (Score:2)
Ah, but think about how piss-poor spelling will fuel ever-more-powerful search algorithms that can take into account mis-spellings! Goto.com already does a lot of this; try searching for "britteny spiers" or any reasonable variant. Their pay-per-search business model gives them a direct financial incentive to correct such errors.
Heck, if everyone were as careless as Taco, spelling wouldn't even matter, because browsers would have Autocorrect(tm) built in to the rendering engines!
All hale Cmrd Tacko! Hes not a looser!
Re:Ahem... (Score:1)
Re:Comparing cycle penalty times is meaningless .. (Score:2)
Yes it is. It was launched on Monday.
And even with the faster P4 (woah, 6% higher clock), 20 cycles @ 1.8 GHz is more than 7 cycles @ 733 MHz.
The point is, a 20 cycle penalty is not three times more expensive than a 7 cycle penalty, as the article implies.
The article also fails to mention that Willamette has the most advanced BPU in the world, which minimizes the number of mispredicts greatly.
Re:Comparing cycle penalty times is meaningless .. (Score:2)
Go to any major OEM website such as Gateway or Dell. They are all shipping 1.8 GHz Pentium 4 systems today, and are advertising them on their website.
(As an aside: it's really funny how some
Comparing cycle penalty times is meaningless ... (Score:3)
What the author apparently fails to grasp is the only thing which matters is wall clock time. P4 may have a 20 cycle mispredict penalty, higher than G4e's penalty of 7, but it also at about triple the clock speed. 20 cycles @ 1.8 GHz is less than 7 cycles @ 600 MHz.
This is basically another very pedestrian hate-on-P4 article with very little substance. P4 does have some performance problems (mostly to do with shifts and multiplies) and they're documented in the optimization manual, but this article does nothing to dig any deeper than what a dozen other pedestrian articles have said.
Also
Intel was definitely paying attention, and as the Willamette team labored away in Santa Clara they kept MHz foremost in their minds.
Willamette was designed entirely in Oregon. Santa Clara had nothing to do with it, and has had nothing to do with IA-32 design since P5 (nearly 10 years ago).
Re:Gay webdesigner => Gay font (Score:1)
I don't know why people insist on having white text on a black background. It's just too hard on my eyes.
PS Don't mod me down on off topic. I care about my karma.
Sorry, you are wrong... :-( (Score:1)
Dear dbarclay10,
Please don't post things as fact unless you are right. You are wrong...sorry. Taking a quote from a previously correct post and changing the number of MHZ for the G4, you will be precisely the reason why you are wrong.
What the author apparently fails to grasp is the only thing which matters is wall clock time. P4 may have a 20 cycle mispredict penalty, higher than G4e's penalty of 7, but it also at about triple the clock speed. 20 cycles @ 1.8 GHz is less than 7 cycles @ 600 MHz
So, modifying the MHZ will give you the real and true facts, which completely destroy your facts and will help to clarify to someone who thinks you are telling the truth within your post...
...P4 has a 20 cycle mispredict penalty, higher than G4e's penalty of 7. 20 cycles @ 1 GHz is MORE than 7 cycles @ 1 GHZ...
It is pretty simple math, my friend. So, if you truly knew what you were talking about, then you wouldn't have made such a long, blatant, incorrect, and uniformed post.
Intels aquisition of compaq alpha (Score:2)
ideas from the EV bus protocol to scaling. My guess is since processor design has such a long term for each cpu, that future designs are fairly well hard coded, intel couldnt just drop in any compaq IP 'just like that'.
so now intel have alpha technology and arm technology.. imagine the combination of the two! what a hybrid processor that would be.
aaah for those that dream anyway...
i would be interested to read a form of comparision to sun's usIII on a technical design level.
Write your Own Operating System [FAQ]!
Re:Clock Speeds (Score:1)
Uh, Moore's 'law' states that the number of transistor per given area doubles every 18 months, it says nothing about "MHz".
Quake III is the only truly fair real world benchmark
Quake3 is more of test of the video card then the CPU's, you'd have to use software rendering to give a fair comparision.
Re:mips + ARM x10 more chips than PCs (Score:1)
Do you know that a majority of video game machines in the late 80s, through the mid 90s ( and maybe even now, not sure.. ) used the 68000 chipset?
Eg: Neogeo used two 68k chips.
Be careful what you say.
Dante.
Re:mips + ARM x10 more chips than PCs (Score:1)
Processors in your 'toys' (Score:1)
>>68000 is a toy
>video game machines
QED. Besides, most of the consoles with 68K were early '90s consoles, not late '80s consoles.
Either way, the CPU doesn't handle most of the game; the TVIA, PPU, GPU, GS, or whatever they call the display circuitry does. But in case you wondering what CPU(s) your system uses:
Re:Clock Speeds (Score:2)
Remember, the article is about only one "x86" processor: the P4. Clock speed still matters, just not as much with that particular processor. There is a performance penalty to be paid due to this design philosophy. A similarly clocked P3 will eat a P4's lunch because of it. Let's not even get started with what a similarly clocked Athlon does
By comparison, the G4 and Athlon are very efficient with their clock cycles.
Re:Predictor of predictors (Score:2)
P4 is the drive-through? (Score:1)
Re:P.S. It's spelled "judge", you fuckwad. (-) (Score:1)
I actually meant to spell it "jedge". The Southern-Midwest US pronounces certain words in an odd way, like "acrost" rather than "across" or "jedge" instead of "judge". It seemed to me like it worked when I hit Submit.
You think I'm a fuckwad. The moderators didn't listen to me and wasted their (not there or they're) points instead of using them on more important pieces (not peaces). And what do (dew?) I need karma for (four/fore?)?
woof
ObL(inux): WDR in Germany ran a neat -- albeit somewhat chopped up and overdubbed -- interview with Linus today (Saturday afternoon). Better still, there was little talk about MS, the interviewer and translator seemed to understand the different meanings of "free" and there weren't a load of quoteheads (c.f. Rush Limbaugh's sycophantic "dittoheads") interrupting the discussion with cheers or boos every time Torvalds took a breath.
Re:fsck (Score:1)
The English word comes from Old Norse "fikk" (or "fykk, either way) and it meant then what it does now. It did vector into the language in an interesting and colourful way. Modern Norwegian still has the word.
When everyone does something (save for a few religious people by choice and half the readership here -- not necessarily by choice), there's gonna be a word for it.
The German word for that now is... [drum roll]... "Ficken". Of course, there's "bumsen" (bouncing), "flachliegen" (laying flat) and a couple others, but I make my point about the beauty of English when I tell Germans about "bumping uglies", "knockin' boots" and "the horizontal two-step". I was out West too long.
woof.
The differences (Score:2)
Sorry, Taco, but it's getting worse. Think about how piss-poor spelling completely screws up the ability to search for anything, anywhere. That ought to be reason enough for you and everyone else to at least CONSIDER spell-checking.
We're geeks, and we all hate being judged on what we look like or what weird idiosyncrasies we have, yet many of us have also learned the hard facts of life: people jedge based on what they see. Bad spelling == worthless. If you can't be bothered to check what you write, why the hell should I be bothered to read it?
Aren't you supposed to be a programmer or something? Yeah? Then how the fsck do you get anything to run (besides the debugger) if your syntax is even close to its English counterpart and your variable names never have the same spelling on any two lines?
woof.
Mod: -2 Pedantism, -1 Taco-spell-flames no longer amusing, +2 Interesting, +2 Insightful, etc.
Total: +1, exactly what it would be posting after logging in, so don't waste your mod points here.
Re:Learn to read (and I'll learn not to troll) (Score:1)
Re:One thing that's always bothered me... (Score:1)
How every version of MICROS~1 Windows(TM) comes to exist.
Re:The differences (Score:1)
I was just thinking the very same thing... not because I read your post, but because I've been thinking about a senior software engineer at work. He can't spell to save his life, his penmanship is horrid, and he mispronounces words (even technical ones) quite often.
I'm amazed when I see him type code at his keyboard! Everything seems to come out as intended (i.e., correct). He's a damn good engineer, too.
You know... I have a feeling this happens more frequently than we'd like to think.
Re:Clock Speeds (Score:2)
-------
Re:mips + ARM x10 more chips than PCs (Score:1)
Hmm, you may in fact be related to a triangular alluvial deposit at the mouth of a river. Or the fourth letter of the Greek alphabet. Either way, I really wish you could use at least minimal puncutation.
--
Re:Processors in your 'toys' (Score:1)
Yeah, and it's 485 MHZ. The original post by yerricde had too many errors to bother with.
--
Re:G4 irrelevant (Score:1)
Re:[OT] Re:One Innovates the other stays the same (Score:1)
The Amiga was not a Commodore. Sure, it had the company name stuck on it. But it was designed by Amiga, Inc., and marked the end of innovation for the company that brought us the "Personal Electronic Transactor".
Fuzzy
Re:Learn to read (and I'll learn not to troll) (Score:1)
Re:Conclusions & Questions (Score:1)
The only alternative is going to explicit parallelism, simply making a slightly slower but wider superscalar processor (the old Cyrix approach) doesnt get you very far.
The biggest problem with explicit parallelism is programming, multithreading is a very poor method... unfortunately its the only one most programmers accept.
Re:Intels aquisition of compaq alpha (Score:1)
Different Chips... (Score:4)
The G4 is meant to be usable in embedded systems, while the P4 is meant to be usable as a space heater
=P
----
Re:Clock Speeds (Score:1)
This is false. The key point that the P4 is design for clock ramping to maintain Moore's law. The G4's performance ramp has fallen *way* off of Moore's law and looks to be an already dead architecture.
Greater clock still leads to high minimum lantency between instructions, and this gives the P4 a hugh lead. The only disadvantage of the longer pipeline, which makes branching somewhat more costly.
The G4 is *NOT* a very wide architecture compared, say, to an Athlon.
Now I know that they only tests that really matter are the real world tests, simply because at a user level that's the only real place that I'll notice the difference.
Quake III is the only truly fair real world benchmark that you can run on both of these machines. I believe the x86s just wipe the floor floor with iMacs.
Paul Hsieh
Breakfast (Score:1)
Re:Clock Speeds (Score:1)
As I recall, Moore's law is about the SPEED of computers. Not just the MHz. If (as the article says) G4s run at comparable speed to P4s, even though they have lower MHz, HOW has that fallen off Moore's law?
Quake III is the only truly fair real world benchmark that you can run on both of these machines.
Fair, assuming that it is equally optimized on both platforms.
I believe the x86s just wipe the floor floor with iMacs.
Since when has the iMac been made for gaming?
iMac != G4 (Score:1)
iMacs run on G3 processors.
Re:mips + ARM x10 more chips than PCs (Score:1)
Re:I wonder what to think... (Score:1)
Re:G4 irrelevant (Score:1)
[OT] Re:One Innovates the other stays the same (Score:1)
Except that the famous Commodore Amiga series used most of the 68000 series. A500,A1000 and A2000 used the 68000. A3000 used 68030 and 68040 (A3000T), and A4000 used 68030 and 68040. Later revisions once C= went down the drain were sold with 68060 as well, a ver elegan chip if I you ask me.
Learn to read (and I'll learn not to troll) (Score:3)
"2 of the most important chips on the market"
Jeez, why do people have such a bad grip of the English language? Is it really that hard to understand?
Yes, "two of". As in "not exclusively of". Yes, the Intel Pentium 4 is one of the most important chips out there. And yes, so is the AMD Athlon. But so it the Motorola G4, and so for that matter is the upcoming Intel Itanium.
Now if the description of the article said "the two most important", I could understand your gripe. But it doesn't. And besides, haven't we already seen dozens of similar comparisons between Intel and AMD processor families?
Re:Clock Speeds (Score:1)
Yeah... show me where I can buy a stable running 1.7Ghz P3 system then.
I wonder what to think... (Score:1)
Re:Conclusions & Questions (Score:1)
Conclusions & Questions (Score:2)
Some of the /.ers out there who are more au fai with this stuff than myself may want to correct me on some of the
following points.
Basically, the clever folks at the Intel marketing department realised that the only thing the General Public know about processors is G/MHz. Therefore this is their only point of comparison between processors in the fragmented AT market (obviously, the G4 does not suffer from this competition, which reflects in the differences in architecture). Therefore the techies at Intel were given the orders: "make the clock speed as high as possible (and also make the processor fast!)".
Clearly, the architecture of P4 was thus designed to break up long instructions into many shorter instructions (over-simplification) which which can each be completed in a shorter single clock cycle. This leads to a 'long-pipeline', of many instructions:
Since each stage always lasts exactly one clock cycle, shorter pipeline stages mean shorter clock cycles and higher clock frequencies. The P4, with a whopping 20 stages in its basic pipeline, takes this tactic to the extreme
However, using this longer pipeline leads to problems - especially when the processor doesn't have any instructions - thus causing a "bubble" which has to propagate right down the long pipeline, and also when the "branch prediction" (i.e. the prediction of what type of instruction to use on the data) is wrong - again causing a delay as the 'bad' instructions propagate through the processor.
Of course the clever guys at Intel came up with some novel solutions to this. This includes:
-Using larger Branch History Table - which includes record information about the outcomes of branches that have already been executed, which helps in branch prediction.
-The trace cache - Which is used for storing translation or decoding logic for the L1 cache, which is particularly useful for blocks of code that are executed thousands and thousands of times.((this reminds me of MMX, although I think that worked in a different way. Any ideas why MMX isn't used anymore?????)) ...there's no delay associated with looking it up and hence no pipeline bubble
-A special microcode ROM that holds pre-packaged sequences of uops so that the regular hardware decoder can concentrate on decoding the smaller, faster instructions. This stops these longer instructions from polluting the trace cache.
-Some others that i forgot/understood even less well?????
This all seems to be an interesting case of the public's perversion for clock speed subverting processor architecture (although not necessarily in a bad way).
Would processors be faster "overall" (im sorry, that's terribly vague) if there wasn't such a push for faster clock speeds???
--The real Marcus Brody doesn't have a Slashdot ID
Re:Clock speed (Score:1)
Clock speed (Score:2)
So, in the same spirit, i have my offering for cpu design: a simple divider on the clock input. This would only take two transistors and yet the processor would double in clock speed! The 3GHz chip is here already . Now, how do i patent the idea?
x86 instructions are bytecodes of the future (Score:5)
If you remember then the Pentium Pro came out, people (including me) dissed it because it was years behind schedule, huge, expensive and hot. Actually, its architecture was just ahead of the process technology curve. With a few tweaks, the same CPU core came to dominate the world with the P-II and the P-III.
Looking at the radical changes in the P4, including storing only uOPs in the instruction cache and reserving (currently useless) pipeline stages for speed-of-light cross chip delays, they are planning ahead for future realities. We can think of the current P4 as being like the Pentium Pro, just a short-lived beta release.
The more interesting question is which approach to driving uOPs will win out: P4, Transmeta or Itanium. P4 and Transmeta convert legacy x86 opcodes to internal wide architecture on-the-fly (P4 in hardware, transmeta in software); Itanium makes the compiler generate wide architecture directly. Note that the original pre-translated instruction format (CISC, RISC, Java bytecodes, whatever) is now largely irrelevant.
My view is that in the abstract, Transmeta has the best approach, followed by P4 and Itanium last. This is because the software approach is the most flexible and can even be upgraded in the field. In theory, it could detect and store the individual performance characteristics of each program on a user's machine. Granted, they currently focus on low-power, but if they retargeted their technology at high speed, it could be interesting.
The P4 approach is hardwired, but at least it can adapt to local code characteristics and translate them to the current internal architecture version.
The Itanium exposes low-level chip details to the compiler, and the decisions are cast in concrete from there on out. It doesn't seem very future-proof to me; if the IA64 architecture changes in the future, today's compiled code will suffer.
Re:Ahem... (Score:1)
Motorola makes most of these processors for embedded applications. Intel makes processors for the embedded market, but they don't get the same publicity.
Re:Ahem... (Score:1)
G4e not G4 (Score:1)
still irrelevent (Score:1)
fsck (Score:1)
f - u - c - k
descends from some German word meaning, "to strike".
g4 and g4e (Score:1)
Actually, the dual 533mhz are G4's. The currently shipping G4e's are the 667MHz and the 733MHz.
There was some talk when the 667 and 733's came out about current compilers not being geared towards really exploiting them.
The 667 i believe has been discontinued (the dual 533 was such a sweet buy in comparison) and the rumors are shortly the 733MHz will become the low end, with their being an 866 and a 933 (potentially).
Re:Predictor of predictors (Score:1)
That has been part of the problem with the existing G4's, and also one of their biggest benefits. I don't know how closely you've followed Mac tech, but god... the things were stuck at 500MHz for so, so long, it was pathetic. Because they couldn't get the clock speed up, Apple eventually had to add a second processor without raising the price at all (which was nice, but because of OS 9 not many apps could benefit).
Because the pipelines were so shallow, they were fast as hell but ramping them up was causing all kinds of problems, and the rumor is altivec just made it worse. Apparently just getting up to 733MHz wasnt very easy.
Moto wants cool, shallow pipelined chips for their embedded market and while this has been a boon to apple in some ways (you have g4 portables! imagine a pentium 4 portable) it is just annoying when you buy their systems and want the speed.
The rumor now is that apple has taken on alot of the design for the new g4's and g5's, to better suit their needs and not motorola's.