Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Technology

Architectural Difference Between The P4 And G4 78

homerJAYsimpson writes: "This article is a great refernce of the differences in the architecture of the P4 and the G4. What is nice is that it is not a holy war of who is better but an explaination of why Intel made its choices and uses the G4 as a point of reference. It has just tons of info on uPs, useful for everyone." Not for the techie novice, but its a well written piece if you're reasonably technical and want to understand more about two of the most important chips on the market.
This discussion has been archived. No new comments can be posted.

Architectural Difference Between the P4 and G4

Comments Filter:
  • by Anonymous Coward
    ummm... this article is so blinking long, u jus knows that the first 100 posts are gonna be goatse nonsense.

    MY ADVICE:

    skip to post 100 or thereabouts, cos someone may have read the whole article by then and actually have something interesting to say....

    anyways, see you in half an hour or so.

    p.s whats wrong with slashdot??? they think we have an attention span of more than two minutes or something?
  • by Anonymous Coward
    You're looking at it from a strict numeric point of view by comparing their "real" values of, say 1000MHz vs 1300MHz. The average consumer is pretty much going to be reading 1.0 vs 1.3 and yawn. "Giga" was the magic prefix to reach, but once there, deltas between chip speeds with such a small cycle time don't seem as dramatic from a marketing POV.
  • by Anonymous Coward
    Overall the article was good, but there were some technical mistakes...I've never heard of an L1 cache that stops fetching in the middle of a line once it reaches a branch instruction: this would imply that fetch and decode are done in parallel and that instructions are fetched serially, not a line at a time.

    Pipeline "bubbles" also do not have to propagate all the way to the end and drop off. This indicates that the bubble is a physical nop of sorts that has to run its course. A more efficient way of doing things is to employ the use of valid bits per stage along with stage holds. In that way, if a "bubble" (empty stage) gets trapped behind a multicycle instruction such as a divide, later instructions can actually fill the "bubble".

    Instructions also do not have to get reassembled back into their original order in order to maintain the SEM (sequential execution model). That is what the commitment unit is for: it ensures that if you issue instructions to the execution units, they can complete. Since instructions don't leapfrog one another in a pipeline, this is ok.

    The P4's commit unit can keep track of 126 instructions. This is very interesting and is a very high number but outside of multimedia (as the article states) and some scientific apps such as VLSI simulation, what app can schedule a great deal of branchless code? Very few. I'd love to see a histogram of the percentage of time that the commit units in the P4 and G4e have one instruction, two, three, etc actually committed. Branch density is very critical and will vary from app to app. This is why predication is a nice thing: it removes pressure from the BTB and predictor/correction mechanisms. Interestingly, the Motorola e500 (embedded PPC) now handles predication.

    PowerPC does provide software hints for branch prediction via the BO4 field: they don't default to backward == true. So unlike what the article states, this isn't some new revolutionary P4 thing.

    Finally, all those pipeline stages in the P4 do take their toll: collectively, across the entire 20 stage pipe you burn a lot of setup and hold time for dflops: 20 x .15ns (being conservative) = 3ns extra latency until an instruction retires. For 7 stages in a G4e, it's a little over 1ns. More stages == less work per cycle, but it also means that the percentage of meaningful work per cycle is burned in CMOS technology overhead. Think about it this way: future P4s will maybe run at 5GHz, but guess how much of that .2ns/cycle is actualy going to be...useful? Decreasing cycle times are going to result in diminishing returns as time goes on.

    On the upside, I suppose for the guys designing the chip, Synopsys runs quite fast because the latch-to-latch distance is small and there are only so many ways to synthesize the logic. =)

  • WRT the Itanium, you're thinking about compilers in the old fashioned way. There's no reason why the "decisions are cast in concrete" if the compiler can execute at runtime, ie, a Just In Time compiler (JITer). These already exist for Java and some other languages. The program is compiled to some intermediate form (IL) and shipped; when first run the JITer compiles it optimally for the hardware it finds itself on. A more sophisticated JITer could recompile the IL on the fly based on dynamic knowledge about which branches are taken, etc (in effect profiling and optimizing the code while running). Thus (in theory) a JIT compiler can create a better-optimized program than a static compiler. Yes, there's some overhead while the JITer runs but that goes away quickly. And it's future-proof too: Swap out your old chip for a new one with more functional units, and the next time the JITer runs it takes that into account.
  • The old AS/400 boxes were the biggest JIT compilers of all as the "real" instruction set was hidden from programmers in order to maintain system security on a machine that doesn't even have memory protection! It's why the 5x speed increase when AS shifted from CISC to the 64-bit POWER architecture. Now calling it JIT on the POWER is a bit of a misnomer since all the conversion was done up front, but nobody every said that future OSs couldn't do "convert once" translation either.
  • by Anonymous Coward
    They also fail to take into account that nobody ever said the order of lines was fixed and you had to stay in the same line like sheep. Some microarchitectures will allow instructions to swap pipes if it's advantageous.

    Then again, I suppose you could always punch the guy in front of you in the face and run off with his fries. It's the special "Microsoft Bit" in the processor. (Hey, somebody had to make the requisite anti-MS comment!)

  • by Anonymous Coward on Saturday July 07, 2001 @03:48AM (#102327)
    is about 70 degrees F.
  • you get a cpu architecture review from Jon [Hannibal] Stokes and you complain about its quality. I don't understand you. You should read all his architecture docs including the CISC vs. RISC one and then claim that his research and comparisons are irrelevant.
  • The specs for the G4 are irrelevant while you can't buy a system board for the thing for under $2,500.

    It has no chance of competing against the P4 until Motorola or IBM produce a cheap decent performance ATX form factor board.

    It doesn't matter how well engineered it is if nobody is going to buy the thing.

  • by nadador ( 3747 ) on Saturday July 07, 2001 @05:27AM (#102330)
    I'm interested to see what the influence of Alpha IP will be on Intel core designs. When I took computer architecture at CMU we spent a couple of lectures on why the Alpha was the best thing since sliced bread, as far as microprocessors go.

    One of the big things that the Alpha did that was so cool was the branch predictor, which actually implemented two branch prediction algorithms and then had a predictor that watched them both and picked the one that was recently the most correct. Some of that kind of deep knowledge of branch prediction and how to avoid having your long pipeline kill performance would be information that Intel could sorely use on the pentium 4 core, as well as on the Itanic, I mean Inanium, I mean *Itanium*. There we go.

    Is anyone else suprised that the G4 core seems so vanilla? The difficulty of making a 4 stage pipeline run at upwards of 733 MHZ on a .25 or .18 micron process is pretty amazing. I'm impressed. I suppose that the embedded focus at Motorola meant that bells and whistes weren't a high priority, but I wonder what kind of performance improvements G4e will demonstrate with a longer pipeline and all.


  • Heard of MacOS X?

    --
  • Fact:
    The article refers to the 7450 - the currently shipping "G4" chip in all Apple systems above 533Mhz.

    (Some speculation/rumors follows...)
    Probably shipping very soon in at least 933Mhz systems - followed by 1Ghz later this year... finally ;).

    The "next generation" or "next rev" to be more precise is split into:

    • The 7440 - Lower Power, no L3 cache, aimed at Moto embedded market may be Apple's lower end consumer (iMac) target chip of choice depending on what happens with the 7460. Probably too much power/heat for notebooks. Announced - currently? or soon shipping.
    • The 7460 (Apollo?) - next generation of lower power. Slated for next-gen notebooks and consumer machines. 1Ghz+ is the design potential.

    And then of course there's the G5 - rumored to be shipping in proto test stages at the moment - turning into production towards the end of the year for Q1 2002 shipping. Hopefully IBM's hand in the production will make it smoother to market than the G4 was. ;)

  • hmm have a mobile phone yep 99.8 % that its power by an ARM chip or plystation2 / N64 that will be a mips chip then so compare that against PCs

    in terms of volume shipments of 32bit chips P4 or G4 dont even feature on the pie chart

    get with it the new palm is going to ARM powered just like the newton

    68000 is a toy nothing else
    you want power then use the StrongARM2 aka Xscale (ARM5TE)

    get with the time boys and girls

    regards

    john jones
  • by Midnight Thunder ( 17205 ) on Saturday July 07, 2001 @04:35AM (#102334) Homepage Journal
    One thing that I got from this article is why we shouldn't be depending too much on clock-speeds for comparison, and thus the fact that PPCs aren't yet available at clock speeds of x86 shouldn't really matter. The wide and shallow approach of the PPC certainly means that less clock cycles are needed than the narrow and deep approach of the x86.

    Now I know that they only tests that really matter are the real world tests, simply because at a user level that's the only real place that I'll notice the difference.

    Of course another issue is going to be motherboard differences and how much I/O depends on the processor, but this is another story.
  • I seem to remember reading that AMD had licensed a whole bunch of tech from DEC (bus archetecture etc.), and that their biggest issue right now was predictor logic. It seems that the main reason Intel would buy Alpha IP is to keep someone else (like their prime competitor) from doing it first.
  • Nice superlatives...

    "A 400 MHz G4 ... slower than ... 4GHz P4."

    Hmm. Slowest G4 ever released - 350. So a 400 is about the same as the baseline model. Of course it's going to be slower! :-P

    But to give you some of my own stats...

    My home computer is a G4/400. My work computer is a P4/1.7GHz. Pretty fair comparison in terms of age of chip (although I'd really think the 450 (top of initial line) would compare better to the 1.7).

    In RC5, no altivec optimization, the G4 is about half as fast (1.3Mkeys/s to 2.4Mkeys/s). This is with less than a quarter the clock.

    With altivec optimization, the puny G4 does 3.5Mkeys/s.

    Simple benchmarking, not necessarily too indictive of normal use, but thank you, move along now - nobody's saying that the G4 will always be faster because of more work per clock cycle, but that the speeds don't have to be so phenomenal on them. Mine's a lowly 400. Imagine a 733?

    Dan
    ls: .sig: File not found.
  • He said mass market.

    Apple will soon be the world's largest *nix vendor, thanks to OS X. How do you like them Apples, Tux?
  • by CBravo ( 35450 ) on Saturday July 07, 2001 @12:23PM (#102338)
    I'll drop the cluebomb again:
    -it is not about processors/instructionsets
    -it is not about MHzs

    it is about e.g. compilers, parallellism, shortest path , bandwidth, technology and algorithmz. You _then_ work on the rest.
    Processors are only a means to what you want to accomplish. I've seen DSP's take a 4x MHz gap just because it had a good architecture. Deep down information processing (clocked or not) takes time to go through the logic.
  • Clearly, the architecture of P4 was thus designed to break up long instructions into many shorter instructions (over-simplification) which which can each be completed in a shorter single clock cycle.

    This is not how a pipeline works. Each instruction (or micro instruction) is executed in stages through a pipeline so that each stage only performs a small part of the overall operation. No modern high performance uP performs an entire op in a single cycle.

    There is a very good reason to try to maximize pipelining whic rarely get mentioned here: The less logic depth between each pipeline stage the more calculations pr. transistor can be performed. The latter is actually a significant metric as die area and transistor dimentions are the most significant limitations for modern uP.

    Shallower pipelines need to do more pr. pipeline stage which means each transistor will waste more time waiting for the signal to propagate through the deeper logic. (it will also waste more glitch power as the signal through a combinatorial logic unit will glitch for a while before stabilizing)

    This is of cource a tradeof against the cost of a pipeline stall which the /. crowd has pointed out so well (as long as intel has the deepest pipeline)
  • What the author apparently fails to grasp is the only thing which matters is wall clock time. P4 may have a 20 cycle mispredict penalty, higher than G4e's penalty of 7, but it also at about triple the clock speed. 20 cycles @ 1.8 GHz is less than 7 cycles @ 600 MHz.


    Yah, what do you want from an 'armchair architect'? Not only that but what all these idiots seem to misunderstand is the cases where that trace cache keeps the whole pipe from flushing. So instead of a full flush it only has to flush from the trace cache down. Now you only need roughly twice the clock rate to match branch mispredicts. Combine that with an incredibly advanced branch predictor, impressive cache prefetch system and a processor designed to scale like mad and you have a processor that will destroy any other 32-bit processor in raw performance for general computing. I love AMD, and the 'RISC' arch's but frankly they are all looking a little weak. In another two shrinks Intel will again have extra die space to play around with to add a bunch of features/optimizations back in that they cut out for this version. At that time i expect an increase of 20%+ at the same clock. The P6 matched the newer chips, the P4 will pass them like they are standing still if intel doesn't screw up to badly.

  • They compare the P4 to a drive through with one long, fast moving line and the G4 to walking inside where you have several shorter, slower lines.

    Well, I used to work in fast food (many moons ago) and it is almost allways faster to walk in
  • Well who knows if those guys from Compaq will be all to happy to have to work for Intel - at least theInquirer [213.219.40.69] hinted so much.

    Besides there are rumors as to Apple buying the desktop PowerPC from Motorola (here [theregister.co.uk]) so we might see a CPU which focuses more on speed an less on power consumption from Apple and who knows maybe all those Alpha wizards might take a job with Apple just like their colleags did with AMD.

    --Ulrich
  • We should be more interested in Mips/Arm chips because there happen to be more of them in use?

    That's kinda like saying that food critics should spend their time reviewing McDonalds burgers.

    There are interesting things going on with Arm designs (jazelle hardware JVM, Amulet asyncronous designs, etc), but in general the Mips/Arm markets are all about taking simple RISC cores and producing them cheap and running at low power.

    Nothing wrong with being more interested in the high end of the market.

    G.

  • One thing that has always bothered me is this nasty attitude towards clock speed. This article isn't too bad, but I've seen worse(and many comments here count towards that).

    Of course clock speed alone doesn't a benchmark make. It's just a number. But it STILL COUNTS. A 400MHz G4[e\+], even with its shallower architecture(accomplishing more work per clock) is *still* going to be a helluva lot slower than an insanely-clocked(4GHz) P4. And I mean a *lot* slower.

    Yeah, you heard me. Amazing, isn't it? Even though the P4 does less per clock, it can actually be FASTER than another chip, if its clock speed is high enough.

    Gee, do you think Intel's world-class design team might take that into account? You think that just *maybe* it might be more than a simple marketing gimmick?

    Let's take a look at this. The Pentium Pro core(which is what the Pentium Pro, Celeron, Pentium II, and Pentium III were based on) was designed with a lot of clockspeed headroom in mind - and lo and behold, it actually worked. By the time that core is retired, it will be orders of magnitude faster than the original cores.

    Can you say the same thing for the G4? No. Oh, sure, it might be two or three times faster than the original when it's retired. But nowhere near the improvement we saw with the PPro.

    So, what's my point? Here it is: yeah, you can't go around buying processors based on clockspeed. But please take it into account. It's not like you can say "a 1MHz G4 is faster than a 1GHz P4, because the G4 does more work per clock cycle." Thanks for listening :)



    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  • This is basically another very pedestrian hate-on-P4 article with very little substance. P4 does have some performance problems (mostly to do with shifts and multiplies) and they're documented in the optimization manual, but this article does nothing to dig any deeper than what a dozen other pedestrian articles have said.

    No, he says quite plainly that the article is about a comparison of the two chip architectures, and not about which one is the fastest. I don't think there's any question right now that the fastest consumer desktop systems on the market are powered by x86 chips.

    However, reading the article gives one a good understanding of why a G4e running at the same mhz as a Pentium 4 will beat it every time.

  • > Think about how piss-poor spelling completely screws up the ability to search for anything, anywhere.

    Ah, but think about how piss-poor spelling will fuel ever-more-powerful search algorithms that can take into account mis-spellings! Goto.com already does a lot of this; try searching for "britteny spiers" or any reasonable variant. Their pay-per-search business model gives them a direct financial incentive to correct such errors.

    Heck, if everyone were as careless as Taco, spelling wouldn't even matter, because browsers would have Autocorrect(tm) built in to the rendering engines!

    All hale Cmrd Tacko! Hes not a looser!
  • The better question is what market? I don't consider either of these part of the Server market yet. Such things should on occasion be clarified.
  • Yeah, but the P4 is not available @ 1.8 GHz yet

    Yes it is. It was launched on Monday.

    And even with the faster P4 (woah, 6% higher clock), 20 cycles @ 1.8 GHz is more than 7 cycles @ 733 MHz.

    The point is, a 20 cycle penalty is not three times more expensive than a 7 cycle penalty, as the article implies.

    The article also fails to mention that Willamette has the most advanced BPU in the world, which minimizes the number of mispredicts greatly.

  • And the ! GHz P3 was launched weeks before it was available. So where can I buy the P4 1.8 GHz?

    Go to any major OEM website such as Gateway or Dell. They are all shipping 1.8 GHz Pentium 4 systems today, and are advertising them on their website.

    (As an aside: it's really funny how some /.-er's get really confused and don't understand why some processors are not listed on pricewatch and then accuse them as not being available).
  • by VAXman ( 96870 ) on Saturday July 07, 2001 @07:31AM (#102350)
    The P4's long pipeline means that bubbles take a long time to propagate off the CPU, so a single bubble results in a lot of wasted cycles. When the G4e's shorter pipeline has a bubble, it propagates through to the other end quickly so that fewer cycles are wasted. So a single bubble in the P4's 20 stage pipeline wastes at least 20 clock cycles (more if it's a bubble in one of the longer FPU pipelines), whereas a single bubble in the G4e's 7 stage pipeline wastes at least 7 clock cycles. 20 clock cycles is a lot of wasted work, and even if the P4's clock is running twice as fast as the G4e's it still takes a larger performance hit for each pipeline bubble than the shorter machine.

    What the author apparently fails to grasp is the only thing which matters is wall clock time. P4 may have a 20 cycle mispredict penalty, higher than G4e's penalty of 7, but it also at about triple the clock speed. 20 cycles @ 1.8 GHz is less than 7 cycles @ 600 MHz.

    This is basically another very pedestrian hate-on-P4 article with very little substance. P4 does have some performance problems (mostly to do with shifts and multiplies) and they're documented in the optimization manual, but this article does nothing to dig any deeper than what a dozen other pedestrian articles have said.

    Also ...

    Intel was definitely paying attention, and as the Willamette team labored away in Santa Clara they kept MHz foremost in their minds.

    Willamette was designed entirely in Oregon. Santa Clara had nothing to do with it, and has had nothing to do with IA-32 design since P5 (nearly 10 years ago).
  • Well, of course you're an idiot for implying that he's gay based on his font. However, I still agree that it's hard to read. It's a bit better in Opera and Mozilla then it is in Netscape. (On Linux of course)

    I don't know why people insist on having white text on a black background. It's just too hard on my eyes.

    PS Don't mod me down on off topic. I care about my karma.

  • Dear dbarclay10,
    Please don't post things as fact unless you are right. You are wrong...sorry. Taking a quote from a previously correct post and changing the number of MHZ for the G4, you will be precisely the reason why you are wrong.
    What the author apparently fails to grasp is the only thing which matters is wall clock time. P4 may have a 20 cycle mispredict penalty, higher than G4e's penalty of 7, but it also at about triple the clock speed. 20 cycles @ 1.8 GHz is less than 7 cycles @ 600 MHz
    So, modifying the MHZ will give you the real and true facts, which completely destroy your facts and will help to clarify to someone who thinks you are telling the truth within your post...
    ...P4 has a 20 cycle mispredict penalty, higher than G4e's penalty of 7. 20 cycles @ 1 GHz is MORE than 7 cycles @ 1 GHZ...
    It is pretty simple math, my friend. So, if you truly knew what you were talking about, then you wouldn't have made such a long, blatant, incorrect, and uniformed post.
  • it will be interesting to see if intel implement any of the design ideas + ip from the aquisition of Compaq/DEC's alpha architecture.

    ideas from the EV bus protocol to scaling. My guess is since processor design has such a long term for each cpu, that future designs are fairly well hard coded, intel couldnt just drop in any compaq IP 'just like that'.

    so now intel have alpha technology and arm technology.. imagine the combination of the two! what a hybrid processor that would be.

    aaah for those that dream anyway...

    i would be interested to read a form of comparision to sun's usIII on a technical design level.

    Write your Own Operating System [FAQ]!

  • The G4's performance ramp has fallen *way* off of Moore's law and looks to be an already dead architecture.

    Uh, Moore's 'law' states that the number of transistor per given area doubles every 18 months, it says nothing about "MHz".

    Quake III is the only truly fair real world benchmark

    Quake3 is more of test of the video card then the CPU's, you'd have to use software rendering to give a fair comparision.

  • 68000 is a toy nothing else
    Do you know that a majority of video game machines in the late 80s, through the mid 90s ( and maybe even now, not sure.. ) used the 68000 chipset?
    Eg: Neogeo used two 68k chips.

    Be careful what you say.

    Dante.
  • that is possibly the most incoherent ramble I have ever read in my life
  • >>68000 is a toy
    >video game machines

    QED. Besides, most of the consoles with 68K were early '90s consoles, not late '80s consoles.

    Either way, the CPU doesn't handle most of the game; the TVIA, PPU, GPU, GS, or whatever they call the display circuitry does. But in case you wondering what CPU(s) your system uses:

    • Atari 2600: MOS 6507 (6502 without A13-A15 or interrupts) at
    • NES: MOS 2A03 (6502 with on-chip DMA and sound but without BCD mode) at 1.8 MHz
    • Sega Master System: Zilog Z80 at 3.6 MHz
    • Game Boy: Sharp Z80 (slightly different set of undocumented instructions) at 4 MHz
    • Game Boy Color: Sharp Z80 at 8 MHz
    • Genesis: Motorola 68000 (It's 32-bit!) at 7.8 MHz + full Master System.
    • Super NES: WDC 65c816 (if you call 68000 16-bit, this is 8-bit) with on-chip DMA at 2.8/3.6 MHz + Sony SPC700 (bitch to code for but somewhat similar to 6502) at 2 MHz.
    • Capcom Play System: Motorola 68000
    • Sega Saturn: dual Hitachi SH2 + Hitachi SH1 (CD controller) + Motorola 68000
    • Atari Jaguar used a Motorola 68000 plus two other CPUs.
    • Game Boy Advance: full GBC + ARM at 16.8 MHz
    • Sony PlayStation 1: MIPS at 36 MHz with external fixed-point unit
    • Nintendo 64: MIPS at 93.8 MHz
    • Sega Dreamcast: Hitachi SH4 at 200 MHz plus some odd Saturn components
    • Sony PlayStation 2: Emotion Engine (MIPS plus some vector processing units) at 300 MHz + full PS1
    • Nintendo GAMECUBE: Motorola PowerPC Gekko at 400 MHz
    • Microsoft XBox: Get a PC instead.
  • "The wide and shallow approach of the PPC certainly means that less clock cycles are needed than the narrow and deep approach of the x86."

    Remember, the article is about only one "x86" processor: the P4. Clock speed still matters, just not as much with that particular processor. There is a performance penalty to be paid due to this design philosophy. A similarly clocked P3 will eat a P4's lunch because of it. Let's not even get started with what a similarly clocked Athlon does :) I think that Intel's thinking was: "Who cares if it might be less efficient on a per-clock basis, this thing'll ramp up to such a high clock speed that it's inefficiencies won't matter anymore."
    By comparison, the G4 and Athlon are very efficient with their clock cycles.
  • In a talk at UT Austin, one of the Pentium 4 architects claimed that it has the best branch predictor of any CPU, but they haven't published the details yet.
  • Well everyone knows... they screw you at the drive-through.
  • See how they bite.

    I actually meant to spell it "jedge". The Southern-Midwest US pronounces certain words in an odd way, like "acrost" rather than "across" or "jedge" instead of "judge". It seemed to me like it worked when I hit Submit.

    You think I'm a fuckwad. The moderators didn't listen to me and wasted their (not there or they're) points instead of using them on more important pieces (not peaces). And what do (dew?) I need karma for (four/fore?)?

    woof

    ObL(inux): WDR in Germany ran a neat -- albeit somewhat chopped up and overdubbed -- interview with Linus today (Saturday afternoon). Better still, there was little talk about MS, the interviewer and translator seemed to understand the different meanings of "free" and there weren't a load of quoteheads (c.f. Rush Limbaugh's sycophantic "dittoheads") interrupting the discussion with cheers or boos every time Torvalds took a breath.

  • "Fuck" comes from the German word "to strike"? Don't think so. That would be "hauen" or "schlagen" -- maybe "treffen" -- or a compound based on one of those.

    The English word comes from Old Norse "fikk" (or "fykk, either way) and it meant then what it does now. It did vector into the language in an interesting and colourful way. Modern Norwegian still has the word.

    When everyone does something (save for a few religious people by choice and half the readership here -- not necessarily by choice), there's gonna be a word for it.

    The German word for that now is... [drum roll]... "Ficken". Of course, there's "bumsen" (bouncing), "flachliegen" (laying flat) and a couple others, but I make my point about the beauty of English when I tell Germans about "bumping uglies", "knockin' boots" and "the horizontal two-step". I was out West too long.

    woof.

  • "This article [americanpartisan.com] is a beginnurs refernce to thee impotence off english speling. What is nice is that it is not a holy war of who is better but an explaination of why speling is impourtint It has sum nfo on useige, useful for everyone." Not for the pedantic novice, but its not two badlee written piece if you're reasonably hyumin and want to understand more about sum things abowt you're langwidje.

    Sorry, Taco, but it's getting worse. Think about how piss-poor spelling completely screws up the ability to search for anything, anywhere. That ought to be reason enough for you and everyone else to at least CONSIDER spell-checking.

    We're geeks, and we all hate being judged on what we look like or what weird idiosyncrasies we have, yet many of us have also learned the hard facts of life: people jedge based on what they see. Bad spelling == worthless. If you can't be bothered to check what you write, why the hell should I be bothered to read it?

    Aren't you supposed to be a programmer or something? Yeah? Then how the fsck do you get anything to run (besides the debugger) if your syntax is even close to its English counterpart and your variable names never have the same spelling on any two lines?

    woof.

    Mod: -2 Pedantism, -1 Taco-spell-flames no longer amusing, +2 Interesting, +2 Insightful, etc.
    Total: +1, exactly what it would be posting after logging in, so don't waste your mod points here.

  • No... YOU don't matter because YOU, sir, are a braindead moron... There, I feel better. Yes, I would have said that to your face. So SUE me, and call it flamebait in the first degree.
  • I only have one problem with your comparison. The RC5 rate that the G4 is able to achive using AltiVec is sweet, and best bar non clock for clock. (My Duron 950 pulls 3.3MKeys/s, approx the same as the G4 - 400). But what you fail to take into account with the P4 is that this new architechture requires a decent compiler, and software to be optimised to run properly on it. I doubt that this has been done with the P4. (None the less i still want a Dual G4 to replace my Duron, and a TiBook to replace my iBook).


    How every version of MICROS~1 Windows(TM) comes to exist.
  • Aren't you supposed to be a programmer or something? Yeah? Then how the fsck do you get anything to run (besides the debugger) if your syntax is even close to its English counterpart and your variable names never have the same spelling on any two lines?

    I was just thinking the very same thing... not because I read your post, but because I've been thinking about a senior software engineer at work. He can't spell to save his life, his penmanship is horrid, and he mispronounces words (even technical ones) quite often.

    I'm amazed when I see him type code at his keyboard! Everything seems to come out as intended (i.e., correct). He's a damn good engineer, too.

    You know... I have a feeling this happens more frequently than we'd like to think.

  • Hmm... if you had actually read and understood the post you are replying to, then you should be ashamed of yourself. The poster even said that part of the design philosophy behind the P4 was so that it will reach higher clock speeds, and thus the Intel engineers figured that the benefits in clock speed outweighed the performance hits. Personally, I think that the more elegant chips are nicer, although perhaps not always faster. Sometimes, people can't realize that brute force is not always the way to go...


    -------
  • (a deltic so please dont moan about spelling but the content)

    Hmm, you may in fact be related to a triangular alluvial deposit at the mouth of a river. Or the fourth letter of the Greek alphabet. Either way, I really wish you could use at least minimal puncutation.

    --

  • No, this is an IBM part.

    Yeah, and it's 485 MHZ. The original post by yerricde had too many errors to bother with.

    --

  • A mass-market consumer OS would help, too.
  • Except that the famous Commodore Amiga series..

    The Amiga was not a Commodore. Sure, it had the company name stuck on it. But it was designed by Amiga, Inc., and marked the end of innovation for the company that brought us the "Personal Electronic Transactor".

    Fuzzy

  • It doesnt matter, it has to be one of the most important chips on the market to be used in a sentence like that in the first place.... and in his an mine opinion it isnt even that.
  • No processors wouldnt be faster over all, the only way to speed up processing of single threads is by clockrate... there is only so much paralellism you can extract on the fly, not a whole lot, after that clockrate is the only way to speed things up.

    The only alternative is going to explicit parallelism, simply making a slightly slower but wider superscalar processor (the old Cyrix approach) doesnt get you very far.

    The biggest problem with explicit parallelism is programming, multithreading is a very poor method... unfortunately its the only one most programmers accept.
  • That would be funny. Intel's chips start selling better because their internals are now more like AMD's. The irony.
  • by zephc ( 225327 ) on Saturday July 07, 2001 @04:01AM (#102375)
    for different uses!

    The G4 is meant to be usable in embedded systems, while the P4 is meant to be usable as a space heater

    =P
    ----
  • One thing that I got from this article is why we shouldn't be depending too much on clock-speeds for comparison, and thus the fact that PPCs aren't yet available at clock speeds of x86 shouldn't really matter. The wide and shallow approach of the PPC certainly means that less clock cycles are needed than the narrow and deep approach of the x86.

    This is false. The key point that the P4 is design for clock ramping to maintain Moore's law. The G4's performance ramp has fallen *way* off of Moore's law and looks to be an already dead architecture.

    Greater clock still leads to high minimum lantency between instructions, and this gives the P4 a hugh lead. The only disadvantage of the longer pipeline, which makes branching somewhat more costly.

    The G4 is *NOT* a very wide architecture compared, say, to an Athlon.

    Now I know that they only tests that really matter are the real world tests, simply because at a user level that's the only real place that I'll notice the difference.

    Quake III is the only truly fair real world benchmark that you can run on both of these machines. I believe the x86s just wipe the floor floor with iMacs.

    Paul Hsieh

  • I like my P4...i can cook a fried egg on it or warm my coffee :)
  • This is false. The key point that the P4 is design for clock ramping to maintain Moore's law. The G4's performance ramp has fallen *way* off of Moore's law and looks to be an already dead architecture.

    As I recall, Moore's law is about the SPEED of computers. Not just the MHz. If (as the article says) G4s run at comparable speed to P4s, even though they have lower MHz, HOW has that fallen off Moore's law?

    Quake III is the only truly fair real world benchmark that you can run on both of these machines.

    Fair, assuming that it is equally optimized on both platforms.

    I believe the x86s just wipe the floor floor with iMacs.

    Since when has the iMac been made for gaming?

  • http://www.apple.com/imac/specs.html [apple.com]

    iMacs run on G3 processors.
  • Uhhhhhh... wouldn't these be considered toys?
  • Its a Faustian bargain thats what!
  • > The specs for the G4 are irrelevant while you can't buy a system board for the thing for under $2,500. Brand new PowerMac G4s start at $1,699. If you want the G4e the article discusses, then I think you have to go up to the 733MHz or Dual 533MHz models. All this will change at Macworld New York in a couple of weeks. The 733MHz model will become the low-end system.
  • "No C= computer did."
    Except that the famous Commodore Amiga series used most of the 68000 series. A500,A1000 and A2000 used the 68000. A3000 used 68030 and 68040 (A3000T), and A4000 used 68030 and 68040. Later revisions once C= went down the drain were sold with 68060 as well, a ver elegan chip if I you ask me.
  • by WIAKywbfatw ( 307557 ) on Saturday July 07, 2001 @04:21AM (#102384) Journal

    "2 of the most important chips on the market"

    Jeez, why do people have such a bad grip of the English language? Is it really that hard to understand?

    Yes, "two of". As in "not exclusively of". Yes, the Intel Pentium 4 is one of the most important chips out there. And yes, so is the AMD Athlon. But so it the Motorola G4, and so for that matter is the upcoming Intel Itanium.

    Now if the description of the article said "the two most important", I could understand your gripe. But it doesn't. And besides, haven't we already seen dozens of similar comparisons between Intel and AMD processor families?

  • A similarly clocked P3 will eat a P4's lunch because of it.
    Yeah... show me where I can buy a stable running 1.7Ghz P3 system then.
  • ...of the fact that as of this writing, there were 142 comments on "Nuclear Booster Rockets," 107 on "The Sliderule as Paleo-Geek Artifact," 114 on "Colorado May Map Drivers' Faces"... and merely 79 on this one. And of those 79, few are worthwhile technical commentaries on the actual article. What's happening here, eh?
  • Would it not be possible (and here i show my lack of knwoledge...) to do something akin to emulating a multiprocessor system, but within a single chip. I guess this is coming back to the explicit parallelism thing, but would it not be possible to have a specialised chipset to deal with it, so that you get round programming problems?
  • I felt it necessary to post some conclusions to the article, as the author completely failed to do this. Otherwise the article was very well written - if you are fairly new to this stuff, don't be put off by Taco's assertation that this is Not for the techie novice. I know very little about processor architecture, but learnt allot from this article.

    Some of the /.ers out there who are more au fai with this stuff than myself may want to correct me on some of the following points.

    Basically, the clever folks at the Intel marketing department realised that the only thing the General Public know about processors is G/MHz. Therefore this is their only point of comparison between processors in the fragmented AT market (obviously, the G4 does not suffer from this competition, which reflects in the differences in architecture). Therefore the techies at Intel were given the orders: "make the clock speed as high as possible (and also make the processor fast!)".

    Clearly, the architecture of P4 was thus designed to break up long instructions into many shorter instructions (over-simplification) which which can each be completed in a shorter single clock cycle. This leads to a 'long-pipeline', of many instructions:

    Since each stage always lasts exactly one clock cycle, shorter pipeline stages mean shorter clock cycles and higher clock frequencies. The P4, with a whopping 20 stages in its basic pipeline, takes this tactic to the extreme

    However, using this longer pipeline leads to problems - especially when the processor doesn't have any instructions - thus causing a "bubble" which has to propagate right down the long pipeline, and also when the "branch prediction" (i.e. the prediction of what type of instruction to use on the data) is wrong - again causing a delay as the 'bad' instructions propagate through the processor.

    Of course the clever guys at Intel came up with some novel solutions to this. This includes:

    -Using larger Branch History Table - which includes record information about the outcomes of branches that have already been executed, which helps in branch prediction.

    -The trace cache - Which is used for storing translation or decoding logic for the L1 cache, which is particularly useful for blocks of code that are executed thousands and thousands of times.((this reminds me of MMX, although I think that worked in a different way. Any ideas why MMX isn't used anymore?????)) ...there's no delay associated with looking it up and hence no pipeline bubble

    -A special microcode ROM that holds pre-packaged sequences of uops so that the regular hardware decoder can concentrate on decoding the smaller, faster instructions. This stops these longer instructions from polluting the trace cache.

    -Some others that i forgot/understood even less well?????

    This all seems to be an interesting case of the public's perversion for clock speed subverting processor architecture (although not necessarily in a bad way).

    Would processors be faster "overall" (im sorry, that's terribly vague) if there wasn't such a push for faster clock speeds???

    --The real Marcus Brody doesn't have a Slashdot ID

  • Bah. The patent office say it's already been done [slashdot.org].
  • From what i can understand, Intel has designed the P4 to be able to run at HIGH CLOCK SPEEDS regardless of the actual performance improvement. They astutely see consumers going for an easy metric.

    So, in the same spirit, i have my offering for cpu design: a simple divider on the clock input. This would only take two transistors and yet the processor would double in clock speed! The 3GHz chip is here already . Now, how do i patent the idea?

  • by Waffle Iron ( 339739 ) on Saturday July 07, 2001 @07:37AM (#102391)
    After reading this article I think that history is repeating itself. I've been scoffing at the P4, but now I think that Intel may be laughing at the end.

    If you remember then the Pentium Pro came out, people (including me) dissed it because it was years behind schedule, huge, expensive and hot. Actually, its architecture was just ahead of the process technology curve. With a few tweaks, the same CPU core came to dominate the world with the P-II and the P-III.

    Looking at the radical changes in the P4, including storing only uOPs in the instruction cache and reserving (currently useless) pipeline stages for speed-of-light cross chip delays, they are planning ahead for future realities. We can think of the current P4 as being like the Pentium Pro, just a short-lived beta release.

    The more interesting question is which approach to driving uOPs will win out: P4, Transmeta or Itanium. P4 and Transmeta convert legacy x86 opcodes to internal wide architecture on-the-fly (P4 in hardware, transmeta in software); Itanium makes the compiler generate wide architecture directly. Note that the original pre-translated instruction format (CISC, RISC, Java bytecodes, whatever) is now largely irrelevant.

    My view is that in the abstract, Transmeta has the best approach, followed by P4 and Itanium last. This is because the software approach is the most flexible and can even be upgraded in the field. In theory, it could detect and store the individual performance characteristics of each program on a user's machine. Granted, they currently focus on low-power, but if they retargeted their technology at high speed, it could be interesting.

    The P4 approach is hardwired, but at least it can adapt to local code characteristics and translate them to the current internal architecture version.

    The Itanium exposes low-level chip details to the compiler, and the decisions are cast in concrete from there on out. It doesn't seem very future-proof to me; if the IA64 architecture changes in the future, today's compiled code will suffer.

  • As someone else has surely mentioned, the G4 is used in a lot more places than Apple's various computers. While this particular version of the chip may not be, you will find a lot of the G3/G4 series processors inside automobile computers, routers, the Tivo boxes, etc.

    Motorola makes most of these processors for embedded applications. Intel makes processors for the embedded market, but they don't get the same publicity.

  • Perhaps I am thinking too broadly about the product line...I know that a good number of PowerPC processors end up in some of the newer routers and switches on the market (Cisco uses various Motorola processors in their boxes, for example).
  • The article is pretty specific to the next-generation G4e and not the G4.
  • A pipeline can't make you worse off. You can only lose what the pipeline tried to let you gain.... Nothing more. So a 20 cycle penalty is not worse than a 7 cycle penalty. Its pretty much exactly the same.
  • by revxul ( 463513 )
    "Then how the fuck do you get anything to run"
    f - u - c - k
    descends from some German word meaning, "to strike".
  • then I think you have to go up to the 733MHz or Dual 533MHz models.

    Actually, the dual 533mhz are G4's. The currently shipping G4e's are the 667MHz and the 733MHz.

    There was some talk when the 667 and 733's came out about current compilers not being geared towards really exploiting them.

    The 667 i believe has been discontinued (the dual 533 was such a sweet buy in comparison) and the rumors are shortly the 733MHz will become the low end, with their being an 866 and a 933 (potentially).

  • Is anyone else suprised that the G4 core seems so vanilla? The difficulty of making a 4 stage pipeline run at upwards of 733 MHZ on a .25 or .18 micron process is pretty amazing. I'm impressed.

    That has been part of the problem with the existing G4's, and also one of their biggest benefits. I don't know how closely you've followed Mac tech, but god... the things were stuck at 500MHz for so, so long, it was pathetic. Because they couldn't get the clock speed up, Apple eventually had to add a second processor without raising the price at all (which was nice, but because of OS 9 not many apps could benefit).

    Because the pipelines were so shallow, they were fast as hell but ramping them up was causing all kinds of problems, and the rumor is altivec just made it worse. Apparently just getting up to 733MHz wasnt very easy.

    Moto wants cool, shallow pipelined chips for their embedded market and while this has been a boon to apple in some ways (you have g4 portables! imagine a pentium 4 portable) it is just annoying when you buy their systems and want the speed.

    The rumor now is that apple has taken on alot of the design for the new g4's and g5's, to better suit their needs and not motorola's.

Let's organize this thing and take all the fun out of it.

Working...