Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Technology

RISC vs. CISC in the post-RISC era 119

S. Casey writes "Ars Technica has a very cool article up that takes on the typical RISC versus CISC debate. The author argues that today's microprocessors aren't RISC or CISC, really, and covers the historical/technical reasons why these two distinctions aren't particularly useful anymore. It's pretty convincing (to me). " Essentially, the author argues that it is difficult, if not impossible, to have the normal debate because both chipsets have evolved features that used to be found in the other chipset.
This discussion has been archived. No new comments can be posted.

RISC vs. CISC in the post-RISC era

Comments Filter:
  • by Anonymous Coward
    Actually a single VAX instruction, with suitably chosen indirect operands which also cross the right page table boundaries can generate up to 52 page faults. Actually if you have to walk the page tables because of a TLB miss, any load/store can cause several page faults, even on a pure RISC machine depending on the way the page tables are implemented. You have to count these to reach this figure on the VAX, otherwise you can reach 26 (only!) with 6 operands instructions using indirect addresses. IOW a single VAX instruction can require as many as 26 TLB entries for the virtual addresses specified in the instruction and the instruction itself, versus only 2 on load/store architectures (3 if they allow misaligned accesses that cross page boundaries).
  • by Anonymous Coward
    RE: VLIW I mentioned VLIW as the trend in next-gen architectures near the end of the article, and I felt that the article was lengthy enough without adding to it by going into a discussion of VLIW. RE: Superscalar execution I got a few polite emails from Ars readers concerning Cray's use of superscalar execution in the 80's, so I corrected the article. My apologies to the poster here who thought he had misquoted me. You actually quoted me correctly the first time. RE: ISA vs. Implementation The fact that I glossed over this distinction in my article was a major omission on my part. This has been the most consistent and valid criticism leveled at the piece thus far. The question of whether or not RISC is ISA or implementation is really what many consider to be at the core of the debate. John Mashey feels that RISC is indeed only about ISA, but his argument for this stems from a different place than mine. Mashey argues that current CPU architects are using the term "RISC" to refer strictly to an ISA with certain features, and since people in the trade are using it this way (or so he claims), everyone should be using it this way. So he mainly appeals to current usage among architects. (This is kind of an oversimplification though, as Mashey also appeals to the historical development of certain architectures. I myself concentrated on the historical development of the debate itself.) He also gives a specific feature list for what he feels RISC is--something that I look at and think, "welcome to the club. Everybody and their uncle has a pet feature list, and you can go round and round about it all day." And in fact, people do go round and round about it all day, even in comp.arch. The very fact that Mashey has a page devoted to the debate is a testament to the fact that it's a recurring issue in the newsgroup. I don't pretend to know as much about computer architecture as Mashey, or many of the others who debate this topic in comp.arch or in other places. However, once again, I felt that a historical approach would provide a context in which people could make their own decisions about RISC and CISC. I don't feel that feature lists and hard-and-fast rules have are really beneficial anymore, because they've been done to death. Indeed, I was very hesitant to even include the simple side-by-side feature list that I did. My intention was to give an overall feel for the development of the debate and then to make some claims at the end based on how I interpreted the history. I think that both the Ditzel quote and the fact that someone felt it necessary to make a comp.arch RISC vs. CISC page show that not all architects agree with Mashey about the ISA vs. implementation issue, so I don't feel too presumptuous in disagreeing with him myself. RE: an anticipated question. The question I expect to get from this reply is, "How do you reconcile your claims that you're not providing exact rules for defining RISC with the fact that in the end you conclude that today's so-called RISC machines aren't truly RISC." The answer I'd give to this is prepoderance of historical evidence. Notice that I didn't dwell on any one post-RISC feature (except /maybe/ OOO) as voilating some "defining principle of RISC." I simply presented them all and said how I felt they were each a new and different answer to a new and different problem, and so it didn't make much sense to define them using an old term that signified older answers to older problems. --Jon "Hannibal" Stokes Ars Technica http://arstechnica.com/
  • by Anonymous Coward on Thursday October 21, 1999 @05:27AM (#1597120)

    The author of the article rightly notes that a basic design philosophy difference is where the burden of reducing run-time should be placed. The original RISC philosophy was to place the burden on software--this is especially true of (V)LIW processors--whether the programmer, the compiler, or the set of libraries. CISC (or more rightly "old style") design philosophy sought to place the burden on the hardware. Effectively microcoded CISC is like RISC with a fixed set of library functions in a specialized read-only high-speed cache. (The obvious limitation of this being that the library is fixed in hardware.)

    Separating load/store into a specialized (memory) instruction nearly forces fixed instruction size (the real mark of RISC). Yet this also works with the general idea of modularized (and potentially superscalar) processing--of course, the same could apply to microcode. Fixed instruction size has a consequence for memory access. A CISC (variable length) instruction fetch would either have to load a maximum-instruction-size number of bytes or use multiple fetches for longer instructions. (One could reduce the performance loss of this if the first load included the opcode and later loads would contain (optional) register ids or immediates (hard-coded values). Since the register ids might not be needed until a second or third stage of the pipeline, the load delay would be invisible--the next instruction fetch would forward the values to the appropriate place in the pipeline. This would make some instructions very fast with a low memory bandwidth--the number of additional values needed might vary from 0 to 3 ("save state," e.g., might require no arguments, while increment might require one argument.))

    The RISC vs. CISC debate also concentrates on general purpose processors. In a single, purpose processor (with a stable, well-defined operation set), a direct-execution CISC design would make sense because the "library" of code is so small that hardwiring it (not merely placing it in ROM for a microcode implementation, though such would also be more efficient that a RISC design, it seems) would increase speed. This, of course, assumes that the design costs are low enough to justify making a specialized design for a specialized function. (With improved design tools and efficient low-output manufacturing, this could become more common. Of course, some are looking to processors that dynamically change their wiring to allow changes of specialization without replacing the hardware.) Certain classes of Digital Signal Processors would seem to fit into this category. 2D and 3D video accelerators--which rely on a generalized processor for certain functions--might be another reasonable CISC application (well defined problem, very limited instruction set required, relatively stable algorithms).

    A true post-RISC processor (for general purpose computing) would take advantage of hardware scheduling (which allows scheduling based on the current data--something even a compiler cannot do), compiler optimization (which would probably include (V)LIW probably with some predication information), and programmer knowledge (a high level language and set of libraries that allows the programmer to share knowledge of the design with the compiler). IA-64 seems to place too little burden on the programmer (perhaps rightly based on the lack of time put into writing code well--by open-sourcers who often just want to make a working program (not considering performance, cooperative development, system integration, etc.) and by 'commercial' programmers who are often trapped by unrealistic deadlines (it is faster in the short term to implement a kludge than to design (write) or research (find another's implementation) good code)) and excessive burden on the compiler (this MIGHT be reasonable since Intel can control the compiler--that being a single unit of coding). IBM's Power-4 seems to emphasizing programmer cleverness (multithreading, multiprocessing, tight libraries) and to a lesser extent compiler cleverness.

    What I wonder is why no system seems to support vector-processor-style non-unit stride in the cache (This could perhaps be implemented by having a vector mode which took, say, one set of a 2-way set associative cache and used it for non-unit offset entries. Say every N cache lines might be associated with an address, offset, and length. This would make certain vector processor functions work very well in general purpose processors.), why a sticky bit is not provided for cache (This would allow code fragments that are known to be reused somewhat frequently but would otherwise be removed by other loads.), why memory is not segmented into non-virtual-addressed kernel, non-virtual-addressed common library, and virtual-addressed application space (This would seem to allow some of the benefits of embedded systems without losing the benefits of virtual addressing where it makes sense. The compiler would use a table of system-call locations and common-library-function locations so that at linking, the jumps would be to these absolute locations. Of course, changing the libraries or kernel would require relinking all applications, but this might be relatively quick and the frequency of having to do such could be reduced by placing libraries like libc (commonly used, very stable implementation) at the beginning of the memory space and using small functions or even empty space to pad larger functions. Placing the kernel memory space onto a processor daughter card would also reduce the card-board memory traffic. Using separate kernel caching might also reduce application cache misses after a kernel invokation. The memory access style of kernels might also be used to design that specific memory system--the kernel (and the common libraries) are not paged out to disk, so the instruction memory could be designed for very slow write and fast read. (Whether the improved performance would justify the cost of a specialized memory system is questionable.)

    Paul A. Clayton (not quite AC)
  • by MacJedi ( 173 )

    Its interesting what the article says about the G4. I hadn't thought of it before, but that velocity engine (vector unit) must have a whole lot of instructions...

    Has anyone gotten their G4 yet?

  • tumeric is talking about the CDC 6600 (Scoreboarding) and the IBM 360/91 (Tomasulo). I believe Cray was still at CDC for the 6600 (but someone correct me if I'm wrong).

    The rule in computer architecture seems to be that everything was done back in the '60's and IBM holds all the patents. :)

    None of the stuff discussed in the artical was particularly revolutionary at the time of RISC I. Bringing it all together on a microprocessor was the missing element.

    --

  • Well, throw transitor count in there, and see just how much that 8080 (or 8008? I know you can boot Z80 cp/m on a 486, so...) compatability costs Intel! It is HUGE, the x86 is a tribute to the skill of Intel's designers and manufacturing, and of course the money they get from the high margin volume that they sell.

    It's like making a flat head harley do 200 mph, sure you could, given absurd amounts of effort, but I can go buy a new bike that with a small effort will crack 200 mph (heck, the new ones will do it stock with the right conditions).

  • ... because it's got polynomial equation solving built into its instruction set!!

    In grad school, I worked on the Pixel-Planes graphics supercomputer at UNC. Among its unique features were large arrays (256 to a chip) of 1-bit processors, with special hardware for solving second-degree polynomial equations. The polygon edges, texture maps, etc. were all turned into plane equations, and the resulting values solved per-pixel with one such processor per pixel. Pretty darned nifty eight years ago...
  • For the n'th time, ArsTechnica's RISC article is at least halfway clueless. The article ignores the definitive article by John Mashey, available on-line at http://www.inf.tu-dresden. de/~ag7/mashey/RISCvsCISC.html [tu-dresden.de]. Two major points Mashey makes (and which "Hannibal" botches) are
    • RISC or not is about arcitecture, not implementation
    • RISC is really about having an architecture whose instructions pipeline cleanly, and which responds to the demands of actual workloads.
    If it's possible within an architecture to generate more than one page-fault within an instruction, then you're not on a RISC (the record seems to be a VAX 3-operand memory-to-memory indirect-indexed instruction with memory-based indexes and offsets, which can generate up to 47 (!) page-faults.

    If you take the point of view that a P6 is a RISC core running an x86 interpreter, then still the user-visible architecture is not RISC. It would only be RISC if you let me program the core directly with its native micro-ops. "Hannibal" still doesn't understand this distinction between architecture and implementation.

  • by bhurt ( 1081 ) on Thursday October 21, 1999 @05:54AM (#1597126) Homepage
    First of all, FP doesn't add all that many instructions to an architecture. The alpha has about 6 FP instructions- load fp, store fp, add, subtract, multiply, divide. The PPC (ingoring SIMD) takes this up to about a dozen or so- two of which have as their sole purpose in life making sin() etc. fast to implement (three instructions, vr.s a dozen or so on the Alpha).

    Second, there are two _different_ optimization problems chip designers face, generally at different times. The least common optimization is the "clean slate design"- where the chip designers don't have to support anything, and can draw the boundaries wherever they make sense to be drawn. In essence this what what the RISC designers of the eigthies did. The other optimization problem the chip designers are handed an architecture and a set of existing programs and told "make them go faster".

    Super-scalar Out of Order execution, branch predicition, more functional units, and speculative code execution are all optimizations you can apply to an existing architecture *without changing the (apparent) semantics of that architecture*- i.e. without breaking legacy code. New instructions allow "new" or recompiled code to gain a performance boost without dropping support for old code (SIMD and DSP-like instructions just happen to be all the rage these days). So of course they're applied to both legacy RISC and legacy CISC applications!

    Of course these "patches" are not as effective as fundamentally rearchitecting the CPU. Of course they increase the complexity of the CPU in much greater proportion than they increase performance. This doesn't imply some "ideological impurity", however- this is the fundamental problem of supporting legacy code. This articles thesis boils down to "there are only legacy CPUs out there!". Which, for the moment is true.

    But let's consider for a moment what a rearchitected CPU for today would look like. What we'd like to do is to continue the trend RISC started- of shoving the complexity off the CPU and onto the compiler. It would be sort of accurate to claim that RISC's central idea was to shove the complexity of the translation to microcode onto the compiler.

    Today's CPU complexity comes primarily from the patches applied to make the legacy code run faster- especially superscalar execution, branch prediction, and speculative execution- all of which require the CPU to deduce information out of sets of instructions. It'd be nice to have the compiler _tell_ the CPU the data ahead of time, so the CPU wouldn't have to spend precious clock time and transistor budget deducing. This, of course, implies a method for explicitly communicating this information in the instruction stream (the only channel of information between the compiler and the CPU)- older instruction sets (of all stripes) forced the CPU to deduce this information because there was no channel in the instruction stream for communicating it.

    If this is begining to sound like the Itanium, you're right. Wether this is the right way to go, only time will tell (and, on advice on time's lawyers, time has no statement to make at this point).
  • Sounds like a snake with a speech impediment.

    Seriously, it's not really possible to have a genuine "HISC" environment. Either the underlying layer is basic or it isn't. That is what's really important, not what is visible from "outside".

    The RISC architecture follows the idea that you have as -few- instructions as possible, thus making it faster to search for what to do.

    If you were to, say, have a whole load of RISC systems wired together, in such a way that they appear to be a CISC, in terms of the instructions available, it's -still- RISC. You just have (effectively) a Beowulf of RISCs.

    On the other hand, if you have a translation layer, converting CISC instructions into RISC ones, you have a CISC architecture, because all you've done is move the search (which is the crucial difference in philosophy) from one layer to another. It's still present. (The previous case didn't have a search mechanism, because it doesn't need one. You can filter to multiple destinations without needing any kind of search.)

    It's that search that makes the crucial difference between whether something is RISC or CISC. If the search is complex or lengthy, it's CISC. If it's simple -and- quick, it's RISC. There are no other cases.

  • Register to memory, memory to register and register to register are all perfectly valid RISC operations. Indeed, you HAVE to have R->M and M->R if you're going to program in assembly at all!

    (Unless you have Processor-In-Memory architecture, you -HAVE- to load your data into the processor somehow!)

    Multiple addressing modes - provided your instruction search time isn't impacted, these don't change whether a processor is RISC or CISC. The whole point of RISC is reducing the overheads of processing. So, if you don't add any overheads, you're not changing the type of processor.

    Variable length instructions - Depends on how it's implemented. Remember, the key is the overhead. If the maximum size of instruction can be fetched in a single transaction, it's quite irrelevent as to whether you've fetched 1, 2, 4, 8 or a million bytes. It becomes CISC if you have to do multiple fetches and parse the data.

    Instructions which require multiple clock cycles - you'll find even the early ARM and Transputer chips (the very ESSENCE of RISC design!) had opcodes which took more than one clock cycle. Indeed, it's impossible to do anything much inside a single clock cycle. (Even basic operations, such as adding two numbers, or fetching a word from memory, only just fit into that, and sometimes not even then. The 8086 took 2 clock cycles to do either of these.)

    The problem, I believe, is the changing definition of RISC and CISC, =NOT= what chip manufacturers are doing. I use the classic definitions that I learned when RISC architectures first appeared in Britain, where (IIRC) the idea was pioneered by companies such as ARM and Inmos.


  • I wish all the folk calling the author of this article an idiot would read the damn thing first!

    All their posts seem to go "The author's an idiot and I'll tell you why" and then proceed to explain exactly what the guy said in the article in the first place! Read the whole thing, please, before you post, and then realise that he isn't slagging off your favourite baby.

    dylan_-


    --


  • RISC's still take three instructions to load, alter, and store memory alterations, CISC load and store does it in one instruction



    You are forgetting that memory-to-memory instructions are a waste of time and bandwidth, and are totally out of favor now that CPUs are so much faster than RAM.



    The question is not RISC / CISC, as even Intel says RISC (soon VLIW / EPIC), but using RISC to emulate CISC seems to be quite inefficient if a PPC at 400 MHz can keep up with a 700 MHz chip.

  • by Effugas ( 2378 ) on Thursday October 21, 1999 @06:36AM (#1597131) Homepage
    Before I say anything, I want to commend Hannibal [mailto] on an absolutely excellent article that clarified issues I thought I understood and illuminated much of the technological history behind the technology we each use every day.

    I am completely impressed.

    That being said, I'd like to take a moment and theorize on the direction microprocessor design is likely to go. This is my theory; you're welcome to disagree and in fact eagerly await commentary from those far deeper in the industry than I. Insert Slashdot Self-Correcting Nature here.

    Of all the chasms in the computer world, there are few as vast as the speed differential between general purpose processors programmed to execute a given task and hard-coded ASICs(Application Specific Integrated Circuits) designed to meet the functional needs of a given process. (OK, granted, Internet -> Local Network -> Hard Drive -> System Memory -> Processor Cache -> Processor Registers is pretty vast too, but cut me some slack here.)

    Telephony is a joke without ASICs--I haven't found a voice over IP solution that operates in software well enough to even be used as a room to room intercom over a 100BaseT Lan--but it's actually reasonably lag-free with hardware encoding.

    Similarly, huge banks of boxen rendering frames for movies became significantly less impressive to me when I realized how many banks of Pentium Processors it would take to match, say, a single Voodoo 2. While, in recent times 3D Rendering has gotten shots in the arm on the general purpose x86 architecture via both MMX and KNI, the order of magnitude difference in speed makes CPU rendering of realtime 3D graphics almost useless.

    (Then again, Sumea [sumea.com] is probably the single coolest thing I've done with Java, short of Mindterm [mindbright.se].)

    As I observed in the Amiga newsgroup, shove a couple of custom ASICs in a box and you can run a highly competitive multitasking OS in 512K of RAM, with unmatched graphical support to boot.

    But ASICs have their limitations--while they're fast at what they do, they're extremely inflexible. You can't merely program in a new transparency algorithm, nor implement Depth of Field in an architecture that totally lacks it. The inflexibility of ASICs dooms their long term viability.

    CPU's are flexible but slow, ASICs are inflexible but fast. It's a dichotomy the industry is on the verge of smashing.

    I dub the coming processor design specificiation(which, as the article correctly noted, is all RISC/CISC really are) XISC, for eXtensible Instruction Set Computing. XISC essentially specifies that the underlying computational structures--be they microcode or raw gate arrays--ought to be dynamically reconfigurable to meet the needs of the process.

    Just as the lack of a quick bilinear filter function(SIMD stuff) on older Intel chips doomed them as far as efficient 3D in relation to customized ASICs, the ability to insert such a command directly into the internal microcode of a processor has a theoretical chance of executing at extremely high speeds for a non-dedicated processor.

    Transmeta, also known as the only reason many people willingly acknowledge the US Patent Office, appears to be spearheading the XISC drive. Their patents refer to technologies that automatically cache microcode translations, that provide backwards-flow in case of a broken emulate, and so on. They've often been "accused" of developing a chip that can emulate any chip--in the XISC context, a chip optimized to execute the instruction set most required by any given process.

    If you accept that performance drops in the orders of magnitude are suffered when a processor lacks the appropriate design for a given set of requests, it's quite obvious that intelligent designers seeking to execute a quantum leap in system performance would try to allow processors to acquire any necessary designs to achieve much higher speeds.

    Of course, most of my chip designer friends would be happy to remind me that much of the speed of ASICs comes from their hard coded nature--the literal gates correspond to whatever output is desired, no translation is necessary.

    Of course, here's where FPGA's come in. Field Programmable Gate Arrays are chips whose internal gate structure can be rewritten on command, sometimes many thousands of time per second. They can't be clocked as fast as true ASICs, nor are the yields as high, but one quickly morphing chip can do the job of three or four in a digital camera. With at least one company(someone give me a name!) developing a language for programmatically defining instruction sets for a FPGA processor, the technology for XISC is obviously in development.

    Ah, but not all is not fair thee well. In fact, while on the topic of 3D chips, the Rendition Verite chipset had a programmable RISC core, and the chip ended up failing because it could not scale in speed like 3DFX's Voodoo could. Developers could write new 3D instructions, but didn't (in general) because it was just too hard. (Yes, Carmack did.)

    That's why there's such a powerful force towards automation in this XISC evo/revolution, such as the FPGA language and Transmeta's automated Microcode translations that stay in memory so as to speed up future similar instruction requests. In an ideal world, a developer merely compiles a chunk of code that profiles as heavy usage directly into CPU microcode, or at least specifies in some way that a given routine ought to be run through the "special ops" part of the system.

    Whether the world will become ideal is a point of question. Whether we will have instruction sets that morph is almost obvious, it's just a matter of when will the bridge between ASICs and CPU's finally be resolved.

    Yours Truly,

    Dan Kaminsky
    DoxPara Research
    http://www.doxpara.com

  • One missing from the article (which I found interesting and informative) is that while memory has become cheap programs have increased in functionality and complexity. So, more space available but lots more to get in there.

    Increased processing power and memory space is no excuse for sloppy programming and lack of optimisation.
  • There are really only three differences between RISC and CISC architectures.

    1. Accumulator model vs separate destination.
    2. Variable length instructions.
    3. Memory addressing complexity.

    Since most "CISC" compliers and chips have all but abandoned the highly complex addressing modes
    (relagated to slow operation), there really isn't much difference between the architectures today
    except for #1 and #2.

    The main advantage of #1 is to allow the compilers better control of register renaming strategies.
    The main disadvantage of #1 is that the extra operand chews up bits in the instruction word.
    This indirectly increases the instruction cache bandwidth (a bad thing in today's world). In fact,
    if you look at the compressed 16-bit RISC instruction sets (MIPS16, THUMB), they went to an
    8 register accumulator model (hmm, sound familiar)...

    However since today's superscalar processors can execute instructions so fast, the copying operands
    to a temporary accumulator isn't a big deal compared to missing the instruction cache. In
    today's world, #1 and #2 are really tied closely together. In some sense, the variable length
    instruction decode logic can be seen as "cache efficiency" enhancing logic.

    Some may argue that memory addressing modes for the arithmetic functions is slowing things down.
    Although that's true in some cases, in the most common case (stack access in the data cache),
    today's highly pipelined "CISC" implementation is only slightly more complex than reading from a
    scoreboarded register file or a reorder/retire buffer.

    So although they've mostly converged to each other, the 2 operand 1 destination model is still
    useful for the next processing paradigm to come down the track - dataflow processors. You can
    see some of this now in EPIC (IA-64) and the TI-C600 where operation register dependancies are
    encoded in the instruction in increasingly simpiler ways.

    Several instruction set generations from now, registers will probably go away completely and
    simply the instruction dependancies will be encoded. The internals of most super-scalar/
    out-of-order processors already look like this. The register numbers are just references to data
    dependancies and really are just place holders. In this context, separating the operands and the
    destination still makes good sense.
  • Look :"Today's memory is fast and cheap; anyone who's installed a Microsoft program in recent times knows that many companies no longer consider code bloat an issue." Duh!!

    LINUX stands for: Linux Inux Nux Ux X
  • >Effectively microcoded CISC is like RISC with a fixed set of library functions in a specialized read-only high-speed cache.

    Now that just gave me an idea: What if that cache was also writeable? Such a CPU could emulate _any_ architecture. At _full_ speed(anyone?)!


    LINUX stands for: Linux Inux Nux Ux X
  • AltiVect is a SIMD unit connected to the PPC it is more like a separate processor
    for more information about AltiVec read http://www.mackido.com/Hardware/A ltiVecVsKNI.html [mackido.com]
  • In the early days of the RISC the research indicated that relatively few instructions did most of the work. In fact the earliest RISC CPUs had something like 30-40 instructions. Production versions had to do more than most of something so they had, for example in the first Power chips, 170 instuctions. This was in comparison to IBM TCM processors which had over 370 instructions. In the ensuing years the number of instructions in RISC hardware has gone up so from that perspective the difference between RISC and CISC is pretty narrow. As you all know the difference is now in the way that each type handles different types of operations eg. FP vs. Integer while the basic approaches to pipelining, scalar-ality, branch prediction are the same.
  • I'd have to agree that some sort of explict-parallelism approach such as VLIW is the way to go. Compatibility with future parts at the object code level is difficult, but the gains at a particular process node are worth it. I'd rather spend my transistors on functional units than on hardware scheduling engines, since those are expensive. In contrast, I don't have to ship the compiler with every chip. And even if I did, it's software, which is much, much easier to manipulate.

    I say this as someone who codes for a VLIW during my day job. :-)

    --Joe
    --
  • by Mr Z ( 6791 ) on Thursday October 21, 1999 @04:40AM (#1597140) Homepage Journal

    I think most people have realized that RISC CPUs (in the sense of the early MIPS and SPARC designs) have not been made for a long time. Nowadays, the only real remaining differentiator between a "RISC" machine and a "CISC" machine is whether the instruction set is LOAD/STORE based or has memory operands on various instructions. There are several reasons for this:

    • Perhaps the largest reason: Backwards compatibility. If you look at the MIPS and SPARC architectures, they both brought to the table the concept of "delay slots". For instance, a LOAD instruction on MIPS won't write its result until after the second cycle, since the LOAD is pipelined over multiple cycles. (SPARC's delayed branches are similar.) Change the pipeline depth, and you sign yourself up for alot of hocus-pocus to continue playing this charade.

    • The relative cost of operations changes pretty radically over time. For example, when RISC debuted, ideas such as including a multiplier on-chip were out of the question, since they cost too many transistors. So instead, programs implemented multiplies with shifts and adds, or in some cases, with lookup-tables. Nowadays, multipliers are relatively cheap when you consider how much the transistor budget has grown.

    • Memory latency has gotten really bad relative to CPU performance. This is one of the largest drivers of out-of-order issue, actually. IBM invented this way-back-when before caches were feasible. (It escapes me on which machine they did this, but it was one of the old mainframes.) The idea is that you can cope with latency by allowing instructions to run when their data arrived -- whether it's from a slow memory or slow floating point unit. (Those of you with Hennesey & Patterson on your shelf: Go look up the Tomasulo scheduling algorithm.)

    • The cost of communication and control has gone way up as pipelines have gotten deeper and transistors have gotten faster than the wires that connect them. "Complex" instructions which reduce communication, and sophisticated branch-prediction schemes which try to flatten control are attempts to address these issues.

    The point is that these are difficult problems, no matter what type of architecture you have. Even Alpha spends alot of transistors allowing such things as "hit under miss" in the cache (allowing one stream of instructions to proceed while another is stalled waiting for a cache service).

    Every so often, when the gap between the original "programmer's model of the architecture" and "what we can do easily in silicon" gets too wide, it becomes necessary to move to a new paradigm. With this shift comes a new programmer's model. Moving from "CISC" to "RISC" was one such paradigm shift. One could argue that we're due for another one, and that VLIW/EPIC-like schemes are the new contenders.

    Some approaches, such as traditional VLIW, say "We're not going to worry so much about bout presenting a traditional scalar model to the programmer. We're going to expose our complexity to the compiler and let it do its best, rather than play tricks behind its back." These work by exposing the functional units of the machine, removing alot of the control complexity. These are today's "direct execution" architectures. (See TI's C6000 DSP family for a live example of VLIW in the field.)

    Intel's EPIC takes VLIW a step further. It adopts alot of the explicitness of VLIW, but it retains alot of the chip complexity that's required to retain compatibility across widely varying family members. It's too early to know how this will turn out, but I'm somewhat concerned that it does not reduce complexity.

    All-in-all, it'll be interesting to see how this turns out. Who knows what type of architectures we'll be programming in 20 years...

    --Joe
    --
  • Obviously you didn't read the article, but...

    What's RISC about and ISA with AltiVec instructions? You know, single "instructions" that take multiple cycles to process because they do multiple things?

    That's the point. Not that "x86 is now RISC" but that traditional RISC is as dead as traditional CISC. If RISC is a two-story house and CISC is one-story, then the current chips are all three stories.
  • Who designed a processor with a new CISC ISA recently? Nobody, everyone is designing RISC ISA or post-RISC ISA ie VLIW or EPIC or ...

    The fact that one CISC ISA, the 80x86 have managed to stay alive and well (:-)) means that backwards compatibility is sometimes more interesting than raw performance and is an amazing feat done by Intel (helped very much by Windows success).

  • Did you read the article or Hennesy & Patterson ?

    The CISC ISA are very different from the VLIW/EPIC ISAs.
  • John Mashey of MIPS/SGI has written a good description of the differences between RISC and CISC architectures. It is posted to comp.arch on an irregular basis and is available on the web here [tu-dresden.de].
  • It appears that we've managed to seriously impede Ars Technica's ability to service all the requests they're getting at the moment...





    This is my opinion and my opinion only. Incidentally, IANAL.
  • ... because it's got polynomial equation solving built into its instruction set!!

  • The complexity with superscalars is not in the ISA, but in the scheduling. At the most basic level, though, RISC instructions are
    used because it is (effectively) impossible to schedule CISC instructions for out-of-order execution.

    Where have you been? Pentium II and K6 (and now K7) has been executing CISC instructions out of order for years!

    Anyway, I argue that the complexity of superscalars is in the ISA. Most CISC architecture do not embed scheduling information in the ISA, which makes the decoding phase quite complex. RISC helps the situation dramatically because the ISA define each instruction to be of fixed length, so less pipeline stages to find instruction boundary. VLIW's ISA takes responsibility of the scheduling from the processor. EPIC, exposes this scheduling information, the processor still needs to do the actual scheduling.

    I am opposed to VLIW because of

    1) lack of binary compatibility (there workarounds being done in IBM research

    2) difficult to fill in all the slots. I don't care how good the compiler is. Most system and application program languages in use today assumes sequential execution. You'll end of with NOPs and unnecessary loop unrollings, which leads to code bloat.

    The bottom line is about price / performace AND the purchasing power of the consumers. If FPGA technology becomes so cheap that I can get 1 billion gates per chip for only a cent, I'll write my routines in verilog rather than C.

    Hasdi
  • ... is a verifier for the interpreter to the compiler of the vm translator that emits code to the vectorising assembler optimised for the hardware scheduler :-).

    If you look at any high level abstract language (say Python) it goes through a number of stages, each of them designed to feed into the lower layers. The debates about the various schools can be viewed as an on-going bun-fight between the various groups as to who gets the largest slice of the $$ pie and simplified workload. In some ways, the hardware guys have a conceptually easier task, they get to include more of the surrounding chipset. The software language or API developers are forced to explore unknown territory. Witness the fumbled gropings to move beyond OpenGL to higher level 3D scene representation.

    The rather interesting factor is that the OpenSource scene allows flexibility for the software and hardware to be realigned periodically. The example I'm thinking about is the GGI project [ggi-project.org] and the move towards the Graphics Processing Unit as a self-contained CPU instead of an add-on video board. The next step might even be dedicated I/O/media processors combining FibreChannel, TCP/IP, SCSI, XML/Perl/Java engines, codecs, etc ... As Linus pointed out, controlling the complexity of the kernel requires understanding very clearly the minimal protocol that is needed to communicate between the different functions.

    The biggest problem nowadays is not actually technical (tough but doable) but legal. Witness the jockeying around System on a Chip where you have to combine multiple IPs along with the core. Hardware vendors have cross-licensing portfolios and reverse-engineer their competitors to copy the ideas anyway. Linux avoids the problem by making everything GNU and thus designers/engineers can concentrate on the job without fighting with the lawyers, as well as defining prior art (cf with universities rushing to publish the human genome before the commerical mobs fence it off). Given the fast pace of the industry, the market is a stronger judge than any legal protection (why bother protecting something trivial that will be obsolete in a few years?). Perhaps in a few decades, people might look back and consider the millstone the patent system has become.

    The biggest open question IMHO now is how to get multiple internet sites to interoperate. For example, some people might wish to combine customised /. filtering with references to Encyclopeida Britanica or archived news sites. Much as EBay might squeal about sites "stealing" their auction databases (what they want to do in practice), it is a way of creating large aggregated information complexes.

    CPU tricks and speed races will always make headlines but despite the appeal of multi-gigahertz chips, the information backlane will remain a mess until the telcos/cable/sat get their act together.

    LL
  • I remember learning a bit about this in class.

    The point was made that it used to be neccesary to hand optimise code on CISC chips, but on RISC chips it has been to complex for quite a long time, and most modern compilers are better at it than humans, and the optimising by the compiler was done very differently (now that is a technical term - I can't remember the details) on RISC chips compared to CISC.

    I'm always annoyed by some commercial Windows compiler makers who take a long time to add instructions for the new instruction sets in MMX etc chips. I would imagine that these hybrid chips would be even worse - having to add stuff to a RISC targeted compiler might take even longer.

    Of course, I may be way off track here - I slept though most of Uni.

    Corrections are of course welcome. *S*

  • by TheDullBlade ( 28998 ) on Thursday October 21, 1999 @07:20AM (#1597150)
    As anyone can obviously see, the old paradigms have failed. Make way for the future, make way for PISC [forth.org]!

    That's right, the wave of the future is the Pathetic Instruction Set Computer.

    (in all seriousness, if this stuff turns your crank, building a computer from standard TTLs is way cool)
  • >The whole idea with RISC is to make instructions so basic that they can (almost) all be completed in a single processor cycle.

    I don't think that's the whole idea of RISC. The basic RISC philosophy, as elucidated most excellently in H&P, is to maximize instructions per second, and if that means a small sacrifice in work per instruction that's OK. Single-cycle execution is probably the most powerful means available to achieve this goal, but there are many others - superscalarity, deep pipelines, reordering, branch prediction, etc. - that apply regardless of how many cycles instructions take.
  • >>When the first RISC machines came out, superscalar execution hadn't been invented yet.
    >
    >Some processors (Cray for one) had been doing this years before RISC came out.

    There's a big difference between vector operations and true superscalarity. Also, RISC vs. CISC only makes sense from the point where instruction sets were truly complex and there was a choice to be made; many early architectures were RISC of necessity.
  • Both the Ars Technica article and the MSU paper to which it refers are very well done, providing an excellent introduction to material covered in greater depth in Hennesy And Patterson's[1] Computer Architecture: A Quantitative Approach.

    My one quibble is that the AT author gives the basic performance formula in what I consider an "upside down" form - time/program rather than work/second - which I think makes it harder to understand some of the later points.

    Like Ditzel, I pine for a return to the days when true RISC architectures existed. The principles involved haven't really changed, and I think some later "evolution" has been misguided. See my recent posts in the "Intel Athlon-Killer" article for more details if you're interested.

    [1] I can never remember which letters in which name get doubled, or which one was at Berkeley and which at Stanford. Apologies for any errors on either count.
  • Your reply seems to indicate the you have gotten your Transmeta chip yet. Now I'm not bitter anymore, now I feel discriminated :-)
  • So we've figured out that neither RISC or CISC were an optimal solution and that we need something new/better. Excuse me for not being impressed by this conclusion: I've been programming for quite a few years now and if there's one thing I've learned, it's that there are no optimal solutions. Every design, every optimization is biased towards some specific situation/bias, evey optimization has a tradeoff which may have far reaching implications. What may work well in one area, might be horrible in another. It' just that in the CPU world architectures have to last for a while, so the general turnaround of fundamental designs is a bit slow, but this is hardly news and should not be surprising to you if you're more than just a causual programmer.


  • ...ALCBSKRISC (pronounced: ALCBSKRISK)
    First of all you can see the benefits by the ease of pronunciation of the name...(I mean, compared to CISC...which is next to IMPOSSIBLE to say 3 times fast)

    Second, it makes sense:
    A Little Complex But Still Kinda Reduced Instruction Set Computing.

    I'm telling you, it'll be the future!
  • Or even better: TTA's, the ultimate RISC. A TTA is a transport triggered architecture and specifies datatransports with operations as a sideeffect, (instead of the other way around). So instead of

    add r3, r1, r2

    you do

    mv r1 to add_unit
    mv r2 to add_unit
    mv add_unit_result to r3


    True, 3 opcodes instead of one, but how simple!
    And if you put a lot a parallel transportbusses in the chip, let's talk about really more parallelism, lookahead features and pipelining!

    It's the logical step after CISC, RISC and VLIW!

    [shamelessplug]Our project [tudelft.nl][/shamelessplug]
  • Obviously you haven't gotten your Transmeta system yet. You just sound bitter.
  • Here is an another article about the difference between CISC and RISC and how much of what we call RISC isn't RISC in the traditional sense.
    This is the linky thing [uregina.ca].
    The article talks about a vast amount of other processor related topics, and incldes the whole RISCSy CISCy thing.
  • by marphod ( 41394 ) <galens+slashdot.marphod@net> on Thursday October 21, 1999 @03:53AM (#1597160)
    THere is a lot a-like in today's RISC and CISC chip designs, true, but there is still a lot that is different.

    The major difference between your two designs include how they interact with memory; RISC's still take three instructions to load, alter, and store memory alterations, CISC load and store does it in one instruction. There are differences in instruction size; where RISC design knows that each instruction is 16/32/64 bits long, CISC allow variable length instructions and fancy footwork/chp design to allow read ahead buffers to work well. The most significant example of this is in the x86 chip series, which still has 8 bit instructions on the legacy registers, but with prefixes and extensions can have 100+ bit instructions, as well. THe system needs, effectively, a parser before feeding the instructions to the actual CPU. And while my background on CISC design is lacking, I can only imagine the design acrobatics to do superscalar/pipelined design for instructions that can do so much.

    While not strictly a RISC/CISC issue, there is also the use of registers. In gross generalities, RISC design is much more apt to use general purpose registers than CISC. There are definitive advantages to each design.

    Yes, this is all 'under the hood' items, but they have a large effect on design; compilers that know of, for instance, the legacy registers from the 8088/8086 and use them primarily, have nice small instructions, and can get the most out of the x86 instruction preloading. THis has been less and less significant with newer x86's (P IIs, P IIIs) but it is still present.
  • I was surfing around and since I ususally hit Blue's - Ars - ABCNEWS - Slashdot in the morning, I saw the link on Blue's and looked forward to reading the article. Then I started waiting for Ars to load. In the meantime I opened another browser and hit ABCNEWS and then Slashdot. I saw the article on here and I thought, "Hey, that's great, they'll get a lot of visitors now". Ars still wasn't loading.

    Then the lightbulb came on... goddammmit, I wanted to read Ars today! I thought their servers could handle a lot, but apparently they went from "Apache on a 200-node Beowulf" to "Win98 running PWS while someone's playing Q3Test" performance. Hopefully the article gets mirrored so that I can get my lovin spoonful of Ars sometime today... at least the forums still work...
  • No.

    With Instruction Set Architectures (ISA) I believe that orthogonal means that an instruction such as add can use any register (in a general purpose register file) as its arguments, whereas with a non-orthogonal architecture, you can't, so you end up with instructions where you have to use certain registers in order to execute a certain instruction, which is hell for compilers!

    I forgot to add that RISC was developed because of compilers, and that only the common compiler instructions were implemented, because compilers are too stupid to work out that you could use a certain instruction to implement a certain function. Compilers on CISC tended to only use certain instructions and not others, making those other instructions worthless! Of course, thesedays you would have hand-coded HALs which are written in assembler and can use the complex instructions (such as 3D-Now! and SSE) and these fuctions are called by programs...

    Ah! The end of the working day is nigh! Time to go and assemble my new semi-computer (gutting ye olde computer for some stuff)

  • RISC does stand for Reduced Instruction Set Chip, but that doesn't mean less instructions, it means less Instruction Formats. Think of how many different instruction formats x86 has, with varying lengths of instructions, non-orthogonal instructions, etc, compared with the simplified instructions provided by RISC processors, which might have as few as 3 or 4 different instruction formats.

    RISC really should have been SISC, for Simplified Instruction Set Chip, but that clashes with CISC, ho hum....

    Remember, you don't get a RISC chip (Reduced Instruction Set Chip Chip)! :-)

    The article was silly really, the author didn't look beyond the word 'Reduced' in RISC, thought it meant less instructions, then saw that most RISC chips have tonnes more instructions than most CISC chips, and arrived at the wrong conclusion. Hell, a simple ARM chip has a theoretical 4 billion instructions (all conditional etc) but there are much fewer general operations.

    RISC:

    • Register-Register operations (e.g., add)
    • Load-Store operation (Mem to Reg LDR, STR)
    • Simplified Instructions, less Instruction Formats
    • Orthogonal Architecture (i.e., add can use any General Purpose Register)
    • Pipelined Architecture from the beginning
    • and many many more, read comp.arch FAQ for more details

    CISC:

    • Variable Length Instructions (e.g., POLY in VAX)
    • Non-orthogonal, i.e., must use register BX for multiply instruction result
    • Many addressing modes... nice when there were no such things like caches and memory was as fast as the processor
    • Good when not many transistors
    • etc

    CISC chips now typically have a RISC core, where instructions such as ADD (contents of memory A), (contents of memory B), (resulting memory location) are broken up into micro-ops, LD A, LD B, ADD A,B,C, ST C

    Anyway, just my (small) point of view...

  • Some years back, someone pointed out the real difference between CISC and RISC processors: If it was released after the 68000, it was RISC. Otherwise, it was CISC.

    While that is an oversimplification, it looks an awful lot like truth, sometimes. For years, processor manufacturers have touted their new "RISC" processors even though those "reduced" instruction sets might have more instructions than the 8086.

    Of course, the real hallmarks of a RISC processors are the load-store architecture, the large general-purpose register sets, and the uniform instruction size, but even those aren't sufficient to give a significant performance advantage to a computer based upon the RISC architecture. In fact, various benchmarks provide evidence that CISC vs RISC has very little effect on the performance of the computer.

    So, where did the RISC vs CISC distinction come from, and why do RISC processors have a reputation of being faster, all other things being equal, than CISC processors? The answer has to do with what is now the prehistory of microcomputing. Back in the dim dawn of history (early 70's) microprocessors were for embedded systems. They were, therefore, designed to minimize part count and that meant optimizing the program space. The early embedded systems were programmed universally in assembly language and programmers typically used various tricks to use space very efficiently because space was more at a premium than time.

    The complexity of those early embedded systems processors was mostly focused on reducing instruction count and instruction size as much as possible. "Bit mining" while it is still around today, was a way of life for the early microprocessor programmers, and the processor manufacturers built processors to facilitate that effort.

    However, in the middle of the 70's, some people started putting these processors into general-purpose computers, and the microcomputer market became significant. That drew the attention of some processor designers who wanted to add some of the advanced performance-enhancing features, like caching and pipelining, from minicomputers and mainframes, to the micros.

    The only problem was that the largest scale of integration available in the 70's was a few thousand transistors. When you've got 4000 transistors in your whole processor, you're going to need to trim unnecessary functionality away from the whole processor if you want to add an on-processor cache that's of some use to somebody. Hence the desire for a reduced instruction set.

    Originally, the idea was to take those transistors that would otherwise go into complex instructions and put them into performance-enhancing features. The loss of memory efficiency was not a problem because they were intended to be put into fairly large, fairly capable, and fairly expensive computers.

    Of course, now the processor designers have silicon budgets of millions of transistors, and the amount taken up by the instruction set is relatively tiny. That means that, in the 20 or so years since RISC processors were first envisioned, the instruction sets of the so-called RISC processors have gotten far more complex and the CISC processors have gotten essentially all of the performance-enhancing features of the RISC processors such that there is no real difference between them any more. Moore's law has made the distinction obsolete.

  • 1. Yes, I have read Patterson, and yes, he is clear about the goal of reducing execution time for instructions.

    2. I am not arguing CISC vs. RISC. I am saying many of the "design leaps" you refer to are infeasible using traditional CISC methods. It was the push towards reducing execution time per instruction in the 80's that made modern processor designs possible.


  • by anonymous loser ( 58627 ) on Thursday October 21, 1999 @04:00AM (#1597166)
    His main evidence is a quote wherein Ditzel is quoted as saying, "Superscalar and out-of-order execution are the biggest problem areas that have impeded performance [leaps]." Obviously the author has absolutely no knowledge of how processors work internally, or he wouldn't say that this is due to the complexity of the ISA (Instruction Set Architecture).

    The complexity with superscalars is not in the ISA, but in the scheduling. At the most basic level, though, RISC instructions are used because it is (effectively) impossible to schedule CISC instructions for out-of-order execution.

    The whole idea with RISC is to make instructions so basic that they can (almost) all be completed in a single processor cycle. In the article, he tries to refute this with a quote from Patterson, but the quote actually refutes the author's point, and the author is too blind to realize it. Twice in the quote Patterson refers to reducing the cycle time for each instruction, but the author says that's not Patterson's point.

    Today's processors take the idea a step further, by trying to execute MORE than one instruction per cycle by providing multiple processing units (the thing that does the actual addidtion, subtraction, or whatever) which can execute instructions in parallel. However, instructions still need to be scheduled so that they can execute in parallel while preserving dependencies.
    The hardware that accomplishes this scheduling is complex.

    IMHO VLIW is the way to go. With VLIW, you do the scheduling at compile time, and remove a lot of the complexity involved with hardware scheduling. Not only do you gain the possiblity of higher parallelism through an increased number of processing units (you can use the silicon previously reserved for the scheduling hardware), but you also can gain a little more since theoretically a complier can spend more time looking for dependencies between instructions, and come up with a more optimal schedule.

    anyway, that's just my 2 cents.
  • Some time ago there was another series of RISC-CISC articles on Ars Technica, well worth the reading:

    RISC vs. CISC [arstechnica.com]
    RISC vs. CISC II: Hellazon fires back [arstechnica.com]
    RISC v. CISC III: The last word? [arstechnica.com]

  • No problem. Frankly, the only reason I came down on you so hard was that I've really grown to like some of the stuff over at ars, and find Hannibal's architecture articles particularly informative, detailed and, most importantly, written well enough for the layman to understand. In fact, they're part of the reason I'm only a mostly-layman these days.

    Oh, and because I find /. gets a bit too civil these days; shaking things up when you're pretty convinced you're right is always fun. ;)

    As for RISC being a true innovation, and not at all obvious at the time: yes, you're absolutely right. It's only in hindsight and with the story smoothed out a bit that we realize that the environment which made "CISC" processors a good design had ceased to exist by the 80's. I don't want to take anything away from the original designers of the RISC philosophy; but still, I very much agree with Hannibal's thesis that that debate has just about nothing to do with the merits of today's CPUs.
  • by ToLu the Happy Furby ( 63586 ) on Thursday October 21, 1999 @11:06AM (#1597169)
    Now let's see. Why is it that the author of this article is so "clueless", as you say?

    1. When the first RISC machines came out, superscalar execution hadn't been invented yet.

      Some processors (Cray for one) had been doing this years before RISC came out.
    Unfortunately, the actual quotation [arstechnica.com] from the article is "When the first RISC machines came out, Seymore Cray was the only one really doing superscalar execution." Whoops. Looks like ya' dropped the ball on that one.

    1. I also think that the ideas behind RISC such as "move the complexity from the chip into the compiler" also apply today and that VLIW is an example of this applied to scheduling
    Well, that's very forward thinking of you. Of course, I guess that makes you clueless as well, because, once again, it's exactly what the author of the article wrote [arstechnica.com]: "VLIW got a bad wrap when it first came out, because compilers weren't up to the task of ferreting out dependencies and ordering instructions in packets for maximum ILP. Now however, it has become feasible, so it's time for a fresh dose of the same medicine that RISC dished out almost 20 years ago: move complexity from hardware to software."

    So this leaves us with your remaining "point":

    1. The fast CISC chips (PII, Athlon) do instruction conversion into RISC...so if the debate is over, its because RISC won -- big time.
    Well that's a brilliant insight you've uncovered there (covered in the article here [arstechnica.com]), but the point of this "clueless" article (which you obviously did not read) was that there no longer is a debate. The term RISC refers to a CPU design philosophy which was invented in 1981. That was 18 years ago. It was intended as a replacement to "CISC", the CPU design philosophy which came about in the early 70's.

    That is to say, the CPUs of today, whether P6 or PA-RISC or G4, and the systems that they are in, bear almost no resemblence whatsoever to either a "CISC" chip or a RISC chip. The only similarity is that the P6's and K7's of the world are compatible with the x86 ISA, which was originally written (back in 1977 IIRC) for a "CISC" chip. Yes, this adds an extra decoding step (to break down instructions into "RISC-like" ops), and yes, theoretically that means increased die-size and complexity, which of course means lower clock speeds. Oh wait--that reminds me: which currently shipping chip has the fastest clock speed?? Oh yeah--the 700MHz Athlon. With a 750 part set to be shipped later this week. And, looks like, a 900 in time for New Year's. One of those ungainly "CISC" chips, huh.

    Hmm...but let's take a look at how all those competing "RISC" chips have used their incredibly simple architectures to keep die size down. Like the PA-RISC, which is, IIRC, about 6 or 7 times the size of a PIII. Or those new G4's, with their impressive yield of 0% at 500MHz. The simplicity of today's modern "RISC" chips in action.

    Now, none of this is to say that the G4 or the PA or whathaveyou isn't a great design. Just that the resemblence of today's CPU's to a true "CISC" or RISC chip is so tangential as to make the categorization meaningless. And as for your "debate"--of course RISC won big time. It came out nearly 10 years after the first chips made with the "CISC" philosophy. As was, IMO, rather compellingly and insightfully explained in the article, "CISC" chips were the best possible solution to the awful state of complilers and memory available at the time. By 1981, the state-of-the-art had advanced to the point where RISC was a better solution. Duh.

    Since then, compilers have gotten better, transistor densities have increased, and RAM prices have plummetted, allowing all the advancements which the author termed "post-RISC". And, looking at the CPU's of tomorrow, we see all sorts of new techniques on the horizon: optimization based on advancements in compiler/software technology, ala IA-64, MAJC, and (apparently) Transmeta; or optimizations based on incresing transistor densities, like some of the new physical parallelization designs that appear to be a couple generations down the road for Alphas and IBM chips.

    But as long as people like you insist on categorizing these chips into meaningless 20-year design philosophies, the tech world will be a more ignorant place.

    What a dissappointing comment.
  • So let me get this straight.

    Let's assume that Transmeta (or the MIT group someone else mentioned around here) is indeed working on something like this.

    A processor that has a core that uses an ISA of one-cycle or few-cycle instructions and all sorts of goody good technology to make it run fast. This will do vanilla, FP and SIMD amongst other things.

    Around that core a translator that takes code in various ISAs and translates it and then profiles it so that it makes sense to the core.

    In other words, one chip to rule them all, one chip to find them, one chip to bring them all and in the darkness bind them :-)

    Benefits? This chip can be re-programmed on the fly to do anything: "Normal" CPU tasks (memory access, bit blits, integer ops), FP-intensive maths, 3D rendering, Multimedia encoding/decoding. However, having a CPU that can be re-programmed to do these things is pretty useless. Reprogramming it will probably take time measured in the thousands of cycles and the scheduling would be a nightmare. But...

    Stick two or four of these babies onto a motherboard and configure them depending on system useage... keep one in "traditional" CPU mode, to do the basic stuff, i.e. run the OS code most of the time. Then, when you run a 3D rendering app or game, you can switch one of these to 3D rendering mode and it will take over the load. Fire up a DVD movie, and another switches to MPEG decoding. And the cherry on the cake.... Fire up a non-native app, and another CPU switches over to the different instruction set. And you get all of this without feeding anything through a bus.

    We're talking about a new approach to computer architecture. Instead of having several high-performance chips of different types inside a computer, with one at the center and the rest residing at the other end of a bus, we get an array of configureable chips that take the role of any specialised processor as the need arises. Which means that if you stick enough of these babies onto a board, you have a system that is good at *everything*. Good at maths, good at 2D graphics, good at 3D graphics, good at multimedia, able to run ANY legacy code, good at anything else they come up with later...

    So let's hear it from someone more knowledgeable: is this feasible?

  • RISC's still take three instructions to load, alter, and store memory alterations, CISC load and store does it in one instruction

    True.

    But keep in mind that RISC is designed to ideally execute one instruction per clock cycle. CISC doesn't care. it may take one instruction for the memory op, but it still takes multiple clock cycles for the op to complete.

    Otherwise there wouldn't be so much effort put into math tricks to get around division. One instruction. approximately 6-12 (or more) clock cycles to complete depending on your system. (if its faster since I took my computer architecture class, someone gimme a link...) Division is notoriously slow.

    "You want to kiss the sky? Better learn how to kneel." - U2
    "It was like trying to herd cats..." - Robert A. Heinlein
  • I was thinking that perhaps he was a Transmeta insider, giving away info, until he put his name at the end :-)

    Listen to this line of thought. We don't have a video proccessor of common VGA and 3D functions in the CPU, as it doesn't make sense for a home system where we might want to upgrade the video card. For the same reason, why not make the module that (on the K6es, at least) translates the CISC instructions into micro-ops & schedules them. Want more Mhz? Replace the micro-op executor (CPU). The company found a problem? (F0 0F, Coma, etc) Replace the module that converts the instructions. Perhaps a newer algo that makes the chips faster w/o needing a faster Mhz core? I'd buy it. A chip on the motherboard that is wired to convert CISC to micro-ops, schedule, and send along to a core would be a very easy way to also enabled multiple instruction sets. This sounds like what Transmeta was doing (their patents).

    This will be more important as IA-64 and the K8 (?) will probably have divergent 64-bit instructions. Why miss out on either's programs? Buy a Transmeta super-CPU :-)

    ---
  • Does the following correct?

    CISC = Complex Instruction Set Computer
    RISC = Reduce Instruction Set Computer

    So, if a computer use few and simple instructions, as a programmer, I'll will treat it as RISC. Thus complex task would require me to think more and code (write) more...

    If a computer give a lot of instructions, and instructions that would help me simplied my code, then I'll treat it as CISC.

    GhostDancer
    ps: Personally, I don't really care what/how/why, it happened in a processor.

  • Quite true...

    But the point is (was there a point?) I differentiate a processor type by its instructions set.

    So the idea looking a RISC and CISC in its core, to me, is rather untrue. Thus, I don't really agree (fully) what the author say.

    And, no, I don't write compiler (studied it during my uni. days only) and I don't think I can write it either.

    GhostDancer
  • So... what is a RICS? What is its 'philosophy'? How do you define a RISC?
  • Hmmm... maybe I should go back and read my old text again...

    thanks for the info.

  • I believe Cray was still at CDC for the 6600 (but someone correct me if I'm wrong).

    You are not wrong. The 6600 was Cray's baby, the machine he always dreamed of making. The 6600 came out in 1964 (?) and I believe Cray was still there through the early 70's.

    Of course, if you look at the instruction set for the CDC 6000 series you recognize that Cray anticipated RISC by a number of years. This observation seems to upset some people for some reason.

  • It means less complex instructions. Basically, you break the CISC style instructions down into simpler instructions that each have a very specific task.

    CISC vs. RISC (in this day and age) has nothing to do with die size or complexity. It's about how the instruction set is designed.

    (PS, altivec adds 162 instructions, iirc.)
  • A fascinating article and a concise (relatively speaking :) summation of the uneasiness I have felt toward the CISC/RISC labels for a number of years.

    One thought: the root cause of the popularity of OOO-execution hardware (and one of the reasons I'm surprised the IA-64 doesn't have it) might be that the companies that make chips aren't always the same companies that make compilers. Therefore, since marketers enjoy throwing around measurements of their chip's performance relative to other chips (with the compiler being merely a footnote, if mentioned at all), the chip maker can't afford to leave the ranking of their product up to the skill of some other company's compiler programmers. Hence, they must take out an insurance policy of sorts by including the capability of OOO execution, which is essentially a way of saying that you don't trust compilers to do their job properly.

    paisleylad

  • Your computer course instructor was partially right. True RISC processors do avoid microcode and exclusively use hardwired control logic.

    But modern CISC implementations use hardwired control logic for most common simple instructions and resort to microcode for the more arcane, multi-cycle CISCy instructions. It is theoretically possible to hardwire all control logic in any CISC implementation but it would be time consuming, buggy, and have little payback. The last x86 to rely exclusively on microcode was the 386.
  • re: CISC vs RISC has very little effecton the performance of the computer.

    Sorry but you are quite wrong. If you compare RISC and CISC processors built with equivalent technology from the mid-80's to today there is an obvious difference in performance. Here's some "best of breed" matchups (RISC vs CISC):

    1) MIPS R2000 vs Intel 386

    2) HP PA-7150 vs 0.8 um (5V) Pentium

    3) Alpha 21064 vs NVAX (same damn process!)

    4) Alpha 21164A vs Pentium Pro

    5) Alpha 21264A vs Pentium III (Katmai)

    In each and every one of these matchups the CISC part comes out a distant second.

    BTW, I am eagerly looking forward to see how the 0.18 um Alpha EV68 stacks up against Coppermine and K7, and how EV7 stacks up against Williamette ;-)
  • Actually I did read the article. The historical background wasn't bad. I even read the university paper linked and that was just a plain waste of time.

    As for your comments, I don't know of a single RISC processor that doesn't take "multiple cycles to process" an instruction. The classic 5 stage RISC scalar pipeline for example takes at least five clocks to process an instruction in its entirety. Or perhaps you're groping in the dark for the concept of instruction execution latency. There was no tablet handed down from Mt Olympus by John Cocke and Seymour Cray that says all RISC instructions must have a one cycle latency. Find me a RISC processor that implements FP arithmetic instructions with less than three clocks of latency. If the Altivec instructions take more than a single clock of latency then pipeline them, tell the compiler what their latencies are so it schedule around them if it can and be done with it.

    You claim to understand it, so please try to explain to this simple engineer why "traditional" RISC is dead? Did someone add a bit count instruction or a reciprocal square root approximate and suddenly the instruction count went over some magic number?* Or maybe its this out of order execution stuff that's got you "new age" or "third way processor" advocates thinking that RISC and CISC have merged. Whip out a programmers reference manual for the 386 and the MIPS R2000. Now do the same thing for the Pentium III and the MIPS R14K. A decade and a half later and the two ISA's don't seem to have gotten much closer to me. It must be implementation then, the block diagrams have much the same names now. I tell you what, if I could get a circuit schematic for the 32 bit adders in the R2000 and the 386 could you examine them and tell me which came from the RISC CPU and which came from the CISC CPU? No? Yet the RISC CPU yielded far higher performance than that CISC CPU even with smaller off-chip cache, older semiconductor technology and one third the transistor count.

    The only people who promulgate the view that RISC and CISC processors have somehow merged into some bizarre hybrid of the two seem to be those under the hypnotic influence of marketing department of the vendors for the few remaining CISC processor families. Those who are actually designing RISC processors don't seem to have difficulty in telling them apart.

    *The only rule in the RISC school of processor design that relates to how many instructions a RISC ISA should or should not is that you don't add another instruction unless your performance increases for your target applications by a worthwhile amount even after considering the impact of adding the new instruction on the microarchitecture and cycle time of likely implementations.
  • Have you ever heard of cache? If you aren't blowing 1.5 or 2 million transistors on an x86 translator to stick on the front end of an OOO instruction execution engine then maybe you'd have more room for cache. And when CPUs start integrating great masses of DRAM on the die (like the M32RD) or doing serious stuff with SMT it won't be bloated x86 cores that will be called on.
  • flamebait?

    I was getting his attention

    If presenting cold hard quantitative facts is flame bait then I am out of here.
  • by chip guy ( 87962 ) on Thursday October 21, 1999 @04:31AM (#1597186)
    I am sick and tired of people who cannot fathom the difference between an abstract instruction set architecture (ISA) and a chip implementation with functional units, gates, and flops. The terms RISC and CISC refer to ISAs. If you build a CISC processor using many of the same implementation tricks that are commonly used in RISC processors then fine for you. But you still have a CISC MPU. RISC and CISC have always shared many implementation details be it triple ported register files, on-chip cache or a 32 bit adder.

    Look at this way. Lets say CISC is analogous to a bungalow while RISC is a two story house. These are architectural differences. Lets say that in the early days of house building bungalows were always built out of brick with load bearing walls while two story houses were wood framed. If some joker comes along and builds a bungalow with wood frame technology it suddenly doesn't make his edifice a two story house even though it may be a big improvement on earlier bungalows.

    While CISC has generally caught back up to near RISC for integer performance once MPU complexity reached about 3 to 5 million transistors the ISA differences do matter. For example, an out-of-order x86 with a translator front end and register renaming might have 16 or 32 physical GPRs instead of the eight architected GPRs. But the compiler cannot address these physical registers, it only sees eight. This means that an x86 compiler will have to spill values to memory much more often than a RISC compiler and it will not be able to exploit performance enhancing techniques such as register assignment to local and globals variables and software pipelining nearly as well as a RISC.

    There is also the baggage effect with CISC architectures. For example, nearly every x86 instruction modifies flag bits as well as GPRs. This means that the flags become a central dependency choke point that requires a lot of attention to address. CISC ISAs also invariably have multiple instruction sizes. This means that a CISC CPU will typically require an extra pipe stage or two to sort out instruction boundaries regardless of how "RISC-like" the backend looks.

    People who believe the Intel party line of "x86 was CISC but is now RISC" should pay attention to what Intel is doing rather than saying. They are busy spending billions to develop a new RISC-like 64 bit architecture to carry them into the future. It is true AMD will stretch x86 to 64 bits but they had to change the x86 floating point programming model to a RISC like flat register file with three address instructions to even attempt to close the distance on FP code. And AMD's future success in keeping up with IA-64 and SMT superscalar RISC implmentations is far from asured.
  • I have designed RISC machines. They are ultimately faster at the same clock speeds since the software designer has closer access to the hardware and can optimize his program. The typical CISC instruction is inefficient since most of its complexity can be unused when it executes. Since RISC cores are simpler they can host more parallelism, lookahead features and pipelining.
  • Of course it's possible to have a hybrid! Look at the powerPC chip. In designing that chip, they tried to follow the risc mentality, having simple instructions and a low orthogonality. They also however wanted to have powerful math instructions, and you end up having instructions like a+b*c as a single instruction... certainly risc.

    CISC/RISC is more than instructions that 'do alot', the complexity of instructions also takes these into account:

    CISC characteriscs:

    Register to register, register to memory, and memory to register commands.

    Multiple addressing modes for memory, including specialized modes for indexing through arrays

    Variable length instructions where the length often varies according to the addressing mode Instructions which require multiple clock cycles to execute.

    It is not hard to imagine a basically RISC machine that allows different memory addressing modes? gee, what do we have now, a hybrid!

  • wow, great article! as for efficient code, perhaps i'm wrong but it seems to have died a bit of a death. for example, i can't really write efficient code in java, since it will run on an intel chip, a alpha chip, a palm pilot (maybe soon), a G4 etc etc. i can try not to use excessive amounts of memory, write efficient ( ie not O(2^n) ) algorithms, but i can't push things around into registers or whatever. in a way, i guess it is a CISC language (i have a very abstract level of instructions that are implemented in whatever way underneath, maybe in one instruciton on a 'CISC' machine but 3-4 on a 'RISC', for example). but java, jini, upp or whatever (which all do pretty much the same thing) are the future for everything you see around you (i think anyway). does there particularly need to be a distinction now? everyone wants abstraction (windows/xwindows is really just abstraction) and this would seem to include chip architectures. i realise the guys designing the chips have to make a decision, but from everyone elses point of view, does it matter?
  • Basically, you break the CISC style instructions down into simpler instructions that each have a very specific task

    That's what I always thought.

    What I learned from my last computer design course, was that RISC implemented all instructions in hardware. While CISC implemented all instructions in microcode.

  • Did anyone else notice that the code generated by the author's hypothetical H Compiler produces buggy code on the ARS-1 architecture? The code for "B = CUBE(A)" calculates A^4 rather than A^3 (in addition to its unintended side effect of modifying register A).
  • One of the points implicitly made in the article is that big on-chip caches running at (or close to) full processor speed act a lot like big banks of registers. A Pentium II might look, superficially, like its got only a handful of general-purpose registers, but your average C compiler treats it like its got a whole gaggle of them, indexed by %ebp.

    From this point of view, the difference in the way "CISC" and "RISC" chips interact with memory is that, with a CISC chip, memory is implicitly used as a backing store for register contents while, with a RISC chip, the backing store is managed explicitly.

    In practice, then, even this seemingly fundamental difference may not be that big a deal.

  • Before you rant, read the entire article.

    That way you'd understand that the article's purpose is not to debunk RISC, but to point out that the distiction today is almost meaningless in the framework of the old RISC v CISC debate.

    The author's "main evidence" as you call it, is merely laying out the main question of the article, ie is there a difference nowadays? As you read on, there are numerous examples cited of the blurring distinction, eg G4's with the Velocity Engine (162 _extra_ instructions), and the Pentium II, a classic CISC CPU, with its internal microcode (RISC) architecture.

    Cheers,
    Justin.

  • Cool.

    I mean, they're making CPUs so darn complex these days it's hard to appreciate the real work that the processor is doing. Example: I haven't got a clue what the PIII under my hood is doing, and I don't want to, because it would involve so many complex subjects that my mind would either explode or doze off. Now the PISC as described in that article is my kind of chip (or maybe it's not a chip if I try to build it, but same idea of CPU). So what if it runs at 5 MHz? That could be fixed--just speed up the underlying circuits. And at 5 MHz, my brain might actually have a chance at keeping up with it. Skimming the page was enough to give me the idea that "I could build this thing," even though I have very little experience in actually building electronic circuits. I'm not a chip engineer, obviously, or an electrical engineer, and neither is the average computer user. The average computer user, however, might be interested to know that they actually have a shot at figuring out what that magical black (or grey) box in that case is doing.

    Demystify the processor. But don't call it Pathetic. I'd call it Understandable. Unfourtunately it's even harder to pronounce UISC than CISC or RISC, so make it Technically-understandable Instruction Set Computer, or TISC (cm) (Cool Mark).

    Kenneth Arnold

    PS - If you are going to have a shot at looking at some neat technology, I thought that the K7 Article [arstechnica.com] also on Ars Technica was pretty cool. It was well written and provided a good metaphor. Still, it barely scratched the surface at telling me what is really going on down there.

    PPS - I could have said more, but my hard drive is making funny noises and I better investigate.

  • I still don't know what to think about the G4. Is it faster than the PIII because it has more (total) instructions? Does it just appear faster, or is it really? The test numbers given at the various review sites don't seem to give a good overall picture. The article confused me on this issue.

    Still, the Velocity Engine, in theory as well as, it seems, in practice, is a very good idea (I think... Is this true?), so why didn't Intel or AMD or Cyrix have it a long time ago?

    I have a whole bunch of opinions on this subject, but any opinions stated here are merely coincidental. Don't feel like you actually have to answer my questions either; I don't want to start another flame war.

    Kenneth

    PS - I'm debating whether or not to post this anonymously because of the flame bait issue. I guess I'll post as myself, trusting other users to be smart enough to keep their mouths shut.



  • Moderate this up. (Yeah, I know, it's only been here for less than 2 hours.)

    His article is cool, and that's a good enough reason to moderate his post up. And that he didn't notice that the default Slashdot style is HTML (so he did \n's instead of br's) is no reason not to; we can still understand it. It addresses issues talked about on the top of the list, and as I write this it's on the bottom.

    The only problem is that when it does get to the top of the list, this post will be irrelevant. I think Slashdot should have a way for posters to delete their own messages (but keep any effect they have had so far on their karma). If anyone agrees with me, you can go e-mail one of the Big Guys (tm) about it. Of course, you couldn't do anything about the Anonymous Coward posts.

    Kenneth Arnold




  • It dosen't seem to me like there was ever, or ever will be, any real difference; the complexity of chips must increase over time as new functions are added. Every once in a while, someone will say "These are too complex! Let's design a new chip, but give it a minimal amount of instructions! We'll call it Reduced Instruction Set Computing!". Basically, like most other things, it swings like a pendulum.
    _____________
  • Sorry, I read that far too quick and deserved to get my arse kicked. Nice detailed reply.

    One point that needs to be made lest history is re-written though:

    And as for your "debate"--of course RISC won big time. It came out nearly 10 years after the first chips made with the "CISC" philosophy. As was, IMO, rather compellingly and insightfully explained in the article, "CISC" chips were the best possible solution to the awful state of complilers and memory available at the time.

    When RISC came out it wasn't seen as a simple logical progression. I even heard it refered to as a fad that would disappear when transistor counts improved. There were tons of arguments for keeping CISC (it does more work per instruction, RISC processors were only faster because they had more registers, the early Sparcs couldn't do multiplication to save their lives ...) and I remember being called an idiot by an Intel engineer for saying that CISC was not going to last. Digital engineers were also very vocal in damning CISC when the Alpha came out. This was because so many people were defending CISC.

    I also think that the attitude that RISC is just an ISA (as seen in a lot of replies) is a software persons argument. The thing doing the work in PIIs and Athlons is running simple instructions and is RISC whether it is doing the can-can or emulating x86 as far as the outside world is concerned. That is why some hardware and marketing people think of them as RISC.

    Sorry for the disappointing comment.

  • I was moved to send in a reply that the author of the article was clueless -- thanks for getting there first.

    When the first RISC machines came out, superscalar execution hadn't been invented yet.

    Some processors (Cray for one) had been doing this years before RISC came out. RISC actually makes superscalar execution easier because there is more consistency in the instructions and they don't hit as many functional units in a single instruction.

    The fast CISC chips (PII, Athlon) do instruction conversion into RISC, under the bonnet, for this and many other reasons. So if the debate is over, its because RISC won -- big time.

    I also think that the ideas behind RISC such as "move the complexity from the chip into the compiler" also apply today and that VLIW is an example of this applied to scheduling and the branch hints on the Alpha is another example of this applied to branching.

    There is also promising work done on MISC (Minimal Intruction Set) with simple chips doing lots of instructions per second.

    That was a very disappointing article.

  • It's not. This is the second one that went this way along with a few posts. I guess it's more of a popularity contest than anything. Someone didn't like my .sig or something.


    SL33ZE, MCSD
    em: joedipshit@hotmail.com
  • What I wish the author made more of a poit of is that this "debate" between camps has been moot for years. I think it can be argued that RISC in its pure sense refers only to those old early 80's CPUs, and everything since then is part of an evolution from the origonal principle. People may like the warm feeling of saying that their chip is part of a greater cadre of RISC(=good) parts, but where ever we are now, we've surely evolved to somewhere else from the old terms. The author also seems too willing to let his writing "make a point" and put people in their place, instead of just trying to educate... Maybe he's hanged out in too many forums where folks are endlessly bickering. I read Patterson and Henesey, - wonderful book... anyone interested in the subject should pick up a copy. E
  • In many instances, for example in the telecom field, FPGAs have already beat general purpose processors. Many ATM (Asynchronous Transfer Mode, get your mind out of the bank) switches use FPGAs in the crossbar switch (multi-way high-speed data connections with FIFOs (first-in-first-out buffers), for those who don't know what crossbar switches are). Some printers use them to control and buffer the digital video raster that gets sent to the printhead. When there's a mechanical, not pausable, time-critical, function involved, it's usually more expensive to purchase and program a general purpose processor to meet the real-time needs, than it is to purchase and program an FPGA.

    Programming an FPGA to play tic-tac-toe is just silly. Possible, but silly.

    This message buzz-word compliant.
  • MIT has a lab researching pretty much exactly what you are talking about, except in a new hardware-based architecture they call RAW, and it's part of the Oxygen project (you know, make computing as pervasive as oxygen...). There was an article in Scientific American a couple of months ago (August 1999). Here's [scientificamerican.com] the url to the issue.

    This is the actual site at MIT [mit.edu]: www.cag.lcs.mit.edu/raw/ [mit.edu]

    The Oxygen project is part of the Labortatory of Computer Science at MIT, but it's kind of a pain to get direct info off of MIT's site. The best I could find is a list of articles of LCS in the news, that includes the Scientific American articles. www.lcs.mit.edu/news/inthenews/ [mit.edu]

    It seems to be a new project (started in April) with a lot of money (38 million from DARPA), and, I'm sure this will piss some of you off, a new 20 million dollar facility donated by - you guessed it - Bill Gates.

    From these articles, I get the impression that the technology goes a ways beyond FPGA's, because it is actually a whole bunch of processing units tiled together, and they plan to run multiple "programs" at once by varying the paths through the tiles. So, the chip could be a custom video chip and a radio and a cell phone at the same time. The last line in the article kind of sums it up:"...within a couple of decades there will be only three kinds of chips in the world: Raw chips, memory chips and, of course, potato chips." As long as we still have potato chips...

  • Where is he making these arguments? Were are you getting this information from?

    All I can see is you making claims that the author didn't make. He tries to move the debate to a useful level. If you want to talk about an architecture whose instructions pipeline cleanly, aren't you talking about performance oriented concerns? And if so, why are you unwilling to look at the aftereffects of implementation.

    As Ditzel has said, the theory is useful in practice because you end up with a FINAL PRODUCT that is every bit as wieldy as CISC.

    Now, you can do what many RISC supports do and try to continually refine what RISC means, but all you are doing there is playing "essentialist" games and missing the point: there's more in common now between so-called RISC and CISC CPUs than anyone ever imagined. If that's the case, then the distinction is hardly worth what marketers and trumpeters on web sites make it out to be.

It was kinda like stuffing the wrong card in a computer, when you're stickin' those artificial stimulants in your arm. -- Dion, noted computer scientist

Working...