Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Graphics Intel Technology

Larrabee ISA Revealed 196

David Greene writes "Intel has released information on Larrabee's ISA. Far more than an instruction set for graphics, Larrabee's ISA provides x86 users with a vector architecture reminiscent of the top supercomputers of the late 1990s and early 2000s. '... Intel has also been applying additional transistors in a different way — by adding more cores. This approach has the great advantage that, given software that can parallelize across many such cores, performance can scale nearly linearly as more and more cores get packed onto chips in the future. Larrabee takes this approach to its logical conclusion, with lots of power-efficient in-order cores clocked at the power/performance sweet spot. Furthermore, these cores are optimized for running not single-threaded scalar code, but rather multiple threads of streaming vector code, with both the threads and the vector units further extending the benefits of parallelization.' Things are going to get interesting."
This discussion has been archived. No new comments can be posted.

Larrabee ISA Revealed

Comments Filter:
  • by symbolset ( 646467 ) on Saturday April 04, 2009 @06:01AM (#27456217) Journal

    Your post can be summarized as: Intel Giveth; Microsoft taketh away. That's been the formula for far too long.

    And that period is almost over.

  • Re:End of an era (Score:3, Informative)

    by Repossessed ( 1117929 ) on Saturday April 04, 2009 @06:30AM (#27456323)

    Intel actually tried to build a different leaner, instruction set. IA64, the market rejected it.

    Via and AMD don't have much trouble implementing these instruction sets either, or adding their own, so this doesn't much represent a stranglehold move on Intel's part.

    If you really want cheap small processors with no extra instruction sets, Intel does still make Celerons, I dare you to run Vista on one.

  • by GreatBunzinni ( 642500 ) on Saturday April 04, 2009 @06:34AM (#27456339)

    When performing limit analysis, the lowest supported load calculated through the plastic limit (see limit analysis' upper bound theorem) is the lowest possible load that causes the structure to collapse. Then, if we compare it with the static limit of said structure (see limit analysis' lower bound theorem) we can pinpoint the exact resistance to failure of a structure and, from there, optimize it and make it safer. Which is a nice thing to do in terms of safety and cost.

  • Re:End of an era (Score:3, Informative)

    by Hal_Porter ( 817932 ) on Saturday April 04, 2009 @06:49AM (#27456387)

    Actually the key patents on x86 probably run out soon. x64 has always been licensable from AMD. And an AMD or Intel x86/x64 chip has been at the top of the SpecInt benchmark for most of the last few years. Plus Itanium killed of most of the Risc architectures and x64 looks likely to kill off or nicheify Itanium.

    Meanwhile NVidia are rumoured to be working on a Larrabee like chip of their own. Via have a ten year patent license, by which point the architecture is rather open. And Larrabee shows a chip with a lot of simple x86 cores is good enough at graphics for most people to not need a powerful GPU. I bet a Larrabee like CPU would be great in a server too, and it's trivially highly scalable by changing the number of cores.

    I'd say x86/x64 will be around for a long time.

  • by joib ( 70841 ) on Saturday April 04, 2009 @07:25AM (#27456489)

    There are lots of instructions and other craft inside 80x86 processors that occupy silicon that is never used. A clean break from 80x86 is needed. Legacy 80x86 code can run perfectly in emulation (and need not be slow, using JIT techniques).

    All the legacy junk takes up a pretty small fraction of the area. IIRC on a modern x86 CPU like Core2 or AMD Opteron, it's somewhere around 5%. Most of the core is functional units, register files, and OoO logic. For a simple in-order core like Larrabee the x86 penalty might be somewhat bigger, but OTOH Larrabee has a monster vector unit taking up space as well.

    What I like most about Larrabee is the scatter-gather operations. One major problem in vectorized architectures is how to load the vectors with data coming from multiple sources. the Larrabee ISA solves this neatly by allowing vectors to be loaded from different sources in hardware and in parallel, thus making loading/storing vectors a very fast operation.

    Yes, I agree. Scatter/gather is one of the main reason why vector supercomputers do very well on some applications. E.g. scatter/gather allows sparse matrix operations to be vectorized, and allows the CPU to keep a massive number of memory operations in flight at the same time, whereas sparse matrix ops tend to spend their time waiting on memory latency when you have just the usual scalar memory ops.

    The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.

    There is the "restrict" keyword in C99 precisely for this reason. It's not in C++ but most compilers support it in one way or another (__restrict, #pragma noalias or whatever). That being said, I'd imagine something like OpenCL would be a more suitable language for programming Larrabee than either C, C++ or Fortran. Functional lnaguages are promising for this as you say, of course, but it remains to be seen if they manage to break out of their academic ivory towers this time around.

  • by 4181 ( 551316 ) on Saturday April 04, 2009 @07:43AM (#27456547)
    It's probably worth noting that although the actual article uses neither the acronym nor its expansion, ISA in the story title refers to Instruction Set Architecture [wikipedia.org]. (My first thoughts were of ISA [wikipedia.org] cards as well.)
  • Re:End of an era (Score:5, Informative)

    by SpazmodeusG ( 1334705 ) on Saturday April 04, 2009 @07:55AM (#27456593)
    Look i hate to be anal, but neither Intel nor AMD have been at the top of the SpecInt benchmark for a long time.
    The stock IBM Power6 5.0Ghz CPU is the fastest CPU on the specint benchmark on a per-core basis (and before that it was the 4.7Ghz model of the same CPU that was the leader).

    http://www.spec.org/cpu2006/results/res2008q2/cpu2006-20080407-04057.html [spec.org]
    Search for: IBM Power 595 (5.0 GHz, 1 core)
    Which is telling considering it's made on a larger process than the fastest x86 (the i7). It really shows there's room for improvement if you ditch the x86 instruction set.
  • Re:End of an era (Score:4, Informative)

    by turgid ( 580780 ) on Saturday April 04, 2009 @07:55AM (#27456595) Journal

    Intel actually tried to build a different leaner, instruction set. IA64, the market rejected it.

    It wasn't lean at all. It it typical over-complicated intel junk. Just look at the implementations: itanic. It's big, hot, expensive, slow...

    If you really want cheap small processors with no extra instruction sets, Intel does still make Celerons, I dare you to run Vista on one.

    The Celerons have all the same instructions as the equivalent "core" processors, they just have less cache usually.

    This Larabee thing doesn't sound much different to what AMD (ATi) and nVidia already have. A friend of mine has done some CUDA programming and, form what he says, it sounds just the same. Just like a vector supercomputer from 10 years ago.

  • "Would you believe a GOTO statement and a couple of flags?"

    How about a while loop and a continue statement?

    In C, a continue breaks out of only one nested while or for loop. If you're in a triply nested loop, for example, you can't specify "break break continue" to break out of two nested loops and go to the next iteration of the outer loop. You have to break your loop up into multiple functions and eat a possible performance hit from calling a function in a loop. So if your profiler tells you the occasional goto is faster than a function call in a loop, there's still a place for a well-documented goto.

    C++ code can use exceptions to break out of a loop. But statically linking libsupc++'s exception support bloats your binary by roughly 64 KiB (tested on MinGW for x86 ISA and devkitARM for Thumb ISA). This can be a pain if your executable must load entirely into a tiny RAM dedicated to a core, as seen in the proverbial elevator controller [catb.org], in multiplayer clients on the Game Boy Advance system (which run without a Game Pak present so they must fit into the 256 KiB RAM), or even in the Cell architecture (which gives 128 KiB to each DSP core).

  • by julesh ( 229690 ) on Saturday April 04, 2009 @08:29AM (#27456697)

    If developers are too stupid to code for it, it won't go anywhere. This is sounding a lot like the PS3 architecture in complexity.

    There are several problems with PS3 programming that don't apply to Larrabee:

    * Non-uniform core architectures. Cell processors have two different instruction architectures depending on which core your code is intended to run on. This causes quite a bit of confusion and makes the tools for development a lot more complex.
    * Non-uniform memory access. Most cell processor cores have local memory, and global memory accesses must be transferred to/from this local memory via DMA. Larrabee cores have direct access to main memory via a shared L2 cache.
    * Memory size constrains. Most cell processor cores only have direct access to 256K of memory, so programs running on them have to be very tightly coded and don't have much spare space for scratch usage.

    Any application that's reasonably parallelisable is going to be pretty easy to optimize for larrabee. Most graphics algorithms fit into this category.

  • by Anonymous Coward on Saturday April 04, 2009 @08:40AM (#27456741)

    nVidia G80 is scalar in the sense it's not VLIW (like ATI is), but it still has 32-wide SIMD. (Likely to go to 16 in next generations). 32*16 is actually 512 bits too.

    Doing a truly scalar architecture would have an enormous cost in instruction caches - you'd need to move as many instructions as data around the chip, and that won't be cheap. So SIMD is going to be around for a while.

    Nice try, more research next time.

  • Re:End of an era (Score:4, Informative)

    by SpazmodeusG ( 1334705 ) on Saturday April 04, 2009 @08:54AM (#27456809)
    I said on a per-core basis!
    The Xeon X5570 is a quad core machine!
  • by gnasher719 ( 869701 ) on Saturday April 04, 2009 @09:40AM (#27457073)

    Articles states that there's hardware support for transcendental functions, but the list of instructions doesn't include any. Anyone know what is/isn't supported in this line?

    "Hardware support" doesn't mean "fully implemented in hardware".

    What hardware support do you need for transcendental functions?
    1. Bit fiddling operations to extract exponents from floating point numbers. Check. 2. Fused multiply-add for fast polynomial evaluation. Check. 3. Scatter/gather operations to use coefficients of different polynomials depending on the range of the operand. Check.

  • Re:End of an era (Score:3, Informative)

    by godefroi ( 52421 ) on Saturday April 04, 2009 @10:25AM (#27457331)

    Here's a little secret:

    Lots of games (maybe all of them) already include graphics-vendor-specific rendering engines. It's just that nowadays your graphics API isn't your whole game development toolset (glide), so it's easy to include support for both (all) vendors.

  • Re:End of an era (Score:3, Informative)

    by forkazoo ( 138186 ) <<wrosecrans> <at> <gmail.com>> on Saturday April 04, 2009 @11:19AM (#27457689) Homepage

    This is misinformed B.S. Itanium didn't kill anything.

    That was (and is) triumphant march of Linux/x64 all the time.

    Itanium killed high end MIPS years before anybody was talking about x64. You mentioned PA RISC, and Alpha was dead in-practice long before HP ever had it to officially declare it dead. Itanium killed a lot of good architectures.

  • by mrfaithful ( 1212510 ) on Saturday April 04, 2009 @01:09PM (#27458495)

    If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor!

    I don't see why it should be one or the other - maintainability is important, as is using optimal algorithms. Fast algorithms can still be written in a clear and understandable manner.

    Up to a point, then you've got to make a choice. Keep the high level OOP constructs, or flatten it out to make the compiler's job easier.

    THEN you have the next level of optimization, keep the readable code or do it the "clever" way that nets a 40% boost. And as any experienced coder will tell you, clever code is the antithesis of maintainable.

  • by Pseudonym ( 62607 ) on Saturday April 04, 2009 @10:57PM (#27462349)

    As a structural engineering in training who is starting to cut his teeth in writing structural analysis software, these are truly interesting times in the personal computer world.

    There's one drawback to the current crop of CPUs, though. More cores per die means less cache per core. So depending on what you're doing, this could actually degrade performance (all other things being equal) over older SMP machines.

I tell them to turn to the study of mathematics, for it is only there that they might escape the lusts of the flesh. -- Thomas Mann, "The Magic Mountain"

Working...