Larrabee ISA Revealed 196
David Greene writes "Intel has released information on Larrabee's ISA. Far more than an instruction set for graphics, Larrabee's ISA provides x86 users with a vector architecture reminiscent of the top supercomputers of the late 1990s and early 2000s. '... Intel has also been applying additional transistors in a different way — by adding more cores. This approach has the great advantage that, given software that can parallelize across many such cores, performance can scale nearly linearly as more and more cores get packed onto chips in the future. Larrabee takes this approach to its logical conclusion, with lots of power-efficient in-order cores clocked at the power/performance sweet spot. Furthermore, these cores are optimized for running not single-threaded scalar code, but rather multiple threads of streaming vector code, with both the threads and the vector units further extending the benefits of parallelization.' Things are going to get interesting."
An architecture named after a Get Smart character? (Score:2)
"Would you believe a GOTO statement and a couple of flags?"
Re: (Score:2)
Bet they've got some serious CONTROL structures to keep things from getting too KAOTIC.... "Would you believe a GOTO statement and a couple of flags?"
How about a while loop and a continue statement?
Where continue may fail with a nested loop (Score:5, Informative)
"Would you believe a GOTO statement and a couple of flags?"
How about a while loop and a continue statement?
In C, a continue breaks out of only one nested while or for loop. If you're in a triply nested loop, for example, you can't specify "break break continue" to break out of two nested loops and go to the next iteration of the outer loop. You have to break your loop up into multiple functions and eat a possible performance hit from calling a function in a loop. So if your profiler tells you the occasional goto is faster than a function call in a loop, there's still a place for a well-documented goto.
C++ code can use exceptions to break out of a loop. But statically linking libsupc++'s exception support bloats your binary by roughly 64 KiB (tested on MinGW for x86 ISA and devkitARM for Thumb ISA). This can be a pain if your executable must load entirely into a tiny RAM dedicated to a core, as seen in the proverbial elevator controller [catb.org], in multiplayer clients on the Game Boy Advance system (which run without a Game Pak present so they must fit into the 256 KiB RAM), or even in the Cell architecture (which gives 128 KiB to each DSP core).
Re: (Score:2)
FWIW, I believe setjmp/longjmp are the closest C equivalents to exceptions.
Re: (Score:2)
Since you specifically asked about what I said above, I'll repeat it here. An alternative to breaks and gotos in C, which gets you functionality more like exceptions, is the setjmp.h header's setjmp (equivalent to except) and longjmp (equivalent to raise or throw).
http://en.wikipedia.org/wiki/Longjmp#Exception_handling [wikipedia.org]
Not when you're working for a living (Score:2)
You don't have to use separate functions in C. C does have GOTO's, thank you very much :)
The subset of C enforced by many employers' coding standards lacks the goto keyword.
I'd be happy if someone could tell me a better way
If you can't get your boss to amend the coding standards to allow use of goto to handle exceptions in C, the better way involves leaving your employer. But that isn't practical in this recession.
Re: (Score:2)
The only note is that you can't jump to the end of a block.
I'm pretty sure an empty statement (which is a statement, after all, and that's what's supposed to come right after a label) should do the trick:
{
...
...
goto foo;
foo: ;
}
np: Autechre - Are Y Are We (WAP72 12")
Had a flashback there. (Score:5, Funny)
Re: (Score:2, Informative)
Re: (Score:3, Funny)
Re: (Score:2)
End of an era (Score:2, Interesting)
Re: (Score:3, Informative)
Intel actually tried to build a different leaner, instruction set. IA64, the market rejected it.
Via and AMD don't have much trouble implementing these instruction sets either, or adding their own, so this doesn't much represent a stranglehold move on Intel's part.
If you really want cheap small processors with no extra instruction sets, Intel does still make Celerons, I dare you to run Vista on one.
Re: (Score:3, Informative)
Actually the key patents on x86 probably run out soon. x64 has always been licensable from AMD. And an AMD or Intel x86/x64 chip has been at the top of the SpecInt benchmark for most of the last few years. Plus Itanium killed of most of the Risc architectures and x64 looks likely to kill off or nicheify Itanium.
Meanwhile NVidia are rumoured to be working on a Larrabee like chip of their own. Via have a ten year patent license, by which point the architecture is rather open. And Larrabee shows a chip with a
Re: (Score:2, Interesting)
Plus Itanium killed of most of the Risc architectures and x64 looks likely to kill off or nicheify Itanium.
This is misinformed B.S. Itanium didn't kill anything.
That was (and is) triumphant march of Linux/x64 all the time.
It is true that Intel and HP made out of PA-RISC and Alpha sacrificial lambs on Itanic's altar. Yet, Itanic's never caught up (and never will) to the levels where both PA-RISC and Alpha in the times were.
I bet a Larrabee like CPU would be great in a server too, and it's trivially highly scalable by changing the number of cores.
Servers are I/O heavy - CPU parallelism is very secondary. I doubt Larrabee would make any dent in server market. Unless of course OnLive/similar would catch up or Intel add something i
Re: (Score:2)
Servers are I/O heavy - CPU parallelism is very secondary
I take it you've never tried to run a large-scale J2EE app.
Re: (Score:2)
With same success, I can run "for(;;;);" in several threads and run all CPUs/cores to ground.
No matter what Java folks try to make out of it, Java on servers is pretty niche - precisely because of inefficient use of resources.
Server Java is of course not so niche in whole Java market. But not other way around.
Re: (Score:3, Informative)
Itanium killed high end MIPS years before anybody was talking about x64. You mentioned PA RISC, and Alpha was dead in-practice long before HP ever had it to officially declare it dead. Itanium killed a lot of good architectures.
Re: (Score:2)
That will depend on the server. Encryption could benifit from a Larrabee like System as could things like software RAIDs. With extra cpu power available software RAIDs and advanced file system like ZFS could replace hardware RAIDs everywhere.
Re:End of an era (Score:5, Informative)
The stock IBM Power6 5.0Ghz CPU is the fastest CPU on the specint benchmark on a per-core basis (and before that it was the 4.7Ghz model of the same CPU that was the leader).
http://www.spec.org/cpu2006/results/res2008q2/cpu2006-20080407-04057.html [spec.org]
Search for: IBM Power 595 (5.0 GHz, 1 core)
Which is telling considering it's made on a larger process than the fastest x86 (the i7). It really shows there's room for improvement if you ditch the x86 instruction set.
Re:End of an era (Score:4, Informative)
The Xeon X5570 is a quad core machine!
Re: (Score:3, Interesting)
And what about per-Watt basis? (honest question here; though I do suspect i7 is quite a bit more competive here)
Re: (Score:2)
Because there weren't any low-power 64-bit PowerPC processors at the time and Apple couldn't wait.
Re: (Score:2)
oh yeah...i did forget about the problems they had trying to get the g5 in laptop form while intel had the cool and efficient cores in the wings. thanks for reminding me. it doesn't make me any sadder to see the power go.
Re:End of an era (Score:5, Insightful)
The only reason Itanium is still hanging around like a bad smell is because companies like HP were dumb enough to dump their own perfectly good RISC CPUs on a flimsy promise from Intel, and now they have no choice.
Re:End of an era (Score:5, Funny)
So that is where the term "EPIC FAIL" comes from...
Re: (Score:2)
It's not that HP is dumb, it's greedy. HP owns something on the order of 50% of the IP that goes into an Itanium. If they can effectively block you from buying anything else, you buy into their patents. Intel is the other major patent holder.
Most of the patents for the Itanium are designed to make it impossible to produce an Itanium clone without violation the patents.
Re: (Score:2)
What I find interesting is that Intel tried this thing before, but it was called the iAPX 432 back in the '80s. It failed miserably back then, but is only somewhat more successful now.
Also, I think it was HP that approached Intel to make Itanium, not the other way around.
Re: (Score:2)
Fixed that for you.
Re:End of an era (Score:4, Informative)
Intel actually tried to build a different leaner, instruction set. IA64, the market rejected it.
It wasn't lean at all. It it typical over-complicated intel junk. Just look at the implementations: itanic. It's big, hot, expensive, slow...
If you really want cheap small processors with no extra instruction sets, Intel does still make Celerons, I dare you to run Vista on one.
The Celerons have all the same instructions as the equivalent "core" processors, they just have less cache usually.
This Larabee thing doesn't sound much different to what AMD (ATi) and nVidia already have. A friend of mine has done some CUDA programming and, form what he says, it sounds just the same. Just like a vector supercomputer from 10 years ago.
Re: (Score:2)
Re: (Score:2, Insightful)
This is a real market, and as it matures the average joe will find that it offers things that they want as well.
The fact is that as long as even a small market exists, that market can expand under its own momentum to fill roles that cannot be anticipated.
I certainly wasn't thinking that there was a market for hardware accelerated graphics 20 years ago, yet I'm sure to make sure thats in the syste
Re: (Score:2)
Re: (Score:3, Informative)
Here's a little secret:
Lots of games (maybe all of them) already include graphics-vendor-specific rendering engines. It's just that nowadays your graphics API isn't your whole game development toolset (glide), so it's easy to include support for both (all) vendors.
Re: (Score:2)
Because of this, chipmakers will probably continue to have at least one product for the high-e
Structural engineering welcomes this. (Score:5, Interesting)
As a structural engineering in training who is starting to cut his teeth in writing structural analysis software, these are truly interesting times in the personal computer world. Technologies like CUDA, OpenCL and maybe also Larrabee are making it possible to simply place in any engineer's desk a system capable of analysing complex structures practically instantaneously. Moreover, it will also push the boundaries of that sort of software beyond, making it possible to, for example, modeling composite materials such as reinforced concrete through the plastic limit, a task that involves simulating random cracks through a structure in order to get the value of the lowest supported load and that, with today's personal computers, takes hours just to run the test on a simple simply supported, single span beam.
So, to put this in perspective, this sort of technology will end up making it possible for construction projects to be both cheaper, safer and take less time to finish, all in exchange of a couple hundred dollars on hardware that a while back was intended for playing games. Good times.
Re: (Score:2, Insightful)
As a seasoned structural engineer (and PhD in numerical analysis), I hate to say this, but this is partly wishful thinking. Even an infinitely powerful computer won't remove some of the fundamental mathematical problems in numerical simulations. I will not start a technical discussion here, but just take some time to learn about condition numbers, for instance. Or about the real quality of 3D plasticity models for concrete, and the incredibly difficult task of designing and driving experiments for measuring
Re: (Score:2)
modeling composite materials such as reinforced concrete through the plastic limit
I wonder if that software could also improve animation, by making solid objects which look as if they actually have weight. Too many avatars seem to be hovering just above the ground because you don't see the forces being transmitted through their bodies.
Re: (Score:3, Interesting)
That's a problem with the animator. You don't need complicated software to make good animation--Toy Story should be sufficient evidence of that. You just need talent. Less and less talent these days, actually: if you're playing a game where the avatars are floating, it's because the designers don't give a^H^H^H^H^H^H^H care enough to simulate motion properly.
As an aside, realism is frequently not a goal in animation. You tend to run up against the uncanny valley: all the characters look like zombies. Realis
Re: (Score:2)
Hey, "A Scanner Darkly" was not painful to watch. Not for anyone except you. ^^
I have seen it with many people, and most of them liked it. Some of them did find it a bit slow/boring. But nobody found it to be painful.
So if you always presume you are talking just about your views, then I apologize. But if not, please stop stop assuming everybody has your point of view. :) :)
Thank you.
Re: (Score:2)
Kind of scary if you ask me. It sounds like you are trying to use simulation to reduce the margin of error you build into a structure. While that can be a good thing it isn't always. It puts a lot higher demand on quality control at the building site which is often outside the control of the engineer.
Kind of reminds of some really nice homebuild aircraft in the 80s. They used very low drag laminar flow airfoils. They where very fast and worked well. Soon some where falling out of the sky on take off. They
Re: (Score:3, Informative)
There's one drawback to the current crop of CPUs, though. More cores per die means less cache per core. So depending on what you're doing, this could actually degrade performance (all other things being equal) over older SMP machines.
Re: (Score:3, Informative)
When performing limit analysis, the lowest supported load calculated through the plastic limit (see limit analysis' upper bound theorem) is the lowest possible load that causes the structure to collapse. Then, if we compare it with the static limit of said structure (see limit analysis' lower bound theorem) we can pinpoint the exact resistance to failure of a structure and, from there, optimize it and make it safer. Which is a nice thing to do in terms of safety and cost.
Isn't that the "highest supported load"? (Score:2)
When performing limit analysis, the lowest supported load calculated through the plastic limit (see limit analysis' upper bound theorem) is the lowest possible load that causes the structure to collapse.
I think Anonymous Coward was trying to say that the layman's term for this load amount is the "highest supported load" that doesn't cause collapse.
Re: (Score:2)
I see what you mean. Nonetheless, what I meant instead of "lowest supported load" was the lowest supported plastic load, which basically means the load that leads a certain section of the structural element to stop increasing it's resistance proportionally to the applied load (i.e., break).
If Intel are smart they will mix Core and Larabee (Score:4, Insightful)
If Intel are smart they will release a chip containing one core (or 2 cores) from some kind of lower-power Core design and a pile of Larabee cores on the one die along with a memory controler and some circuits to produce the actual video output to feed to the LCD controler, DVI/HDMI encoder, TV encoder or whatever. Then do a second chip containing a WiFi chip, audio, SATA and USB (and whatever else one needs in a chipset). Would make the PERFECT 2-chip solution for netbooks if combined with a good OpenGL stack running on the Larabee cores (which Intel are talking about already).
Such a 2-chip solution would also work for things like media set top boxes and PVRs (if combined with a Larabee solution for encoding and decoding MPEG video). PVRs would just need 1 or 2 of whatever is being used in the current crop of digital set top boxes to decode the video.
As for the comment that people will need to understand how to best program Larabee to get the most out of it, most of the time they will just be using a stack provided by Intel (e.g. an OpenGL stack or a MPEG decoding stack). Plus, its highly likely that compilers will start supporting Larabee (Intel's own compiler for one if nothing else).
Re: (Score:2)
Which does raise a question - will Intel keep SSE if it adds in the Larrabee vector unit as yet another legacy feature? I'm guessing it will (sigh).
Re: (Score:2)
Yeah, most x86_64 ABI's use SSE for scalar floating point, so it's too late to remove it. But hey, at least SSE is an improvement over x87.
Re: (Score:3, Insightful)
I don't think we will see this in notebooks for a while. We need to wait and see what the real product looks like (Intel hasn't released any specs), but Google for Larrabee and 300W and you will see the scuttlebut is that this chip will draw very large amounts of power.
Re: (Score:3, Interesting)
Oddly enough your post ranks quite highly in that search. Drilling through the forums that show up reveal speculation that a 32-core Larrabee design will use 300W TDP, or roughly 10W per core. There doesn't seem to be any justification for that number although the Larrabee looks like Atom + stonking huge vector array. The Atom only uses 2W, it seems hard to believe that the 16-way vector array would use as much power for each FLOP as the entire Atom power budget to deliver that FLOP. Or perhaps it will, it'
Re: (Score:2)
Awww how sweet. Henk has registered a name troll just for me. Poor guy, that's a lot of issues for such a sweet child to carry around.
"GPU class rendering in software" (Score:2)
The claim that this is the first time you can get "GPU class rendering in software"... with nothing more than a pixel sampler to help is somewhat dubious. Modern GPUs are, after all a bunch of stream processors with a pixel sampler. So, really, modern GPU graphics is all in software except the sampling.
Oh, hey and anyone here remember the voodoo? That was a big (for the sime) sampler driven by an x86 CPU. Sound familiar?
Sarcasm aside, I want one. The peak performance is high, and the programming model is we
Re: (Score:2)
The claim that this is the first time you can get "GPU class rendering in software"... with nothing more than a pixel sampler to help is somewhat dubious. Modern GPUs are, after all a bunch of stream processors with a pixel sampler. So, really, modern GPU graphics is all in software except the sampling.
As I understand it, the key difference that makes the software running on Larrabee more like traditional "software" than NV's or ATI's offerings is that Intel is exposing these stream processors' instruction sets to let compiler writers compete on writing shader compilers.
Re: (Score:2)
ATI has been exposing their instruction set to everyone since R600
In that case, "software" might refer to Larrabee's use of an ISA so closely related to one that has already had plenty of research into optimization. Or it could mean an ISA that isn't so limited to the kind of processing that occurs in the sorts of vertex and pixel shaders that we have seen up until now.
Transcendental functions? (Score:3, Interesting)
Articles states that there's hardware support for transcendental functions, but the list of instructions doesn't include any. Anyone know what is/isn't supported in this line?
Re:Transcendental functions? (Score:5, Informative)
Articles states that there's hardware support for transcendental functions, but the list of instructions doesn't include any. Anyone know what is/isn't supported in this line?
"Hardware support" doesn't mean "fully implemented in hardware".
What hardware support do you need for transcendental functions?
1. Bit fiddling operations to extract exponents from floating point numbers. Check. 2. Fused multiply-add for fast polynomial evaluation. Check. 3. Scatter/gather operations to use coefficients of different polynomials depending on the range of the operand. Check.
Re: (Score:2)
I would guess that they would be the same transcendental functions supported by the other shader languages; Cg, GLSL and Renderman; sine, cosine, tan, asin, acos, atan, sinh, cosh, tanh, if not sincos as well. They are also going to need exp, log, exp2, log2, exp10 and log10. All of these will be required for statistical modeling of texture, 3D animation and image processing. Maybe they won't be vectorized, or maybe it will be possible to treat each 16-element vector as a matrix.
Re: (Score:2)
From the C++ prototype guide [intel.com], which is just the ISA made into a terribly complex C++ wrapper, they support these transcendental functions in the ISA:
EXP2_PS - Exponential Base-2 of Float32 Vector
LOG2_PS - Logarithm Base-2 of Float32 Vector
RECIP_PS - Reciprocal of a Float32 Vector
RSQRT_PS - Reciprocal of the Square Root of a Float32 Vector
They also provide library functions that implement everything else you'd want (sin, cos, etc) in software, I assume using Newton-Raphson iteration.
Excuse the Serenity reference... (Score:5, Funny)
isa (Score:2)
wtf does the international school of amsterdam have to do with this?
.... then a miracle occurs .... (Score:2)
Unless you are interested in a pretty small class of problems, the inherent parallelism of most applications continues to be somewhere in the range 2.1 to 2.5 (i.e., you can speed them up by a little over 2x with the addition of more processors). Thus, in most real-world applications, most of those cores, or vector units, or any other "supercomputer" features will go unused.
If anyone here
Re: (Score:2)
Well, virtualization: it's the driving force behind enterprise adoption of multicore technology today. Companies are eating down all the cores they can get. The appetite is so voracious that memory busses are well and truly stressed. Worse, no one really has any serious technological proposal to solving the memory b/w problem as we get to 16 cores or so.
C//
Re: (Score:2)
Re:Missed it by *that* much (Score:5, Insightful)
However, give the average developer more speed, and all that gets produced is more bloat with less speed. If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor! (smart code that actually executes quickly is generally too difficult for the dumb-arsed upper level (management) programmers to understand, and is thus removed. Believe me, I've seen this happen many times!)
What? Is 15GB that much for a base OS install? (Score:4, Informative)
Your post can be summarized as: Intel Giveth; Microsoft taketh away. That's been the formula for far too long.
And that period is almost over.
Re: (Score:2)
Fast code that doesn't work is not all that useful.
Except in search engines.
Re:Missed it by *that* much (Score:5, Funny)
It appears that this could well improve the speed of lots of different operations. A definite boon for graphics like operations, but also a lot of DSP (audio/maths)stuff can benefit from these enhancements. It would also appear that general code could easily be sped up, however, compiler writers need to get their collective arses into gear for this to happen.
Yeah, and while they are at it, I hope they finally get around to fixing that damn segfault bug. It's been around for YEARS.
It is more important (Score:5, Insightful)
Re:Missed it by *that* much (Score:5, Insightful)
If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor!
I don't see why it should be one or the other - maintainability is important, as is using optimal algorithms. Fast algorithms can still be written in a clear and understandable manner.
Re: (Score:2, Informative)
If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor!
I don't see why it should be one or the other - maintainability is important, as is using optimal algorithms. Fast algorithms can still be written in a clear and understandable manner.
Up to a point, then you've got to make a choice. Keep the high level OOP constructs, or flatten it out to make the compiler's job easier.
THEN you have the next level of optimization, keep the readable code or do it the "clever" way that nets a 40% boost. And as any experienced coder will tell you, clever code is the antithesis of maintainable.
Re: (Score:2)
If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor! (smart code that actually executes quickly is generally too difficult for the dumb-arsed upper (management)
No, we (management) understand your shifty obfuscated code just fine. It's just that you are too stupid to grasp basic economics.
One week of a developer's time, fully allocated, costs the same as a decent app server. So optimizing code for maintenance is far more cost effective than optimizing for performance.
Re: (Score:2)
compiler writers need to get their collective arses into gear for this to happen
There's a limit to how much general purpose C/C++ code can be sped up automatically; C/C++ semantics just don't allow a lot of optimizations.
Re: (Score:3, Interesting)
If you watch large teams of programmers, the management actually force the developers to write slow code, claiming that maintainability is more important than any other factor!
I've worked in a couple of companies like that - usually the programmers were limited to working on technology that the management (ex-programmers) were familiar with. Then also, management didn't want the programmers learning "high-demand skills" (ie. hardware programming) that would boost the chances of their staff leaving to a bett
Duh (Score:5, Interesting)
That's what libraries, toolsets and custom compilers are for. If the problem was just silicon we'd have Larrabee by now. What's holding up the train is the software toolchain and software licensing issues.
Don't worry, though. On launch day the tools will be mature enough to use, and game vendors will have new ray tracing games that look fabulous on nothing but this.
I'm hoping the tools will be open but that's a long bet. If they are, Microsoft is done as the game platform for the serious gamer and Intel will make billions as they take the entire graphics market. Intel will make hundreds of millions regardless and a bird in the hand is worth two in the bush, so they might partner in a way that limits their upside to limit their downside risk. That would be the safe play. We'll see if they still have the appetite for risk that used to be their signature. I'm hoping they still dare enough to reach for the brass ring.
Re: (Score:2)
Tools would mean dick and it would take long time for developers to actually adopt such tools to exploit the architecture.
As an example, take recent Sun's highly-multi-threaded CPUs - T1 and T2. Benchmark team ran our software (essentially message processing) on the CPUs and it is dog slow - unless you disable the CPU multi-threading. On other hand Java, have seen good boost to performance: because Sun's JIT already can on-the-fly optimize code for such architecture.
It is a long road for Larrabee bef
Re: (Score:3, Insightful)
Re: (Score:3, Interesting)
Re: (Score:2)
http://www.winosi.onlinehome.de/Ravi.htm [onlinehome.de]
Re: (Score:2)
Huh, that thinking sunk another of Intel's efforts, the Itanium. It was an architecture that required explicitly paralleled code (by the compiler). After sending the first samples to labs and universities and anybody interested in making a compiler for it, they thought everything would be good.
Except the awesome compilers didn't materialize. Itani
Re:Missed it by *that* much (Score:5, Informative)
If developers are too stupid to code for it, it won't go anywhere. This is sounding a lot like the PS3 architecture in complexity.
There are several problems with PS3 programming that don't apply to Larrabee:
* Non-uniform core architectures. Cell processors have two different instruction architectures depending on which core your code is intended to run on. This causes quite a bit of confusion and makes the tools for development a lot more complex.
* Non-uniform memory access. Most cell processor cores have local memory, and global memory accesses must be transferred to/from this local memory via DMA. Larrabee cores have direct access to main memory via a shared L2 cache.
* Memory size constrains. Most cell processor cores only have direct access to 256K of memory, so programs running on them have to be very tightly coded and don't have much spare space for scratch usage.
Any application that's reasonably parallelisable is going to be pretty easy to optimize for larrabee. Most graphics algorithms fit into this category.
Re:Isn't it high time for a 80x86 cleanup? (Score:4, Insightful)
The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.
Intel has a lot of smart people in their compilers group, and they've done stuff like this before in different times in the past. I wouldn't at all be surprised if they released compiler extensions to allow quick loading of data into the processing vectors.
Re:Isn't it high time for a 80x86 cleanup? (Score:5, Informative)
There are lots of instructions and other craft inside 80x86 processors that occupy silicon that is never used. A clean break from 80x86 is needed. Legacy 80x86 code can run perfectly in emulation (and need not be slow, using JIT techniques).
All the legacy junk takes up a pretty small fraction of the area. IIRC on a modern x86 CPU like Core2 or AMD Opteron, it's somewhere around 5%. Most of the core is functional units, register files, and OoO logic. For a simple in-order core like Larrabee the x86 penalty might be somewhat bigger, but OTOH Larrabee has a monster vector unit taking up space as well.
What I like most about Larrabee is the scatter-gather operations. One major problem in vectorized architectures is how to load the vectors with data coming from multiple sources. the Larrabee ISA solves this neatly by allowing vectors to be loaded from different sources in hardware and in parallel, thus making loading/storing vectors a very fast operation.
Yes, I agree. Scatter/gather is one of the main reason why vector supercomputers do very well on some applications. E.g. scatter/gather allows sparse matrix operations to be vectorized, and allows the CPU to keep a massive number of memory operations in flight at the same time, whereas sparse matrix ops tend to spend their time waiting on memory latency when you have just the usual scalar memory ops.
The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.
There is the "restrict" keyword in C99 precisely for this reason. It's not in C++ but most compilers support it in one way or another (__restrict, #pragma noalias or whatever). That being said, I'd imagine something like OpenCL would be a more suitable language for programming Larrabee than either C, C++ or Fortran. Functional lnaguages are promising for this as you say, of course, but it remains to be seen if they manage to break out of their academic ivory towers this time around.
Re:Isn't it high time for a 80x86 cleanup? (Score:5, Insightful)
The programming languages that will benefit from Larrabee though will not be C/C++.
Awwwww :-(
It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.
Oh. You mean like restrict which has been in the C standard for 10 years?
GCC supports it for C++ too. I'd be suprised if ICC and VS didn't support it for C++ too.
Re: (Score:2)
Seriously, can someone tell my mhy my post was a troll? The GP was rererring to the lack of a feature that C has had for 10 years. The C99 standard came out in 1999 and had the restrict keyword in it. This allows for optimizations on a par with FORTRAN since it provides the same guarantees.
I know it's fashionable to hate C++ and C overe here these days. Perhaps that's the problem.
Re: (Score:2)
Oh. You mean like restrict which has been in the C standard for 10 years?
Sorry, but "restrict" is not sufficient. Fortran has built in support for vectorization, parallelization, and efficient dynamic multi-dimensional arrays.
Re: (Score:2)
Sorry, but "restrict" is not sufficient.
For the most part, it is. There are some other minor things, but restrict closes the majority of the performance gap between C and Fortran. Oh yes, and C99 has some truly retarded pedantery wrt. complex arithmetic, so you might need to use some compiler option to get around that.
Fortran has built in support for vectorization, parallelization, and efficient dynamic multi-dimensional arrays.
If you're thinking of FORALL and the other stuff imported from HPF, well, there'
Re: (Score:2)
There are lots of instructions and other craft inside 80x86 processors that occupy silicon that is never used.
Rarely used instructions are not need to be optimized - then they would take very little of transistors to implement. Only heavily used instructions needs to be optimized.
The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages.
I hope you do understand that Fortran was fast because programs written in it were also simple. Modern programs combine lots and lots of math, memory and I/O operations. You can't easily parallelize that. Even now it can be already perfectly parallelized in C/C++, yet resulting software is quite complicated to manage and maintain.
It s
Re: (Score:2)
Re: (Score:3, Insightful)
What I wonder is why they haven't attempted to release two versions: an x86 version, and a stripped down RISC version without the x86 decoder.
If you looked at what Intel has been doing recently, the RISC code that x86 is translated to has been slowly evolving. For example, sequences of compare + conditional branch become a single micro op. Instructions manipulating the stack are often combined or not executed at all. So what is the perfect RISC instruction set today isn't the perfect RISC instruction set tomorrow. And Intel's RISC instruction set would likely be quite different from AMD's.
Re: (Score:2)
I have often wondered if a more orthogonal superset of the existing instruction set with a clean instruction encoding would be better. The JIT compiler would be dead simple since it would only have to translate. AMD apparently had that option when they designed x86-64 but decided full compatibility was more important.
Re: (Score:2)
You could put a hypervisor on a lower level for this functionality, but that brings its own can of worms, including figuring out how to pass communication from hardware devices to the operating systems. For example, which OS should set the proper settings on a USB toaster?
Re:WTF. I do not want moar x86. (Score:4, Interesting)
Isn't this exactly what Gallium3d + LLVM GLSL compiler is giving you? Heck, even with the simple shader ISA's you probably want an optimizing compiler anyway in order to get good GLSL performance, no?
Wouldn't this actually be a good thing; instead of spending all the time developing new drivers for each generation of hw (changing every 6 months, poorly if at all documented), you could just keep on developing the architecture and improve the x86 backend.
Re: (Score:2)
Will LLVM help with this? AFAIR, Gallium3D already supports rendering using Cell PPUs and Larrabee is going to look like them.
PS: thanks for your work on Gallium3D!
Re: (Score:2)
Seriously, most of the Mesa shader assemblers deal with very limited, simple, straightforward shader ISAs. This is icky. We're gonna need a full-on compiler for this
If you don't need the extra complexity of an x86 core, you can ignore it. Compilers for this system will be just as simple as compilers for current nvidia/ati designs.
Re: (Score:2)
Now if you want real fun, try getting good performance out of r600 and up Radeon cards. Nasty VLIW architecture with all sorts of strange and interesting restrictions
Re: (Score:3, Interesting)
This isn't really x86, in my opinion; it's x86 with a separate set of very obviously graphics-oriented instructions bolted on top. Since getting decent performance will require using the new instructions and a new programming model almost exclusively, what's the point of the x86 bit?
The point is that there's stuff those graphics-oriented instructions are really not very good at, like indirect memory referencing and branching logic, both of which x86 excels at handling. Now, that kind of workload isn't comm
Re: (Score:3, Insightful)
Re: (Score:2, Informative)
nVidia G80 is scalar in the sense it's not VLIW (like ATI is), but it still has 32-wide SIMD. (Likely to go to 16 in next generations). 32*16 is actually 512 bits too.
Doing a truly scalar architecture would have an enormous cost in instruction caches - you'd need to move as many instructions as data around the chip, and that won't be cheap. So SIMD is going to be around for a while.
Nice try, more research next time.
Re: (Score:2)
As I recall, ATI's SIMD cores work on 16 pieces of data at once (16 pixels, for example) whereas the internet information suggests that NVidia's work on 32.
Larrabee really isn't that revolutionary, by the way
Re: (Score:2)
Well, I'm pretty sure that Intel envisions DirectX being the driver for consumers, yes. However they are looking at GPGPU as a threat, and want to own that space lest it take off.
C//