AMD's OpenCL Allows GPU Code To Run On X86 CPUs 176
eldavojohn writes "Two blog posts from AMD are causing a stir in the GPU community. AMD has created and released the industry's first OpenCL which allows developers to code against AMD's graphics API (normally only used for their GPUs) and run it on any x86 CPU. Now, as a developer, you can divide the workload between the two as you see fit instead of having to commit to either GPU or CPU. Ars has more details."
Nice (Score:5, Interesting)
Re: (Score:2, Funny)
Good on them. Now how about an API that allows me to run GPU code on the GPU? The day I can play 1080p mkvs from a netbook on AMD/ATI hardware is the day I'll quit buying nvidia.
*Head Explodes*
Re:Nice (Score:5, Informative)
Re:Nice (Score:5, Informative)
AMD/ATI only offers GPU-accelerated decoding and presentation through the XvBA API, which is only available to their enterprise and embedded customers. People seem to always forget that fglrx is for enterprise (FireGL) people first.
Wait for the officially supported open-source radeon drivers to get support for GPU-accelerated decoding, or (God forbid!) contribute some code. In particular, if somebody would write a VDPAU frontend for Gallium3D...
Re:Nice (Score:4, Insightful)
I suppose I could have been clearer. I'm talking about gpu decoding of HD video, conspicuously absent on AMD drivers in Linux, fully functional on NVIDIA.
Fixed that for you. Or does installing Linux somehow magically unsolder the video decoding part of AMD's GPUs?
np: Death Cab For Cutie - Information Travels Faster (The Photo Album)
Re:Nice (Score:5, Funny)
does installing Linux somehow magically unsolder the video decoding part of AMD's GPUs?
I'm not going to lie to you; I don't know the answer to that question, and I'm not about to make any assumptions.
Re: (Score:2)
What the heck, this is /. so I can nitpick as much as I want.
The OP you referred to said "decoding of HD video ... absent on AMD hardware in Linux" not "from". There's a difference and it's enough to understand his statement correctly (as he meant it).
Re: (Score:2)
> decoding of HD video ... absent from AMD hardware in Linux
eh? Doesn't make any sense to me.
Re: (Score:2)
oh, now it does...when I put the right emphasis on and fill in the '...' with the right words (something like 'is' and group 'from' with 'absent' rather than 'AMD'.
never mind...
Re: (Score:3, Interesting)
Re: (Score:2)
We've come a long way in most respects, I'll give you that. Hardware accelerated HD playback on Linux too is happening, but I want it now, see?
When it comes to open source, I'm part of the pragmatist camp. Yeah, I totally prefer to use the stuff that's open, but then it has to be usable. AMD's video hardware is way more open than nvidia's if you believe the reports, yet time and again I'm disappointed by its poor real-world performance. As I implied earlier in this discussion, ATI has already won my heart,
Re: (Score:2)
Re: (Score:3, Insightful)
Damn, you beat me to it!
The problem now is the lack of applications that enable end users to make benefit from having a powerful GPU. This will be the case until there's a standard API which works across multiple GPU architectures. Having both CUDA and OpenCL is one too many
Re: (Score:2, Interesting)
That's hilarious. Maybe you should quit buying nvidia hardware, then.
Maybe I should be a little clearer: you should have quit buying nvidia hardware in September of 2008 [phoronix.com] , because hardware acceleration for video on Linux has been available since then, with the official AMD/ATI driver.
Re: (Score:2)
XvBA isn't yet usable by end-users on Linux
The API for XvBA isn't published yet and we are not sure whether it will be due to legal issues. We're told by a credible source though that X-Video Bitstream Acceleration wouldn't be much of a challenge to reverse-engineer by the open-source community.
Interesting, but not yet useful (unless you're able to reverse-engineer this type of code, and I'm not). I'm still looking forward to the day when ATI hardware is a viable alternative on Linux.
Re:Nice (Score:4, Interesting)
look back about a year, since AMD opened up specs & docs, the radeon drivers have become very usable for everyday stuff (maybe not HD video, compiz or games), but the stability blows any prop driver i have ever used (nvidia or flgrx) right out of the water.
For years linux users/developers have been claiming that we don't want drivers we just want open specs (without NDAs) and "we" would do the hard work. Well AMD have opened specs but it turns out when i say "we" i mean just the 2 guys who can be bothers, fortunately these guys are pretty fucking awesome so development is coming along smoothly but still lags behind what prop drives offer (in terms of performance anyway). Perhaps readon does not meet your needs but they it is defiantly viable alternative to nvidia for many uses!
Re: (Score:2)
Re: (Score:2)
If the lock has no key, then it cannot be locked.
Problem solved! :D
Isn't there a fundamental problem... (Score:2)
In that memory on the card is faster for the card GPU and memory on the CPU is faster than the CPU. Like, I know PC-Express speeds things up, but, is it that fast that you don't have to worry about the bottleneck of the system bus?
Re: (Score:2)
The GPU is there, now lets make it useful as often as possible. And if there is no GPU but two CPUs then with OpenCL we can use two the CPUs instead.
Re: (Score:2, Interesting)
IMO, the fundamental problem with OpenCL is the same as with OpenAL, which is that Operating System vendors don't provide a standard implementation as is done with OpenGL.
(Bus) speed isn't an issue as creating a CPU or GPU context requires a specific creation flag, so one would know what the target platform is.
Re: (Score:3, Interesting)
IMO, the fundamental problem with OpenCL is the same as with OpenAL, which is that Operating System vendors don't provide a standard implementation as is done with OpenGL.
It's still pretty early to say, though Apple provides an API for this with Snow Leopard. I don't know it OpenAL is a bad comparison or not, but as someone that does audio coding, OpenAL is the biggest joke of an API yet devised by man. OpenAL has little support because it's an awful and usless set of resources and features.
Re:Isn't there a fundamental problem... (Score:5, Interesting)
My main issues with OpenAL are that it is completely based around the concept of a "listener" interacting with sounds in "space." In other words, it's the OpenGL semantic applied to sound. I looked into it originally because I wanted something more system-independent than Apple's CoreAudio, but really OpenAL is just a videogame language, and it's focused completely around choreographing sounds for interactive emulation of space. OpenAL is hell if you want to apply a subjective effects aside from its pre-cooked spatial repertory, or even do something simple like build a mixer with busses.
In my line, film post-production, the users really don't want to control the "direction" and "distance" of a sound, they want to control the pan and reverb send of a sound; the language and the model is simply too high level for people who are used to setting their own EQ poles and their own pitch-shifts for doppler.... Most of the models OpenAL uses to create distance and direction sensations are pretty subjective, arbitrary, and not really based on current pychoacoustic modelling. It works to an extent, but it doesn't give a sound designer, of a videogame or anything else, the level of control over the environment they generally expect. It certainly doesn't give a videogame sound designer the level of control over presentation that OpenGL gives the modeller or shader developer.
Oh, and OpenAL doesn't support 96k, 24 bit audio, or 5.1 surround.
I admit I am not their target audeince, and I can see how OpenAL is sufficient for videogame developers, but it really is nothing more than sufficient, and unlike OpenGL, which universal enough that it can be used in system and productivity software, on computers, phones, and in renderfarms on everything from calendar software to animated movies, OpenAL is strictly for videogames only.
Re:Isn't there a fundamental problem... (Score:4, Insightful)
I admit I am not their target audeince, and I can see how OpenAL is sufficient for videogame developers, but it really is nothing more than sufficient, and unlike OpenGL, which universal enough that it can be used in system and productivity software, on computers, phones, and in renderfarms on everything from calendar software to animated movies, OpenAL is strictly for videogames only.
Um, yeah. I have only used it sparingly, but it has always been my understanding that OpenAL was a library for doing spatial audio, in particular for 3D games. I never got the impression that it was supposed to just be an arbitrary audio api. I never got the impression that it was supposed to be for anyone who wasn't specifically interested in spatial audio.
I mean there are plenty of other cross-platform sound libraries.
Is OpenAL seriously advertising itself as a general-purpose sound library akin to OpenGL these days? Is it suffering from feature/scope creep? Or is this just a case of picking the wrong tool for the job based on an understandable confusion regarding the OpenFoo nomenclature?
Re: (Score:2, Interesting)
As a related aside to this, how long before GPU's include a form of audio processing as well. We want to offload radiosity effects to video cards. GPGPU is one way, although specialized support for this that utilizes the graphics card's inherent knowledge of object positioning might be somewhat preferable
At the same time it might be beneficial to consider a similar, but slightly more general problem. Radiosity utilizes reflectivity "textures" to calculate final light levels. One could easilly imagine applyi
Re: (Score:2)
Did you guys actually talk to any sound designers when you designed this spec? There are so many other things you could have done, but instead you chased the chimera of "OpenGL for sound" or rather "a sound design API for people who hate sound design."
5.1 being implemetation-defined is unacceptable. The signal presented on spea
Re: (Score:2)
IMO, the fundamental problem with OpenCL is the same as with OpenAL, which is that Operating System vendors don't provide a standard implementation as is done with OpenGL.
(Bus) speed isn't an issue as creating a CPU or GPU context requires a specific creation flag, so one would know what the target platform is.
http://www.khronos.org/registry/cl/ [khronos.org]
Embrace and extend. So far I'm seeing C/C++ APIs and of course Apple extends their own with ObjC APIs.
What's stopping you from using the C APIs?
The Core Spec is akin to the OpenGL spec. The custom extensions for Intel, Nvidia and AMD will be based upon their design decisions they implement in their GPUs.
However, the CPU specs for Intel and AMD are there to leverage with OpenCL.
What else do you want?
Re: (Score:3, Informative)
So, you store the data the GPU is working on in the card's memory, and the data the CPU is working on in system memory.
yes, it is relatively slow to move between the two, but not so much that the one time latency incurred will eliminate the benefits.
Re: (Score:3, Interesting)
I've found that an O(n^3) algorithm or less should be run on cpu. The overhead of moving to gpu memory is just too high. The gen2 pci is faster, but that just means I do #pragma omp parallel for and set the number of processors to 2.
The comparisons of gpu and cpu code are not fair. They talk about highly optimised code for the gpu but totally neglect the cpu code (only use a O2 with the gcc compiler and that's it). On a E5430 Xeon, intel compiler and well written code, an O(n^3) or less is faster.
Re: (Score:3, Informative)
Not at all absurd. I realise that the gpu is a compute workhorse. That's not the issue. It is the data transfer rate to and from the card. Transferring 3GiB takes quite a while. Pulling the results back takes a while also. That's what kills it. The cpu can get the work done in that time.
I'm using the cuda blas routines, examples from the sdk and those published as 'glorious almighty' codes. Everything that the card does is timed as it is all time to solution.
Re: (Score:2)
I'd argue it is absurd, it's way too simplistic.
Lets say you have an O(n) algorithm, and a quad core CPU that can handle 4 billion instructions per second per core (16 billion IPS total for the CPU), on average, and the algorithm is highly scaleable.
Now, lets say the number of instructions per input is 1 million. That means 16 thousand inputs takes 1 seconds.
Now, with a GPU, you might have 128 effective cores, each of which can handle 500 million instructions per second, and each unit requires 2 billion ins
Re: (Score:2)
sorry for the reply to my own post, in the GPU section, I stated the units taking 2 billion instructions instead of 1, it should read 2 million instructions instead of 1.
Re: (Score:2)
I apologize. I have a lot of typos and slips of the finger. If that offends you, you might want to get off the internet and go outside for a while.
Re: (Score:3, Interesting)
Unless of course you have a device (like newer macbooks) with nvidia's mobile chipset, which shares system memory and can therefore take advantage of Zero-copy access [nvidia.com], in which case there is no transfer penalty because there is no transfer. A limited case, but useful for sure.
Re:Isn't there a fundamental problem... (Score:5, Interesting)
If your concern is shipping object code to a card to be processed may end up being so time consuming that it would not be worth it. Then I'd say that most examples of this kind of processing I've seen are doing some specific highly scalable task (e.g. MD5 hashing, portions of h264 decode). So clearly you have to do a cost/benefit like you would with any type of parallelization. That said, the cost of shipping code to the card is pretty small. So I would expect any reasonably repetitive task would afford some improvement. You're probably more worried about how well the code can be parallelized rather than the transfer cost.
Re: (Score:2)
If your concern is shipping object code to a card to be processed may end up being so time consuming that it would not be worth i
Not so much as the code but the data. If you have a giant array of stuff to crunch, then yeah, shipping it to the card makes sense. But if you have a lot of tiny chunks of data then, it may not make as much sense to ship it all over to the card. That same problem is really what haunts multicore designs as well - its like you can build a job scheduler that takes a list of jobs a
Re: (Score:2)
Unless of course we're taking about a bunch of chunks that are not going to be worked on simultaneously which goes back to my statement about the degree o
Re: (Score:2)
I'm guessing we'll soon get with GPUs what happened with FPUs. Remember FPUs? Maths Co-processors? 80387? A seperate chip that handled floating point ops because the CPU did have those in the instruction set. Eventually merged into the main CPU chip. GPUs: initially on a seperate card, but requiring and increasingly faster bus (GPUs have driven the development of high speed buses), now often on the mainboard (true, not top-of-the-line chips yet, but I suspect that has a lot to do with marketing rather than
The real benefit (Score:5, Insightful)
Re:The real benefit (Score:5, Funny)
Yes, that's the solution. Have your code run on any system, all too willing to be duped by street vendors, and blissfully unaware of the nefarious intentions of the guy waving candy from the back of the BUS.
Oh... you meant running code natively... I see.
Intel counters with CPU+GPU on a chip (Score:5, Interesting)
Ironically Intel announced that they are going to stop outsourcing their GPU's in Atom processors and include the gpu + cpu in one package, yet nobody knows what happened to the dual core Atom N270...
Re:Intel counters with CPU+GPU on a chip (Score:4, Insightful)
Microsoft wouldn't allow licensing dual cores on netbooks.
Re: (Score:2)
As far as I can tell, that's only regards Windows XP.
See this article [overclockers.com] (which, admittedly, its talking about a "nettop" box, not a netbook:
Got anything which specifically states that other OS's besides XP (which they've been trying to drop support on for a some time now) is restricted regards Dual Core?
Re: (Score:3, Interesting)
Re: (Score:2)
Makes sense (Score:4, Interesting)
Things have been slowly moving in this directly already, since game makers have not been using available cpu horsepower very effectively. A little z-buffer magic and there is no reason why the object space couldn't be separated into completely independent processing streams.
-Matt
Re: (Score:2)
How do you handle translucency when you have a Z buffer?
Use both at the same time? (Score:2, Interesting)
I haven't read too much of OpenCL (just a few whitepapers and tutorials) but does anybody know if you can use both the GPU and CPU at the same time for the same kind of task. For example, in a single "kernel", I want it done 100 times, I can send 4 to the quad-core CPU and the rest to the GPU? If so, this would be a big win for AMD.
Re: (Score:2)
I am pretty sure these are details for the implementation of OpenCL, not for client code. It is the very reason why libraries such as OpenGL/CL/AL/etc exists, so you don't have to worry about implementation details in your code.
From what I know of the spec, you would just create your kernel, feed it data, and execute it, the implementation will worry about sharing the work between the CPU and GPU to get optimal performance.
However, I don't think it would be optimal to have all 4 cores of the CPU running on
Overhyped (Score:5, Informative)
Having a separate compiler that doesn't integrate cleanly with the rest of your toolchain (i.e. uses a different intermediate representation preventing cross-module optimisations between C code and OpenCL) and doesn't integrate with the driver stack is very boring.
Oh, and the press release appears to be a lie:
AMD is the first to deliver a beta release of an OpenCL software development platform for x86-based CPUs
Somewhat surprising, given that OS X 10.6 betas have included an OpenCL SDK for x86 CPUs for several months prior to the date of the press release. Possibly they meant public beta.
Re: (Score:2)
Compiling OpenCL code as x86 is potentially interesting. There are two ways that make sense. One is as a front-end to your existing compiler toolchain (e.g. GCC or LLVM) so that you can write parts of your code in OpenCL and have them compiled to SSE (or whatever) code and inlined in the calling code on platforms without a programmable GPU. With this approach, you'd include both the OpenCL bytecode (which is JIT-compiled to the GPU's native instruction set by the driver) and the native binary and load the CPU-based version if OpenCL is not available. The other is in the driver stack, where something like Gallium (which has an OpenCL state tracker under development) will fall back to compiling to native CPU code if the GPU can't support the OpenCL program directly.
Having a separate compiler that doesn't integrate cleanly with the rest of your toolchain (i.e. uses a different intermediate representation preventing cross-module optimisations between C code and OpenCL) and doesn't integrate with the driver stack is very boring.
Oh, and the press release appears to be a lie:
AMD is the first to deliver a beta release of an OpenCL software development platform for x86-based CPUs
Somewhat surprising, given that OS X 10.6 betas have included an OpenCL SDK for x86 CPUs for several months prior to the date of the press release. Possibly they meant public beta.
I assume so OpenCL for ATI cards is heavens sent, since ATI seems to get nowhere with their custom shader language solutions, unlike NVidia which made heavy inroads with CUDA on the video codec front.
I am rather sick of having a powerhorse which rivals the best nvidia cards and yet all the codecs use CUDA for video coding acceleration!
Re: (Score:2)
Re: (Score:3, Informative)
Source: http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx [amd.com]
Being able to target both Windows and Linux is something outside Apple's platform scope.
GPUs are dying - the cycle continues (Score:3, Insightful)
Now that we have CPUs with literally more cores than we know what to do with, it makes sense to use those cores for graphics processing. I think that within a few years, we'll start seeing games that don't require a high-end graphics card- they'll just use a couple of the cores on your CPU. It makes sense, and is actually a good thing. Fewer discrete chips is better, as far as power consumption and heat, ease-of-programming and compatibility are concerned.
Re:GPUs are dying - the cycle continues (Score:4, Insightful)
A dedicated graphics processor will be faster than a general purpose processor. Yes, you could use an 8 core CPU for graphics, or you could use a 4 year old VGA. Guess which one is cheaper.
Re:GPUs are dying - the cycle continues (Score:4, Insightful)
Hey, my nVidia 9800GTX+ has over 120 processing cores of one form or another in one package..
Show me an Intel offering or AMD offering in the CPU market with similar numbers of cores in one package.
Re: (Score:2)
Technology fail.
Re: (Score:2)
the "same thing" basically means that if you have a branch in your code and some threads go one way, and some go the other way then those two are run sequentially.
the total time to run
code:
A
if B then X else Y
C
is a+b+x+y+c if all the threads don't take the same branch.
Re:GPUs are dying - the cycle continues (Score:4, Interesting)
For some games that'll be true, but I think it'll be a long time, if ever, before we see a CPU that can compete with a high end GPU especially as the bar gets higher and higher - e.g. physics simulation , ray tracing...
Note that a GPU core/thread processor is way simpler than a general purpose CPU core and so MANY more can be fit on a die. Compare an x86 chip with maybe 4 cores with something like an NVidea Tesla (CUDA) card which starts with 128 thread processors and goes up to 960(!) in a 1U format card! I think there'll always be that 10-100 factor more cores in a high end GPU vs CPU and for apps that need that degree of paralellism/power the CPU will not be a substitute.
Re: (Score:2)
Not any time soon (Score:5, Insightful)
I agree that the eventual goal is everything on the CPU. After all, that is the great thing about a computer. You do everything in software, you don't need dedicated devices for each feature, you just need software. However, even as powerful as CPUs are, they are WAY behind what is needed to get the kind of graphics we do out of a GPU. At this point in time, dedicated hardware is still far ahead of what you can do with a CPU. So it is coming, but probably not for 10+ years.
Re: (Score:2)
Re: (Score:2)
Simplicity and size. The less components we need, and the smaller they can be, the better. Ultimately, if programmers didn't NEED to split up their code to run on different processors, they wouldn't, because it just makes life harder. Having one chip that handles everything makes that so, and having an API that brings us closer to a place where that makes intuitive sense is a logical progression toward that end.
Re: (Score:3, Informative)
There's only two ways to do that:
Of course, you're reading
Re: (Score:2, Funny)
And so, the wheel [catb.org] starts another turn.
Funny, cus this is about GPU ascendency. (Score:2)
Now that we have CPUs with literally more cores than we know what to do with, it makes sense to use those cores for graphics processing. I think that within a few years, we'll start seeing games that don't require a high-end graphics card- they'll just use a couple of the cores on your CPU.
LOL. That's funny, because this is about exactly the opposite -- using the very impressive floating point number crunching power of the GPU to do the work that the CPU used to do. OpenCL is essentially an API for being
Who modded this insightful? (Score:2)
If history tells us anything, it's quite the opposite. For years, graphics cards have been getting more and more cores and applications (especially games or anything 3D) have come to rely on them much more than the CPU. I remember playing Half-life 2 with a 5 year old processor and a new graphics card...and it worked pretty well.
The CPU folk, meanwhile, are being pretty useless. CPUs haven't gotten much faster in the past 5 years; they just add more cores. Which is fine from the perspective of a multipr
Re: (Score:2)
Whatever.
My system was an Athlon 2200+ with 2 GB RAM and a GeForce 6800. HL2 ran just fine -- albeit not at max res/max detail. As another anecdote, even upgrading my brother's old box (Athlon64 3000+) to from a GF5500 to a GF7200 yielded tremendous performance gains.
Re: (Score:2)
Except that GPU architecture is pretty different from that of a CPU. IANAE(xpert), but from what I understand the GPU is very, very, parallel compared to a CPU thanks to how easily parallelized most graphics problems are. Though CPUs are gaining more cores, I think that the difficulty in parallelizing many problems places a practical limit on the CPU's parallelism.
That's not to say though that a GPU-type parallel core can't be integrated into the CPU package, however. I believe NVIDIA is doing some of th
Re: (Score:2)
Since when has NVidia sold CPU's?
Intel and AMD are doing this, and NVidia is going to be left in the dust. Why do you think they are shifting some of their focus to ultra-high end parallel processing tasks? NVidia is slowly moving away from the desktop market, or at least are building a safety net in case they get pushed out of it. Who knows, maybe they'll team up with VIA to produce a third alternative to the CPU/GPU combo.
Not exactly (Score:2)
For many problems, multi-core CPU's aren't even close to having enough power, that's why all of the interest in utilizing the GPU processing power.
They are different ends of a spectrum: CPU generally=fast serial processing, GPU generally=slow serial, fast parallel. Some problems require fast serial processing, some require fast parallel processing and some are in between. Both are valuable tools and neither will replace t
Re: (Score:2)
"Now that we have CPUs with literally more cores than we know what to do with, it makes sense to use those cores for graphics processing."
This comment is always trotted out by people who have no clue about hardware.
CPU's doing graphics are bandwidth limited by main memory (not to mention general architecture). Graphics requires insane bandwidth. GPU's have had way more main memory bandwidth then modern CPU's have had for a long time. There is simply no way CPU's will ever catch up to GPU's because the GP
Re: (Score:2)
It would like going back to the era of early DOS game programming where you just had the framebuffer, a sound function (sound), two keyboard input functions (getch/kbhit), and everyone wrote their own rendering code.
What's the story? (Score:3, Informative)
UniversCL (Score:2, Interesting)
Re: (Score:2)
... You were doomed to fail for multiple reasons. 'nearly done supporting the CPU and the Cell'. ... which CPU? ARM, x86, SPARC, PPC? Are you ignoring all the other implementations that already support OCL on x86?
If this comes as a blow to you, you didn't do any research before you started and I find it really hard to believe you haven't come across the other existing implementations in your research for your own project.
Re: (Score:2)
Firstly, "in theory, practice and theory are the same, in practice they're not." You'll learn from the process of implementing it, and if you provide your code (and a reasonable number of comments), what you've leaned will be available to other people who read your code.
Secondly, with the right licence, e.g. GPL, corporations won't (or shouldn't be able to) steal your code. If they do, you have legal grounds to sue them.
I don't know the architecture of the ATI/Nvidi
Re: (Score:2)
Why do you feel like you have to compete? Unless you went in expecting a profit, which is unlikely given the open nature of the project, you are contributing to the progress of humanity.
Now, give a proper license, your product will likely be used by a few, and maybe even included into the PS3 and other cell based systems, spreading your name far and wide. So, look at this as an advertisement opportunity. If you release your project soon, you could put on your resume that you were among the first OpenCL impl
Unsurprising (Score:2)
AMD obviously has a vested interest in making their scheme an industry standard, so of course they'd want to support Larrabee with their GPGPU stuff. Larrabee has x86 lineage (of some sort, I'm not clear on exactly what or how), so they'd have to have at least some x86 support to be able to use their scheme on Larrabee. It seems to me that if they were going to bake some x86 support in there, they may as well add regular CPUs in as well (if you already wrote 90% of it, why not write the other 10%?).
I don'
Bah. The Amiga did it already. (Score:2)
Exactly the same thing.
I said EXACTLY!
[wanders off, muttering and picking bugs out of beard]
Undermining Larrabee? (Score:2)
(Ars makes a similar point:)
the fact that Larrabee runs x86 will be irrelevant; so Intel had better be able to scale up Larrabee's performance
If AMD is working on a abstraction layer that lets OpenCL run on x86, could the reverse be in the works, having x86 code ported to run on CPU+GPGPU as one combined processing resource? AMD may be trying to make it's GPUs more like what Intel is trying to achi
Re:Optimization (Score:5, Funny)
Why would anyone ever want to do something well when they can fail at several things?
Re: (Score:2, Insightful)
Re:Optimization (Score:5, Insightful)
So now programmers can write code that will work on either processor and will be optimized on neither. Brilliant. I'm sure this is somehow a great step forward.
-sigh-
Um, what? How does the existence of a compiler that generates x86 code prevent the existence of an optimizing compiler that generate GPU instructions?
Re: (Score:2)
These types of changes aren't really optimizations the compile
Re: (Score:2)
Those types of change aren't all that radical, even though they're not commonly implemented in compilers at the moment, as far as I know.
You're not describing major algorithm changes, just reorganising data to suit different batching requirements, reorganising loops and so on.
Reorganising loops is decades old already.
Re: (Score:2)
It was already explained above. CPU and GPU are very different at handling things, meaning that top level algorithms used are very different.
Unless of course you can point at a compiler which can rethink and rewrite the program.
Re: (Score:2)
"Unless of course you can point at a compiler which can rethink and rewrite the program."
That's exactly what Lisp was invented for.
Pity we abandoned it in the 1980s and left it half-built.
Re: (Score:2)
Yeah, it's amazing how things that can generate executables on multiple platforms, things like C, are so amazingly slow.
Man, why did we ever stop using assembly?
Re: (Score:2)
For the kind of really high performance stuff OpenCL is targeted to, we didn't. Look at the low level code in GnuMP, for instance.
Re: (Score:2, Interesting)
Re:Optimization (Score:5, Insightful)
Re:Optimization (Score:5, Funny)
The SX is for Sux!
Re: (Score:2)
Re:DIDN'T APPLE COME UP WITH THIS ABOUT A YEAR AGO (Score:2, Informative)
Ok, I'll feed the troll (this time)
Anyway, Apple was one of the companies that first came up with the OpenCL standard. Apple worked with Khronos to make it a full standard. AMD is one of the first to publicly release a full implementation of OpenCL which is why this is big news.
Re: (Score:2)
This idea isn't new. CUDA allows you to execute your GPU code on the CPU. This is just AMD implenting OpenCl which afaik is sufficently new no one else has done this yet. I would have expected it to be another couple of months before we really saw NVIDIA and AMD start pushing OpenCL when they release new hardware. Obviously they're working on it already, it's just a matter of when anyone can do anything with it.
Re: (Score:3, Interesting)
I wouldn't be so sure on nVidia. They appear to think CUDA is a better system, and from what I've heard and seen, they're right. OpenCL appears to be more limited in scope and harder to optimize, partially due to OpenCL being written as a spec for abstract, heterogeneous hardware, while CUDA was written with the 8000+ series nVidia cards in mind. They'll probably eventually implement OpenCL, but I suspect it will take a back seat to CUDA.
OpenCL has advantages in larger systems (e.g. supercomputers built fr
Re: (Score:2)
CUDA's focus on the GPU often means the GPU does more work than an OpenCL program using both GPU and one or two CPU cores.
Do you have evidence for this statement? Code that you can share?
Re: (Score:2)
CUDA is the GLIDE of the GP-GPU movement. In the short term it may be highly attractive due to features, completeness, optimization, and so forth, and you'll see applications using it for this reason. In the long run it's a dead-end. Just like with rendering APIs, the winners will be one or both of the following: The open and cross-platform API, or the one Microsoft is creating.
Re: (Score:2)
So, where can one obtain an open source OpenCL compiler? (Or, to be more precise, an open source compiler which can take OpenCL compliant code and produce object code that will run on my GPU via the driver stack?)
The Gallium3D architecture, which is likely to be the driver architecture for 3D drivers for open source operating systems for the next few years, compiles a bytecode that is a bit lower-level than OpenCL to native GPU code. Gallium has a pluggable architecture that allows different front ends to be plugged in and an OpenCL state tracker (the part that handles API-specific semantics) is under development and should appear in the next version.
There is also a project to write an OpenCL front-end for LLVM.