Transcoding in 1/5 the Time with Help from the GPU 221
mikemuch writes "ExtremeTech's Jason Cross got a lead about a technology ATI is developing called Avivo Transcode that will use ATI graphics cards to cut down the time it takes to transcode video by a factor of five. It's part of the general-purpose computation on GPU movement. The Aviva Transcode software can only work with ATI's latest 1000-series GPUs, and the company is working on profiles that will allow, for example, transcoding DVDs for Sony's PSP."
Re:This would be great for MythTV.. Linux support? (Score:3, Informative)
Pick up a 5200FX card (for SVIDEO/DVI output) and then use the GPU to do audio and video transcode. I have been thinking about audio (MP3) transcode as a first "trial" application.
"Heftier" GPUs may be used to assist in video transcode -- but it strikes me that the choice of stream programming system is most important (to allow code to move to other GPUs, driver permitting). I think that nVidia also supports developers using the GPU (there are comments and test results generated by nVidia available on the 'web). So far, not much from ATI, so I think nVidia gets the nod...
Ratboy.
Re:I'm rarely impressed... (Score:4, Informative)
Re:GPU advantages over CPU? (Score:5, Informative)
But don't take that out of context. Ask a GPU to compile the linux kernel [which is possible] and an AMD64 will spank it something nasty. *GENERAL* purpose processors are slower at these very dedicated tasks but at the same time capable of doing quite a bit with reasonable performance.
By the same token, a custom circuit can compute AES in 11 cycles [1 if pipelined] at 300Mhz which when you scale to 2.2Ghz [for your typical AMD64] amounts to ~80 cycles. AES on the AMD64 takes 260 cycles. But, ask that circuit to compute SHA-1 and it can't. Or ask it render a dialog box, etc...
Tom
lessons of "array processors" from 1980s (Score:4, Informative)
All these companies died mainly because the commodity computer makers could pump out new generations about three times faster and eventually catch up. And the general purpose software was always easier to maintain than the special purpose software. Perhaps graphics card software will buck this trend because its a much larger market than specialty scientific computing. The NVIDAS and ATIs can ship new hardware generations as fast as the Intels and AMDs.
Done in Roxio Easy Media Creator 8 (Score:2, Informative)
Re:Will all x1000 cards do this? (Score:2, Informative)
From the article (second page):
"The application only works with X1000 series graphics cards, and it only ever will. That's the only architecture with the necessary features to do GPU-accelerated video transcoding well."
Apple's core image (Score:4, Informative)
http://www.apple.com/macosx/features/coreimage/ [apple.com]
Re:This would be great for MythTV.. Linux support? (Score:3, Informative)
Linux Support (Score:3, Informative)
Re:GPU advantages over CPU? (Score:2, Informative)
Why, yes it would. GPUs fill a For one thing, about 90% of the transistors on a GPU are used for processing. About 60% on a CPU are used for processing (the rest is used for caching).
There are also many more transistors in GPUs these days than CPUs. Graphics processing is inherently parallel and streamed. That's what a GPU does very well, very fast. Grab 8 texture samples simultaneously each clock cycle, the next stage linearly blends these floating point values together in one clock. A CPU would have to work on each of those 8 one at a time.
For parallel, streamed operations - a GPU can speed up a process by 5 or 10 times, like this example. At SIGGRAPH this summer, they had a session on running a ray tracer on a GPU. After 15 minutes of explaining all of the optimizations they performed, they were happy to report that they were only 5x slower than a CPU implementation. Ray tracing is not a very parallel, streamed operation. Rays can bounce 10 times or maybe not a all.
So, let's review: GPUs are significantly faster than a CPU for graphics and some streamable parallelizable processes. CPUs are great for branchable, more random processing.
Re:Already available.. (Score:3, Informative)
FPGA's cheap. Synthesis EXPENSIVE. (Score:3, Informative)
- Tools: FPGA tools are getting better, but still suck compared to modern IDEs and software development. This might be me being jaded (VHDL can get nasty), some things like System C and others are in the infancy stage, but long ways to go here.
- Synthesis time: It can take DAYS on a very fast machine to run the synthesis that produces your design for the FPGA. Some designs work out to be impossible to synthesise; you might not find this out until hours or worse into the process. Then your whole design might have to change! Ha ha ha.
- Tool expense: The good tools cost a lot of money. The ones that can do good designs on the fly cost on the order of a new Ferrari or worse. Engineers that are framiliar with optimizing and implementing these tools and designs cost a lot too, but sadly, don't get to drive too many Ferraris. (me anyway!)
- CPUs and GPUs are heavily optimized and VERY VERY VERY fast for most tasks. In many cases it is cheaper to go buy a farm than implement on a FPGA, unless you are trying to do something very specialized. FPGAs are more often used for specialty communications brokering, timing, and interfacing tasks where bus speeds on a micro are too low.
Great idea in principle. Wouldn't hold my breath, however.
Re:funny about memory comments (Score:1, Informative)
Re:What I want to see. (Score:2, Informative)
Re:Already available.. (Score:3, Informative)
You haven't specified which FPGAs you're talking about, but at those prices, you should be getting more like 6 million gates or so (e.g. an XC2V6000 goes for about $4000). Perhaps you're looking at something like a Virtex-4 FX? If so, you should be aware that what you're looking at not only includes FPGA gates, but also includes a PowerPC core (or perhaps 2) as well.
It depends a bit on what you mean by high capacity and high speed. At around $100US, you can get a 1.5 million gate Spartan 3, or a somewhat smaller Virtex (which will generally run a bit faster).
These obviously aren't the biggest or fastest FPGAs available, but for the right kind of job, they'll still blow away a general purpose CPU pretty easily.
As far as ASICs vs. FPGAs goes, it's really not a contest: ASICs are fast, but have specific purposes. FPGAs are slower, but can be programmed. Given the idea originally stated in this thread, ASICs simply don't seem (to me) like contenders at all.
--
The universe is a figment of its own imagination.
Re:Already available.. (Score:3, Informative)
Ummm... Okay, here is a quote from the original post again... by LWATCDR:
And then a quote from tomstdenis:
tom then goes on to talk about PPC versus FPGA's, as if LWATCDR weren't talking about ASICs. Since this conversation now ivolves so many people, I hope I've quoted clearly enough. Anyhow, the explanation that an FPGA can be faster than a general purpose CPU was correct, but a complete non-sequitor from LWATCDR's point that ASICs are faster than FPGA.
I do agree that the basic question of general-purpose CPU vs. other is relevant to the article. I couldn't quite bring myself to claim he was off-topic, just non-sequitor. Now, to get to something interesting you said, rather than just picking nits about who said what...
photon37:
I like the basic idea, but I'm not so sure how well dynamically sharing a function between an FPGA and a CPU would work in practice. In theory, it would be like a thread migrating from one normal CPU to another.
Since FPGAs have significant latency when they are reconfigured, I have to think that you wouldn't really want the kernel to be dynamically deciding which app gets FPGA time, and which gets CPU time. I think a better interface would be that the programmer has to manually write an optimisation for an FPGA. This eliminates the need to have automatically generated matching functions in hardware and software. The programmer can decide that the specific functions X, Y, and Z should be able to run on an FPGA if it is available. Whether the FPGA programming code comes from a C compiler or from hand coded FPGA specific stuff doesn't matter. There should be some standard interchange format for the FPGA data. gcc should be able to take some C code an output FPGA intermediate programming data from it.
Then, at run time, an app can do something like:
register_fpga(num_gates_needed, programming_data, function_name);
with a matching "unregister_fpga(function_name)"
These two functions would act sort of like a malloc and free for the FPGA, so the kernel could choose to assign any given function to any of the 0 or more FPGA's in the system. Many applications could each allocate chunks of the FPGA(s). The application itself just needs to call the function by a function pointer, so it can use the hardware or software version by swapping the value of the pointer. (Just like we do now with code bases that have optimised versions of functions for various SIMD variants)
calc_foo = soft_calc_foo;
With each app only registering the functions that it knows will most benefit from optimisation, there will be more room on the FPGA hardware for other apps...
Re:Already available.. (Score:2, Informative)
> There should be some standard interchange format for the FPGA data. gcc should be able to take some C code an output FPGA intermediate programming data from it.
Smile! This stuff already exists for years:
You just have to build a library that
Everything else is already there.
The problem is that FPGA-boards are pretty expensive... (The least expensive [altera.com] i found was some 66MHz devboard for 150$. The most expensive [altera.com] had 500MHz and a price tag of ~7000$!! [including a ton of golden analog contacts and stuff ;])