Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Graphics Software Hardware

Transcoding in 1/5 the Time with Help from the GPU 221

mikemuch writes "ExtremeTech's Jason Cross got a lead about a technology ATI is developing called Avivo Transcode that will use ATI graphics cards to cut down the time it takes to transcode video by a factor of five. It's part of the general-purpose computation on GPU movement. The Aviva Transcode software can only work with ATI's latest 1000-series GPUs, and the company is working on profiles that will allow, for example, transcoding DVDs for Sony's PSP."
This discussion has been archived. No new comments can be posted.

Transcoding in 1/5 the Time with Help from the GPU

Comments Filter:
  • GPU Stream programming can be done with Brook http://graphics.stanford.edu/projects/brookgpu/ [stanford.edu]. Brook supports the nVidia series, so that is what you purchase.

    Pick up a 5200FX card (for SVIDEO/DVI output) and then use the GPU to do audio and video transcode. I have been thinking about audio (MP3) transcode as a first "trial" application.

    "Heftier" GPUs may be used to assist in video transcode -- but it strikes me that the choice of stream programming system is most important (to allow code to move to other GPUs, driver permitting). I think that nVidia also supports developers using the GPU (there are comments and test results generated by nVidia available on the 'web). So far, not much from ATI, so I think nVidia gets the nod...

    Ratboy.
  • by Dr. Spork ( 142693 ) on Wednesday November 02, 2005 @02:04PM (#13933747)
    You don't get it. ATI is not releasing a new encoder. The test used standard codecs, which do the very same work when assisted by the GPU, only 5X faster.
  • by tomstdenis ( 446163 ) <tomstdenis AT gmail DOT com> on Wednesday November 02, 2005 @02:11PM (#13933838) Homepage
    GPUs are massively parallel DSP engines. That makes them ideally suited for the task. They can do things like "let's multiply 8 different floats in parallel at once". Which is useful when doing transforms like the iDCT or DCT which are capable of taking advantge of the parallelism.

    But don't take that out of context. Ask a GPU to compile the linux kernel [which is possible] and an AMD64 will spank it something nasty. *GENERAL* purpose processors are slower at these very dedicated tasks but at the same time capable of doing quite a bit with reasonable performance.

    By the same token, a custom circuit can compute AES in 11 cycles [1 if pipelined] at 300Mhz which when you scale to 2.2Ghz [for your typical AMD64] amounts to ~80 cycles. AES on the AMD64 takes 260 cycles. But, ask that circuit to compute SHA-1 and it can't. Or ask it render a dialog box, etc...

    Tom
  • by peter303 ( 12292 ) on Wednesday November 02, 2005 @02:25PM (#13933975)
    In the scientific computing world there have been several episodes where someone comes up with a attached processor an order of magnitude faster than a general purpose CPU and try to get the market to use it. Each generation improved the programming interface eventually using some subset of C (now Cg) combined with a preprogrammed routine library.

    All these companies died mainly because the commodity computer makers could pump out new generations about three times faster and eventually catch up. And the general purpose software was always easier to maintain than the special purpose software. Perhaps graphics card software will buck this trend because its a much larger market than specialty scientific computing. The NVIDAS and ATIs can ship new hardware generations as fast as the Intels and AMDs.
  • by Anonymous Coward on Wednesday November 02, 2005 @02:25PM (#13933977)
    fyi this is already done by Roxio in Easy Media Creator 8. they offload a lot of the rendering or transcoding to GPUs that support it. for those that are older they have a software fallback. probably not an increase by such a large factor but still a significant boost on newer PCI-E cards.
  • by freakyfreak2 ( 613574 ) <jeff.j-maxx@net> on Wednesday November 02, 2005 @02:26PM (#13933992) Homepage Journal
    It is very specific about this
    From the article (second page):
    "The application only works with X1000 series graphics cards, and it only ever will. That's the only architecture with the necessary features to do GPU-accelerated video transcoding well."
  • Apple's core image (Score:4, Informative)

    by acomj ( 20611 ) on Wednesday November 02, 2005 @02:29PM (#13934022) Homepage
    some of Apple's apis (core video/core image/core audio) use the gpu when it detects a supported card, otherwise it just uses the cpu, seemlessly and without fuss. So this isn't new.

    http://www.apple.com/macosx/features/coreimage/ [apple.com]

  • by EpsCylonB ( 307640 ) <eps AT epscylonb DOT com> on Wednesday November 02, 2005 @02:34PM (#13934080) Homepage
    When I got my 6600gt the box that it came said it could do hardware mpeg2 encoding, obviously this is not the case. I remember reading somewhere that nvidia orginally wanted the 6XXX series to be able to do loads of on board video stuff but they couldn't get it working on time. Its a real shame.
  • Linux Support (Score:3, Informative)

    by Yerase ( 636636 ) <randall DOT hand AT gmail DOT com> on Wednesday November 02, 2005 @03:03PM (#13934377) Homepage
    There's no reason there couldn't be Linux Support. At the IEEE Viz05 Conference there was a nice talk from the guys operating www.gpgpu.org about cross-platform support, and there's a couple of new languages coming out that act as wrappers for Cg/HLSL/OpenGL on both ATI & NVidia, & Windows & Linux... Check out Sh (http://libsh.sourceforge.net/ [sourceforge.net] and Brook (http://brook.sourceforge.net./ [brook.sourceforge.net] Once their algorithm is discovered (Yipee for Reverse engineering), it won't be long.
  • by DotDotSlasher ( 675502 ) on Wednesday November 02, 2005 @03:20PM (#13934520)
    Wouldn't it just be easier to have multiple CPUs?
    Why, yes it would. GPUs fill a For one thing, about 90% of the transistors on a GPU are used for processing. About 60% on a CPU are used for processing (the rest is used for caching).
    There are also many more transistors in GPUs these days than CPUs. Graphics processing is inherently parallel and streamed. That's what a GPU does very well, very fast. Grab 8 texture samples simultaneously each clock cycle, the next stage linearly blends these floating point values together in one clock. A CPU would have to work on each of those 8 one at a time.
    For parallel, streamed operations - a GPU can speed up a process by 5 or 10 times, like this example. At SIGGRAPH this summer, they had a session on running a ray tracer on a GPU. After 15 minutes of explaining all of the optimizations they performed, they were happy to report that they were only 5x slower than a CPU implementation. Ray tracing is not a very parallel, streamed operation. Rays can bounce 10 times or maybe not a all.
    So, let's review: GPUs are significantly faster than a CPU for graphics and some streamable parallelizable processes. CPUs are great for branchable, more random processing.
  • by forkazoo ( 138186 ) <wrosecrans@@@gmail...com> on Wednesday November 02, 2005 @03:28PM (#13934592) Homepage
    Ummm... Comparing a general purpose CPU to an FPGA is a bit odd. The grand-parent post was talking about ASIC's vs. FPGA's. An ASIC can impliment exactly the same structure as an FPGA, so it can work just as efficiently, but an ASIC can be made to clock higher than an FPGA. Somebody mod the parent post "non-sequitor."
  • by xtal ( 49134 ) on Wednesday November 02, 2005 @03:31PM (#13934629)
    Unfortunately, there are a few problems with this scenario in practice that prevent it from becoming widespread. I worked on optimizations with VHDL destined for FPGA's in a prior life.

    - Tools: FPGA tools are getting better, but still suck compared to modern IDEs and software development. This might be me being jaded (VHDL can get nasty), some things like System C and others are in the infancy stage, but long ways to go here.

    - Synthesis time: It can take DAYS on a very fast machine to run the synthesis that produces your design for the FPGA. Some designs work out to be impossible to synthesise; you might not find this out until hours or worse into the process. Then your whole design might have to change! Ha ha ha.

    - Tool expense: The good tools cost a lot of money. The ones that can do good designs on the fly cost on the order of a new Ferrari or worse. Engineers that are framiliar with optimizing and implementing these tools and designs cost a lot too, but sadly, don't get to drive too many Ferraris. (me anyway!)

    - CPUs and GPUs are heavily optimized and VERY VERY VERY fast for most tasks. In many cases it is cheaper to go buy a farm than implement on a FPGA, unless you are trying to do something very specialized. FPGAs are more often used for specialty communications brokering, timing, and interfacing tasks where bus speeds on a micro are too low.

    Great idea in principle. Wouldn't hold my breath, however.
  • by Anonymous Coward on Wednesday November 02, 2005 @03:33PM (#13934639)
    The memory used on a video card is usually clocked far higher than the DDR400 used in the articles referenced. So, yes, there is reason to brag there.
  • by TooMuchToDo ( 882796 ) on Wednesday November 02, 2005 @06:51PM (#13936415)
    I use to play with this idea 4-5 years ago. A small team was going to look into building FPGA PCI boards that could be used with http://www.distributed.net/ [distributed.net] to help crack DES/RC5/*insert-your-choice-encryption-here*.
  • by Jerry Coffin ( 824726 ) on Wednesday November 02, 2005 @06:56PM (#13936468)
    There's another problem with general-purpose FPGAs. (order of magnitude comparison only): Athlon 64 4000 (from pricewatch): $330 Xilinx 2.4Million gate design (from digikey): $2100-$5000.

    You haven't specified which FPGAs you're talking about, but at those prices, you should be getting more like 6 million gates or so (e.g. an XC2V6000 goes for about $4000). Perhaps you're looking at something like a Virtex-4 FX? If so, you should be aware that what you're looking at not only includes FPGA gates, but also includes a PowerPC core (or perhaps 2) as well.

    The computing world would look a lot different if there were good $100 high-speed, high-capacity FPGAs. Now, I wouldn't argue with a good ASIC or highspeed DSP implementation for some algorithms...

    It depends a bit on what you mean by high capacity and high speed. At around $100US, you can get a 1.5 million gate Spartan 3, or a somewhat smaller Virtex (which will generally run a bit faster).

    These obviously aren't the biggest or fastest FPGAs available, but for the right kind of job, they'll still blow away a general purpose CPU pretty easily.

    As far as ASICs vs. FPGAs goes, it's really not a contest: ASICs are fast, but have specific purposes. FPGAs are slower, but can be programmed. Given the idea originally stated in this thread, ASICs simply don't seem (to me) like contenders at all.

    --
    The universe is a figment of its own imagination.

  • by forkazoo ( 138186 ) <wrosecrans@@@gmail...com> on Wednesday November 02, 2005 @08:42PM (#13937264) Homepage
    Photon317 writes:
    The original post never mentioned ASICs that I saw.

    Ummm... Okay, here is a quote from the original post again... by LWATCDR:
    I have seen a combo FPGA/PPC chip for embedded applications. The issue I see with this is how long would it be useful? FPGAs are slower then ASICs.

    And then a quote from tomstdenis:
    FPGAs aren't always slower than what you can do in silicon.

    tom then goes on to talk about PPC versus FPGA's, as if LWATCDR weren't talking about ASICs. Since this conversation now ivolves so many people, I hope I've quoted clearly enough. Anyhow, the explanation that an FPGA can be faster than a general purpose CPU was correct, but a complete non-sequitor from LWATCDR's point that ASICs are faster than FPGA.

    I do agree that the basic question of general-purpose CPU vs. other is relevant to the article. I couldn't quite bring myself to claim he was off-topic, just non-sequitor. Now, to get to something interesting you said, rather than just picking nits about who said what...
    photon37:
    The idea of sticking one or more FPGAs into a machine via an I/O bus certainly has merit. I think the main issue is that we don't have compiler toolchains, libraries, and kernels ready to take advantage of it in an intelligent way. The biggest problem is that the FPGA computations need to be able to fallback to the general purpose CPU, which has an entirely different instruction set. A method that might be used, for example, would be to wrap things up such that in the application source code you have two functions with identical call signatures and supposedly identical behavior - one is for the cpu, the other is for fpga offloading. Then the runtime linker and the kernel can work magic together and schedule applications any time they need on the FPGAs and dynamically cause an application to fallback to its cpu code as well.

    I like the basic idea, but I'm not so sure how well dynamically sharing a function between an FPGA and a CPU would work in practice. In theory, it would be like a thread migrating from one normal CPU to another.

    Since FPGAs have significant latency when they are reconfigured, I have to think that you wouldn't really want the kernel to be dynamically deciding which app gets FPGA time, and which gets CPU time. I think a better interface would be that the programmer has to manually write an optimisation for an FPGA. This eliminates the need to have automatically generated matching functions in hardware and software. The programmer can decide that the specific functions X, Y, and Z should be able to run on an FPGA if it is available. Whether the FPGA programming code comes from a C compiler or from hand coded FPGA specific stuff doesn't matter. There should be some standard interchange format for the FPGA data. gcc should be able to take some C code an output FPGA intermediate programming data from it.

    Then, at run time, an app can do something like:
    register_fpga(num_gates_needed, programming_data, function_name);

    with a matching "unregister_fpga(function_name)"

    These two functions would act sort of like a malloc and free for the FPGA, so the kernel could choose to assign any given function to any of the 0 or more FPGA's in the system. Many applications could each allocate chunks of the FPGA(s). The application itself just needs to call the function by a function pointer, so it can use the hardware or software version by swapping the value of the pointer. (Just like we do now with code bases that have optimised versions of functions for various SIMD variants)

    calc_foo = soft_calc_foo; // or calc_foo = fpga_calc_foo, or sse_calc_foo, or altivec_calc_foo

    With each app only registering the functions that it knows will most benefit from optimisation, there will be more room on the FPGA hardware for other apps...
  • by Hurricane78 ( 562437 ) <deleted @ s l a s h dot.org> on Thursday November 03, 2005 @03:27AM (#13939139)

    > There should be some standard interchange format for the FPGA data. gcc should be able to take some C code an output FPGA intermediate programming data from it.

    Smile! This stuff already exists for years:

    You just have to build a library that

    • shoves "compiled" logic chunks to the chip
    • uses the FPGA-board's upload functionality as a pluggable driver
    • does the resource management.

    Everything else is already there.

    • You can get some FPGA developer board [altera.com] to develop and test your library:
    • You can use SPARK [ucsd.edu] to compile your C-code to VHDL [wikipedia.org].
    • I guess VHDL can be uploaded directly to the FPGA. If not maybe stuff like gEDA [seul.org] or similar stuff for VHDL helps...
    • I am a total n00b in things of hardware design, but i found this in 1-2 hous of investigation and reading via wikipeda.

    The problem is that FPGA-boards are pretty expensive... (The least expensive [altera.com] i found was some 66MHz devboard for 150$. The most expensive [altera.com] had 500MHz and a price tag of ~7000$!! [including a ton of golden analog contacts and stuff ;])

8 Catfish = 1 Octo-puss

Working...