Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Graphics Software Technology

The Wretched State of GPU Transcoding 158

MrSeb writes "This story began as an investigation into why Cyberlink's Media Espresso software produced video files of wildly varying quality and size depending on which GPU was used for the task. It then expanded into a comparison of several alternate solutions. Our goal was to find a program that would encode at a reasonably high quality level (~1GB per hour was the target) and require a minimal level of expertise from the user. The conclusion, after weeks of work and going blind staring at enlarged images, is that the state of 'consumer' GPU transcoding is still a long, long way from prime time use. In short, it's simply not worth using the GPU to accelerate your video transcodes; it's much better to simply use Handbrake, which uses your CPU."
This discussion has been archived. No new comments can be posted.

The Wretched State of GPU Transcoding

Comments Filter:
  • by Dahamma ( 304068 ) on Tuesday May 08, 2012 @07:24PM (#39935695)

    Actually, recent GPUs *were* meant to do exactly this type of thing, and have been marketed by Nvidia and ATI heavily for this purpose. Of course there needs to be a CPU as well. The CPU runs the operating system and application code, and offloads very specific, parallelizable work to the GPU. This sort of architecture has existed almost as long as modern CPUs have existed.

    And Quick Sync is even less of a general purpose CPU solution than using a GPU. Quick Sync uses dedicated application specific hardware on the die to do its encoding.

  • by billcopc ( 196330 ) <vrillco@yahoo.com> on Tuesday May 08, 2012 @07:57PM (#39935953) Homepage

    Well see, that's the thing. A GPU is better suited to some kinds of massively parallel tasks, like video encoding. After all, you're applying various matrix transforms to an image, with a bunch of funky floating point math to whittle all that transformed data down to its most significant/perceptible bits. GPUs are supposed to be really really good at this sort of thing.

    My hunch is that the problems we're seeing are caused by two big issues:

    1. lack of standardization across GPU processing technologies. CUDA vs OpenCL vs Quicksync, and a bunch of tag-alongs too. Each one was designed around a particular GPU architecture, so porting programs between them is non-trivial.

    2. lack of expertise in GPU programming. Let's be fair here: GPUs are a drastically different architecture than any PC or embedded platform we're used to programming. While I could follow specs and write an MPEG or H.264 encoder in any high-level language in a fairly straight-forward manner, I can't even begin to envision how I would convert that linear code into a massively parallel algorithm running on hundreds of dumbed-down shader processors. It's not at all like a conventional cluster, because shaders have very limited instruction sets, little memory but extremely fast interconnects. We have a hard enough time making CPU encoders scale to 4 or 8 cores, this requires some serious out-of-the-box thinking to pull off.

    Moving to a GPU virtually requires starting over from scratch. This is a set of constraints that are very foreign to the transcoding world, where the accepted trend was to use ever-increasing amounts of cheaply available CPU and memory, with extensively configurable code paths. The potential is there, but it will take time for the hardware, APIs and developer skills to converge. GPU transcoding should be seen as a novelty for now, just like CPU encoding was 15 years ago when ripping a DVD was extremely error-prone and time-consuming. If you want a quick, low quality transcode, the GPU is your friend. If you're expecting broadcast-quality encodes, you're gonna have to wait a few years for this niche to grow and mature.

  • by Hatta ( 162192 ) on Tuesday May 08, 2012 @08:01PM (#39936005) Journal

    What I don't understand is how this happens. Why would the same calculation get different results on different GPUs? Are they doing the math incorrectly?

  • by fuzzyfuzzyfungus ( 1223518 ) on Tuesday May 08, 2012 @09:07PM (#39936667) Journal
    What strikes me as a bad sign is not so much that the GPU transcoding doesn't necessarily produce massive speed improvements; but that the products tested produce overtly broken output in a fair number of not-particularly-esoteric scenarios.

    Expecting super-zippy magic-optimized speedups on all the architectures tested would be the mark of expecting serious maturity. Expecting a commercially released, GPU-vendor-recommended, encoding package to manage things like "Don't produce an h264 lossy-compressed file substantially larger than the DVD rip source file" and "Please don't convert a 24FPS video to 30FPS for no reason on half the architectures tested" seems much more reasonable.

    I can imagine that the subtle horrors of the probably-makes-the-IEEE-cry differences in floating point implementations, or their ilk, might make producing identical encoded outputs across architectures impossible; but these packages appear to be flunking basic sanity checks, even in the parts of the program that are presumably handled on the CPU(when a substantial portion of iPhone 4S handsets are 16GB devices, letting the 'iPhone 4S' preset return a 22GB file while whistling innocently seems like a bad plan...)
  • by Skarecrow77 ( 1714214 ) on Tuesday May 08, 2012 @09:14PM (#39936723)

    but, at least in this context, speed is nearly irrelevant because it fails at the task at hand, producing high quality video.

    who cares how fast it completes a task if it's failing? Nobody gives little jimmy props when he finishes the hour-long test in 5 minutes but scores a 37% on it.

  • by Anonymous Coward on Tuesday May 08, 2012 @09:22PM (#39936801)

    Nice shill paper you got there... Of course a paper made by the throughput computing lab and the Intel architecture group (both Intel corp) will advocate there's not much speedup by a GPU when compared to a CPU.

    The big thing to note is that with a GPU, you have to do what you did when working with the original SSE (intel...) instruction set on regular CPU's, FP16 numbers will not have a significant amount of precision, so you must take that into account when programming with that instruction set in mind. It's not as if people haven't been performing calculations with numbers bigger than the bit width of the cpu's instructions. Modern GPU are getting much beefier with double precision math as well

    the 5000 series Radeon's (not examined in the paper) have much better DP performance than the geforce GTX 280 compared. The Radeon 5970 for example has 18x the DP GFLOPS that the i7-960 has, and they both went to market at the same time. For SP data, the 5970 is 46x better than the i7-960.

Living on Earth may be expensive, but it includes an annual free trip around the Sun.

Working...