Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Graphics Software Technology

The Wretched State of GPU Transcoding 158

MrSeb writes "This story began as an investigation into why Cyberlink's Media Espresso software produced video files of wildly varying quality and size depending on which GPU was used for the task. It then expanded into a comparison of several alternate solutions. Our goal was to find a program that would encode at a reasonably high quality level (~1GB per hour was the target) and require a minimal level of expertise from the user. The conclusion, after weeks of work and going blind staring at enlarged images, is that the state of 'consumer' GPU transcoding is still a long, long way from prime time use. In short, it's simply not worth using the GPU to accelerate your video transcodes; it's much better to simply use Handbrake, which uses your CPU."
This discussion has been archived. No new comments can be posted.

The Wretched State of GPU Transcoding

Comments Filter:
  • by Anonymous Coward

    I've heard from a lot of sources that the quality of output from various GPU accelerated video encoding schemes almost invariably lacks when compared to an established, known good CPU based video encoding scheme. When the GPU encoders can match quality, will they still be fast? Are they just cheating now? What gives?

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Yes, they are cheating. That is exactly how they are getting it to be so fast.

    • has anyone tried Badaboom?

    • by Hatta ( 162192 ) on Tuesday May 08, 2012 @07:01PM (#39936005) Journal

      What I don't understand is how this happens. Why would the same calculation get different results on different GPUs? Are they doing the math incorrectly?

      • by cheesybagel ( 670288 ) on Tuesday May 08, 2012 @07:11PM (#39936119)
        Hint: Not all GPUs have IEEE FP compliant math. Often they break the standard, or do something else altogether just to improve performance.
        • CPUs used to be like that too.
        • by ville ( 29367 )

          Evem with IEEE things aren't that simple: http://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/ [wordpress.com]

          // ville

        • by parlancex ( 1322105 ) on Tuesday May 08, 2012 @09:55PM (#39937371)

          Hint: Not all GPUs have IEEE FP compliant math. Often they break the standard, or do something else altogether just to improve performance.

          I can't speak for ATI, but actually all FP32 math on Nvidia architectures for many generations now has been IEEE compliant, excluding NAN and -inf +inf and exception handling cases, and except for their hardware sin, cos, log implementations, and except when using the fused multiply add instruction (though the last one you could actually get around by using special compiler intrinsics to avoid the fusing).

        • The math units on every nVidia card made since at least late 2009, both single and double precision, are ieee754 compliant. The only excuse for it being wrong is that someone deliberately used the __fast non-primitive operations (sqrt/log/exp & friends), which compromise the algorithms used to compute transcendental operations. The exact extent of the compromise is detailed in the back of the nVidia CUDA guide.

          I agree it would be pathetic if this were because someone passed -ffast-math or whatever it
      • by TD-Linux ( 1295697 ) on Tuesday May 08, 2012 @07:19PM (#39936187)
        Because behind the scenes your "encoder" program is actually using several different encoders. Generally the encoder has to be custom written specifically for the specialized GPU hardware it is targeting.
        • by pla ( 258480 ) on Tuesday May 08, 2012 @07:56PM (#39936587) Journal
          Because behind the scenes your "encoder" program is actually using several different encoders. Generally the encoder has to be custom written specifically for the specialized GPU hardware it is targeting.

          This has largely ceased to present a problem, thanks to OpenCL.

          GPU code no longer needs to run as custom-written shaders targetting 20 different platforms. One program, written in fairly straightforward C, will run on just about any modern platform. And it will do so at speeds that absolutely dwarf a CPU - The Radeon x9yy cards (for x>=5) easily crush a modern CPU at OpenCL code by a factor of a thousand. The x8yy cards still perform admirably, over three hundred to one. For NVidia, the Tesla series do well, while the GX... Well, ten to fifty times faster doesn't exactly suck...


          The real problem here? Most people have really crappy GPUs. Even compared to the $100 card range, your GPU sucks ass, and hard. And you can't really blame people, because honestly, even modern IGPs will run just about anything fairly well, so why would you pay for more?


          But don't blame the GPUs, or the concept in general. If you target OpenCL and the user has a halfway decent modern GPU, it will give consistent, reliable results, and will blow away your CPU many times over.
          • by Skarecrow77 ( 1714214 ) on Tuesday May 08, 2012 @08:14PM (#39936723)

            but, at least in this context, speed is nearly irrelevant because it fails at the task at hand, producing high quality video.

            who cares how fast it completes a task if it's failing? Nobody gives little jimmy props when he finishes the hour-long test in 5 minutes but scores a 37% on it.

            • by pla ( 258480 ) on Tuesday May 08, 2012 @08:28PM (#39936845) Journal
              who cares how fast it completes a task if it's failing? Nobody gives little jimmy props when he finishes the hour-long test in 5 minutes but scores a 37% on it.

              I agree that presents something of a problem for current implementations; the concept of GPU transcoding doesn't fail, however. Only the fact that those currently pushing it have tried to show at least modest gains for everbody - meaning those with massively inappropriate hardware - has made it such an abysmal failure to date.

              To repeat my earlier post, if you target an OpenCL-capable GPU, you will get consistent results; and if you target a card with a reasonable number of compute units, (58xx/59xx/68xx/69xx/tesla), you'll see performance far beyond what a modern CPU can give.

              Does that make GPU transcoding the best choice for the general public at present? No! But for those with the hardware, the comparison counts as literally laughable.
              • I've got a first generation fermi-based GTX 470. Considering that, at the time, the parallel compute power was the big halo selling point of the new fermi gpu, I was very underwhelmed when I finally found some software that would actually use it. I saw speedups of only about 3x or so above and beyond my core 2 duo (only a 2-core!) e8400, and the quality was abysmal in comparison

                I'm not saying that GPU transcoding -shouldn't- be a better option than cpu transcoding, it completely should be, but current imple

                • Maybe I'm missing something... but if it's taking you 36 to 48 hours to transcode a single video, and assuming you can justify dedicating a system to that purpose for such an extended time, wouldn't the power savings you'd get by purchasing a much faster system be worthwhile? I'm guessing that the power draw for a process which takes so long to complete is substantial, and that you aren't intending to transcode only a couple of videos.

                  One source of comparative CPU benchmarks is here http://www.cpubenchmark [cpubenchmark.net]
                  • Around here running that 24x7 would cost ~ $200. You'd need to run it for several years to pay for the cost of a new system.

                  • Between my wife and I, we have 7 computers. dedicating one to transcoding isn't a big deal. it's also our file server, so it's on 24/7 anyway.

                    I transcode 2-5 videos at a time usually. once every few weeks/months.

            • by Surt ( 22457 )

              Props to Jimmy, he got 37% right in 8.3% of the time, and even more credit since I assume not everyone could get a 100% in an hour, or what would be the point of the test.

              • I think of it with a different analogy. Instead of little Jimmy and the test, I prefer the BK Lounge metaphor:

                Burger King manages to hand you "food" in 11 Seconds, compared to Shari's (or wherever, insert a middle-of-the-road place here) 20 minutes,
                The "food" is consistently inedible at BK, whereas at Shari's or wherever won't make you sick after bite 2.

                The GPGPU is shitty at video transcoding, but boy howdy it's fast. And that's completely beside the point.
                What good is a "burger" in 11 seconds if you can

          • by tyrione ( 134248 )
            Thank you. I'm personally getting sick and tired of badly written articles on Parallel Programming discussing CUDA and having to wade through crap before one sharp post discusses OpenCL. OpenCL 1.2 is very robust and we'll be seeing OpenCL 2.0 this August.
      • There are probably two issues here, but the kind of calculations we're talking about here are floating-point calculations. And as every programmer should know floating-point calculations done by different CPUs or GPUs don't give you consistent results: http://developers.slashdot.org/story/10/05/02/1427214/what-every-programmer-should-know-about-floating-point-arithmetic [slashdot.org]

        Also, we're talking about GPUs here. GPUs aren't even designed to give you IEEE standard results. Instead they're designed to give approxima

        • Every GPU from nVidia for 3 full hardware generations (Since compute architecture 1.3 - 2009 at least, possibly earlier) has had IEEE754 compliant fp32 and fp64 math. I imagine ATI has been compliant for as long also.

          It is true that code can be compiled using libraries that deliberately compromise the algorithms for transcendental functions to make them faster, but that's 100% the programmer's fault.
      • by rsmith-mac ( 639075 ) on Tuesday May 08, 2012 @07:54PM (#39936569)

        Because they're not using the same encode paths.

        All 3 hardware encode paths - Intel QuickSync, AMD AVIVO, and NVIDIA's CUDA video encoder - are largely black boxes. Programs such as MediaEspresso are basically passing off a video stream to the device along with some parameters and hoping the encoder doesn't screw things up too badly. Each one is going to be optimized differently, be it for speed, more aggressive deblocking, etc. These are decisions built into the encoder and cannot be completely controlled by the application calling the hardware. And you have further complexities such as the fact that only Intel's QuickSync has a hardware CABAC encoder, while AMD and NV do it in software (and poorly since it doesn't parallelize well).

        Or to put this another way, everyone has their own idea on what the best way is to encode H.264 video and what kind of speed/quality tradeoff is appropriate, and that's why everything looks different.

        • CABAC doesn't scale well in massively threaded environments that is true. However there are ways to avoid the issues involved and this really isn't the issue either. It's not the CABAC so much as the bit stream writing for the most part. CABAC scales fine if you parallelize it across slices. Of course no modern encoders make use of multiple slices per field/frame, so it's more of an issue of whether latency is an issue. You can run parallel CABAC encoders by buffering frames.

          The real problem especially when
      • Because it's not the same calculation. All they must do is succeed in outputting h264 compliant files, there are many h264 compliant files that look something like the original source video. Different methods of encoding produce results closer or further from the original, at higher or lower file sizes.

  • by CajunArson ( 465943 ) on Tuesday May 08, 2012 @06:06PM (#39935535) Journal

    The GPU isn't meant to do everything. If it were, there wouldn't be a CPU. Considering the hatred that was poured on Quicksync here, and that Quicksync still produces better quality Transcodes than GPUs while being substantially faster, I don't think we'll be seeing the end of CPU transcoding anytime soon.

    • correct me if I'm wrong but doesn't quicksync use the integrated gpu of sandy/ivy bridge cpus?
      • by CajunArson ( 465943 ) on Tuesday May 08, 2012 @06:13PM (#39935593) Journal

        The quick sync hardware is part of the IGP block but it is specialized hardware specifically geared towards transcoding. For example, it is not using the main GPU pipeline and shader hardware to do the transcoding.

        • by BLKMGK ( 34057 )

          Yeah now go look at the heaping scorn that was heaped on the Intel rep when he approached the x.264 guys way late in development. Had they been smart enough to come forward sooner we might have gotten accelerated instructions the x.264 guys would have used - not so now it seems. :-(

          • by rsmith-mac ( 639075 ) on Tuesday May 08, 2012 @07:41PM (#39936437)
            Let's be clear here: the x264 guys will never be happy. QuickSync, AMD's Video Codec Engine, and NVIDIA's NVENC all use fixed function blocks. They trade flexibility for speed; it's how you get a hardware H.264 encoder down to 2mm2. There are no buttons to press or knobs to tweak and there never will be, because most of the stuff the x264 guys want to adjust is permanently laid down in hardware. The kind of flexibility they demand can only be done in software on a CPU.
            • Except even when you compare the fixed function H.264 encoders to x264 at those exact settings, x264 still dominates.

              • That's my point though. Fixed function encoders won't be able to match x264 because of a lack of flexibility. They can't be optimized for specific niches, they need to be generalist in order to be decent at everything since the hardware can't be changed.

                • Also let's be clear (Score:5, Informative)

                  by Sycraft-fu ( 314770 ) on Tuesday May 08, 2012 @10:31PM (#39937565)

                  That while the x264 guys aren't wrong to want to keep working on a software encoder that is tweakable, there is nothing wrong with a fixed function hardware encoder for some tasks. Sometimes, speed is what you want and "good enough" is, well, good enough.

                  Like at work I edit instructional videos for our website (I work at a university) using Vegas. I use its internal H.264 encoders, which can be accelerated using the GPU. They are quite zippy, I can generally get a realtime or better encode, even when there is a decent amount of shit going on in the video that needs to be processed (remember that Vegas isn't for video conversion, I'm doing editing, effects, that kind of thing).

                  Now the result is not up to x264 quality, per bit. I could get better quality by mucking around setting up an avisynth frameserver and having x264 do the encoding using some tweaked settings for high quality. However it would be much slower.

                  Not worth it. I'll just encoder a reasonably high bitrate video. It is getting fed to Youtube anyhow, so there's a limit to how good it is going to look. The faster hardware assisted encode speeds are worth it.

                  If I was mastering a Blu-ray? Ya I might do the final encode to go off to fabrication with x264 (actually more likely an expensive commercial solution that can generate mastering compliant bitstreams). Spend the extra time to get it as quality as possible because of all the other work and because it could actually be noticable.

                  There is room for both approaches.

            • by BLKMGK ( 34057 )

              Actually they seemed to have some ideas for functions that were bound and could be accelerated. However Intel contacted them having apparently already decided what instructions they were going to accelerate and they weren't useful. Additionally, as I recall, shortly after contacting the development team Intel sort of let on that these guys were somehow on board when in fact they were really only just being contacted and weren't. things didn't go really well after that and I couldn't find any more contact on

        • For example, it is not using the main GPU pipeline and shader hardware to do the transcoding

          No, but it is using it for post-processing such as deinterlacing, noise reduction, etc. The shader pipeline is still involved whenever you need to decode something, be it for QuickSync or for just playing back a video on a PC.*

          *Consequently this is why Intel can't quite match AMD or NV in video playback quality; they lack the shader performance to do as much resource intensive processing

      • by Dahamma ( 304068 ) on Tuesday May 08, 2012 @06:19PM (#39935647)

        Quick Sync uses dedicated HW on the die. Intel's solution that uses their GPU is called Clear Video.

      • by Bengie ( 1121981 )
        It does and it shows Intel's GPU may not be the fastest in all areas, but they're quite well rounded as they are a few factors faster than $300+ GPUs.
    • by Dahamma ( 304068 ) on Tuesday May 08, 2012 @06:24PM (#39935695)

      Actually, recent GPUs *were* meant to do exactly this type of thing, and have been marketed by Nvidia and ATI heavily for this purpose. Of course there needs to be a CPU as well. The CPU runs the operating system and application code, and offloads very specific, parallelizable work to the GPU. This sort of architecture has existed almost as long as modern CPUs have existed.

      And Quick Sync is even less of a general purpose CPU solution than using a GPU. Quick Sync uses dedicated application specific hardware on the die to do its encoding.

    • by PopeRatzo ( 965947 ) on Tuesday May 08, 2012 @06:34PM (#39935763) Journal

      The GPU isn't meant to do everything.

      But since "Graphics Processing" is part of their name, wouldn't you expect them to at least do that?

      Especially considering the price of high-end GPUs is getting up there compared to high-end CPUs.

      • by Anonymous Coward

        Today's "graphics processing units" are essentially designed to render a large number of triangles on a screen in a highly efficient way. If any other graphics operation is thrown at them, they may simply not be designed to execute it well. Just because it has "graphics" in the name it doesn't mean that it can handle every graphics technology perfectly well.

    • by billcopc ( 196330 ) <vrillco@yahoo.com> on Tuesday May 08, 2012 @06:57PM (#39935953) Homepage

      Well see, that's the thing. A GPU is better suited to some kinds of massively parallel tasks, like video encoding. After all, you're applying various matrix transforms to an image, with a bunch of funky floating point math to whittle all that transformed data down to its most significant/perceptible bits. GPUs are supposed to be really really good at this sort of thing.

      My hunch is that the problems we're seeing are caused by two big issues:

      1. lack of standardization across GPU processing technologies. CUDA vs OpenCL vs Quicksync, and a bunch of tag-alongs too. Each one was designed around a particular GPU architecture, so porting programs between them is non-trivial.

      2. lack of expertise in GPU programming. Let's be fair here: GPUs are a drastically different architecture than any PC or embedded platform we're used to programming. While I could follow specs and write an MPEG or H.264 encoder in any high-level language in a fairly straight-forward manner, I can't even begin to envision how I would convert that linear code into a massively parallel algorithm running on hundreds of dumbed-down shader processors. It's not at all like a conventional cluster, because shaders have very limited instruction sets, little memory but extremely fast interconnects. We have a hard enough time making CPU encoders scale to 4 or 8 cores, this requires some serious out-of-the-box thinking to pull off.

      Moving to a GPU virtually requires starting over from scratch. This is a set of constraints that are very foreign to the transcoding world, where the accepted trend was to use ever-increasing amounts of cheaply available CPU and memory, with extensively configurable code paths. The potential is there, but it will take time for the hardware, APIs and developer skills to converge. GPU transcoding should be seen as a novelty for now, just like CPU encoding was 15 years ago when ripping a DVD was extremely error-prone and time-consuming. If you want a quick, low quality transcode, the GPU is your friend. If you're expecting broadcast-quality encodes, you're gonna have to wait a few years for this niche to grow and mature.

      • by fuzzyfuzzyfungus ( 1223518 ) on Tuesday May 08, 2012 @08:07PM (#39936667) Journal
        What strikes me as a bad sign is not so much that the GPU transcoding doesn't necessarily produce massive speed improvements; but that the products tested produce overtly broken output in a fair number of not-particularly-esoteric scenarios.

        Expecting super-zippy magic-optimized speedups on all the architectures tested would be the mark of expecting serious maturity. Expecting a commercially released, GPU-vendor-recommended, encoding package to manage things like "Don't produce an h264 lossy-compressed file substantially larger than the DVD rip source file" and "Please don't convert a 24FPS video to 30FPS for no reason on half the architectures tested" seems much more reasonable.

        I can imagine that the subtle horrors of the probably-makes-the-IEEE-cry differences in floating point implementations, or their ilk, might make producing identical encoded outputs across architectures impossible; but these packages appear to be flunking basic sanity checks, even in the parts of the program that are presumably handled on the CPU(when a substantial portion of iPhone 4S handsets are 16GB devices, letting the 'iPhone 4S' preset return a 22GB file while whistling innocently seems like a bad plan...)
        • by Dputiger ( 561114 ) on Tuesday May 08, 2012 @08:19PM (#39936767)

          Fuzzy,

          You pretty much nailed my problem with the output. :P That's the reason why Arcsoft, with compatibility problems, ultimately ranked above Cyberlink. Arcsoft doesn't do very good work on the Radeon 7950 and it can't handle CUDA, but it at least gets something right. Quick Sync video is very good.

          Cyberlink got nothing right anywhere. And it's the program most-often recommended to reviewers as a benchmark when we want to review GPU encoding.

      • A GPU is better suited to some kinds of massively parallel tasks, like video encoding. After all, you're applying various matrix transforms to an image, with a bunch of funky floating point math to whittle all that transformed data down to its most significant/perceptible bits. GPUs are supposed to be really really good at this sort of thing.

        And there's your problem.

        An h.264 encoder takes a frame of video and splits it up into 16x16 pixel macroblocks. Each macroblock is heavily dependent on those surrounding it (spatially and temporally). For an intra block, a prediction of the content of the current block is made using the decoded content of the top and left blocks. For inter blocks, a previous frame is used as a reference. The decoder has no idea what the original source file looked like, so any predictions made in the encoding process must b

  • by carlhaagen ( 1021273 ) on Tuesday May 08, 2012 @06:12PM (#39935581)
    ...since the results of OpenCL code is static across GPUs rather than being an arbitrary output.
    • by Mia'cova ( 691309 ) on Tuesday May 08, 2012 @07:35PM (#39936355)

      Only the more modern GPU support it. And of those, there are still different levels of support. Even if it's supported, you would probably get much better perf on an nvidia card by using cuda for example. So in today's world, you can't just use an onpencl-powered encoder, it depends on what hardware you have.

  • by wbr1 ( 2538558 ) on Tuesday May 08, 2012 @06:16PM (#39935625)
    There is a screwed up graph on page two where they use the same graphic twice, and the caption describes aspects of the one that is missing. I really wanted to see the comparison too. You would think in an article of that size and scope someone would be responsible for checking layout as well as copy. It is no wonder we are losing to china. Their English may be worse, but their work ethic and attention to detail is possibly better.
  • I think that the real benefit of GPUs for transcoding will be seen once people start making new as-yet unimagined encoding schemes that are designed to do data parallel tasks that wouldn't even be considered on a traditional CPU.
    • I think that the real benefit of GPUs for transcoding will be seen once people start making new as-yet unimagined encoding schemes that are designed to do data parallel tasks that wouldn't even be considered on a traditional CPU.

      Maybe by then, "traditional" CPUs will be different from the ones we have right now.

    • by Hentes ( 2461350 )

      Encoding should be trivial to paralellize, you just cut up a movie to a sequence of n clips and encode them independently.

      • Encoding should be trivial to paralellize, you just cut up a movie to a sequence of n clips and encode them independently.

        Because the structure of modern codecs is based on Groups of Pictures (GOP), you'd have to run two passes on the video, with the first pass determining where the keyframes go. Although this is commonly done by people who don't have a good understanding of video encoding, the more efficient way is to just run a single pass using a constant quality (which is not the same as a constant quantizer). Then, on that single pass, you parallelize the operations on each frame. This also results in less disk thrashi

        • What about slice-based parallel processing?
          Correct me if I'm wrong, (I wouldn't be surprised if it turns out I am...) but doesn't x264 have an option to do slice-based parallel processing? As I understand it, if there are 4 running threads, each frame is chopped into four quadrants with a little edge room buffer in each slice, then independently encoded, then glued back together at the other end? That's how I remember that option being described in some forum or other. Not the standard multi-threading, b

          • What about slice-based parallel processing?

            According to the wiki [project357.com], you're still better off using normal multithreading even if you are using slices (as you must if you are encoding for Blu-Ray).

  • by Anonymous Coward on Tuesday May 08, 2012 @06:38PM (#39935781)
    Here's a link [extremetech.com] to the article in 1 page.
  • The real problem is a lack of a common API for encoding regardless of GPU/CPU, which leads to vendor-specific implementations with varying degrees of quality. The most efficient way to pretty much do anything is a dedicated HW block (from both perf and power point of view), so there is no question that there is value in encoding using dedicated hardware, but the software has to catch up.
  • that encoders inexplicably insist on codex and wrappers that predate the Millenium? The problem with transcoding is that it exists at all. Strongarm the holdout encoders into using h264 or mp4v with mp4 wrappers, and transcoding will be like... well, like anything no one does anymore.
    • The problem with transcoding is that it exists at all. Strongarm the holdout encoders into using h264 or mp4v with mp4 wrappers, and transcoding will be like... well, like anything no one does anymore.

      There will always be transcoding, since you can't fit the 20GB H.264 stream from a Blu-Ray on a phone. And, why would you want to? Resize the 1920x1080 to 800x480 or so, and it will look great on every phone.

      For tablets or other devices with more resolution, you still don't need all the bits that most Blu-Ray encodes use. Most are essentially constant bit rate around 25-30Mbps. For movies that are essentially "talking heads" (courtroom dramas like A Few Good Men and Presumed Innocent are the best exampl

      • There will always be transcoding, since you can't fit the 20GB H.264 stream from a Blu-Ray on a phone.

        You are thinking about this all wrong. You think you own that movie... but you don't, you own a license. That license entitles you to transcode... if you want to go ahead and do work that, chances are, has already been done, and is constantly being done for you by others that create far better quality transcodes. The obtuse talk about how great their hardware is, and how fast they can rip their movies... but the astute keep all their movies backed up on the Internet in every format and resolution imaginable

        • by jedidiah ( 1196 )

          > You are thinking about this all wrong. You think you own that movie... but you don't, you own a license.

          Your attempt to spread that pro-corporate propaganda simply won't work here. We know better.

        • if you want to go ahead and do work that, chances are, has already been done, and is constantly being done for you by others that create far better quality transcodes.

          In general, there are two kinds of rips available on torrents: too large so that quality stays high, and really small to fit on phones or similar devices. It's very hard to find something in-between, where you the encode is close to transparent, but as small as possible. Most rips still use two-pass average bitrate mode, which is basically "find the CRF value that gives me this bitrate", which is the wrong way to maintain quality. The right way is to pick a CRF value and let x264 figure out the bitrate n

  • So basically the article says GPU rendering is bad, but QuickSync is good enough for prime time.

    Duh. QS is made to do a very specific task (encoding/decoding video) and it can do it super fast at decent quality rates. There's always the tradeoff of quality vs. encoding time. With QS, I can rip an entire 50GB Blu-Ray in 12 minutes to a 1080p MKV @ 8000kbps. It takes about 16 hours doing the same task with a normal x264 encoder such as Handbrake even though the quality is a little bit better. Is it wo
    • With QS, I can rip an entire 50GB Blu-Ray in 12 minutes to a 1080p MKV @ 8000kbps. It takes about 16 hours doing the same task with a normal x264 encoder such as Handbrake even though the quality is a little bit better.

      Even using the "slower" preset on x264, 1080p encodes take about 3 times as long as the movie, so no more than 8 hours. This is on a slower CPU (since you have QuickSync) than you use, and end up at about 4Mbps

      If I used a less-intensive preset, I would get encodes at about the same bitrate as yours, but taking just a little more than the running time of the movie to do it. QuickSync may be even faster, but 3 hours to encode most movies is good enough.

      With enough bitrate, anything looks good.

      In general, this is true, but very poor encoders can st

    • Re: (Score:2, Informative)

      by Dputiger ( 561114 )

      No, the article says that GPU encoding software runs the gamut from outright awful to simply broken and limited. Quick Sync video is great in Arcsoft, terrible in Cyberlink, unsupported in Xilisoft, and looks decent in MediaCoder. Check the GTX 580's output in Xilisoft for plenty of proof that no, you don't need insane bitrates to create decent-looking output.

    • Ok so i have a Sandy Bridge K processor. What else do i need to make QS work?
  • That's why video professionals and tv stations rely on hardware based transcoding, and this solutions tend to be expensive. There should be many systems than encode H264 videos really fast, something like this: http://www.blackmagic-design.com/products/teranex/
     

    • by nabsltd ( 1313397 ) on Tuesday May 08, 2012 @08:29PM (#39936857)

      That's why video professionals and tv stations rely on hardware based transcoding, and this solutions tend to be expensive.

      x264 can encode 1080p in realtime on a modern Intel CPUs (Sandy Bridge, etc.) with pretty much as good a quality for the same bitrate as most hardware solutions. For non-HD, x264 just smokes hardware, as it can do better than realtime encodes at very high quality on those same CPUs.

    • by jedidiah ( 1196 )

      Fascinating. The fact that hardware based transcoding is a disaster is why "professionals" use hardware based transcoding?

      That simply makes no sense.

    • by EdZ ( 755139 )
      The hardware based transcoding is not necessarily better (see: the dire state of BBC's terrestrial HD broadcasts compared to the earlier 'test' HD broadcasts, and their stonewalling whenever people call them out on it and explain how to improve it). Hardware transcoders are used
      1) Because they're guaranteed real-time, so you can pipe video through and just factor in a set time delay
      2) Designed to be robust, so you don't need to worry about overheating, or the encoder choking on a certain bit of video

      If
  • I use DVDFab to rip DVDs using my GPU, and it positively flies. Most 2 hr movies take around 10 minutes to convert to H.264. It doesn't support VBR, but outside of that I've never had trouble with it. The resulting video quality is quite good as well (except with files that need deinterlacing, but that's always a problem). I think the person who wrote the articles just didn't try the right programs.

    • Have you tried x264 with --preset veryfast? My experience is that x264 is able to match a GPU encoder's speed while still giving significantly better quality. I'd only bother with a GPU encoder if I had a terrible CPU (netbook, phone?).
  • by TheSync ( 5291 ) on Tuesday May 08, 2012 @11:05PM (#39937729) Journal

    Please see Elemental Technologies [elementalt...logies.com] GPU-accelerated H.264 transcodes.

    • Considering the article ruled out some software, because it was considered too difficult to use, I suspect Elemental Technologies' software would be ruled out due to cost.

  • Surprise, surprise, I have the feeling that most of you haven't actually read the article. The article is not arguing that GPUs are inherently flawed. Also, the article is not an NVIDIA-vs-AMD competition. Rather, the author tests software on each platform. It's the software that is bad, not the GPUs themselves. For instance, the NVIDIA GPU does quite well with Arcsoft and Xilisoft; this wouldn't be possible if GPUs were somehow broken for transcoding. After all, as others have pointed out here, floati

  • After waiting and trying and waiting and trying and waiting and trying... finally conversion to 6GB mkv with full DTS works reliably. I converted my library of 600+ blu rays over the last few weeks.

    Using the GPU I get about 70fps, and I've watched about 15 of the movies without noticing any problems at all.

    I flat out gave up with trying to support my fricking PS3.

"A car is just a big purse on wheels." -- Johanna Reynolds

Working...