Which Open Source Video Apps Use SMP Effectively?

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Which Open Source Video Apps Use SMP Effectively? 262

Posted by kdawson on Wednesday July 23, 2008 @04:54PM from the on-the-one-core-on-the-other-core dept.

ydrol writes "After building my new Core 2 Quad Q6600 PC, I was ready to unleash video conversion activity the likes of which I had not seen before. However, I was disappointed to discover that a lot of the conversion tools either don't use SMP at all, or don't balance the workload evenly across processors, or require ugly hacks to use SMP (e.g. invoking distributed encoding options). I get the impression that open source projects are a bit slow on the uptake here? Which open source video conversion apps take full native advantage of SMP? (And before you ask, no, I don't want to pick up the code and add SMP support myself, thanks.)"

This discussion has been archived. No new comments can be posted.

Which Open Source Video Apps Use SMP Effectively?

Load All Comments

Search 262 Comments Log In/Create an Account

Comments Filter:

ffmpeg (Score:5, Informative)

by bconway ( 63464 ) writes: on Wednesday July 23, 2008 @04:55PM (#24310803) Homepage

Use the -threads switch.

Share
twitter facebook
- Re: (Score:2)
  
  by pak9rabid ( 1011935 ) writes:
  
  Agreed. ffmpeg worked quite nicely for me during my DVD-ripping heyday. Although, it seems that it would rip audio and video in separate threads. While an improvement over the traditional, linear way of doing things, I would still see 1 CPU maxed out (video encoding), while the CPU encoding audio was only at about 1/3 capacity.
- Re:ffmpeg (Score:5, Informative)
  
  by morgan_greywolf ( 835522 ) * writes: on Wednesday July 23, 2008 @04:58PM (#24310873) Homepage Journal
  
  Similarly, mencoder supports threads=# where # is something between 1 and 8.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Insightful)
    
    by Z00L00K ( 682162 ) writes:
    
    And it may or may not be useful to actually rune more than one thread per kernel. It depends on the encoder and application how many threads you shall run, so the best is to test with 1, 2 and 4 threads per kernel.
    - Re: (Score:3, Informative)
      
      by sp332 ( 781207 ) writes:
      
      And it may or may not be useful to actually rune more than one thread per kernel. It depends on the encoder and application how many threads you shall run, so the best is to test with 1, 2 and 4 threads per kernel.
      Isn't that per-core, not per-kernel?
      - Re: (Score:2)
        
        by Z00L00K ( 682162 ) writes:
        
        Of course... Not my best day today! Maybe I shall think more of that pillow...
- Re:ffmpeg (Score:5, Insightful)
  
  by Albanach ( 527650 ) writes: on Wednesday July 23, 2008 @05:03PM (#24310949) Homepage
  
  Or just convert 2 videos at once, or 4 for a quad core etc. They did suggest they have lots to convert, and it's a pretty easy way to get all available cores working hard.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by i.r.id10t ( 595143 ) writes:
    
    Yup, with separate disks to work on to remove (mostly) the disk i/o contention, just let each process run happily away.
  - Re: (Score:2)
    
    by init100 ( 915886 ) writes:
    
    That's exactly what I do. I also wrote a scheduler in Python that starts new jobs when the previous ones are completed. It keeps the number of running encoding processes equal to the number of processors/cores.
    To get the optimal scheduling order, it figures out the length of each input file (using midentify from the mplayer/mencoder distribution), and then sorts the jobs so that the longest jobs will be processed first (it assumes that processing time is roughly proportional to input file length (in seconds
    - Re:ffmpeg (Score:5, Interesting)
      
      by Tanktalus ( 794810 ) writes: on Wednesday July 23, 2008 @06:29PM (#24311977) Journal
      
      That sounds like a lot of work... I just used make:
      %.mpg: %.avi tovid -ntsc -dvd -noask -ffmpeg -in "$<" -out "$(basename $@)" all: $(subst .avi,.mpg,$(wildcard */*.avi))
      
      Then I just ran "make -j4". All four processors working like mad, with a minimal of effort.
      (You may need to change the wildcard for your own scenario.)
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Insightful)
        
        by TheLink ( 130905 ) writes:
        
        Ah but figuring out "make" might require too much wetware CPU time for most people ;).
        
        "Why is it not working? Oops messed up tabs and spaces", etc.
      - Re:ffmpeg (Score:5, Informative)
        
        by ksheff ( 2406 ) writes: on Wednesday July 23, 2008 @11:21PM (#24314383) Homepage
        
        That's the point. If the xvid encoder is single threaded, then to keep all the cores busy, one must run multiple instances of ffmpeg with each one encoding a different file. For the given Makefile, that is what make will do when the -j switch is used.
        
        Parent Share
        twitter facebook
        
        Re:ffmpeg (Score:4, Informative)
        
        by QuoteMstr ( 55051 ) writes: <dan.colascione@gmail.com> on Thursday July 24, 2008 @01:21AM (#24315047)
        
        You're still missing the OP's point. Let me spell it out for you:
        Say you have four videos to encode, and four cores.
        1) You can either use one core at a time and encode one video at a time. Let's say that takes time T.
        2) You can encode one video at a time, but use all four cores while doing it. Your total time is T/4.
        3) You can encode four videos at a time, one on each core. Your total time is T/4.
        The OP was advocating strategy #3. It's a fine approach.
        
        Parent Share
        twitter facebook
  - - Re: (Score:3, Insightful)
      
      by Albanach ( 527650 ) writes:
      
      I thought about that but, seriously, transcoding is usually CPU limited. I'd really suspect it'd take a lot of simultaneous encoding to make it I/O bound.
      - Re:ffmpeg (Score:4, Informative)
        
        by networkBoy ( 774728 ) writes: on Wednesday July 23, 2008 @08:57PM (#24313319) Journal
        
        I hit I/O throttling when I do the following:
        * rip 2 dvds (two DVDR Drives)
        * transcoding previous DVD rips to XVID
        * Moving completed rips to server over 1 Gbps Ethernet link.
        At this point I can see CPU load start to drop as PCI bus I/O saturates.
        At no point do I hit disk I/O or memory limits.
        Disks are non-RAID non-striped, but rips are to separate disks (thus DVDA rips to HDA DVDB to HDB) and server upload pulls from whatever disk is not currently transcoding (transcode file on HDA, when done start transcode on HDB and move file from HDA).
        -nB
        
        Parent Share
        twitter facebook
        
        Re:ffmpeg (Score:4, Informative)
        
        by MadnessASAP ( 1052274 ) writes: <madnessasap@gmail.com> on Wednesday July 23, 2008 @10:07PM (#24313901)
        
        If I may offer a suggestion, I'm not too sure on what your setup is but on mine I have 2 DVD drives each on separate IDE buses and 2 SATA drives (also on separate buses) rip from the DVD to drive 1 and encode from drive 1 to 2. OF course it all depends on a variety of factors but using that certainly helped that.
        
        Parent Share
        twitter facebook
        
        Re:ffmpeg (Score:5, Informative)
        
        by Nikker ( 749551 ) * writes: on Wednesday July 23, 2008 @10:18PM (#24313973)
        
        Running multiple cores with an ide interface is going to kill you regardless because you are only encoding in memory not really storing much there. Basically you have a cap of about 40MB/s for anything larger than about 40MB.
        
        Parent Share
        twitter facebook
- Re: (Score:2, Interesting)
  
  by fm6 ( 162816 ) writes:
  
  So why is threading off by default? In a CPU-intensive application like this, multithreading always makes sense, even on a single-core system.
  - Re:ffmpeg (Score:5, Informative)
    
    by m0rph3us0 ( 549631 ) writes: on Wednesday July 23, 2008 @05:47PM (#24311515)
    
    No it doesn't the only time you want to use multi-threading in a single CPU environment is because asynchronous methods for IO are unavailable or the code would be too difficult to re-architect to use asynchronous IO. If the application is seriously IO bound threads can even make the situation worse by causing random IO patterns.
    Ideally, the number of threads a program uses should be no more than the number of processors available. Otherwise, you are wasting time context switching instead of processing.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by TeacherOfHeroes ( 892498 ) writes:
      
      Ideally, the number of threads a program uses should be no more than the number of processors available. Otherwise, you are wasting time context switching instead of processing.
      An exception to this kind of rule should really be made for graphical user interfaces. In the case of GUI applications, time wasted in context switching is less important than keeping the UI responsive and the user happy.
      Any kind of heavy lifting (IO-blocking or otherwise) should really be done on a different thread than the one that is responsible for handling the user interface. This allows the user interface to stay responsive, providing the user with feedback (progress bar, time estimate, reassurance th
      - Re: (Score:2, Insightful)
        
        by sick_soul ( 794596 ) writes:
        
        Just want to inform you that threads nor any other
        multiprogramming mechanisms are necessary for
        responsive user interfaces,
        and that IO multiplexing in particular does not require
        threads at all.
        You can solve both with threads, but you don't have to.
        And in most common cases it is much better not to;
        it seems that threads continue to be one of the most
        misused and misunderstood of the programming concepts.
        
        Re: (Score:3, Insightful)
        
        by slimjim8094 ( 941042 ) writes:
        
        Perhaps. But threads are far more versatile - if they're done well.
        So our video app has a sound-processing thread, a video processing thread, and a UI thread. If it's implemented well (don't read or write twice, have a common buffer), it'll run with the same or near performance as a one-threaded program on a one-processor/core system.
        But on a multicore/processor system no extra work is needed to take advantage of the cores. If we have three cores, it'll run automatically across cores for a massive performan
        
        Re: (Score:3, Interesting)
        
        by QuoteMstr ( 55051 ) writes:
        
        Yes, you can use threads well. But with less effort (taking into account synchronization and debugging), you can make the asynchronous tasks independent programs instead of threads. Your video and sound processing threads sound like perfect candidates for being made into independent programs.
        A task being an independent program affords several advantages. For example, it's easier to test an independent program, especially in a test harness. An independent program can be run by itself. And it's very clear wha
    - - Re:ffmpeg (Score:5, Insightful)
        
        by m0rph3us0 ( 549631 ) writes: on Wednesday July 23, 2008 @06:06PM (#24311757)
        
        On a two processor system this would result in multi-threading being off.
        
        Parent Share
        twitter facebook
        
        Re:ffmpeg (Score:5, Funny)
        
        by maglor_83 ( 856254 ) writes: on Wednesday July 23, 2008 @07:35PM (#24312655)
        
        On a single core system this would result in not being able to run anything!
        
        Parent Share
        twitter facebook
  - Re: (Score:2)
    
    by hedwards ( 940851 ) writes:
    
    Threading is sometimes broken on the OS, or sometimes it varies between revisions.
    FreeBSD for instance has been in the middle of changes to the threading system and there was a bug in the 6.x branch which wasn't in either 7.x or current. Defaulting to off if you're not sure how well threading is going to be handled is probably better than defaulting to something that could be broken.
    Anybody who knows that they need threading and decides to turn it on is likely to know whether or not threading is broken. Or
  - - Re: (Score:3, Informative)
      
      by Anonymous Coward writes:
      
      If thread 1 is doing work while thread 2 is blocked (io, semaphores, etc), then multithreading will be faster.
- Re:ffmpeg (Score:5, Informative)
  
  by ydrol ( 626558 ) writes: on Wednesday July 23, 2008 @06:26PM (#24311931)
  
  Darn, I forgot a minor detail in my question. I was really asking about the various front-end apps (dvd::rip, k9copy, acidrip etc), I got the impression that none seem to notice they are running on an SMP platform and pass the necessary switches by default to the backend.
  Some may argue this is a good thing, but for the time being SMP is the way forward for faster processing as MHz has maxed out, in consumer PCS. So when they start buying octo-core CPUs they dont expect it to run at 1/8th speed by default.
  I was also being a bit lazy. I could have checked up on each app in turn, but I asked /. instead.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by elgaard ( 81259 ) writes:
    
    I have not tried it. But e.g. k9copy uses mencoder.
    So if you just put something like "x264ops=threads=auto" in you mencoder.conf file it might work also from k9copy.
    k9copy also have a settings menu where you can tune options to mencoder for various codecs.
    - AcidRip patches (Score:3, Informative)
      
      by ydrol ( 626558 ) writes:
      
      Cheers. I also found these Acidrip patches. [ubuntuforums.org] PS In case anyone missed it, I really meant to ask about the front end GUI/script tools rather than the engines. PPS I'm actually using Mandriva.
- - Re:ffmpeg (Score:5, Informative)
    
    by mweather ( 1089505 ) writes: on Wednesday July 23, 2008 @05:22PM (#24311211)
    
    Apple computers ARE PCs. They coined the damn term.
    
    Parent Share
    twitter facebook
    - Re: (Score:2, Interesting)
      
      by civilizedINTENSITY ( 45686 ) writes:
      
      strange that quoting history correctly and in context gets you modded flamebait...
    - Re:ffmpeg (Score:5, Informative)
      
      by VGPowerlord ( 621254 ) writes: on Wednesday July 23, 2008 @07:16PM (#24312467)
      
      True, but in most contexts, "PC" is the shortened form of IBM-compatible PC (which is really outdated), and is usually just stands for Windows these days.
      
      Parent Share
      twitter facebook
    - Re:ffmpeg (Score:5, Insightful)
      
      by hedwards ( 940851 ) writes: on Wednesday July 23, 2008 @08:08PM (#24312957)
      
      Apple has spent a lot of time and money convincing everybody that they don't sell PCs, they sell Macs. I'm not sure what the point of arguing with both the general public as well as Apple is.
      At this point, the term PC does not include Apple computers. It's a change to the definition which happens when the vast majority of people decide amongst themselves that the definition should change.
      In terms of the topic at hand, most video apps really should be capable of using multiple cores, tasks of this sort are quite easy to finish in parallel. Either by doing ever n frames or subdividing the image into a number of regions which can be completed separately and joined at the end before writing the frame to disk.
      
      Parent Share
      twitter facebook
    - Re:ffmpeg (Score:5, Insightful)
      
      by 3vi1 ( 544505 ) writes: on Wednesday July 23, 2008 @08:30PM (#24313153) Homepage Journal
      
      No - HP did (for their calculators), way before there "was" an Apple.
      Also, I don't even think Apple marketing would agree with you - or they wouldn't have "I'm a Mac... and I'm a PC" adverts.
      
      Parent Share
      twitter facebook
  - Re: (Score:2, Informative)
    
    by SimonTheSoundMan ( 1012395 ) writes:
    
    Yeah, Compressor is pretty damn good. It doesn't just use all your cores, but it can also distribute the workload to other machines on a network. Whole render farms.
    
    Logic Node is somewhat better, however it only does audio, we have two eight core Mac pro's and three Xserv machines in our studio. The Xserve machines will be binned when the new version of Logic Pro supporting GPU processing the audio is out.
  - - Re: (Score:3, Interesting)
      
      by Fred_A ( 10934 ) writes:
      
      Is creating a copy of my DVD for my Cowon D2 piracy now ?
      Legally it probably is in many places since I'm probably not even allowed to read them on my PC (Linux), but still...
      - Re: (Score:2, Informative)
        
        by sexconker ( 1179573 ) writes:
        
        If you're making another copy of it to play on another device (format shifting or whatever bullshit term they used), yeah, you can probably get sued for it if some asshat wants to target you.
        Illegal? No.
        Wrong? Hell no.
        My point is that encoding apps often exist separately from editing apps (such as FCP). This is due in large part to piracy, especially when talking about free/open encoders and sites like doom9.
        Pirates are not concerned with editing/creating, they're concerned with copying and converting/co
transcode, of course! (Score:5, Informative)

by morgan_greywolf ( 835522 ) * writes: on Wednesday July 23, 2008 @04:55PM (#24310813) Homepage Journal

transocde [transcoding.org]uses separate processes for everything.

Share
twitter facebook
x264 (Score:3, Insightful)

by Anonymous Coward writes: on Wednesday July 23, 2008 @04:58PM (#24310869)

x264 use slices and scales pretty well across multiple cores. I use it on windows via megui, but you could easily use it in Linux as well. You could use mencoder to pipe out raw video to a fifo and use x264 to do the actual conversion, for instance.

Share
twitter facebook
- Beat me to it! (Score:5, Informative)
  
  by BLKMGK ( 34057 ) writes: <{morejunk4me} {at} {hotmail.com}> on Wednesday July 23, 2008 @05:05PM (#24310985) Homepage Journal
  
  x264 via meGUI from Doom9 is what I use to compress HD-DVD and BD movies - also on a quad core. I have some tutorials posted out and about on how I'm doing it. Near as I can tell you cannot dupe the process on Linux due to the crypto - Slysoft's AnyDVD-HD is needed.
  Playback - I use XBMC for Linux. It is also SMP enabled using the ffmpeg cabac patch. the developers of this project have been VERY aggressive at taking cutting edge improvements to the likes of ffmpeg and incorporating them into the code. Since Linux has no video acceleration of H.264 SMP really helps on high bitrate video!
  
  Parent Share
  twitter facebook
VisualHub... (Score:4, Informative)

by e4g4 ( 533831 ) writes: on Wednesday July 23, 2008 @04:59PM (#24310881)

...makes excellent use of multiple cores. It is however Mac-only. Interestingly, what it does is split a file into chunks and spawns multiple ffmpeg processes to do the conversion. Which is to say, perhaps you can do some (relatively simple) scripting with ffmpeg that will do the job.

Share
twitter facebook
- - Re: (Score:2, Informative)
    
    by phuul ( 997836 ) writes:
    
    So is ffmpeg not open source? It uses the LGPL license and from their license FAQ:
    "FFmpeg is licensed under the GNU Lesser General Public License (LGPL). However, FFmpeg incorporates several optional modules that are covered by the GNU General Public License (GPL), notably libpostproc and libswscale. If those parts get used the GPL applies to all of FFmpeg. Read the license texts to learn how this affects programs built on top of FFmpeg or reusing FFmpeg. You may also wish to have a look at the GPL FAQ. "
  - Re:Which part of Open Source didn't you get? (Score:5, Informative)
    
    by pushing-robot ( 1037830 ) writes: on Wednesday July 23, 2008 @05:20PM (#24311193)
    
    OP is asking for open source tools. You cited a commercial one that doesn't provide source.
    VisualHub (the front-end app) may be closed, but ffmpeg is LGPL.
    And the GP was suggesting using ffmpeg, not VisualHub.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by mweather ( 1089505 ) writes:
    
    And told him how it uses an open source program in an easily-replicatable way.
x264 and avisynth (Score:3, Informative)

by PhrostyMcByte ( 589271 ) writes: <phrosty@gmail.com> on Wednesday July 23, 2008 @05:01PM (#24310913) Homepage

x264 and avisynth can make pretty decent use of threads. check out meGUI.

Share
twitter facebook
- Re: (Score:2)
  
  by figleaf ( 672550 ) writes:
  
  Yeah x264 is great. There is a slight quality degradation (albeit you have to look really hard to visually determine the difference) if you use multiple threads.
  I once used a batch file to encode several gigs of my family vacation MJPEG videos to H.264 using x264 in a single background thread over a period of 10 days.
  With some heavy-duty post processing (for noise removal etc) it encoded about a 1 GB source/day. There was no perf. degradation with my other apps (games, email etc.) on account of the video en
- Re: (Score:2)
  
  by Henriok ( 6762 ) writes:
  
  ffmpegX for OSX uses x264 and it's transcoding like mad on my eight core Mac Pro. A 2h Video_TS film conversion to iPhone-ready double pass h264/MPEG4.. in less 20 minutes. Using 720-760% CPU, i.e. just the right ammount for me that uses the machine for other tasks as well.
  - - Re: (Score:2)
      
      by Henriok ( 6762 ) writes:
      
      It's only the GUI that's shareware, what I just told everyone was that the open source codec x264 is threaded and performing very good on SMP systems.
- - Re: (Score:2)
    
    by figleaf ( 672550 ) writes:
    
    Thats not correct. The admin-rights are only needed to update Megui. Video encode works fine without admin permissions.
    You can install MeGUI in a non-standard location like c:\tools\megui and not require admin permissions to update.
- - Re: (Score:2)
    
    by PhrostyMcByte ( 589271 ) writes:
    
    The newer version supports SetMTMode [avisynth.org] which works quite well in many cases.
Load balancing: Why? (Score:5, Insightful)

by DigitAl56K ( 805623 ) * writes: on Wednesday July 23, 2008 @05:09PM (#24311043)

don't balance the workload evenly across processors
Why is balancing the load evenly important, as long as one thread is not bottlenecking the others? Loading a particular core or set of cores might even be beneficial depending on the cache implementation, especially when other applications are also contending for CPU time.
Sure, a nice even load distribution might be an indicator for good design, but it doesn't have to apply in every case. I don't think software should be designed so you can be pleased with the aesthetics of the charts in task manager.

Share
twitter facebook
- Re: (Score:2, Insightful)
  
  by Scottie-Z ( 30248 ) writes:
  
  Because, ideally, all four cores should be running at 100% -- the idea is to make maximal use of your available resources, right?
  - Re:Load balancing: Why? (Score:5, Insightful)
    
    by DigitAl56K ( 805623 ) * writes: on Wednesday July 23, 2008 @05:34PM (#24311347)
    
    It's still possible to load all cores 100%.
    A video decoder that I'm working with, for example, currently uses only as many threads as necessary for real-time playback. So for example if one core can do the job only one core is used. If the decoder looks like it might start falling behind more threads are given work to do. Ultimately, if your system is failing to keep up all cores will be fully leveraged.
    However, so long as only some cores are required the others are 100% available to other processes, including their cache (if it's independent). I'm not sure how power management is implemented but perhaps it's even possible for the unused cores to do power saving, leading to longer batter life for laptops/notebooks, etc.
    the idea is to make maximal use of your available resources, right?
    No, the idea is to make the best use of your resources. I'm not trying to say that load balancing is wrong. I'm just saying that processes that don't appear to be balanced are not necessarily poorly designed or operating incorrectly.
    
    Parent Share
    twitter facebook
Handbrake (Score:5, Informative)

by vfs ( 220730 ) writes: on Wednesday July 23, 2008 @05:18PM (#24311163) Journal

Handbrake [handbrake.fr] has always used both of the cores on my system for transcoding.

Share
twitter facebook
- Re:Handbrake (Score:5, Informative)
  
  by catmistake ( 814204 ) writes: on Wednesday July 23, 2008 @05:57PM (#24311647) Journal
  
  that's because Handbrake uses ffmpeg
  
  Parent Share
  twitter facebook
- - Re: (Score:3, Informative)
    
    by crmarvin42 ( 652893 ) writes:
    
    It's good for Video_TS folders in general. In fact, a handful of DVD's can't be ripped directly from the disk using handbrake and need to be copied to HD via something like MacTheRipper before being transcoded by Handbrake. I don't know what format the guy is trying to transcode from, but most people only need to transcode DVD's.
  - Re: (Score:2)
    
    by MsGeek ( 162936 ) writes:
    
    But the program will NOT transcode from .VOB to .DV. That's all I want to do. I want to point Handbrake to a VOB and have it transcode direct to .DV, particularly Final Cut-friendly .DV. Yeah I'm on a Mac. MacBook Core 2 Duo (Merom) 2GHz. I converted from .VOB to .MKV, then I took the .MKV into VisualHub. The transcode in VisualHub died silently towards the end. Fail.
    There has GOT to be a better way. On Mac. I'm willing to learn command line apps to do this if I can take a .VOB and convert direct to .DV.
F(next) = F(current) + Delta(F(current:next)) (Score:5, Insightful)

by Lumenary7204 ( 706407 ) writes: on Wednesday July 23, 2008 @05:21PM (#24311195)

The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.
Multi-threading is most efficient when it is applied to discrete data sets that have little or no dependency on each other.
For example, suppose I have a table with four columns -- three holding input values (A, B, and C) and one holding an output value (X). If the data in a given row of the table has nothing to do with the data in any other row, multi-threading works efficiently, because none of the threads are waiting for data from any of the other threads. If I want to process multiple rows at once, I simply spawn additional threads.
On the other hand, for data such as MPEG video, the composition of the next frame is equal to the composition of the current frame, plus some delta transformation - the changed pixels.
This introduces a dependency which precludes efficient multi-threaded processing, because each succeeding frame depends on the output of the calculations used to generate the prior frame. Even if more than one core is dedicated to processing the video stream, one core would wind up waiting on another, because the output from the first core would be used as the input to the second.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by Lumenary7204 ( 706407 ) writes:
  
  Note that the above example is about the video component only of a single MPEG audio/video stream.
  There is no reason that an encoder/decoder can't process audio in one thread and video in another, thereby using more than one core (which has already been discussed in other posts relating to this article).
- keyframes (Score:5, Informative)
  
  by Anonymous Coward writes: on Wednesday July 23, 2008 @05:29PM (#24311297)
  
  Actually, the MPEG stream resets itself every n frames or so (n is often a number like 8, but can vary depending on the video content). These are called keyframes (K) and the delta frames (called P and I frames) are generated against them. Because of this, it is really easy to apply parallel processing to video encoding.
  
  Parent Share
  twitter facebook
  - Re:keyframes (Score:5, Informative)
    
    by DigitAl56K ( 805623 ) * writes: on Wednesday July 23, 2008 @06:14PM (#24311839)
    
    Actually, the MPEG stream resets itself every n frames or so (n is often a number like 8, but can vary depending on the video content).
    That is not true for MPEG-4 unless you have specifically constrained the I/IDR interval to an extremely short interval, and doing so severely impacts the efficiency of the encoder because I-frames are extremely expensive compared to other types.
    Keyframes are usually inserted when temporal prediction fails for some percentage of blocks, or using some RD evaluation based on the cost of encoding the frame. Therefore unless the encoder has reached the maximum key interval the I frame position requires that motion estimation is performed, and thus you can't know in advance where to start a new GOP.
    In H.264 due to multiple references you would certainly have issues to contend with since long references might cross I-frame boundaries, which is why there is the distinction of "IDR" frames, and this would certainly not be possible threading at keyframe level.
    Granted, for MPEG1&2 encoders threading at keyframes is a possibility, although still not one I'd personally favor.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by elgaard ( 81259 ) writes:
      
      Yes, but you would only need one keyframe per cpu/core.
      E.g. on a dualcore let one core encode the first half and the other core the second half.
    - Re:keyframes (Score:4, Informative)
      
      by TwinkieStix ( 571736 ) writes: on Wednesday July 23, 2008 @07:26PM (#24312551) Homepage
      
      This may be true for sending entire frames to threads, but in mpeg4, frames are broken up into chunks. Motion vectors are created that allow these chunks to move about the image from frame to frame. Other filters are used to remove blockiness, compress the image, do motion detection and macroblock detection, and do various other tasks. MPEG4, especially H.264, can be easily multi-threaded: http://ietisy.oxfordjournals.org/cgi/content/abstract/E88-D/7/1623 [oxfordjournals.org] http://adsabs.harvard.edu/abs/2004SPIE.5308..384L [harvard.edu] http://www.electronicsweekly.com/Articles/2007/05/02/41296/aspex-targets-parallel-processor-at-blu-ray-dvd.htm [electronicsweekly.com] When doing a two-pass encode, this is even easier because the keyframes are discovered on the first (faster) pass, so (if encoding already couldn't be threaded) it could by taking advantage of the known keyframe markers in at least the second pass. But, that's not necessary. I use handbrake to create H.264 videos under Linux all the time on my dual core machine, and both processors stay between 80%-90% utilization from start to finish regardless of the number of passes.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by DigitAl56K ( 805623 ) * writes:
        
        I agree that MPEG4 can be easily multithreaded, but it is not threaded by entire GOPs, as the GP suggests, in any encoder that I know of for the reasons I gave. Frame-level and slice-level threading are the two common techniques. I do actually work on MPEG-4 codecs.
  - Re: (Score:2)
    
    by statemachine ( 840641 ) writes:
    
    How did you get a +5 Informative when you're wrong?
    First off, which MPEG spec has a K-frame? An I-frame is not a delta frame, it's more like your "keyframe." P and B are the delta frames.
    Secondly, there's very little to parallelize if you're working with open Groups of Pictures (GOP), that is to say every GOP references into the next GOP. If you have closed GOPs, then you can do this a little better by putting the next GOP on another core/CPU.
    But will you gain a significant speedup? The problem is not just
  - Re: (Score:3, Informative)
    
    by srw ( 38421 ) * writes:
    
    Slight correction: in MPEG, the keyframes are called I-Frames. The delta frames are B and P frames. Most MPEG2 encoders that I have used default to a 15 frame GOP.
- Re:F(next) = F(current) + Delta(F(current:next)) (Score:5, Insightful)
  
  by Omega996 ( 106762 ) writes: on Wednesday July 23, 2008 @05:36PM (#24311367)
  
  theoretically, couldn't an encoder scan the data stream for keyframes, chunk the data from keyframe to the next keyframe, and then queue up the keyframe+delta information for multiple cores? That way, each core has something to do that isn't dependent upon the completion of something else.
  i'd think that n-1 cores/threads/whatever to process the chunked data, and the last core/thread/whatever to handle overhead and i/o scheduling would run pretty nicely on a multi-core machine.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Informative)
    
    by foxyshadis ( 1056614 ) writes:
    
    Several implementations of this exist: For x264, there's x264farm (which includes network encoding as well).
- Re: (Score:2)
  
  by ZachPruckowski ( 918562 ) writes:
  
  MPEG uses keyframes, right? So you'll still have a full frame in there every few frames. When I play back a MP4 I encoded, I wind up with something like a full frame every second or two (with the intermediate frames being the transformations you mentioned). So you can split at those frames. That's not infinitely parallel, but if we split it up by minute-sized segments, we'd have 90-150 segments (based on movie length), which is plenty for any prosumer computer for the foreseeable future, and even plenty
- Re: (Score:2, Informative)
  
  by Zygfryd ( 856098 ) writes:
  
  http://en.wikipedia.org/wiki/Group_of_pictures [wikipedia.org]
  You can encode GOPs independently. I think the only dependency between GOP encoding processes is bit allocation, which probably works well enough if you simply assign each process an equal share of the total bit budget.
  - Re: (Score:3, Insightful)
    
    by init100 ( 915886 ) writes:
    
    I think the only dependency between GOP encoding processes is bit allocation, which probably works well enough if you simply assign each process an equal share of the total bit budget.
    Is this even needed if you use multi-pass encoding? At least for XviD, IIRC the first pass is used to accumulate statistics used to allocate the proper bit budget to each frame. Then the individual processes should be able to use the statistics file from the first pass to get the bit allocation for their current GOP in the second pass.
  - Re: (Score:3, Insightful)
    
    by benwaggoner ( 513209 ) writes:
    
    You can encode GOPs independently. I think the only dependency between GOP encoding processes is bit allocation, which probably works well enough if you simply assign each process an equal share of the total bit budget.
    That's a pretty painful constraint for anything other than very flat constant bitrate encoding. You really want to be able to move bits between GOPs to optimize for consistant quality.
- Re:F(next) = F(current) + Delta(F(current:next)) (Score:5, Insightful)
  
  by John Betonschaar ( 178617 ) writes: on Wednesday July 23, 2008 @05:41PM (#24311419)
  
  You could of course split each frame in slices, and process these in parallel. Or skip the video N frames between each core, with N being the number of frames between MPEG keyframes. Or have core 1 do the luma and core 2 and 3 the chroma channels. Or pipeline the whole thing and have core 1 do the DCT, core 2 the dequant etc. and have core 3 reconstruct the output reference frame while core 1 already starts the next frame.
  Plenty of ways to parallelize decoding, and even more for encoding...
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by semiotec ( 948062 ) writes:
  
  The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.
  Not quite true. Someone above already explained some of this re VisualHub.
  The video data/frame at 0:00 is very likely completely unrelated to the data/frame at 5:00, thus you can simply chop up the raw file into a number of segments and process them in parallel.
  Some clever stitching is probably required to put the whole thing back together in the end.
  Multi-threading is most efficient when it is applied to discrete data sets that have little or no dependency on each other.
  Exactly, so you chop up the raw input into segments and they become discrete data sets.
- Re: (Score:2)
  
  by Fry-kun ( 619632 ) writes:
  
  But MPEG has keyframes - you need them for scene changes and error recovery. There's one at least every few seconds. For offline video, the threads can work on different keyframes & their respective deltas.
  For online video, it's harder.. but still can be done. Similar to how two-videocard setups work, you can split the image into pieces and have each CPU work on a particular piece, since there's little relation between . Of course it becomes very hard to scale beyond a certain point... but 2-4 cores/CPU
- Re: (Score:2)
  
  by adisakp ( 705706 ) writes:
  
  The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.
  
  You can also break down a frame into discrete rectalinear regions and have each separate region be performed by various threads. This block-based approach is a simple (although probably not 100% optimal) way to get parallelism with an operation involving only two buffers (current and output) in any 2D filter / transform from MPEG frame decoding to JPEG decompression to a Photoshop filter.
  
  Fo
- - Re: (Score:2)
    
    by Lumenary7204 ( 706407 ) writes:
    
    > Most video compression techniques including MPEG set a maximum number
    > frames between base frames. A base frames can be decoded without any
    > information about previous or future frames.
    >
    > All the motion vectors or deltas are calculated against the closest
    > previous base frame.
    Yeah, I forgot to include the whole keyframes thing... My bad. I should have said "not always."
    However, the problem with keyframes is that their placement is often artificial; i.e., every 30 frames or so.
    This limi
Max CPU? (Score:2)

by HaeMaker ( 221642 ) writes:

Huh? I am using AGK and my CPU never does anything. It is always waiting for I/O. I must be doing something wrong...
- - Re: (Score:3, Interesting)
    
    by Barny ( 103770 ) writes:
    
    With video conversion faster storage (not low latency) is the big winner, with huge cache being a close second.
    If you want the fastest video encodes with no care to cost, get an 8-way pci-e raid card and 8 laptop sata HDD, small and very very fast in a stripe raid.
What about playing? (Score:2)

by Godji ( 957148 ) writes:

Is there anything out there that can play a high-bitrate obese .mkv Blueray backup rip efficiently on 2 or 4 cores?
- - Re: (Score:2)
    
    by Godji ( 957148 ) writes:
    
    Well, I'm using Linux, and as far as I know, the video card does absolutely nothing (well, maybe scaling). What acceleration are you talking about? I have the exact same CPU by the way, and it works fine on most things. But try something _really_ high bitrate - like 5 GB for 30 minutes or something - and see what happens. It stutters, but only slightly, so if the other core was actually doing something to help, it would be fine.
    - Re: (Score:2)
      
      by Barny ( 103770 ) writes:
      
      With the windows media player classic, you can indeed have your video hardware speed things up, it does it by making a 2 triangle direct3d window and rendering the video stream as a texture, with today's low end video cards this takes a load off the CPU having to do overlays in 2d windows.
      Also (and I know its not OSS) but corecodec does a great job, ffmpeg under windows is very bad at threading h.264 content, to the point where a fast AMD dual core will struggle with 1080p, but corecodec plays it back smoot
MPEG Algorithm (Score:2)

by c0d3r ( 156687 ) writes:

The mpeg algorithm is called DCT Cosine. If this is parallaizable, then mpeg encoding/decoding should be, although there is no way a general processor can beat an asic in silicon.
Windows? VirtualDub 1.8.x + ffdshow-tryouts (Score:4, Informative)

by tdelaney ( 458893 ) writes: on Wednesday July 23, 2008 @05:44PM (#24311445)

You don't say if you're running on Windows or Linux or something else. If you are running on Windows, the latest versions of VirtualDub have made big improvements to SMT/SMP encoding.
VirtualDub home [virtualdub.org]
VirtualDub 1.8.1 announcement [virtualdub.org]
VirtualDub downloads [sourceforge.net]
Make sure you grab 1.8.3 - 1.8.1 was pretty good, but had a few teething problems. 1.8.2 has a major regression which is fixed in 1.8.3. The comments in the 1.8.1 announcement contain a few important tips for using the new features (some of which I posted BTW).
The two major new features that would be of interest to you are:
1. You can run all VirtualDub processing in one thread, and the codec in another. This works very well in conjunction with a multi-threaded codec - this one change improved my CPU utilitisation from approx 75% to 95% on my dual-core machines - with an equivalent increase in encoding performance.
2. VD now has simple support for distributed encoding. You can use a shared queue across either multiple instances of VD on a single machine, or across multiple machines (must use UNC paths for multiple machines). Each instance of VD will pick the next job in the queue when it finishes its current job. Instances can be started in slave mode (in which case they will automatically start processing the queue).
I use 3 machines for encoding (all dual-core). With VD 1.8.x I start VD on two of the machines in slave mode, and one in master mode. I add jobs to the queue on the master instance, and the other two instances immediately pick up the new jobs and start encoding. When I've added all the jobs, I then start the master instance working on the job queue.
To achieve a similar effect on your quad-code, start two instances of VD on the same machine - one slave, the other master.
It's not perfect (if you've only got one job, you won't use your maximum capacity) but it has greatly simplified my transcoding tasks, and reduced the time to transcode large numbers of files.

Share
twitter facebook
- Re: (Score:2)
  
  by trawg ( 308495 ) writes:
  
  Holy shit! Somehow I missed all these VDub releases. Thanks for the notice.
  Out of interest, what sort of stuff are you encoding from/to? Are you aware of any mpeg4/h264 codecs that will work happily in Virtualdub?
avidemux (Score:5, Informative)

by Unit3 ( 10444 ) writes: on Wednesday July 23, 2008 @05:46PM (#24311489) Homepage

I've noticed a lot of talk about commandline options, but not the nice guis that use them. Avidemux is open source, cross-platform, gives you a decent interface, and uses multithreaded libraries like ffmpeg and x264 on the backend to do the encoding, so it generally makes optimal use of your multicore system.

Share
twitter facebook
Also consider this. (Score:2, Interesting)

by SignOfZeta ( 907092 ) writes:

If you do a lot of H.264 conversion, look into picking up a hardware encoder. There's the Turbo.264; it's Mac-only, but I'm fairly sure it's a rebranded PC device. Plug into a USB port, and it speeds up H.264 encoding -- even on single-core systems. Imagine that with your quad-core. It's not a free solution, but if you find yourself doing a *lot* of encodes, it may be worth your money.
Not as simple as you would think (Score:5, Insightful)

by sjf ( 3790 ) writes: on Wednesday July 23, 2008 @07:51PM (#24312809)

As other commenters have said, decoding video is not, per se, a trivially parallelized algorithm. Especially for modern codecs with lots of temporal encoding. MJPEG would be easily parallelized, buy you'd have to be dealing with fairly ancient sources...MediaComposer 1 for instance.
However, there are different classes of "video app" that are good targets for parallelization. Real world video editing for instance: consider multiple streams of video with overlays, rotations, effects etc. Video and audio decoding can happen in parallel, you can pipeline the effects stages so that each effect is handed off to another core. Modern video editing systems do this with aplomb.
I'm from the commercial end of this so, I can't comment much on open source alternatives. But I will say that a lot of the algorithms in certain products are highly tuned to the particular CPU type.
And they're smart enough to distribute work across only as many cores as actually exist.
Finally. Don't forget that optimization is hard. You have to consider the speed of the hard drive, the cost of sharing data between threads and cpu caches and a bunch of other real constraints. Any half decent cpu of the last five years or so can easily decode most video faster than it can be read and written to disk. So long as this is true, you won't get any benefit from parallelization.

Share
twitter facebook
heroinewarrior.com (Score:3, Informative)

by heroine ( 1220 ) writes: on Wednesday July 23, 2008 @08:09PM (#24312969) Homepage

The version of Cinelerra from heroinewarrior.com uses SMP. It's highly dependant on the supporting libraries & who implemented the feature. In the worst case, use renderfarm mode & nodes for each processor. Sometimes the libraries work in SMP mode & sometimes they don't. Sometimes the feature was intended for everyone to use on any number of processors & sometimes it was written for 1 person's cheap single processor.

Share
twitter facebook
Hmm (Score:2)

by moosesocks ( 264553 ) writes:

Now I'm a bit curious.
Given that all of the "usual suspects" of encoding apps support SMP on almost every platform, and have done so for quite some time, what was this guy using that didn't support it?
ffmpeg and x264 are just about the only players in town these days.
- Re: (Score:2)
  
  by Bert64 ( 520050 ) writes:
  
  Multi core no....
  But unix apps have been running on multi processor systems for years, and geeks have had access to such systems for years too. I did video encoding in 2000 on a quad cpu alphaserver and a dual cpu sparc, but i just did as someone else suggested and ran multiple encodes simultaneously.
- - Re:Simple... (Score:5, Informative)
    
    by j00r0m4nc3r ( 959816 ) writes: on Wednesday July 23, 2008 @05:21PM (#24311199)
    
    Running multiple instances of the same code concurrently in multiple threads is simple. Even running mutually exclusive parts of the same code concurrently in separate threads is easy. Converting complex serial algorithms to effectively utilize multiple cores is generally not simple. And writing code that can scale and balance across n number of cores/threads is extremely hard. There are all sorts of synchronization issues to deal with, scheduling issues, data transport issues, etc.. and it becomes increasingly hard to debug code the more cores/threads you throw in. I think the stigma is justified.
    
    Parent Share
    twitter facebook
    - Re: (Score:2, Informative)
      
      by sexconker ( 1179573 ) writes:
      
      How the hell is this modded interesting (as opposed to informative)?
      Do people really not know this stuff (thus making it interesting to them)?
      For the gp and the others who still don't get it.
      Multi-threaded programming (getting your shit to run in separate threads) is easy, now.
      Multi-threaded / distributed algorithms (getting your shit to do some coherent, useful shit while scaling well) are not easy at all.
  - Re: (Score:2, Insightful)
    
    by everphilski ( 877346 ) writes:
    
    Amen
    
    If you truly understand the problem domain you are operating in, parallelism becomes readily apparent. Implementing it isn't difficult even on old code, again, if you truly understand where the parallelism exists.
    - Re: (Score:3, Insightful)
      
      by Cyrano de Maniac ( 60961 ) writes:
      
      Exactly. Too many people assume that any given programmer can write any given program. What isn't generally realized (at least by the masses) is that programming really is about acquiring expertise in a particular domain and then solving problems in that domain through the use of computer programs. Generally some of the most effective programs I've seen have been written, on their first pass, by a person with intimate domain knowledge, and mediocre programming/computer knowledge. The program then become
- Re: (Score:2)
  
  by X0563511 ( 793323 ) writes:
  
  It's refreshing to see that, rather than having us all answer questions and think about it, only to THEN find out he doesn't want to do any work.
  - Re: (Score:2)
    
    by ydrol ( 626558 ) writes:
    
    I simply dont have the time to grok video encoding AND efficient SMP alogrithms, and do my day job, but I do want to use them.
    And FWIW I have contributed patches in the past to both the avidemux AND nzbget prejects , and they have been accepted, but these have addressed more trivial aspects of the software.
- Re: (Score:2)
  
  by ydrol ( 626558 ) writes:
  
  Excellent. Just the type of thing I was looking for. A gui frontend that sensibly passes the thread options to the engine!
- Re: (Score:3, Informative)
  
  by Anonymous Coward writes:
  
  But Mac users have been living with SMP since 2001
  
  Just for reference:
  UNIX System V R4-MP 1993
  Windows NT 1993
  OS/2 2.11 1993
  Linux 2.0 1996

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

ffmpeg (Score:5, Informative)

Re: (Score:2)

Re:ffmpeg (Score:5, Informative)

Re: (Score:2, Insightful)

Re: (Score:3, Informative)

Re: (Score:2)

Re:ffmpeg (Score:5, Insightful)

Re: (Score:3, Informative)

Re: (Score:2)

Re:ffmpeg (Score:5, Interesting)

Re: (Score:3, Insightful)

Re:ffmpeg (Score:5, Informative)

Re:ffmpeg (Score:4, Informative)

Re: (Score:3, Insightful)

Re:ffmpeg (Score:4, Informative)

Re:ffmpeg (Score:4, Informative)

Re:ffmpeg (Score:5, Informative)

Re: (Score:2, Interesting)

Re:ffmpeg (Score:5, Informative)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re:ffmpeg (Score:5, Insightful)

Re:ffmpeg (Score:5, Funny)

Re: (Score:2)

Re: (Score:3, Informative)

Re:ffmpeg (Score:5, Informative)

Re: (Score:3, Informative)

AcidRip patches (Score:3, Informative)

Re:ffmpeg (Score:5, Informative)

Re: (Score:2, Interesting)

Re:ffmpeg (Score:5, Informative)

Re:ffmpeg (Score:5, Insightful)

Re:ffmpeg (Score:5, Insightful)

Re: (Score:2, Informative)

Re: (Score:3, Interesting)

Re: (Score:2, Informative)

transcode, of course! (Score:5, Informative)

x264 (Score:3, Insightful)

Beat me to it! (Score:5, Informative)

VisualHub... (Score:4, Informative)

Re: (Score:2, Informative)

Re:Which part of Open Source didn't you get? (Score:5, Informative)

Re: (Score:2)

x264 and avisynth (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Load balancing: Why? (Score:5, Insightful)

Re: (Score:2, Insightful)

Re:Load balancing: Why? (Score:5, Insightful)

Handbrake (Score:5, Informative)

Re:Handbrake (Score:5, Informative)

Re: (Score:3, Informative)

Re: (Score:2)

F(next) = F(current) + Delta(F(current:next)) (Score:5, Insightful)

Re: (Score:3, Informative)

keyframes (Score:5, Informative)

Re:keyframes (Score:5, Informative)

Re: (Score:2)

Re:keyframes (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re:F(next) = F(current) + Delta(F(current:next)) (Score:5, Insightful)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:3, Insightful)

Re: (Score:3, Insightful)

Re:F(next) = F(current) + Delta(F(current:next)) (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Max CPU? (Score:2)

Re: (Score:3, Interesting)