Forgot your password?
typodupeerror
Google The Internet Media Movies Businesses Television

YouTube's Content Identification Failure Raises Eyebrows 109

Posted by Zonk
from the making-new-things-is-hard dept.
MSNBC is carrying a story looking at YouTube's failure to follow through with a promised 'content identification system' by the end of the year. The article goes on to discuss the possible impact this failure will have on the site's (so far) good relations with television, music, and movie studios. From the article: "If the delay lasts for more than a week or two into the new year, suggesting more than just a slight technical hitch, 'this is certainly going to be a serious issue', [Mike McGuire, a digital media analyst at Gartner] added. Leading music companies have already made clear they see completion of YouTube's anti-piracy technology as an important step in any closer co-operation. Failure to build adequate systems to protect copyright owners could also add to the risk of legal action against the site."
This discussion has been archived. No new comments can be posted.

YouTube's Content Identification Failure Raises Eyebrows

Comments Filter:
  • by Salvance (1014001) * on Tuesday January 02, 2007 @10:41AM (#17431364) Homepage Journal
    It's hard to believe that Google hasn't already discussed the delay and any consequences with the movie, television, and music studios. Google had such intensive conversations with them before purchasing YouTube, that it would be silly if they went quiet and just let things slide.
    • Re: (Score:2, Interesting)

      by jackharrer (972403)
      _WE_ don't know if they did or not. This kind of negotiations are usually behind closed doors, and on this level this means vault doors.
      Let's wait for some time and we will know. Any lawsuit - they haven't. Simple.
  • by Anonymous Coward on Tuesday January 02, 2007 @10:41AM (#17431378)
    Here you go guys, this one's on the house:

    if (content) {
        return "This Youtube content has been identified as: Bad";
    }

    • You forgot the "mmm'kay?".
    • by dimeglio (456244) on Tuesday January 02, 2007 @12:39PM (#17432370)
      The code is not the problem. Maybe the MPAA was requested to provide the MD5SUM of all the material they object to be published. I suppose they haven't completed this. So it's not necessarily YouTube's fault. ;-)

      10 YouTube exec: So what clips exactly do you want us to remove?
      20 MPAA: well all those which we don't want you to publish.
      30 YouTube exec: Ok, which clips exactly do you object to.
      40 MPAA: all those we don't want you to publish.
      50 GOTO 10
      • Re: (Score:2, Insightful)

        by ifrag (984323)
        That's just a joke about the MD5's right? Even the most simple edit to a clip would cause it to change, such as clipping blank frames from the start or end.
        • Dynamic and/or spectral profiling of the audio content over a time domain of a few seconds renders just about any audio content easily identifiable by comparison to a library of the same profiles of a copyright holder's content. It's like, uh, it sounds the same, essentially.

          MD5 is for exact digital authentication and a completely different thing.

      • by xantho (14741)
        The MD5 of their material probably wouldn't help that much, as most of the copyrighted videos are analog TV recordings and videos of kids lipsynching. I mean, there are tons of StepMania videos that are reasonably close to authentic reproductions, except that they're recorded on an awful digital camcorder and have obnoxious keyboard clacking sounds all through it ( http://youtube.com/watch?v=KTAYjpunO5E&mode=relat ed&search= [youtube.com]). How's an MD5 gonna help identify the stuff then?
    • Re: (Score:3, Funny)

      by recursiv (324497)
      I thought of a new algorithm that should be more accurate:

      if (views > LEGIT_VIEWS_THRESHOLD) {
          return "This Youtube content has been identified as: Illegal";
      } else {
          return "This Youtube content has been identified as: Legal";
      }
  • Relax (Score:5, Funny)

    by Timesprout (579035) on Tuesday January 02, 2007 @10:42AM (#17431382)
    Its in Beta.
  • by nizo (81281) * on Tuesday January 02, 2007 @10:42AM (#17431386) Homepage Journal
    Once all that illegal content is gone, it will make it easier to find things like this [youtube.com].
  • by LiquidCoooled (634315) on Tuesday January 02, 2007 @10:48AM (#17431440) Homepage Journal
    Because once they show that they can identify bad content within video files won't the MPAA/RIAA/* start to bug them about soing the same with normal search results?

    Instead of Perfect 10 having to search and list the illegal boobies on display, google will have to automatically remove them from view :(

    Won't somebody think of the boobies :(
    • Re: (Score:3, Funny)

      There are no boobies until the third page of results and they aren't even good ones. Please doo some research before you send us on a wild goos chase.
    • by Thansal (999464)
      Santa stole my boobies [gucomics.com] (comic from /gu it is SFW so long as your boss doesn't mind you reading comics, click back a few days if you honestly want to get the joke)
    • by fithmo (854772)
      Won't somebody think of the boobies :(

      Someone did. Ta-da: PronoTube [pornotube.com] (in case it's not obvious, this link is SUPER NSFW)

  • Does Microsoft own anything comparable to YouTube?
  • by spike2131 (468840) on Tuesday January 02, 2007 @10:57AM (#17431510) Homepage
    I pity the developers who are making this product. They have been given a complex task and an arbitrarily chosen deadline, probably pulled out of the air by marketing/legal/upper management. Since September they have been on a death march to meet this date, sacrificing family time around the holiday season.

    But you know what? It just ain't ready because it was a fools errand to begin with. My guess is they are working off of half-assed specs that weren't even ready before Thanksgiving. Maybe in a few more months they can have something good. But media partners getting pissy about it isn't going to help the code mature any faster.
    • Re: (Score:3, Insightful)

      by Herr Ziffer (1042828)
      The technology isn't there yet. There are other companies working toward the same goal of media fingerprinting for much longer than YouTube has. For a sufficiently long media clip, it can be done. There serious problem, though, is with smaller clips. 30 seconds just isn't enough material, currently, to get a good match. Add to that the fact that the original clips get resampled and distorted and overdubbed. YouTube may be getting a break from media companies simply "because" it is so easy to make the
    • by jo42 (227475)

      ...arbitrarily chosen deadline, probably pulled out of the air by marketing/legal/upper management.
      should read

      ...arbitrarily chosen deadline, probably pulled out of their asses by marketing/legal/upper management.
  • Is it possible? (Score:5, Interesting)

    by ErGalvao (843384) on Tuesday January 02, 2007 @11:01AM (#17431544) Homepage Journal
    This may sound a little OT - sorry for that - but this story raised an old question here: is it really possible to do an automated content identifier/filter solution? Personally I've always found these kind of solutions full of flaws. Take web surfing filtering for an instance: it's pretty common that the filtering software makes a mistake and end up identifying a "false positive bad content site". After all - google or not - both things follow the same basic principles, right?
    • Re: (Score:3, Insightful)

      by Rob86TA (955953)
      The thing is, the MPAA and etc don't care if there are false positives, they only care that they are no escapes. Youtube could probably deploy a solution that would make the MPAA happy, only to have its own users leave as valid content was always accidently being blocked.
      • by dwandy (907337)

        The thing is, the MPAA and etc don't care if there are false positives

        I suspect quite the opposite: They want *any* content that they don't get paid for taken down. They don't care if it's their content (and they're not getting paid) or someone else's content (and they're not getting paid). So the maximum false-positive rate is exactly what they want...
        They only exist as long as they are the content owners: the second that content stops getting signed over to them they get relegated to nothingness.

    • Honestly, it really depends on the type of content you're trying to identify and how much time/processing power you have ...

      With text it should be pretty easy and there are products on the market which will search the web to see if part of a paper was copied from a web-based source so that professors can ensure that the paper properly cited their sources ...

      Music (I would imagine) would be somewhat easy to determine if a file was a copyrighted song as long as you had the original source; a quick method I ca
    • by twitter (104583)

      ... is it really possible to do an automated content identifier/filter solution?

      To take away your fair use they would have to fingerprint both the audio and video content. That's possible for whole works at a given frame size, rate and audio quality. Already, you can see the problem because there's an almost unlimited choice of those. Couple that problem to every length variation and you have an impossible task for any single work. The database of fingerprints would be infinitely large. You can m

      • by Jerf (17166)
        You seem to be assuming that the only "fingerprint" algorithm that exists is something like MD5.

        While I'm not entirely optimistic about the existence of a fingerprint function that matches what the media companies want (although that is partially the fact that they do not really know what they want to the requisite mathematical precision, or, put another way, what they want is easy money and whatever magical tech is required for that to happen), the problem isn't as hard as you make it out to be, either, ev
        • Re: (Score:3, Interesting)

          by Gulik (179693)
          I don't know -- IANAM (I Am Not A Mathematician), but it sounds like an exceedingly difficult problem. To fingerprint a video, you're going to have to use specific information from it, and I don't know what information will remain constant between different encoding qualities and even encoding applications using the same theoretical quality. I assume you have to fuzz it up (the mathematical equivalent of "this area of the image from time index X to time index X+2 is reddish-orange, and this other area dur
          • Re: (Score:3, Insightful)

            by Jerf (17166)
            I Am Not A Mathematician either but I'm closer than the vast majority of people on Slashdot. (I've studied this stuff in a formal setting and done some limited work in the field of handling wildly multidimensional data.)

            This reply is much more reasonable, and much closer to the truth. One of the missing pieces of your first post is the problem of making attacker-resistant fingerprints. Fingerprinting is actually not so hard when you haven't got people actively trying to hurt the fingerprint and you can acce
            • The easiest thing to do is simply make the fingerprints cover more stuff ("fuzzing" the fingerprint is a pretty good mental model), which definitely increases the false-positive rate on audio.

              I would have thought the easiest thing to do would be to take the Vista approach: all video will be reduced to a 2x2 pixel screen size. Content will easy to identify that way, because it will all look the same.

        • Re: (Score:2, Interesting)

          by twitter (104583)

          You seem to be assuming that the only "fingerprint" algorithm that exists is something like MD5.

          You have something better? MD5 is the easiest computationally and produces the smallest result to store, using other techniques will increase the size of your database and computational expense. They could do FFT on single frame images, but you would need one for each scene of interest. The result could be made independent of size but not encoding quality. It would also be large and could create hundreds o

          • by jlarocco (851450) on Tuesday January 02, 2007 @09:17PM (#17438276) Homepage
            You have something better? MD5 is the easiest computationally and produces the smallest result to store, using other techniques will increase the size of your database and computational expense.

            There's no way they could use MD5. MD5 hashes are designed to return the same value given the same input, and a totally different value for even a slight modification of the input. Or in other words, md5("ABCD") is nothing at all like md5("ABCE"). Given the nature of audio and video, it would be trivial to bypass an MD5 copyright check. Change a single pixel in a single frame from RGB(255,255,255) to RGB(255,255,254) and nobody would notice, and it'd get through the check.

      • by YGingras (605709)

        ...such as Star War's liberal use of "Triumph of Will", "Forbidden Planet" and several WWII films.

        Did you actually watch Triumph of Will? I think you pull you accusation from an other source like Wikipedia. The graphical elements that are common to both films are military formations and I don't think that Riefenstahl had much to say on the way the SAs were to line up. Triumph of Will shows a bunch of guys getting ready for the speech (which include washing and shaving beside their tents and preparing foo

        • by geoff lane (93738)
          And Forbidden Planet was based on a script written by some dude called Shakespeare and he ripped the idea from some Italian play.

      • by kchrist (938224)
        Star Wars is not nearly as homoerotic as Triumph of the Will. I didn't see any strapping young stormtroops wrestling shirtless for one, and Darth Vader isn't nearly as effeminate as Hitler.
        • by twitter (104583)

          Star Wars is not nearly as homoerotic as Triumph of the Will.

          How do you get "homoerotic" out of a film made by a woman? The thing is a long nightmare of twisted sentiment, logic and fanaticism, brilliantly captured, with one awful end - 25% of the people you see will die violently and no two bricks will be left standing a few short years after filming. The sexual aspects of boys playing escaped me. To each, their own.

    • Re:Is it possible? (Score:4, Insightful)

      by MindStalker (22827) <mindstalker.gmail@com> on Tuesday January 02, 2007 @12:04PM (#17432068) Journal
      I'm betting they go with a computer/human pair system. If it matches close to 100% to a known video treat it as if it were the known video. If it matches greater than 50% have a human look at it. If it matches less keep it and wait for a user to flag it. Realistically most youtube videos are near carbon copies of other videos on youtube already. This would greatly decrease dups at least.
      • by Ken_g6 (775014)
        Great idea. Exactly what I thought of.

        There's just one problem I see. Where is a site with no revenue stream going to get money to pay the humans?
    • by shish (588640)
      musicbrainz [musicbrainz.org] automatically checks a given chunk of audio against a database of known songs -- it's designed for automated MP3 tagging, but similar tech could be used by youtube
    • It's relatively easy to write a content filter for audio which can detect reasonably close copies (ie, is tolerant of different MP3 bitrates and encoder variants)-- companies like GraceNote and Shazam have services available which can be used to ID music files, but they are at the mercy of having up-to-date signatures to catch the latest music going around.

      Depending on the service, they may have a web-based POST mechanism which returns an XML result, which can accept either raw PCM/WAV or sometimes other fo
    • There are companies like Auditude [auditude.com] that already have working solutions. Google is actually trying to re-invent the wheel on this one.
  • Some of us could certainly think of ways to easily identify video content. I've thought of a way. I don't know if it is feasible but I'm not going to post the idea here. If anything, I'd patent to to keep it from being used. Why help an industry who is so consumer unfriendly?
    • Re: (Score:3, Funny)

      by b0s0z0ku (752509)
      If anything, I'd patent to to keep it from being used.

      Better yet, patent it and send all royalties to the EFF. The "industry" can only use it at "their own expense" - in more ways than one :)

      -b.

    • by tooyoung (853621)
      Wow, it sounds like you could get a free PHD at any university in the world. Perhaps you'd like to share your method with us. Principle Components Analysis, Semi-Naive Bayes Classifier, compare pixel values with a histogram ;) (maybe after applying a Hough transform)? Stop press, stop press!!!
    • Some of us could certainly think of ways to easily identify video content. I've thought of a way.
      In other words, I've thought of a way to identify infringing material, but the pseudocode is too long to fit in the margins.
  • Content producers wouldn't license their content to Google/YouTube if they weren't getting something out of it (either money or free advertising). And as we've seen in the past, when it comes to money, the content producers would rather make more of it, even in the face of rampant unauthorized duplication, rather than follow through on their threats to take their toys and go home.

    The content producers who are going to license their content will ultimately do so without any sort of detection scheme, and the
  • It will be interesting to see how long YouTube can stay in the lead for video. It will give other sites like MySpace [slashdot.org], Facebook [slashdot.org], and Grapheety [slashdot.org] some way to evaluate themselves. This web 2.0 thing could be making some changes soon...
  • DMCA (Score:2, Interesting)

    by Xymor (943922)
    Isn't all they need to comply with DMCA a link to allow reporting of DMCA violation/copyrighted protected content and removing of the content once verified?
    • by Lehk228 (705449)
      not a link, some listed person/procedure.

      the best way is to take it as snail mail since it's too easy to lose messages into a spam filter if you use email.
  • Enforce That ! (Score:3, Insightful)

    by leftcase (1030652) on Tuesday January 02, 2007 @11:28AM (#17431736)

    Given that the media and entertainment industry has made such a miserable job of enforcing copyright since the emergence high speed internet, perhaps their efforts would be better spent figuring out ways to capitalise on the presence of sites such as youtube and myspace.

    If businesses such as Red Hat can make a living from open-source software, surely there's a more refined way for said media businesses to realise capital from their assets without being so 'grabby'!

    • Not only that, but they've done a shit job of copy-protection too. Name one form of copy protection that hasn't been hacked shortly after inception, whether through reverse-engineering, number crunching, or a plain ol black sharpie. Not only that, but name ones that haven't made life more inconvenient for legal users than potential infringers.

      The fact is that the ??AA would just love to have youtube create a system that could effectively single out copyrighted works, because it would save them the trouble
    • ...ways to capitalise on the presence of sites such as youtube and myspace

      The thing is, they already are (in theory at least). The RIAA and MPAA need to understand that everything on YouTube that is copyrighted, is really just an advertisement for the full copyrighted work. YouTube has placed a lenght limit in order to prevent the entire two hours of a movie from being posted. So any movie on there is bits a pieces. Same with TV shows. Sure, lots of people split the 30 min. shows into three ten minut

    • The only practical way to do this is to utilize the army of users to police the site. Set up some kind of tagging or flagging system where users can flag a video as "copyrighted music" or "copyrighted video." Perhaps a meta-moderation system on top of the base level can assure smooth functioning. At the top a smaller army of actual people will have to make decisions about fair use etc. This site, and craigslist employ similar strategies relatively effectively. It'll be the first ever web2.0 social snit
  • Solution (Score:2, Interesting)

    by jlebrech (810586)
    The solution would be to perform some sort of hash check against previously taken down material. So actually posting copyrighted material once and having it spotted, would stop it from recurring on the system. It just needs to still match submittions with bits cut out and varying watermarks and source qualities with some kind of identification algorythm. (similar to fingerprinting)
    • by Thansal (999464)
      couldn't you easily destroy a checksum by inserting somethign random into the file?

      All it should take (I tihnk) is some one to create a nice little program that detects file type and manualy inserts a white pixel into a corner of every frame. IF that is not enough just randomly change the colour of the pixel, or insert any other sort of noise that we (as humans) would not percieve, but would destroy a checksum.
      • by clodney (778910)
        Digital watermarking is far more resilient than that. To destroy a Digimarc watermark in a photograph you have to do a surprising amount of manipulation of the image - it is by no means impossible to get rid of, but doing it without significantly altering your perception of the image is difficult (I am sure people have figured it out, but the amount of manipulation required made me wonder how the watermarking is done).

        In some ways, watermarking video should be even easier. You could potentially watermark
        • by Splab (574204)
          Theres a big difference between guarding material with watermarks and identifying material from known content. The already got those annoying dots in the movies so they can identify where a TS was made.

          Solving the problem should be possible, but theres a lot of licensing issues to deal with. You need to try to match pictures from the original to every single frame to look for similarities, and then you need to match the sound from those frames, find a good threshold and mark offending videos for manual revi
  • by shotgunefx (239460) on Tuesday January 02, 2007 @11:52AM (#17431932) Journal
    Only one video I ever uploaded was not posted immediately. It was a demonstration of a touchscreen media player I'm working on (Was one of a couple vids I uploaded that night). I was playing copyrighted material in the demo, but no song played very long before moving on and the audio (as it was off camcorder) was horrible.

    About 12 hours later, it cleared. Fairly certain it was flagged and reviewed. If that's the compromise, I think I could deal with that.
    • by b0s0z0ku (752509)
      Fairly certain it was flagged and reviewed. If that's the compromise, I think I could deal with that.

      Why have any censorship at all? Anyway, don't you think that people looking to watch movies for free would rather download off of P2P. I don't see something that uses Flash in a browser window to play video as a large potential infringement problem.

      -b.

      • I'm not advocating censorship, but as a public company they have liability. So if Google has to cover there ass to provide me free hosting, I can't bitch about that too much.

        But if what I experienced was infact flagged content, it was quite reasonable. I wasn't trying to provide a high fidelity copy of somone else's music, I wasn't trying to capitilize on it, the music was simply to show function. In my mind, a fare and reasonable use (what the law would say OTOH)

        I'm just saying, if that were the way it wa
    • I want a touchscreen to sit in my living room. When idle, it can cycle through images like one of those digital photo displays. When touched, it switches to Amarok (or equivalent) and lets me select from music on my server upstairs (wireless connection to home network, Samba-shared drive with music on it). The music is streamed downstairs and plays through the room's sound system.

      That doesn't happen to be what you are working on, does it? I was gonna tackle one of these as a project this year, but I'd r
      • Actually it's an automotive media player for my car pc. But something like you are after should be pretty easy.

        Just set it up to run the slideshow as a screensaver and run your music app in the background.
  • Why does the *AA always move to the default assumption that internet distribution of content == EVIL?

    I would argue that Youtube and other services like it are very similar if not the same as a television channel. Instead of trying to police Youtube for copyright infringement, why not collaborate on a similar business model as television? The video service would pay for some type of broadcast fee presumably via advertising revenue just like television. For videos that are more popular the advertising sp

    • Honestly, you're looking for logic where there is none. You're trying to sell a bright new day to the blind, and I really hope they all die off for it. It's getting to the point where people can handle content creation (And certainly distribution) by themselves. Who needs em?
    • by MacWiz (665750)
      From a business point of view, lawsuits are more profitable than licensing.

      If the RIAA licenses content, they have to share the income with the artists. If they sue, they keep the proceeds for themselves.
  • Mike McGuire, a digital media analyst at Gartner, says [T]he technology industry really has to start living up to the media industry's expectations ....

    Maybe it's the other way around, the media industry needs to start living up to the expectations of the technology industry.

    I don't understand why the content providers don't just embed their content with banner advertising overlays and distribute it online themselves. I guess these guys are so stuck in 20th century television mode they just don't get
    • by dimeglio (456244)
      Why not impose a reasonable content tax to all those services (youtube, etc.) which offer "non public domain materials." A similar system is used elsewhere in the world on things such as blank media, yes apparently also in the US on some media http://www4.law.cornell.edu/uscode/html/uscode17/u sc_sec_17_00001004----000-.html [cornell.edu].

      It could be based on a percentage of gross profits but with a preset minimum. Heck, this money could be used to fund the space program and help emerging artists.
  • "Failure to build adequate systems to protect copyright owners could also add to the risk of legal action against the site."

    Huh? I assume by copyright "owners" they mean copyright "holders". I don't think there's ever been any claim that any holder has ever been put at risk by YouTube. It's possible that copyrights might be infringed via YouTube, but that hardly amounts to a risk to the holder of that copyright.

    Why should YouTube waste the CPU cycles in a futile attempt to seek out copyrighted material

  • Haven't we already seen filtering attempts like this with the old Napster? That failed miserably because at the end of the day, there is just too much content to be checked and defeating automated id systems is relatively easy. Now it may be that that has changed, but still the sheer volume of content is likely to slow things down noticeably. And as others have already noted, if the system successfully removes illegally uploaded content, then why are people going to bother with YouTube anymore?

    In the end, t
  • When you do this, YouTube will drop like a rock in popularity, depending how good job you did.
  • So, YouTube is going to build an automated system that can tell that the movie clip in my video is part of my online review of said movie and is thereby covered under fair use? Or that my video which uses a famous pop song is a parody of that song and also covered by fair use? The only way to build a system to detect copyright infringement is to ignore fair use. I'm sure that makes the MPAA and the RIAA happy, but it really isn't viable.
  • In some ways I agree that the copyright holders should control the content but at the same time I can see they are missing a golden opportunity to "promote" their content on youtube. To give you an example, I like watching Anime but as anyone knows "my favorites are someone else's rubbish". So how do I know if I "like" the content on a DVD? Well when someone mentions a new Anime series I can go look at a review but as previously stated thats not always a good method or I can go look on youtube for an episod
  • Seriously, is anyone that is even slightly aware of the state-of-the-art pattern recognition algorithms surprised? This is a case of popular culture leading people to believe that pattern recognition software is ahead of where it is. Every semester, PHD and masters students at literally 100's of universities try their hands at this (not to mention industry researchers/professors). People have been studying the top techniques used in this field for years, and there is no suggestion yet that this problem i
  • Why should YouTube be responsible for this? I guess YouTube is giving an outlet for people to break copy write. However, any company that makes CD-R's is giving an outlet for people to do the same, but in that case the CD manufacturers are not responsible. Is there a difference because it's hosted online, so they're hosting illegal content?

    Any person should already know it's illegal to post copy write material in this way. Even if you didn't know already I'm also sure YouTube's TOS covers this and you ha

The clearest way into the Universe is through a forest wilderness. -- John Muir

Working...