Become a fan of Slashdot on Facebook

Searching Sound 68

Posted by michael on Monday May 05, 2003 @07:45AM from the when-grep-isn't-enough dept.

Technology Review has one of their few stories that's not registration-required describing searching audio files for any specified set of sounds. All sorts of interesting applications become possible if you can turn analog audio into a digitally-useful product without massive human intervention.

This discussion has been archived. No new comments can be posted.

Searching Sound

Load All Comments

Search 68 Comments Log In/Create an Account

Comments Filter:

Usage scenario (Score:1, Funny)

by markov_chain ( 202465 ) writes:

% grep "nixon.*(fool|idiot|moron)" /usr/mp3/watergate*.mp3
- Re:Usage scenario (Score:3, Interesting)
  
  by MacroRex ( 548024 ) writes:
  
  (paranoia)
  No really, what if they start bugging public places where people talk a lot (bars etc) and run the output through something like this? After acquiring a speech sample from bank/airport/whatever and thus connecting it to a person, it's a breeze to have a global textual log of everything the person says in a public place.
  
  Of course, the article talks only about deconstructing the audio sample into words, but further analysis is a natural extension of the idea.
  (/paranoia)
Where have they been? (Score:1, Funny)

by Chris_Stankowitz ( 612232 ) writes:

"...if you can turn analog audio into a digitally-useful product without massive human intervention.
Aren't CDs just that? If you really want to make a more usefull digital product, start scouting for some new tallent. American/Pop Idol isn't cutting it. :)~
been there, done that (Score:2, Informative)

by shachart ( 471014 ) writes:

I work for a CT (Computer Telephony) company (see comment on story from half an hour ago). My company does soundex, phonex, and some proprietary stuff too, to convert recorded phone calls into the text of the call, regardless of noise, tone, etc. Useful for your friendly government to spy on you. This is really old news.
- Re:been there, done that (Score:1)
  
  by dentldir ( 15080 ) writes:
  
  Except that Fast-Talk doesn't do speech to text processing at all. It does waveform pattern matching for language patterns without ever passing through a text phase. It actually does a reasonable job of it too.
  
  I played with The SDK a year ago. I suspect its gotten considerably better since then. In that respect, it is old news.
When you think about it... (Score:5, Interesting)

by Ieshan ( 409693 ) writes: <ieshan AT gmail DOT com> on Monday May 05, 2003 @07:54AM (#5880258) Homepage Journal

When you think about it, though, government and military agencies must have had this for quite some time.

Tapping and bugging really does no good unless you've got someone listening all the time - and that's both expensive and impossible. While I realize that someone only has to be listening every time someone makes a phone call with the tapping situation, the outcome is lots more hours of audio then are feasible to search and use.

If we couldn't have searched audio on a wide scale before, then I find it hard to believe we'd ever be catching anyone by specific phone intercepts. Instead, we'd just be using that sort of thing as evidence.

I mean, I realize this is a great technology, I just doubt it's as "new" as it seems...

Share
twitter facebook
- Re:When you think about it... (Score:2, Interesting)
  
  by Chris_Stankowitz ( 612232 ) writes:
  
  Tapping and bugging really does no good unless you've got someone listening all the time -
  It was done this way for many many years. It is partly why many investigations took a long time to be fruitfull. There are also laws in some states that do not allow for a tapping to continue if after "xyz time" has passed without any usefull information.
- Re:When you think about it... (Score:3, Interesting)
  
  by arvindn ( 542080 ) writes:
  
  I'm not sure.
  What's new about this technology is that it does searches without transcription, but instead works at the phoneme level. This doesn't mean that the results are more accurate than if transcription and indexing are used. Its just that the new technique has applications in some cases that can't be handled by the conventional method, like when your model is inadequate, and you would lose information by converting phonemes into lexical form.
  Its not clear how this sort of thing would be useful fo
  - Re:When you think about it... (Score:2)
    
    by Ieshan ( 409693 ) writes:
    
    Its not clear how this sort of thing would be useful for the military.
    
    Well, you could tap thousands of phone lines and search for phonemes that indicate the high level military commanders, etc. After you "found" one, you could immediately jump in and listen on it, or if the communication is laden enough, hit that phone-system. With the recent military advances and precision weapons, this isn't *that* hard to imagine.
    
    I mean, with a powerful enough system, you could filter thousands of hours of data at on
    - Re:When you think about it... (Score:2)
      
      by arvindn ( 542080 ) writes:
      
      Well, you could tap thousands of phone lines and search for phonemes that indicate the high level military commanders, etc.
      Hmm.. phonemes just tell you what the sound is, they contain no information about the voice.
    - Re:When you think about it... (Score:1)
      
      by zdislaw ( 664912 ) writes:
      
      phoneme: "The smallest phonetic unit in a language that is capable of conveying a distinction in meaning, as the m of mat and the b of bat in English." What phoneme do you suggest looking for to indicate high level military commanders?
      - Re:When you think about it... (Score:2)
        
        by Ieshan ( 409693 ) writes:
        
        Their names? Sensative information about them? Suspected locations?
        
        Re:When you think about it... (Score:1)
        
        by zdislaw ( 664912 ) writes:
        
        Yeah, I was being kind of a wiseass about searching text for a phoneme. A phoneme carries no meaning, as it is a simple sound. Now on the other hand, a particular string of phonemes...
        Oh well, never mind me.
- Re:When you think about it... (Score:3, Interesting)
  
  by PerryMason ( 535019 ) writes:
  
  The big problem with this sort of technology is that in the past when you wanted to tap someone, you had to have a good reason (good enough to persuade a judge anyway) and you had limits on what you could and could not record/listen to. Now with technology like this and Echelon etc, it becomes possible to monitor every person who makes a phone call or sends an email. In effect you are presumed guilty and have to prove your innocence by not discussing or commiting a crime. One of the fundamental tenets of th
- Re:When you think about it... (Score:1)
  
  by Daniel_ ( 151484 ) writes:
  
  Its been out for quite a while. The biggest difference is that the computational linguists that do this kind of thing don't get much attention from places like slashdot.
  
  There numerous packages/toolkits that can be used to do the same thing. (If your willing to take the time to put the pieces together). One is Praat [praat.org]. Its a (mostly) GPL toolkit for sound and speech analysis.
- Re:When you think about it... (Score:2)
  
  by gl4ss ( 559668 ) writes:
  
  modern way of tapping is that the phone company records all the calls, suitable for listening to cellphones too.
  
  you just then go through the records.
  
  if man spent 20 minutes per day in phone.. you need 20 mins to listen through those calls per day.. not much of a chore.
- Re:When you think about it... (Score:2, Informative)
  
  by npendleton ( 255215 ) writes:
  
  See or read "Killing Pablo" [amazon.com] and then tell me what you think about catching an individual from an intercepted phone call. The U.S. Government poured top flight resources (NSA and Delta Force) on the problem of helping a Colombian Government military unit find and kill drug king-pin Pablo Escobar. Escobar was killed by this Colombian military unit.
  
  This technology would help immensely on message analysis. Evaluating messages typically is divided into two areas, signal analysis, and message analysis.
  
  Sign
Does that mean... (Score:3, Funny)

by Valdrax ( 32670 ) writes: on Monday May 05, 2003 @08:00AM (#5880281)

...that I can finally find that one song that goes Wagga-chigga wa! Wagga-chigga wa! Wagga-chigga wa-wa! Thoomp! Meedly-meedly-meedly-meedly! Meedly-meedly-meedly-meedly meedly-meedly-meedly-meedly meeeeeeee!!

Share
twitter facebook
- Re:Does that mean... (Score:2)
  
  by madmarcel ( 610409 ) writes:
  
  Ehm...maybe...there was/is a research-project
  at (*shame-less plug* :) Waikato University's
  school of CS which might be able to help you out.
  
  The idea is that you hum or whistle a tune
  into the microphone and the computer will then use
  some fancy pattern-matching (I think) and spit out a song(s) that match. (From memory, too tired to look it up)
  
  I have not seen this system in action, so I couldn't tell you how good it is. I don't think it works with MP3's though :o
  I think one of my fellow students is working on
  - Oops, here's a link (Score:5, Informative)
    
    by madmarcel ( 610409 ) writes: on Monday May 05, 2003 @08:24AM (#5880465)
    
    If you really want to find out how it works:
    
    Links to PS and PDF files are on this page
    
    http://www.cs.waikato.ac.nz/~nzdl/publications/
    
    (They are not going to like what I am about to do to their server ;)
    
    Parent Share
    twitter facebook
- Re:Does that mean... (Score:2)
  
  by Migrant Programmer ( 19727 ) writes:
  
  And then Strong Mad comes in on his bass and he's like doo doo doo doo doo doo doooo!
  
  And then The Cheat comes in on his keyboard and he's like boop boop boop boop boop boop boop boop boop!
  
  And then I come in with And the dragon comes in the NIIIIIIIGGGGHTTTT!!!
- Re:Does that mean... (Score:1)
  
  by gbpuckett ( 572575 ) writes:
  
  You're not that far off of describing the search engine I'd really like to see, sort of a "Name that Tune" search engine. Instead of inputting a text string, you would be able to put in a musical notation string and have the search engine return a listing of tunes that contain (or begin with, or end with, or repeat) that string. It would be useful for trying to deal (probably unsuccessfully) with those snippets of music that occasionally create an endless loop in your head. It would also help tunesmiths avo
- Re:Does that mean... (Score:1)
  
  by McWilde ( 643703 ) writes:
  
  Here it is. [homestarrunner.com] Albeit through massive human intervention.
- Re:Does that mean... (Score:1)
  
  by absolut_kurant ( 152888 ) writes:
  
  Query by Humming (really)
  
  http://www.cs.cornell.edu/Info/Faculty/bsmith/quer y-by-humming.html [cornell.edu]
Voice Control (Score:1)

by 2sleep2type ( 652900 ) writes:

This is a step towards full voice control of systems I have always felt the computers will not have truly come of age until they are voice controlled. For general business use all other forms of interaction are a compromise. The future I look forward to is full voice control of systems. Probably via a discreet headset so the box next door to you doesn't start typing your letter. I will then be possible to have truly 'afordant' systems.
- Re:Voice Control (Score:1)
  
  by Biogenesis ( 670772 ) writes:
  
  I'd still prefer simply having to wave my hand in the general direction of the thing, zaphod beeblebrox style. It'd make for more 'creative' input styles.
- Re:Voice Control (Score:1)
  
  by s0m3body ( 659892 ) writes:
  
  do you really want your microsoft windows to be talking on you ?
- Re:Voice Control (Score:2)
  
  by samael ( 12612 ) writes:
  
  I don't know about you, but I can type darn near as fast as I can talk and I can certainly rearrange text faster with a keyboard/mouse combo than I could describe what I wanted using voice.
  
  Voice as an adjunct to keyboard/mouse would certainly be handy though.
  - Re:Voice Control (Score:2)
    
    by gilgongo ( 57446 ) writes:
    
    I agree - although my father, now retired, spent most of his working life dictating letters to a dictaphone. It's a really impressive skill - he kinda thinks a then, dictates a couple of sentences. Stops, thinks again then does another five or six. Rewinds to review it, maybe changes a bit in the middle, pads out anything that needs it with a bit of silence, then continues.
    
    The fact is that we're used to keyboards and word processors, clipboards, etc. so we can't see another way of doing it. We also tend to
- Re:Voice Control (Score:1)
  
  by Kazoo the Clown ( 644526 ) writes:
  
  While I figure I may want voice control someday when my eyeballs or fingers don't work so good anymore, I've always figured that's a less-than-optimum workaround needed because the more efficient means of interconnection are impaired. Now if you are a hunt-and-peck typist and didn't find the move to mouse input akin to amputating 9 fingers, perhaps you'll prefer voice I/O. However, you'll be way behind those who have no problem with adapting to technology, just like those were who insisted that automobile
Google for sounds? (Score:2, Interesting)

by Shiranui ( 643648 ) writes:

It would be cool if we're able to actually 'search' for any soundbytes. Even with altered speed / tone.

Listening to all those techno remixes, I always have a hard time trying to find out where those cute backgound soundbytes came from...only to find out it was a heavily distorted Mozart or a mixed up vocal of JFK.
- Re:Google for sounds? (Score:1)
  
  by DJ FirBee ( 611681 ) writes:
  
  Would'nt it be cool to have even more intellectual property laws so that it is even harder to make music in the first place ? Wouldn't it be cool that I could be sued for using something as inconsequential as so and so's snare drum hit in a composition ?
Oh, NO! (Score:1, Insightful)

by Anonymous Coward writes:

Lets home this doesn't make pointy head bosses think they can store customer information as a blob of speech data...
Dialoggle... no... Earggle... (Score:1, Funny)

by rinkjustice ( 24156 ) writes:

It would be nice if there was a search engine exclusively for that - instead of typing "linus torvald linux .au", you would navigate to a subdirectory called 'pronounciations', pick the audio format and voila...

If you can search websites and images, jobs and news articles, sound bytes would be the logical next step.
One good implementation (Score:2, Informative)

by emcron ( 455054 ) * writes:

A company called Fast-Talk Communications [fast-talk.com] has a set of tools that they resell for 3rd-party apps for things like searching interviews for specific words that were said. I have actually seen this feature in used some newsroom software made by Dalet Digital Media [dalet.com] and it was amazing to see in action. Very fast and accurate

The research for the fast-talk technology was done at Georgia Tech's Interactive Media Technology Center [gatech.edu] (IMTC). They've got a page about the corporate spin-off [gatech.edu] of the technology.
Index or no index? (Score:4, Insightful)

by Psychic Burrito ( 611532 ) writes: on Monday May 05, 2003 @09:59AM (#5881242)

There's quite a contradiction in this text. First they tell us that FastTalk doesn't uses an "index":

The key to expediting the process was eliminating the need for transcription or indexing or both.

Then on the second page, they say that some sort of pre-processing is needed:

(...) the Fast-Talk approach ?processes the speech in such a way that you can later go back and search it very efficiently (...)

So I see no revolution here... it's just about indexing the phonemes of a audio stream and then searching these, right?

Share
twitter facebook
- Re:Index or no index? (Score:1)
  
  by Polarweasel ( 33867 ) writes:
  
  Stop thinking like a geek!
  
  By "indexing", they mean creating a list of every place a particular phoneme occurs. Think of an index like you find in the back of a book...
I said "moos" (Score:1)

by sin(theta) ( 609000 ) writes:

a system trained by a speaker from Canada would transcribe the sound "hoos" into the word "house."
But I don't say "hoos".
RIAA funding forthcoming? (Score:3, Interesting)

by fluffhead ( 32589 ) writes: <eric.sherrill@at ... m ['sor' in gap]> on Monday May 05, 2003 @12:32PM (#5882751) Homepage Journal

I wonder if the RIAA will throw money at this type of technology, to help catch "pirates" who might otherwise escape by subtly transmogrifying their shared MP3s. Or maybe it already has?

Share
twitter facebook
- Re:RIAA funding forthcoming? (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  Let's see:
  - RIAA mention in title CHECKED
  - pirates word between quotes CHECKED
  - use of the word shared CHECKED
  
  All you need for a couple of karma points! +3 ??
Animals? (Score:1)

by kaoshin ( 110328 ) writes:

This seems like it may be a good tool to use for learning how animals communicate with each other. Just an idea.
Google for ... (Score:1)

by Dossy ( 130026 ) writes:

Google
Web Images Sounds Groups Directory News

__porn_blowjob_"money_shot"__ [I'm Feeling Lucky]

I can see it now ...

-- Dossy
Soundtrack (Score:2)

by mbbac ( 568880 ) writes:

Apple's Final Cut Pro's Soundtrack feature does this. It automatically categorizes inported samples depending on the instrument in them. Then, you can search on that.
Music 'fingerprints' - Polyphonic HMI (Score:1)

by scrimshander ( 671183 ) writes:

Has anyone heard about this company Polyphonic HMI (www.polyphonichmi.com) that claims to be able to create a digital 'fingerprint' of music (beat, melody, etc.) and identify potential 'hits'? Anybody know how they're claiming to do it?
findsounds.com (Score:2)

by Qender ( 318699 ) writes:

I know it's just indexed by name, but this can be useful.

http://findsounds.com/

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Usage scenario (Score:1, Funny)

Re:Usage scenario (Score:3, Interesting)

Where have they been? (Score:1, Funny)

been there, done that (Score:2, Informative)

Re:been there, done that (Score:1)

When you think about it... (Score:5, Interesting)

Re:When you think about it... (Score:2, Interesting)

Re:When you think about it... (Score:3, Interesting)

Re:When you think about it... (Score:2)

Re:When you think about it... (Score:2)

Re:When you think about it... (Score:1)

Re:When you think about it... (Score:2)

Re:When you think about it... (Score:1)

Re:When you think about it... (Score:3, Interesting)

Re:When you think about it... (Score:1)

Re:When you think about it... (Score:2)

Re:When you think about it... (Score:2, Informative)

Does that mean... (Score:3, Funny)

Re:Does that mean... (Score:2)

Oops, here's a link (Score:5, Informative)

Re:Does that mean... (Score:2)

Re:Does that mean... (Score:1)

Re:Does that mean... (Score:1)

Re:Does that mean... (Score:1)

Voice Control (Score:1)

Re:Voice Control (Score:1)

Re:Voice Control (Score:1)

Re:Voice Control (Score:2)

Re:Voice Control (Score:2)

Re:Voice Control (Score:1)

Google for sounds? (Score:2, Interesting)

Re:Google for sounds? (Score:1)

Oh, NO! (Score:1, Insightful)

Dialoggle... no... Earggle... (Score:1, Funny)

One good implementation (Score:2, Informative)

Index or no index? (Score:4, Insightful)

Re:Index or no index? (Score:1)

I said "moos" (Score:1)

RIAA funding forthcoming? (Score:3, Interesting)

Re:RIAA funding forthcoming? (Score:2, Funny)

Animals? (Score:1)

Google for ... (Score:1)

Soundtrack (Score:2)

Music 'fingerprints' - Polyphonic HMI (Score:1)

findsounds.com (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals