Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
Technology

Online Speech Indexing 87

Thomas Edwards from The Sync (where we host Geeks in Space) sent us an interesting site: "Speechbot" is a Compaq Research project that is indexing online radio shows. Apparently it found terms like 'Red Hat' and 'Yahoo' in past episodes of GiS. Interesting technology. Imagine when it lets me ask my TV to find me every show that mentions Sarah Michelle Gellar.
This discussion has been archived. No new comments can be posted.

Online Speech Indexing

Comments Filter:
  • mmmmmmmmmmmmmmm, FBI Betty, oooooooooooh.
  • TV input cards that let you watch TV on your computer will download all of the "Closed Captioning" text and then let you search it. There is no reason why you couldn't setup something that watches all channels for closed captioning text and indexes all of it.

    Espesially with these new hard-drive VCR's that let you pause a TV show while you go "take a #2". This device could also watch particular channels and buffer them for an hour while it searches it's closed captioning for specific search criteria. When it sees the text you are looking for in the closed captioning text it will send the whole program to tape. It would be pretty nifty I think if I could have my computer watch the Science Channel for anything to do with Quantum computing and when it does it will output the whole thing to tape. It's like having your computer watch TV for you and only get the good stuff. :-)

  • Perhaps it should be reading the web pages that link to a clip and finding some context from that. Esp. when the computer doesn't know who's talking context becomes evertying. Obviously gathering context from sourrounding audio isn't sure fire.

    --Ben
  • Word! Since you guys use it (RealNetworks stuff), I figure you would be complaining if you didn't like it. I didn't know that they devoloped RealServer on Linux. And I never even thought about Windows Media Player, yikes!

    Sometimes you actually learn something on /., huh.

    BTW the sync rocks for hosting geeks in space, the most freaky and funny talk show I've ever listened too. With this Andover business, the show needs to lay down some cash to get a call in section, so all of the geeks across the nation (err, umm globe, sorry) can call in and be nerdy with the kings.

    -Rich

  • Even at 3x realtime, I could probably live with working with it based on what I saw on the pages. I'd have to record everything to wav's first, then have one of my machines on the network handle processing it later.

    This sounds like it's going to turn out to be quite the kick ass product!

    I'll ask the same question the other guy did - Open Source, or proprietary? And in the future, can the SQL server be replaces with something else like MySQL, etc.? Heck, for that matter, got something that gives us even more information?

    Hell, need a beta tester to do any stress testing on this puppy? I'm yer man!

  • They probably need more powerful searches than that. Their patent for searching and sorting text [ibm.com] describes a system for searching and sorting "speech-based text, optical-character-read text, stop-word-filtered text, stutter-phrase-filtered text, and lexical-collocation-filtered text," according to claim #2. [ibm.com]
  • by xyzzy ( 10685 ) on Wednesday December 08, 1999 @10:58AM (#1475403) Homepage
    Ohoh, I've stirred up a firestorm here :-)

    Re: SQL -- it can be any SQL server, really. However, I will add that we are somewhat in bed with Microsoft on the visualization end, simply because IE5 does XML quite well (note to Mozilla people: get with the program).

    Re: Open source. Unfortunately, not up to me. Much of the technology is "open source" in the sense that papers have been published about it (not what you were looking for, I know), but we've already licensed some of the core technology to another company, and being a phone company (GTE) we consider the speech rec somewhat of a competitive advantage (wipe those Echelon thoughts out of your mind! We use it for call center and directory assistance automation! Sheesh :-)

    As I posted probably about 6 months ago in a thread about speech recognition, there are some significant issues with open-sourcing beyond the recognizer code. The learning processes behind the recognition are based on a considerable amount of data for which licensing is an issue, such as CNN broadcasts. In fact, we use over 100 hours of broadcast news audio to train the system, and several million words of text for the language model. This comes to us through the Linguistic Data Consortium at the University of Pennsylvania (http://www.ldc.upenn.edu). This is an academic group set up to maintain these common train-and-test databases for researchers, and there's a fairly sizeable fee to join. They handle the intellectual property issues with the training data.

    And, unfortunately, without the training data, it's kind of hard to use the system. At least, if you want to use it on something it's not already trained on (in our case: north american broadcast news).
  • I wouldn't call it a firestorm - no flameage involved coming from me. Just lots o' questions. I've been waiting for something like this to come along that can handle transcribing a conversation. I've got more than one application I'd love to use it for - most of the pretty fluff, mind you, and none of them professional.

    Of course, in the end, this thing is going to be outside of my price range (if available at all to 'consumer level' people) based on what it's for.

    I can keep dreaming I suppose :-)

    As for Open Source - it's a valid question to ask about any product any more, but, doesn't mean that it HAS to be Open Source to be useable.
  • Did anyone try searching for "Binks"? I got 2 results, "charge are binks" and "charge or binks".
  • For another project that does something like this (I think) see:
    http://parlevink.cs.utwente.nl/P rojects/olive.html [utwente.nl]
  • Just did a search for "line next" (exact phrase), and got quite a bit of "and on the line next we have...", a la talk show introductions.

    Oh, I was only searching "Geeks in space," not the whole thing..

    Oh, and speaking about the engine... I thought that it might store the transcripts in some sort of phonetic format, and then match your search phrase to it (i.e., "your not hyped" and "journal typed" both match the stored, transcribed phoentics). So I did some searches to match the same sections extracted, and they all seem to match. Conclusion: the transcripts appear to be stored as text, not phonetic symbols.

    That's quite a good idea actually. Store it as phonetic symbols, translate the search query into symbols on the fly, then search based on symbol..

    Better patent it quick, or put it in the public domain for all to use. :-)

    ---
  • by GoRK ( 10018 ) on Wednesday December 08, 1999 @08:17AM (#1475408) Homepage Journal
    What is particularly interesting to note is that the quality of these Internet Raido shows is generally fairly poor. The voice recognition and dictation software that I have toyed with before have always suggested using better microphones and higher sampling rates to achieve decent results. Some even claimed that low quality audio results in a severe accuracy penalty.

    It is very remarkable that this thing can index these low quality streams with the accuracy that they do! I hope that searchable media (other than text) continues to get better like this. Companies like Virage and Compaq definately deserve our support. I hope that standard interfaces appear soon.

    ~GoRK
  • Indexing broadcast TV should be easier...
    1. Capture video feed
    2. Decode closed captioning text
    3. Make text index

    Anyone know of this technique being used today?

    The Closed-Captioning FAQ [robson.org] seems to think that using speech recognition to generate text from broadcast audio "isn't there yet [robson.org]" technology-wise.

  • by Otto ( 17870 ) on Wednesday December 08, 1999 @08:19AM (#1475410) Homepage Journal
    It found no instances of the word Linux, which I found humorous.

    However, a little brain usage, search for "line" and get this:

    ... there an a to think you're doing is making good news slash my next monday's announcement makes it you can use less leonard still want to which it's tilman of the of the open sores movement I have not part of the open sewers that but why in part of the priests out their foundations giving his last line next to the flashlight next with you a while and we can end of the various duckling and in the he's serving snacks that promptly opening the top of that there is god who will bomb and the crowd is bernie this is definitely the most exciting play a thing would have to have one of I mean you for a column about how ...

    The words "end of the various duckling" and around there are in fact "Eric Raymond" in the clip, which I thought utterly hysterical. You can tell because they say "it's Eric Raymond, and he's serving snacks," which partway comes out correct.

    Linux seems to have came out as "line next" a lot, and "line of" in some clips I've found..

    Obviously, the technology is not quite there yet. :-)

    ---
  • by Cylix ( 55374 )
    Obviously your going to see more of this, the web is growing vastly more popular with every passing day, especially with everyone and thier uncle attempting to package the web in a nice little gift wrapped box for sale.
    Although I should probably be more worried about the ramifications of being able to search though countless fields of speech...I am actually more concerned with a different aspect of this new means of indexing.
    My concern is this, with each new means of indexing speech and text becoming readily avaible could this reault in web sites being aggressivly taxed by indivduals/machines not even barely using these services. Granted these are just my simple uniformed worries with little merit. Although it would be interesting to see the results if a slew of these technologies developed and became devastating popular. (types that would work from your home computer and index your favorite speech/web sites) :)
  • Yes, you can have it done that way. Unluckly, if there's anything odd going on with the Cable line (which is most of the time) you get some really strange output from the CC information. My card and software does it, and from time to time it skips parts of words, inserts 'odd' characters, etc., etc., etc. So, you could set it up with the trigger text, but, the likelyhood of getting it to work right all the time is a bit low. (Just based on what I've seen - I'm sure it's different in other areas of the country.)
  • Just think about it, add a regexp to this...you can get Bill gates saying things like: "Microsoft Windows dominates the market due to our huge inovation." and apply a quick couple regexp's (excuse me if they are long code, not really trying)

    $gatespeak =~ s/Microsoft/Micro\$oft/g;
    $gatespeak =~ s/Windows/Winbloze/g;
    $gatespeak =~ s/dominates the market/controls your lives/g;
    $gatespeak =~ s/inovation/stealing and strongarm tactics/g;
    To get this: "Micro$oft Winbloze controls your lives due to our huge stealing and strongarm tactics". Wow, you can actually get Billy boy to speak the TRUTH!
  • After a bit of prodding, I got whatever version of windows realplayer I have installed (4.x or G-something) to run under wine - I expect more recent ones ought to as well. It wasn't too hard, just "wine rvplayer.exe", but you've gotta save the link on the web page to a .rm file, then open that from withing the player.

    Not perfect, but certainly better than nothing!

  • Imagine when it lets me ask my TV to find me every show that mentions Sarah Michelle Gellar.
    Well, I guess that would be a good way of determining what NOT to watch. ;)
    --
  • who CARES about the NSA...I just got a giggle
    out of searching the Art Bell Show for the word
    UFO *grin*

    (for those of you who are not clued in to Art
    Bell, he is famous for conspiracy theories)

    :)
  • I'm pretty sure the NSA has recently been granted a bunch of patents covering this kind of thing. Just wait till Compaq gets cease-and-desist letters from Janet Reno... You thought going up against RIAA on intellectual property was hard... Try the US Government.
  • by anl ( 9070 ) on Wednesday December 08, 1999 @08:47AM (#1475418) Homepage
    The press release [compaq.com] has a little more information. We use workstations running NT to spider the sites; processing is done on a farm of Linux servers, and the UI runs on AlphaServer DS20 machines.
  • This is the introductions section of the show : "I get the deductions on robb commander cockrell mullah eyeing the toast and or mayan jeffery"

    I relise that theres a disclamer saying that the transcripts wouldn't be exact. But I was expecting on or two words off.

    Actualy, after looking through a couple more of these transcripts it seems to have a problem with the nature of Geeks in space. That is, when the voice changes it takes the program a little while to catch up. When there are long stretches of only one person talking it seems to do better. ("the most interesting thing that pops up today is that microsoft is set up a box and they have basically challenged the internet to crack it...")

    Interestingly enough not a single episode of GiS seems to include the word "Taco".
  • so does that mean that the last half hour of a show will be filled with "sex, lies, conspiricy, money, $$$, free, topless" etc just to up its index like web pages do nowadays?
  • Try doing a search for "fuck." This is hillarious.

    "... such chaos these days with possibly of that david thinks not to use nuclear weapons but it could use a biological or chemical try to get the fuck you mentioned before newberg thousand level..."

    or better yet...

    " ... and they knew that and we haven't done just like the chevy malibu fuck are still under oath whistled carefully crafted dig two hundred thousand miles of course..."

    ...it gets more bizarre:

    "... your occurred several koppel with the mask of a focus of of fuck 'em berry's school principals the jaw and the m. our washington studio with a few arctic of the papal trip to do that to us from...

    ...then... listen to the clips and pretend they're actually saying it.



    I need a life.


    _________
  • My ATI all-in-wonder does this. GATOS does not though....
  • Interestingly their contact email addresses are @dec.com (ie. Digital Equipment Corp) .. so it's probably research that has been carried over from Digital. I wonder if they're using Digital Unix. Strange that they don't mention anything about the algorithms, software, hardware, people.
  • Our team is listed at http://speechbot .research.compaq.com/cgi-bin/query?help=about#team [compaq.com], and, in the press release [compaq.com], it explains that we use NT workstations for the content acquisition, a Linux farm to process the data, and Tru64 (formerly Digital Unix) machines to serve the site.

    HTH,
    Andrew Leonard
    Webmaster
    SpeechBot
  • Haha. You people really know ur shit. I love the amount of buffy & SMG references here :P
    Buffy Rox! :D
  • by Anonymous Coward
    "Note: Indexed text does not match audio exactly." And you wonder what kind of technology the NSA has listening to us all right now?
  • The cryptography community usually believes they are a couple years behind the NSA, given that the NSA reads all their papers, but doesn't publish its own work.

    I had been skeptical of Echelon being able to do word recognition on phone conversations, but I expect that the NSA is ahead of private industry in this area too, so Echolon looks plausible.

    --Kevin
  • Based on everyones assumptions around here, this would peg the NSA as having that capability since 1990 or so (just to pick a round number)... And it only came to light this year.

    oh, and first post too... maybe
  • However, it can do "exact phrase" searches which is almost as good.

    For example, searching on Black 47 returns 2,000+ hits when using the default search, but 0 when using an exact phrase....

    Linux OTOH returns only two matches... sigh. Actually I wonder how much that has to do with the confusion over the pronunciation(sp?), considering I've never met two people who say it with the same exact phonetics....

    Oh well,
    RobK
  • Apparently, the engine can't do "near" type searches. If you search for more than one thing, it looks through the whole transcript for the words.

    So, you might get a result back that isn't quite what you are looking for.

    Jordan
  • I searched 'Red Hat' ....it found geeks in space, but look at the excerpt:

    "... using the next that you don't want the newest licks but the sex the years only wicks does that to this incident here from licks founder of the the store called was robert young of red hat michael..."

    looks like the T2S needs some work =)

    #----------------------------
    $mrp=~s/mrp/elite god/g;
  • by Col. Klink (retired) ( 11632 ) on Wednesday December 08, 1999 @07:32AM (#1475433)
    Dragon Systems [dragonsys.com] (makers of Naturally Speaking continuous VR) announced a similar product at Comdex. They call it audiomining [dragonsys.com].
  • sorry, I'm an idiot

    #----------------------------
    $mrp=~s/mrp/elite god/g;
  • do you think it could be adapted so I could filter out stuff, like....

    1)Every mention of Celine Dion out of my radio and TV.

    2)Every mention of Bill Gates out of Slashdot posts.

    Just wonderin'

    But as someone who has spent hours upon hours researching old radio shows for a college assignment, it sounds like a real good idea.

  • Not only for the SMG thing, but also imagine the possibilities when applied to C-SPAN. Now you don't have to listen to hours of mind-numbing, coma-inducing boredom or hope and pray that the media will deign to bring a certain issue out of the Washinton black hole in order to find out about your favorite target of litigation. Like, say, the one you're reading.
  • a hundred twenty gates at a note let me give you every moment in your life right we've got we've got that word the key guess like thirty five to defend the trees now we don't like pancakes free

    Methinks a lexical generator produces better speech than the triumvirate of ./
  • Gives you some idea of what the NSA and Echelon are capable of... Interesting and incredibly useful if used properly, terribly frightening if used for those "black projects" our wonderful TLAs are so fond of running.

    Hey, at least it'll make finding quotes and sound bites easier! The politicians will probably outlaw it, (for civilian use) of course...

    Imagine what it could do if run against all the archived tape that CNN, or NBC, etc... have. Ah, the possibilities!

    I wonder what Katz will have to say about this?

  • by xyzzy ( 10685 ) on Wednesday December 08, 1999 @07:46AM (#1475441) Homepage
    Might as well use this as a chance to plug my project:

    http://www.gte.com/AboutGTE/gto/bbnt/speech/rese arch/extraction/roughn_ready/index.html

    ...which not only tells you what words were said, but who said them, and what topics were being talked about...
  • by Croaker ( 10633 ) on Wednesday December 08, 1999 @07:46AM (#1475443)

    Did a search for "Mars Probe" in the Science Friday show, and got this snippet:

    .. of deep space walk which show the first I am to arrive and interplanetary space another mars or murder ritual rides the september twenty third of mars lander which lands on...

    Err... yeah. That would explain a great many things about space probes. Actually, I'm sure the textified show would be a lot more interesting than the real show. And then, we could shove it through Babelfish for added enjoyment...

    I recently installed the ViaVoice beta for Linux, and found its recognition not quite ready for prime time... at least for my needs. I'd be surprised if radio shows, which often have people on fairly crummy phone connections, would be an ideal candidate for automated indexing.

  • by / ( 33804 ) on Wednesday December 08, 1999 @08:49AM (#1475444)
    "I want to die" turns up 6
    "Grits" turns up 12
    "Sex with animals" turns up 5.
    "Your mother" turns up 200.

    My conclusion: "Your mother is still almost ten times as important as suicide, sex with animals, and grits combined."

    Remember that, always.
  • Check out the results of "Show me more". The transcript reads like bad poetry:

    settling rata at and legs
    to the team network concept
    fighting ends
    the single monolithic entity
    of those are all things
    that the challenger are undermined
    neither side I think
    what I look at the to the the marcie katz
    the respective committees

    (I didn't change anything but adding line breaks.)

    Good for a chuckle. :)
  • Forgot to add my signature information to that post, so you have some idea where I'm coming from:

    Andrew Leonard
    Webmaster
    SpeechBot
  • Is f**k rich? or fuck for that matter?
  • The vampire Slayer.

    Mind you, I didnt know this until I did a web search for her.
  • Rant all you want about RealNetworks [real.com], but at least they have a Linux version of their player, more than I can say for the next leading competitor in low bitrate streaming media (i.e. Windows Media Technology). Linux is also the base OS for the development of the RealServer.

    I do understand Real's need to "encourage" people to purchase the RealPlayerPlus, because they need to make money to keep up the excellent low-bitrate R&D they've been doing. Unlike Microsoft, they can't just chalk it up to selling more NT Server software.
  • by quadong ( 52475 ) on Wednesday December 08, 1999 @09:20AM (#1475452) Homepage
    Someone should take these pseudo-transcripts and run them thru babelfish. Think of the gibberish level we could achive!
  • The problem with ViaVoice, according to IBM's web pages, is that the Linux Beta does not have the training software the Windows version has. This is very aggravating because without any training, I'm getting only 30-50% recognition when I speak slowly and clearly. When trained, this should be much higher. Has any third party written software to train ViaVoice under Linux?
  • Speech recognition is really neat and stands to greatly improve indexing and organization of non-text media. It looks like this is a pretty cool application of it, too.

    That being said, let me say that something like this scares the crap out of me. This sort of technology is exactly what the FBI had in mind when it began to pressure telecommunications companies to make their phone lines more tappable. Now I don't remember the exact figure, but they wanted something like 1% of the phones in any metropolitan area tappable at once. 1% of the phones in New York City is something on the order of 50,000 phones. Tell me how you're going to keep track of all of that without a computer monitoring 50,000 conversations and looking for key words. You can't.

    Monitor 1% of the population's conversations for some suspect keywords like 'bomb', 'assasinate', 'cocaine' or perhaps 'open source' and you've got one scary computer-assisted big brother watching over everything. If you don't hear anything juicy, shift to another 1%. I suppose people have had the technology to do realtime speech recognition and filtering for some time now, but the idea of maintaining searchable archives of phone conversations (enter Speechbot) is a genuinely spooky privacy violation.

    Now, any technology is only as good or as evil as the people who use it. I will be cautiously interested to watch what Speechbot evolves into.
  • > Better patent it quick, or put it in the public domain for all to use. :-)

    I wonder if, years from now when AT&T or some such august body patents it, I can claim a Slashdot post as "prior art".

    Hurm....

    --
    Evan

  • ...which would be...?
  • At the very latest, I'd say. The late 1980's sound more likely, but (as you say), that's a convenient round number.

    This would also be about the time UK piliticians were banned from Menworth Hill, an NSA listening post in the UK, which would have been a likely candidate for early deployment of such technology.

  • The FAQ [compaq.com] is incredibly vague and the About [compaq.com] page doesn't say much either in terms of the actual technology used. It says that they index 20 shows and index daily. Does anyone know what the time to actually do an index is and what kind of processing power these guys are using?

    On an un-related note, the about page says that Compaq has a research lab in Australia... sweet.
    -dr

  • If I had a T2S that could reproduce that garbled mess as understandable speech, I could make a pile of cash turning Marketing Speak into English.

    Might make IBM manuals more friendly too!
  • by Goner ( 5704 )

    \rant{But does where is realaudio at? The company itself is worse (in its smaller domain) than microsoft, I mean their version numbering (5,G2,7) is absurd, their website pushes you to download the plus version of their player (ie the one you have to pay for), and their monopoly on video (and most of the sound) on the web lets them get away with it. I believe they have an ok product, but their marketing schemes are stuck in mid-nineties "pay-for-this-better-version-now-even-though-a-bet ter-free-one-will-be-out-in-three-months ."

    Things like pointcast have died due to this type of scheme, but real is still staying strong. The linux (unix) install scenario, and html documentation is absurd as well, and to be honest the reason for this rant. We'll just have to wait until some sort of disruptive technology forces real to compete, instead of stagnate.}

    As far as the implications of this technology, echelon, etc. I just can't wait until I can do boolean searches through my old phone calls. Not like they're listening anyway...

    -Rich

  • by kaniff ( 63108 ) on Wednesday December 08, 1999 @07:58AM (#1475462) Homepage
    When the guys introduce themselves, the translator has a fun time with their names and nick names.

    Rob "CmdrTaco" Malda -- rob commander topple mall
    Jeff "Hemos" Bates -- jeff in both states
    Nate Ostendorf -- the husband or the smoke

    I also searched for linux and I'll bet that it can't find any instances, because it doesn't translate it right. With all the different pronounciation possibilities.

    It's a cool idea, but has a ways to go. Go Compaq.
    yay.
  • The transcript reads like bad poetry:

    The site's FAQ [compaq.com] admits to that (in not so many words)...

    Warning: The "transcript" that is output by the speech recognition software (and shown in small extracts on the Results and Details pages) rarely matches what was spoken exactly, and often often does not read very well. Because different people speak at different rates and with different degrees of clarity, speech recognition software does not correctly interpret every word. However, research has shown that meaningful words are recognized with a high degree of accuracy, and that even when a word is missed, it will most likely be recognized when it is spoken somewhere else in the program.

    And in all fairness, they are not claiming to be a "transcript service" per se, though I can certainly see a lot of transcript writers losing their jobs in the future as the technology advances.
    -dr

  • Hrm. I can't help thinking of the so-called "stealth fighter", which was apparently fully operational in the late 70s/early 80s (erm, I haven't checked the exact dates on that).

    Anyway, certain gov't TLAs always seem to be about 10 years ahead of what they're telling the rest of us. I wouldn't be surprised if the NSA [nsa.gov] has had open source drew barrymore since the E.T. days.

    __________


  • I'm gonna patent that.

    That'll make me rich as f**k!
  • Keep in mind the NSA does not need to produce transcripts, just scan for keywords. When a keyword is found, the last couple of minutes and the rest of the conversation would be recorded, logged, and shipped off to the agents on duty at the time.

    Without the need to have thousands of words in the recognition engine it would definitely be much faster, and possibly much more accurate than current state of the art systems.

  • If the government or the cops or some agency decided they wanted to act as "Big-Brother", then we might not evre find out. The F-117 stealth military aircraft was only a legend for 15 years, the best kept secret in the military history of the USA. (Or was it? There could be others - that we have never heard of ofcourse...) The point is that the government and military allways seem to be one srep ahead. Why should there not be someone monitoring us normal people and the things we do? If they have got one of these things organized in your neighborhood, you might never even hear of it. What would then happen if a guy was unlucky enough to tell a friend a joke about. Saddam Husain and a revolution? - You never know...
  • Yes, there are quite a few places that do that (CMU has a system called "InforMedia" that does), but there are a few problems:

    a) Close-captioning is not an exact representation of what was said. Quite often it is a paraphrase (but speech recognition is errorful, of course);

    b) There are many many many useful sources of audio that don't have closed captioning. A meeting, for instance. Or a foreign broadcast (closed-captioning is much more prevalent in the US than in other nations).
  • Unfortunately, one of the downsides is that we don't have a whizzy on-line demo for people to play with. I suspect that will change in a year, but until now you'll have to make-do with the screen shots.

    HOWEVER, I do want to add that the system does run on standard, ordinary PC hardware. The indexer currently runs in 3x realtime (so a half-hour wavefile takes 1.5 hours to index) on a P3-500 running RedHat 6.0 with 512mb of memory. It deposits its product in an SQL-Server database on an equivalent PC running NT (no jeers, please). So the analysis and querying/browsing are decoupled.

    We plan on having this down to realtime this year, both through algorithmic improvements and some additional hardware.
  • What would be interesting would be to link up to a machine translation (such as babelfish), and then finally text to speech.
  • This technology can be tuned for specific words/pronunciations/dialects/languages/etc. I used IBM's VoiceType that came with OS/2. It actually performed fairly well with training, wasn't worth a damn before that though (I have a rather hard Southern accent and a deep voice.)

    The Echelon folks wouldn't use this to transcribe phone conversations. They would use it to filter for interesting conversations and then appoint an agent to listen to the real thing. Save a lot of manpower that way.
  • .

    Just did a search for "line next" (exact phrase), and got quite a bit of "and on the line next we have...", a la talk show introductions.

    The only computer related ones that I found without searching too hard were two instances of "iMacs" being translated to "line next".

    Incidently, as we all searched for our favorite phrases, I got a pretty good accuracy on finding "Rocky Horror", but got *many* spurious returns on "Tim Curry". It seems to like to put the phrase "Tim Curry" into people's speech, which might mean that it's using a list of names of famous people within it's dictionary. It's a good idea, and makes sense; I wish they had more info on the guts.

    Oh, and speaking about the engine... I thought that it might store the transcripts in some sort of phonetic format, and then match your search phrase to it (i.e., "your not hyped" and "journal typed" both match the stored, transcribed phoentics). So I did some searches to match the same sections extracted, and they all seem to match. Conclusion: the transcripts appear to be stored as text, not phonetic symbols.

    --
    Evan

  • Any chance of GPLising and releasing it ? Or is this going to be locked down proprietary forever ?
  • Ok, so I live in a box. Who's Sarah Michelle Gellar?
  • by Anonymous Coward
    Cmdr Taco said "What if I could find every instance of someone saying Sarah Michelle Gellar on TV?" You can. There are TV cards for the PC that can scan all of the closed captioning data on all of the channels at once for a word or phrase. I remember reading about this about three years ago. I think you can switch to that channel as soon as it is said. Anyone know more? -by WGS
  • My results varied depending wildly depending on the input. For instance I searching "Geeks it Space" for the word "slashdot" and came up with no hits.
    Are these algorithims Opensource by any chance? I looked into voice algoritims at one point, and it made my run back for my knuth book. I never found any real source for it. It is a fascinating subject and I am sure the opensource community could do a fantastic implentation of it. :)
    I stop rambling now.
  • "Linux Underground" (from GiS 3.1) came across as "limits underground", so try searching on "limits".

    ------
  • Wow - I looked at it, and well, it looks damned impressive to me! Of course - there's nothing to really see but the screen shots, and of course information about what it can do.

    I'd love to see demos of this stuff - I could have this stuff filter the TV for news for me :-) Or even better - I'd never have to take down notes after GMing a gaming session - I'd just let this thing transcribe the session on the fly!

    Of course, I'm figuring the software that does the speech to text and indexing probably doesn't run on plain ol' PC hardware with Windows 98 or NT loaded

In seeking the unattainable, simplicity only gets in the way. -- Epigrams in Programming, ACM SIGPLAN Sept. 1982

Working...