
Online Speech Indexing 87
Thomas Edwards from The Sync (where we host
Geeks in Space) sent us an
interesting site:
"Speechbot" is a Compaq Research project that is indexing online radio shows. Apparently it found terms like 'Red Hat' and 'Yahoo' in past episodes of GiS. Interesting technology. Imagine when it lets me ask my TV to find me every show that mentions Sarah Michelle Gellar.
Or Gillian Anderson (Score:1)
TV is already text searchable. (Score:1)
Espesially with these new hard-drive VCR's that let you pause a TV show while you go "take a #2". This device could also watch particular channels and buffer them for an hour while it searches it's closed captioning for specific search criteria. When it sees the text you are looking for in the closed captioning text it will send the whole program to tape. It would be pretty nifty I think if I could have my computer watch the Science Channel for anything to do with Quantum computing and when it does it will output the whole thing to tape. It's like having your computer watch TV for you and only get the good stuff. :-)
Re:Text to Speech problems? (Score:1)
--Ben
Re:Nice... (Score:1)
Word! Since you guys use it (RealNetworks stuff), I figure you would be complaining if you didn't like it. I didn't know that they devoloped RealServer on Linux. And I never even thought about Windows Media Player, yikes!
Sometimes you actually learn something on /., huh.
BTW the sync rocks for hosting geeks in space, the most freaky and funny talk show I've ever listened too. With this Andover business, the show needs to lay down some cash to get a call in section, so all of the geeks across the nation (err, umm globe, sorry) can call in and be nerdy with the kings.
-Rich
Re:Impressive...? (Score:1)
This sounds like it's going to turn out to be quite the kick ass product!
I'll ask the same question the other guy did - Open Source, or proprietary? And in the future, can the SQL server be replaces with something else like MySQL, etc.? Heck, for that matter, got something that gives us even more information?
Hell, need a beta tester to do any stress testing on this puppy? I'm yer man!
Re:Echelon, anyone? (Score:1)
Re:Impressive...? (Score:3)
Re: SQL -- it can be any SQL server, really. However, I will add that we are somewhat in bed with Microsoft on the visualization end, simply because IE5 does XML quite well (note to Mozilla people: get with the program).
Re: Open source. Unfortunately, not up to me. Much of the technology is "open source" in the sense that papers have been published about it (not what you were looking for, I know), but we've already licensed some of the core technology to another company, and being a phone company (GTE) we consider the speech rec somewhat of a competitive advantage (wipe those Echelon thoughts out of your mind! We use it for call center and directory assistance automation! Sheesh
As I posted probably about 6 months ago in a thread about speech recognition, there are some significant issues with open-sourcing beyond the recognizer code. The learning processes behind the recognition are based on a considerable amount of data for which licensing is an issue, such as CNN broadcasts. In fact, we use over 100 hours of broadcast news audio to train the system, and several million words of text for the language model. This comes to us through the Linguistic Data Consortium at the University of Pennsylvania (http://www.ldc.upenn.edu). This is an academic group set up to maintain these common train-and-test databases for researchers, and there's a fairly sizeable fee to join. They handle the intellectual property issues with the training data.
And, unfortunately, without the training data, it's kind of hard to use the system. At least, if you want to use it on something it's not already trained on (in our case: north american broadcast news).
Re:Impressive...? (Score:1)
Of course, in the end, this thing is going to be outside of my price range (if available at all to 'consumer level' people) based on what it's for.
I can keep dreaming I suppose
As for Open Source - it's a valid question to ask about any product any more, but, doesn't mean that it HAS to be Open Source to be useable.
Humourous Transcriptions (Score:1)
More of this (at dutch university) (Score:1)
http://parlevink.cs.utwente.nl/P rojects/olive.html [utwente.nl]
Re:Linux = iMacs? (Score:1)
Oh, I was only searching "Geeks in space," not the whole thing..
Oh, and speaking about the engine... I thought that it might store the transcripts in some sort of phonetic format, and then match your search phrase to it (i.e., "your not hyped" and "journal typed" both match the stored, transcribed phoentics). So I did some searches to match the same sections extracted, and they all seem to match. Conclusion: the transcripts appear to be stored as text, not phonetic symbols.
That's quite a good idea actually. Store it as phonetic symbols, translate the search query into symbols on the fly, then search based on symbol..
Better patent it quick, or put it in the public domain for all to use.
---
The Remarkable Media Search (Score:3)
It is very remarkable that this thing can index these low quality streams with the accuracy that they do! I hope that searchable media (other than text) continues to get better like this. Companies like Virage and Compaq definately deserve our support. I hope that standard interfaces appear soon.
~GoRK
Indexing TV via closed captioning (Score:1)
Anyone know of this technique being used today?
The Closed-Captioning FAQ [robson.org] seems to think that using speech recognition to generate text from broadcast audio "isn't there yet [robson.org]" technology-wise.
No linux? ESR is a duck? (Score:3)
However, a little brain usage, search for "line" and get this:
The words "end of the various duckling" and around there are in fact "Eric Raymond" in the clip, which I thought utterly hysterical. You can tell because they say "it's Eric Raymond, and he's serving snacks," which partway comes out correct.
Linux seems to have came out as "line next" a lot, and "line of" in some clips I've found..
Obviously, the technology is not quite there yet.
---
q (Score:1)
Although I should probably be more worried about the ramifications of being able to search though countless fields of speech...I am actually more concerned with a different aspect of this new means of indexing.
My concern is this, with each new means of indexing speech and text becoming readily avaible could this reault in web sites being aggressivly taxed by indivduals/machines not even barely using these services. Granted these are just my simple uniformed worries with little merit. Although it would be interesting to see the results if a slew of these technologies developed and became devastating popular. (types that would work from your home computer and index your favorite speech/web sites)
Re:Speech processing (Score:1)
The possiblities (Score:2)
$gatespeak =~ s/Microsoft/Micro\$oft/g;
$gatespeak =~ s/Windows/Winbloze/g;
$gatespeak =~ s/dominates the market/controls your lives/g;
$gatespeak =~ s/inovation/stealing and strongarm tactics/g;
To get this: "Micro$oft Winbloze controls your lives due to our huge stealing and strongarm tactics". Wow, you can actually get Billy boy to speak the TRUTH!
Re:Nice... (Score:1)
After a bit of prodding, I got whatever version of windows realplayer I have installed (4.x or G-something) to run under wine - I expect more recent ones ought to as well. It wasn't too hard, just "wine rvplayer.exe", but you've gotta save the link on the web page to a .rm file, then open that from withing the player.
Not perfect, but certainly better than nothing!
About that SMG comment (Score:1)
--
who cares about the NSA... (Score:1)
out of searching the Art Bell Show for the word
UFO *grin*
(for those of you who are not clued in to Art
Bell, he is famous for conspiracy theories)
:)
The NSA is gonna sue their ass (Score:1)
Re:Processing power and time? (Score:3)
Ha. For a great laugh check out the transcripts. (Score:1)
I relise that theres a disclamer saying that the transcripts wouldn't be exact. But I was expecting on or two words off.
Actualy, after looking through a couple more of these transcripts it seems to have a problem with the nature of Geeks in space. That is, when the voice changes it takes the program a little while to catch up. When there are long stretches of only one person talking it seems to do better. ("the most interesting thing that pops up today is that microsoft is set up a box and they have basically challenged the internet to crack it...")
Interestingly enough not a single episode of GiS seems to include the word "Taco".
filler (Score:1)
"fuck" search... This is more fun than babelfish! (Score:1)
"... such chaos these days with possibly of that david thinks not to use nuclear weapons but it could use a biological or chemical try to get the fuck you mentioned before newberg thousand level..."
or better yet...
"
...it gets more bizarre:
"... your occurred several koppel with the mask of a focus of of fuck 'em berry's school principals the jaw and the m. our washington studio with a few arctic of the papal trip to do that to us from...
...then... listen to the clips and pretend they're actually saying it.
I need a life.
_________
Wasn't that a Nirvana song? (Score:1)
Re:Speech processing (Score:1)
Re:Some interesting bytes. (Score:1)
Re:Some interesting bytes. (Score:1)
HTH,
Andrew Leonard
Webmaster
SpeechBot
SMG baby! (Score:1)
Buffy Rox!
Speech recognition worries (Score:2)
Echelon, anyone? (Score:2)
I had been skeptical of Echelon being able to do word recognition on phone conversations, but I expect that the NSA is ahead of private industry in this area too, so Echolon looks plausible.
--Kevin
Echelon!!! (Score:2)
oh, and first post too... maybe
Re:searches the whole transcript (Score:2)
For example, searching on Black 47 returns 2,000+ hits when using the default search, but 0 when using an exact phrase....
Linux OTOH returns only two matches... sigh. Actually I wonder how much that has to do with the confusion over the pronunciation(sp?), considering I've never met two people who say it with the same exact phonetics....
Oh well,
RobK
searches the whole transcript (Score:1)
So, you might get a result back that isn't quite what you are looking for.
Jordan
Re: (Score:2)
Dragon Systems (Score:3)
Re: (Score:2)
If the technology falls into the wrong hands (Score:1)
1)Every mention of Celine Dion out of my radio and TV.
2)Every mention of Bill Gates out of Slashdot posts.
Just wonderin'
But as someone who has spent hours upon hours researching old radio shows for a college assignment, it sounds like a real good idea.
Yummy (Score:2)
speechbot transcript (Score:2)
Methinks a lexical generator produces better speech than the triumvirate of
The speech is out there... (Score:1)
Hey, at least it'll make finding quotes and sound bites easier! The politicians will probably outlaw it, (for civilian use) of course...
Imagine what it could do if run against all the archived tape that CNN, or NBC, etc... have. Ah, the possibilities!
I wonder what Katz will have to say about this?
Another audio indexing system (Score:3)
http://www.gte.com/AboutGTE/gto/bbnt/speech/res
...which not only tells you what words were said, but who said them, and what topics were being talked about...
Hmm... Mars murder ritual rides? (Score:3)
Did a search for "Mars Probe" in the Science Friday show, and got this snippet:
Err... yeah. That would explain a great many things about space probes. Actually, I'm sure the textified show would be a lot more interesting than the real show. And then, we could shove it through Babelfish for added enjoyment...
I recently installed the ViaVoice beta for Linux, and found its recognition not quite ready for prime time... at least for my needs. I'd be surprised if radio shows, which often have people on fairly crummy phone connections, would be an ideal candidate for automated indexing.
check out this logic (Score:3)
"Grits" turns up 12
"Sex with animals" turns up 5.
"Your mother" turns up 200.
My conclusion: "Your mother is still almost ten times as important as suicide, sex with animals, and grits combined."
Remember that, always.
Reads like bad poetry. (Score:2)
settling rata at and legs
to the team network concept
fighting ends
the single monolithic entity
of those are all things
that the challenger are undermined
neither side I think
what I look at the to the the marcie katz
the respective committees
(I didn't change anything but adding line breaks.)
Good for a chuckle.
Re:Processing power and time? (Score:1)
Andrew Leonard
Webmaster
SpeechBot
Re:filtering out celine dion (Score:1)
Buffy (Score:2)
Mind you, I didnt know this until I did a web search for her.
Re:Nice... (Score:1)
I do understand Real's need to "encourage" people to purchase the RealPlayerPlus, because they need to make money to keep up the excellent low-bitrate R&D they've been doing. Unlike Microsoft, they can't just chalk it up to selling more NT Server software.
babelfish (Score:3)
ViaVoice (Score:1)
spooky (Score:2)
That being said, let me say that something like this scares the crap out of me. This sort of technology is exactly what the FBI had in mind when it began to pressure telecommunications companies to make their phone lines more tappable. Now I don't remember the exact figure, but they wanted something like 1% of the phones in any metropolitan area tappable at once. 1% of the phones in New York City is something on the order of 50,000 phones. Tell me how you're going to keep track of all of that without a computer monitoring 50,000 conversations and looking for key words. You can't.
Monitor 1% of the population's conversations for some suspect keywords like 'bomb', 'assasinate', 'cocaine' or perhaps 'open source' and you've got one scary computer-assisted big brother watching over everything. If you don't hear anything juicy, shift to another 1%. I suppose people have had the technology to do realtime speech recognition and filtering for some time now, but the idea of maintaining searchable archives of phone conversations (enter Speechbot) is a genuinely spooky privacy violation.
Now, any technology is only as good or as evil as the people who use it. I will be cautiously interested to watch what Speechbot evolves into.
Re:Linux = iMacs? (Score:1)
I wonder if, years from now when AT&T or some such august body patents it, I can claim a Slashdot post as "prior art".
Hurm....
--
Evan
Re:problem (Score:1)
Re:Echelon!!! (Score:2)
This would also be about the time UK piliticians were banned from Menworth Hill, an NSA listening post in the UK, which would have been a likely candidate for early deployment of such technology.
Processing power and time? (Score:2)
On an un-related note, the about page says that Compaq has a research lab in Australia... sweet.
-dr
Re:SPEECH TO TEXT rather! (Score:1)
Might make IBM manuals more friendly too!
Nice... (Score:2)
\rant{But does where is realaudio at? The company itself is worse (in its smaller domain) than microsoft, I mean their version numbering (5,G2,7) is absurd, their website pushes you to download the plus version of their player (ie the one you have to pay for), and their monopoly on video (and most of the sound) on the web lets them get away with it. I believe they have an ok product, but their marketing schemes are stuck in mid-nineties "pay-for-this-better-version-now-even-though-a-bet ter-free-one-will-be-out-in-three-months ."
Things like pointcast have died due to this type of scheme, but real is still staying strong. The linux (unix) install scenario, and html documentation is absurd as well, and to be honest the reason for this rant. We'll just have to wait until some sort of disruptive technology forces real to compete, instead of stagnate.}
As far as the implications of this technology, echelon, etc. I just can't wait until I can do boolean searches through my old phone calls. Not like they're listening anyway...
-Rich
Some interesting bytes. (Score:3)
Rob "CmdrTaco" Malda -- rob commander topple mall
Jeff "Hemos" Bates -- jeff in both states
Nate Ostendorf -- the husband or the smoke
I also searched for linux and I'll bet that it can't find any instances, because it doesn't translate it right. With all the different pronounciation possibilities.
It's a cool idea, but has a ways to go. Go Compaq.
yay.
Re:Reads like bad poetry. (Score:2)
The site's FAQ [compaq.com] admits to that (in not so many words)...
And in all fairness, they are not claiming to be a "transcript service" per se, though I can certainly see a lot of transcript writers losing their jobs in the future as the technology advances.
-dr
Re:The speech is out there... (Score:1)
Hrm. I can't help thinking of the so-called "stealth fighter", which was apparently fully operational in the late 70s/early 80s (erm, I haven't checked the exact dates on that).
Anyway, certain gov't TLAs always seem to be about 10 years ahead of what they're telling the rest of us. I wouldn't be surprised if the NSA [nsa.gov] has had open source drew barrymore since the E.T. days.
__________
filtering out celine dion (Score:2)
I'm gonna patent that.
That'll make me rich as f**k!
Re:Echelon, anyone? (Score:1)
Without the need to have thousands of words in the recognition engine it would definitely be much faster, and possibly much more accurate than current state of the art systems.
What`s worse, we might never find out!! (Score:1)
Re:Indexing TV via closed captioning (Score:1)
a) Close-captioning is not an exact representation of what was said. Quite often it is a paraphrase (but speech recognition is errorful, of course);
b) There are many many many useful sources of audio that don't have closed captioning. A meeting, for instance. Or a foreign broadcast (closed-captioning is much more prevalent in the US than in other nations).
Re:Impressive...? (Score:1)
HOWEVER, I do want to add that the system does run on standard, ordinary PC hardware. The indexer currently runs in 3x realtime (so a half-hour wavefile takes 1.5 hours to index) on a P3-500 running RedHat 6.0 with 512mb of memory. It deposits its product in an SQL-Server database on an equivalent PC running NT (no jeers, please). So the analysis and querying/browsing are decoupled.
We plan on having this down to realtime this year, both through algorithmic improvements and some additional hardware.
Re:Reads like bad poetry. (Score:2)
Remember (Score:1)
The Echelon folks wouldn't use this to transcribe phone conversations. They would use it to filter for interesting conversations and then appoint an agent to listen to the real thing. Save a lot of manpower that way.
Linux = iMacs? (Score:1)
Just did a search for "line next" (exact phrase), and got quite a bit of "and on the line next we have...", a la talk show introductions.
The only computer related ones that I found without searching too hard were two instances of "iMacs" being translated to "line next".
Incidently, as we all searched for our favorite phrases, I got a pretty good accuracy on finding "Rocky Horror", but got *many* spurious returns on "Tim Curry". It seems to like to put the phrase "Tim Curry" into people's speech, which might mean that it's using a list of names of famous people within it's dictionary. It's a good idea, and makes sense; I wish they had more info on the guts.
Oh, and speaking about the engine... I thought that it might store the transcripts in some sort of phonetic format, and then match your search phrase to it (i.e., "your not hyped" and "journal typed" both match the stored, transcribed phoentics). So I did some searches to match the same sections extracted, and they all seem to match. Conclusion: the transcripts appear to be stored as text, not phonetic symbols.
--
Evan
Re:Impressive...? (Score:1)
re: Online Speech Indexing (Score:1)
Speech processing (Score:1)
Hmm needs a little work (Score:1)
Are these algorithims Opensource by any chance? I looked into voice algoritims at one point, and it made my run back for my knuth book. I never found any real source for it. It is a fascinating subject and I am sure the opensource community could do a fantastic implentation of it.
I stop rambling now.
Re:Some interesting bytes. (Score:1)
------
Impressive...? (Score:1)
I'd love to see demos of this stuff - I could have this stuff filter the TV for news for me
Of course, I'm figuring the software that does the speech to text and indexing probably doesn't run on plain ol' PC hardware with Windows 98 or NT loaded