Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Google Open Source

Google Open-Sources Live Transcribe's Speech Engine (venturebeat.com) 14

Friday Google open-sourced "the speech engine that powers its Android speech recognition transcription tool Live Transcribe," reports Venture Beat: The company hopes doing so will let any developer deliver captions for long-form conversations. The source code is available now on GitHub.

Google released Live Transcribe in February. The tool uses machine learning algorithms to turn audio into real-time captions. Unlike Android's upcoming Live Caption feature, Live Transcribe is a full-screen experience, uses your smartphone's microphone (or an external microphone), and relies on the Google Cloud Speech API. Live Transcribe can caption real-time spoken words in over 70 languages and dialects. You can also type back into it — Live Transcribe is really a communication tool. The other main difference: Live Transcribe is available on 1.8 billion Android devices. (When Live Caption arrives later this year, it will only work on select Android Q devices.)

This discussion has been archived. No new comments can be posted.

Google Open-Sources Live Transcribe's Speech Engine

Comments Filter:
  • by rlwinm ( 6158720 ) on Sunday August 18, 2019 @11:40AM (#59099408)
    I did a quick clone of this repo. It doesn't look like the engine at all. This looks like the code that hurls off audio to Google's servers. Nothing really interesting here.
    • by Chozabu ( 974192 )
      Can we use it for free? Do we need an API key? Is usage limited?

      If it is free to use - great news. But I get the impression this is not the case from a glance at the readme.
    • That lie was clear from the summary when they said it relied on an API. That always would mean that it is just a client, not the "engine."

    • Reading the (linked, not Slashdot) article, I've come to the same conclusion. It appears to be only an open source client app, the heavy lifting is still done in google's cloud.

    • >came here to say this
      a callback to their api != release the code
      this is almost like a free ad for google
      besides, you also gift them the data you send. this is horrible, not open source at all. change this headline for journalism integrity's sake /thread.

  • by nickovs ( 115935 ) on Sunday August 18, 2019 @11:58AM (#59099454)

    They have released the source code to the library that talks to their servers, as long as you have an API key. The engine that does all the hard work is still closed source, still requires Google's permission to use, still requires you to be on line and of course still allows Google to collect all your data.

    This isn't about Google magnanimously releasing their code to the community so that others can build on their science and improve the state of the art. This is about Google making it possible for people building things other than Android applications to buy into Google's services.

    • The last slashdot editor who made it past "hello world" was the Taco himself.

    • This repository contains the Android client libraries for communicating with Google's Cloud Speech API that are used in Live Transcribe.[...] The libraries provided are nearly identical to those running in the production application Live Transcribe. They have been extensively field tested and unit tested. However, the tests themselves are not open sourced at this time. Agreed, it's a scam.
  • This doesn't render speech. We had programs that could render speech from text back in DOS (Soundblaster drsbaitso). No server needed. Back in Windows 3.1 you could buy Daily Plan-It for $20, let you ten voice commands to lance programs, move and close windows, and do speech-to-text, 386, 4 megs of ram, no network connection.

    Programmers have gotten too dependent on remote servers for shit that used to be done locally. Heck, I didn't need a network connection to have my 4K Radio Shack turn the lights on an

    • Can't agree with BarbaraHudson more - back in the late nineties I had Dragon Naturally Speaking running really well on a (if memory serves) single core Pentium 450MHz with a whopping 32MB of RAM, something like a 10GB disk and NT4. The PCs of that day are now vastly outclassed by even "budget" phones; there seems no good reason why this can't work well locally - except, of course, that means all that lovely audio isn't heading to Google, Amazon, Microsoft or other data-slurping outfits.
    • And this right here, is why we ditched main frames. The sad reality is everyone forgets.

      I myself keep my hands on my pc at all costs. I do not want to send all this info off to google to be mined. Im iritated enough I can't remove google assistant, that keeps telling me the where it thinks I want to eat.

  • by Martin S. ( 98249 ) on Sunday August 18, 2019 @06:12PM (#59100236) Journal

    The quality of the voice recognition was remarkable, far better than anything I tested. The app lacked any feature to save the text making it unsuitable for my needs. It is intended for use by the hard of hearing in noisy environments and I wanted it to automate note taking.

    • by Andyvan ( 824761 )
      I agree. My dad is nearly deaf, and this has completely transformed how we communicate with him. The transcription is not always right, and I wish there was a quick way to clear the screen, but it's an amazing app. It even lets me make the font really big (he's 91 and doesn't see so well either) so that it's easier for him to see.
  • Here at Google we like open source.
    That is why we have open-sourced a specially crafted version of curl so that you can offer us your data for our proprietary stuff...

Gravity is a myth, the Earth sucks.

Working...