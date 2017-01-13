Become a fan of Slashdot on Facebook

 


Open Source Codec Encodes Voice Into Only 700 Bits Per Second (rowetel.com) 51

Posted by BeauHD
Longtime Slashdot reader Bruce Perens writes: David Rowe VK5DGR has been working on ultra-low-bandwidth digital voice codecs for years, and his latest quest has been to come up with a digital codec that would compete well with single-sideband modulation used by ham contesters to score the longest-distance communications using HF radio. A new codec records clear, but not hi-fi, voice in 700 bits per second -- that's 88 bytes per second. Connected to an already-existing Open Source digital modem, it might beat SSB. Obviously there are other uses for recording voice at ultra-low-bandwidth. Many smartphones could record your voice for your entire life using their existing storage. A single IP packet could carry 15 seconds of speech. Ultra-low-bandwidth codecs don't help conventional VoIP, though. The payload size for low-latency voice is only a few bytes, and the packet overhead will be at least 10 times that size.

  • Can this be used for two-way comms? conversion time from analog to the bitstream, across the net and converted back to voice, what's the delay?

    • It m___ cer___ly c_n!

      T__s is just th_ thing Telco_ and oth_r _____ prov___rs need to _ed__e usag_ and all__ more users __ lim_ted bandw__th circ__ts.

      He__. C_n y__ call m_ bac_ on my house__one?

      • Actually, our modems degrade gracefully. The least-protected bits go wrong with low bit-error rates, and the more protected bits survive. It takes a high bit error rate to kill it. So bit errors result in the speech being "off" but not dropping out.

  • Specific to English? (Score:3)

    by MichaelSmith ( 789609 ) on Friday January 13, 2017 @05:51PM (#53663917) Homepage Journal

    I wonder how it performs on tonal languages like Cantonese.

  • This issy awe so nudes (Score:3)

    by JoeyRox ( 2711699 ) on Friday January 13, 2017 @05:51PM (#53663923)
    I've been way thing for a new cold deck for joyce recordings.

  • 70 years * 365 days (roughly) * 24 * 60 * 60 * 88 bytes/sec / 1024 / 1024 / 1024 = 181GB

    Is my math off or are they assuming such people will only have a 15 year life span?

    • Re: (Score:2)

      by darkain ( 749283 )

      There are 256GiB MicroSD cards on the market right now. So yes, this is entirely possible.

      • Re: (Score:1)

        by Cramer ( 69040 )

        Only if that SD card were used EXCLUSIVELY for recording your voice, and it's ACTUALLY 256GB of usable space (capacity is always a lie, filesystems take up space too, etc.), and it doesn't fail over the decades, AND you don't live more than ~98 years, sure.

    • Nobody is assuming a 15 year life span.

      The question is, why do you assume that people talk nonstop 24 hours per day?

    • I got the same as you. 2.59GB/year
      Still damn impressive as 250GB m2 SSDs would hold ~ a century of voice.

      Now, assuming that you are not talking continuously (say you talk 1/3 of the day; 8 hours of continuous talking; that's a lot) then you're at 60 GB/70Yr and that *is* valid for a high(ish) end smartphone.

      • MicroSD capacity should increase faster than the rate data is added to the device.

      • I've been programming all day, and haven't said many words at all. There are people who talk for their entire work day, but they generally spend half their time listening and more processing something, so they may actually do 4 hours of speech or less in the work day. Most people don't really speak for more than a few hours per day.
    • You don't record the pauses. You do sleep, you know :-)
  • 15s/IP packet - this should lower operational cost for our government.

  • How does it sound? (Score:4, Interesting)

    by jandrese ( 485 ) <kensama@vt.edu> on Friday January 13, 2017 @05:56PM (#53663963) Homepage Journal
    That's starting to approach feeding the sentence into a speech to text system at one end and then sending the text over the air to be fed back into a text to speed converter.

    • good point. I suppose the low limit would be doing that while compressing the text stream via a pre-shared library and assuming optimum (no ECC required) communication channel?

  • A new codec records clear, but not hi-fi, voice in 700 bits per second -- that's 88 bytes per second.

    It's 87.5 bytes/s and it's that odd 1/2 byte that keeps it from being too fuzzy sounding for hi-fi.

  • They're skirting the bottom edge of comprehensibility, the voice in the samples is by no means "clear". You have to focus very closely to understand that is being said much of the time, and even then, repeated listenings are sometimes necessary.

    • Though thats often true of amateur radio generally.

    • Re: (Score:2)

      by msauve ( 701917 )
      "You have to focus very closely to understand that is being said much of the time, and even then, repeated listenings are sometimes necessary."

      You're describing all of the tech support calls I've had to make in the past few years.

    • Re: (Score:2)

      by tlhIngan ( 30335 )

      They're skirting the bottom edge of comprehensibility, the voice in the samples is by no means "clear". You have to focus very closely to understand that is being said much of the time, and even then, repeated listenings are sometimes necessary.

      In other words, it's being efficient.

      The brain has a very powerful voice and audio decoder. (In fact, the brain's wetware is so powerful to compensate for relatively poor sensors - but coupled with the power of the brain, they become much more powerful detection devi

  • " A single IP packet could carry 15 seconds of speec"

    great

  • A stream of sounds is difficult to parse. Converting it via various codecs won't change that or make it more useful. Converting the analog wave sounds into meaningful digital data (in the form of words as text, musical notation, specific fart parameters, a database of whale or bird calls, etc) is more helpful and efficient. Meaning can be extracted and/or analyzed. As someone else suggests, those can be converted back to a semblance of the original sequential stream of sounds (but why?).

    If you are communica

  • Do we finally have a 2400b mode? Would love to do digital but when existing FM transceivers. Due to HOA I can't (and yes have tried) do HF reliably.

