Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Facebook Social Networks

Fifty Percent of Facebook Messenger's Total Voice Traffic Comes from Cambodia. Here's Why (restofworld.org) 29

In 2018, the team at Facebook had a puzzle on their hands. Cambodian users accounted for nearly 50% of all global traffic for Messenger's voice function, but no one at the company knew why, according to documents released by whistleblower Frances Haugen. From a report: One employee suggested running a survey, according to internal documents viewed by Rest of World. Did it have to do with low literacy levels? they wondered. In 2020, a Facebook study attempted to ask users in countries with high audio use, but was only able to find a single Cambodian respondent, the same documents showed. The mystery, it seemed, stayed unsolved. The answer, surprisingly, has less to do with Facebook, and more to do with the complexity of the Khmer language, and the way users adapt for a technology that was never designed with them in mind. In Cambodia, everyone from tuk-tuk drivers to Prime Minister Hun Sen prefers to send voice notes instead of messages. Facebook's study revealed that it wasn't just Cambodians who favor voice messages -- though nowhere else was it more popular. In the study, which included 30 users from the Dominican Republic, Senegal, Benin, Ivory Coast, and that single Cambodian, 87% of respondents said that they used voice tools to send notes in a different language from the one set on their apps. This was true on WhatsApp -- the most popular platform among the survey respondents -- along with Messenger and Telegram.

One of the most common reasons? Typing was just too hard. In Cambodia's case, there has never been an easy way to type in Khmer. While Khmer Unicode was standardized fairly early, between 2006 and 2008, the keyboard itself lagged behind. The developers of the first Khmer computer keyboard had to accommodate the language's 74 characters, the most of any script in the world. It was a daunting task. Javier Sola, a Spanish-born, Phnom Penh-based computer scientist, was part of the team working on the initial KhmerOS project in 2005. "There are many, many more symbols in Khmer than in [the] Latin script," Sola, now executive director of Cambodian NGO the Open Institute, told Rest of World. On a Latin keyboard, a user could see all of the alphabet at once, making typing intuitive. But in Khmer, each key hosted two different characters, which required flipping repeatedly between two keyboard layers. Not only that, but limited fonts meant that some messages failed to appear if the recipient's computer lacked the same font as the sender's. Still, users made it work. Facebook became popular in Cambodia around 2009, just at the same time as cheap smartphones and internet access, which meant that its usage exploded. Today, it's still the country's most popular overall platform. But on a small smartphone screen, that same typing system became nearly impossible.

This discussion has been archived. No new comments can be posted.

Fifty Percent of Facebook Messenger's Total Voice Traffic Comes from Cambodia. Here's Why

Comments Filter:
  • Wow! Just think of what the Cambodian version of Slashdot must be like.

  • by pesho ( 843750 ) on Wednesday November 17, 2021 @12:25PM (#61996275)

    Sok said these types of workarounds make it more difficult for engineers working on machine learning to train AI in the language.

    Seems like users exchanging voice messages and typing using a combination of phonetic transliteration and abbreviated/misspelled Khmer script makes it hard for Facebook to sell its product (users) the way it does in other parts of the world.

    • The Cambodians seem to be smart enough to no let their culture be trashed this way.

    • Non-standardized spelling affects lots of things and was a nuisance in the west long before computers or AI. But sure, that's bound to include whatever is the kneejerk hot topic of the moment, like bashing facebook's targeted advertising.
  • French was widely used in Cambodia before the Khmer Rouge.

    • In North Africa, most people use french letters instead of Arabic ones for messaging. I dont know about others, but personally I find that the 2 extra Arabic letters compared to the Latin alphabet make the whole keyboard a little smaller (since every letter has to shave off a couple pixels to make space for the extra ones) making the overall typing experience more prone to errors.

      Studying Korean opened my eyes to how efficient typing can be, though. switching from the mobile Arabic input to the Hangeul one

    • That was a criteria for sending someone to the killing fields. Or glasses.

  • They could have investigated how a Thai keyboard works :P
    After all the Script and writing systems are very similar.

  • I tend to believe the observations, but the "statistical" population seems to be tiny. Are there any significant numbers?

    • by ranelen ( 2386 )

      Anecdotally I can tell you its fairly true. Most khmer speakers I know tend to use mix of voice messages, khmer with the standard english keyboard, english itself and khmer. Even when using something like google translate, people will usually try to use the voice transcription before using the khmer keyboards.

  • by v1 ( 525388 ) on Wednesday November 17, 2021 @01:08PM (#61996413) Homepage Journal

    they look fun to use!

    chinese keyboard [9gag.com]

    • by lucasnate1 ( 4682951 ) on Wednesday November 17, 2021 @01:25PM (#61996457) Homepage

      Most Chinese type simply by writing in english-like script (pinyin) and having auto conversion. I.e. you write "ni bu dong putonghua", and there's a t9 like algorithm that replaces it with chinese letters that sound the same.

      • by v1 ( 525388 )

        yep the Japanese fomal language (Kanji) can be just as bad, but also has a simpler phonetic equivelent (Kana) that only has 46 or so characters. Kanji on the other hand requires about 10,000 for basic comprehension, and goes over 40,000 in total.

        I wonder if that 70-some characters they're looking at are phonemes in the same way Kana is?

        • by alantus ( 882150 )
          Kanji is not the formal language, and kana is not the phonetic equivalent. They are different script systems, both of which are used intermixed in sentences, regardless of formality. Also, the list of "regular use" kanjis needed for basic comprehension consists of 3,246 characters.
    • chinese keyboard [9gag.com]

      I lived in China for years and the only place I saw a keyboard like that was in a museum.

      Chinese people use a normal qwerty keyboard and enter hanzi phonetically using pinyin.

      Chinese is slightly slower than English to type but faster to read. Chinese also saves paper. For the same font size, a Chinese document is typically about half as thick as the same document in English.

  • by Dan East ( 318230 ) on Wednesday November 17, 2021 @01:32PM (#61996487) Journal

    Those speaking Germanic languages are very lucky that our languages just happened to end up with a number of traits that are very conducive to use in digital devices.
    Things like:
    Small character sets
    Spaces delineating words
    Static letters not modified by other letters

    The cognitive skills company I work with decided to open some centers in Thailand, and so we needed to add support for the Thai language. Up to that point, our main headaches had been supporting right-to-left languages. Thai, well, that was another story entirely. Apparently Cambodian is very similar to Thai, so I can see why they have such difficulty.

    In Thai, you have consonants and vowels like many other languages. However, they are not written as discrete glyphs simply written left to right. You write a consonant, then each vowel that follows gets stacked on top of the consonant, up to a few high, until you end the word or have another consonant. So you have base letters with other letters stacked on top of them. However there are cases where vowels are written as their own discreet glyph, like when they are a word by themselves or start a word.

    However, the thing that really exacerbates that to another level is that Thai does not use spaces between the words.

    So if you try to use Thai in a naïve editor / renderer, it will eventually have to word-wrap long lines of text. However, since there are no spaces, the renderer will have to wrap after the last letter that fit on the line. Not only is that jarring in any language, having words broken in two, but it's flat-out wrong in Thai, because then you may have a vowel that was separated from its consonant, and thus the vowel is not drawn above its base consonant as an accent type character.

    My solution to that ended up having an entire dictionary of all Thai words, and I used a greedy algorithm to match words by maximum length. It then inserts Unicode non-printing spaces between the words, so our word-wrap routines work correctly.

    Now let's not even get into things like low-level drawing, where you center or fit text to a rectangle. You can't use the typical things like average letter metrics to make quick estimates to decide how to wrap lines to maximize font sizes, since the row heights vary depending on the words that are in it.

    Also, the Thai government literally commissioned and created their own font to support the language, since it was being neglected in general by tech globally, and there were so many issues with whatever fonts that were out there. So we also use their official font in our output. I'm not sure if Cambodian can make use of the Thai font, but if not, that may also be a big part of their problem.

    • Our luck lies in the fact that most of our devices were first designed to be used by English speakers.

      In some alternative universe where computer technology was developed primarily in Cambodia, and alternate version of me is struggling mightily to enter this comment into a Cambodian phone with a Khmer-oriented GUI design that had English support tacked on as an afterthought :)

      • by theCoder ( 23772 )

        Or did the features of English make it easier for those first designers to create something that would be fairly easy for other English speakers to interact with? Kind of the same way that all the features of Earth that keep humans alive (liquid water, magnetic field, largish moon to help intercept asteroids, etc) also allowed humans to develop in the first place.

        Also, in many ways English (and other Germanic languages) have evolved in concert with technology. Before Guttenberg, there were many variations

    • by r2kordmaa ( 1163933 ) on Wednesday November 17, 2021 @02:25PM (#61996631)
      In this the main feature of German is that Gutenberg spoke German. No better way to fix all your printing issues than to invent printing to begin with. Cyrillic has all the same features you listed, space delineation, small character set, static letters, still it sucks for daily life. Much the same for Greek. Really, any script other than Latin has become just plain masochism since the dawn of internet.
      • Cyrillic has all the same features you listed, space delineation, small character set, static letters, still it sucks for daily life

        I'm not sure what you are saying here. I have found no problem with cyrillic Russian on a typical keyboard.

    • You write a consonant, then each vowel that follows gets stacked on top of the consonant, up to a few high, until you end the word or have another consonant.
      That is incorrect. The vowels are put around the consonant, left (e), right(a), top (i) or bottom (u).
      If you have a decent OS, you should not have any problems to support Thai.

      My solution to that ended up having an entire dictionary of all Thai words, and I used a greedy algorithm to match words by maximum length. It then inserts Unicode non-printing s

    • While this was an issue in the 1980s, now a phonetic input system can handle pretty much any language. That is because all languages have fundamentally similar consonant-vowel structures. So once one can represent those and then use a predictive algorithm to map them to commonly used representations, it's done. Trying to use the native script for input is often a fools errand unless the breakdown to a simplified phonetic system has already happened historically in that language (e.g. the separation of co
  • This strikes me as the sort of thing that would be common knowledge among ordinary Cambodians.

    Cambodia has over 15 million people. Did Facebook really have no employees who were either Cambodian or had friends/family members in Cambodia where they could ask "hey, what's going on with all this audio messaging"?

2.4 statute miles of surgical tubing at Yale U. = 1 I.V.League

Working...