New Unicode Bug Discovered For Common Japanese Character "No" 196

Posted by timothy on Saturday July 18, 2015 @07:32AM from the perfectly-nice-in-garbage-out dept.

AmiMoJo writes: Some users have noticed that the Japanese character "no", which is extremely common in the Japanese language (forming parts of many words, or meaning something similar to the English word "of" on its own). The Unicode standard has apparently marked the character as sometimes being used in mathematical formulae, causing it to be rendering in a different font to the surrounding text in certain applications. Similar but more widespread issues have plagued Unicode for decades due to the decision to unify dissimilar characters in Chinese, Japanese and Korean.

This discussion has been archived. No new comments can be posted.

New Unicode Bug Discovered For Common Japanese Character "No"

Load All Comments

Search 196 Comments Log In/Create an Account

Comments Filter:

No? (Score:2)

by hankwang ( 413283 ) writes:

It tried to RTFA, but it was in Japanese! I thought Japanese didn't have a word for "no":
Japanese also lacks words for yes and no. [wikipedia.org] The words "hai" and "iie" are mistaken by English speakers for equivalents to yes and no, but they actually signify agreement or disagreement with the proposition put by the question: "That's right." or "That's not right.
- - Re: (Score:2)
    
    by AmiMoJo ( 196126 ) writes:
    
    Correct. Unfortunately Slashdot does not allow me to enter Japanese text, hence the confusion.
    This is what happens when I type that character in Japanese: ã®
  - Re: (Score:2)
    
    by Applehu Akbar ( 2968043 ) writes:
    
    Most Japanese characters in the two phonetic alphabets stand for a consonant tied to a vowel. The no phonetic in grammar indicates possession. In the phrase Katoh no boshi (Kato's cap) all of the other characters would be the Chinese-derived kanji.
What bug? (Score:5, Informative)

by Ark42 ( 522144 ) writes: <slashdot@morpheA ... inus threevowels> on Saturday July 18, 2015 @07:53AM (#50134621) Homepage

The character in question is Hiragana "No", codepoint U+306E. As far as I can tell, this has existed since Unicode 1.1 and there are no differences in the Unicode metadata when compared to any other Hiragana glyph. It is marked as IsAlphabetic=True, Category=Other Letter, and NumbericType=None for example. So are all the other common Hiragana glyphs. If there is a bug, it's clearly with some specific application, and not Unicode or Unicode metadata. Compare http://www.fileformat.info/inf... [fileformat.info] with any other Hiragana glyph, like http://www.fileformat.info/inf... [fileformat.info] (Hiragana "Ha").

Share
twitter facebook
- Re: (Score:3, Interesting)
  
  by AmiMoJo ( 196126 ) writes:
  
  The bug is that the Japanese, Chinese, Korean and mathematical versions of this character all share a common code point. There is no reliable way for an application to select the right character and render it properly.
  You can't mix C/J/K and mathematics in Unicode, which is a new bug beyond just the failure to support mixing C/J/K.
  - Re: (Score:2)
    
    by Florian Weimer ( 88405 ) writes:
    
    “-” looks very differently in text and formulas, too. I don't get why people assume that you can get nice rendering without additional markup.
    - Re: (Score:2)
      
      by Ark42 ( 522144 ) writes:
      
      Except while that is called "Hyphen-Minus" and can be used for two things, Unicode does try to solve that problem by having:
      00AD Soft Hyphen
      2010 Hypen
      2011 Non-Breaking Hyphen
      2012 Figure Dash
      2013 En Dash
      2014 Em Dash
      2015 Horizontal Bar
      2212 Minus Sign
      2796 Heavy Minus Sign
      There is no "Mathematical Hiragana No" glyph defined by Unicode, and as such, it should never be rendered in a different font just because somebody *might* use it in a formula. The application is wrong, and there is no bug in Unicode.
  - Re: (Score:2)
    
    by Ark42 ( 522144 ) writes:
    
    I'm aware of the problems with the han unification and certain Kanji being displayed "wrong" because the Chinese equivalent is drawn significantly different from the Japanese Kanji, but this doesn't seem to be anything close to that kind of problem. I'm also aware of the Unicode block U+1D400 "Mathematical Alphanumeric Symbols" which is what should be used for formulas. Any application that is rendering one particular character in the Hiragana block in a different font than the rest of the Hiragana block, i
  - Re: (Score:2)
    
    by amake ( 673443 ) writes:
    
    There are no Chinese or Korean versions of this Japan-specific character. This is the first time I've ever heard of a "mathematical use" of this character, and I suspect the vast majority of users would be surprised at this as well.
    - Re: (Score:2)
      
      by AmiMoJo ( 196126 ) writes:
      
      It's been imported to China: http://portal.nifty.com/koneta... [nifty.com]
      - Re: (Score:2)
        
        by butlerm ( 3112 ) writes:
        
        How can you tell that any of those pictures are from China? They all look like they are from Japan, a country that makes extremely heavy use of Chinese characters (much more than Korea for example), to me.
        
        Re: (Score:2)
        
        by AmiMoJo ( 196126 ) writes:
        
        All the text is in Chinese. The blog post itself in in Japanese, and it says that the pictures are of China.
  - Re: (Score:2)
    
    by NostalgiaForInfinity ( 4001831 ) writes:
    
    The bug is that the Japanese, Chinese, Korean and mathematical versions of this character all share a common code point. There is no reliable way for an application to select the right character and render it properly.
    What you probably mean is that an application can't select the right glyph based on the Unicode string. That is correct, but nothing specific to CJK. Without markup or metadata, Unicode often won't render as expected by readers even in Western languages. Unicode used to have its own system fo
- Re: (Score:2)
  
  by Megane ( 129182 ) writes:
  
  My guess is that it can be used in certain numerical contexts, sort of like "No." ("number") in English. It can mean a quantity as in "n no x" (ippiki no neko), and maybe some other contexts. So something, probably an application, was coded to think of it as used in numerical contexts. The specific instance is about LaTeX, which is one of those ancient apps like emacs that is so old it had to create everything from scratch, so it's possibly specific to LaTeX or some port thereof.
Nitpick (Score:5, Informative)

by msobkow ( 48369 ) writes: on Saturday July 18, 2015 @08:21AM (#50134695) Homepage Journal

This is not a "Unicode bug". It is a rendering bug exhibited by some applications.

Share
twitter facebook
- Re: (Score:3)
  
  by AmiMoJo ( 196126 ) writes:
  
  How is an application supposed to know if a random character is Japanese, Chinese, Korean it mathematical? It would need some kind of strong AI to interpret and understand the text. It's a Unicode bug, merged characters are impossible to render correctly all the time because apps are forced to guess which font to use.
  - Re: (Score:2)
    
    by msobkow ( 48369 ) writes:
    
    Ask the people who wrote the software that doesn't exhibit the bug. Obviously it can be done.
    - Re: (Score:3)
      
      by AmiMoJo ( 196126 ) writes:
      
      Software that doesn't have this bug only avoids it by not supporting mathematical symbols. So far there is no known software that avoids the CJK confusion problem either.
      Most software doesn't even try. How many programmers are even aware of the issue? No Unicode library is immune. It's a problem with the standard that can only be fixed by starting fresh with about 150,000 new CJK characters, and then updating all fonts and libraries to handle translation and equivalence.
    - Re: (Score:2)
      
      by thegarbz ( 1787294 ) writes:
      
      In other news a new bug is shown to exhibit a behaviour where some mathematical programs substitute a Japanese character into the formula.
      The problem is it can't be done. Not without intelligent user / designer input (such as signifying that the unicode to be displayed is Japanese and not a maths formula). If an application is correct in determining one context it will be incorrect in determining the other.
    - - Language markup (Score:2)
        
        by tepples ( 727027 ) writes:
        
        Please name said software.
        Any HTML renderer ought to be able to tell an element with lang="zh-Hans" (Chinese using simplified characters) from one with lang="ja" (Japanese).
  - Re:Nitpick (Score:4, Informative)
    
    by Kjella ( 173770 ) writes: on Saturday July 18, 2015 @10:18AM (#50135097) Homepage
    
    How is an application supposed to know if a random character is Japanese, Chinese, Korean it mathematical? It would need some kind of strong AI to interpret and understand the text. It's a Unicode bug, merged characters are impossible to render correctly all the time because apps are forced to guess which font to use.
    Except font encoding has never been part of the character encoding, you might want your English text in Arial, your French in Times New Roman and the formula in Courier, but Unicode doesn't encode that. You might argue that this is not a bug, that it's simply out of scope and should be solved by a higher level encoding like <font="some japanese font">konnichiwa</font><font="some chinese font">ni hao</font> and not plaintext Unicode. That's what the Unicode consortium says [unicode.org] and if you express it as simply a style issue, it actually sounds plausible.
    On the other hand, you might argue that there's no reasonable way to map a "unihan" character to a glyph except as a band-aid since the CJK styles are distinctly different and so any comprehensive font should have three variations, it shouldn't take three fonts to make a mixed CJK document look correct just one. That this information belongs on the lowest level and should be passed along as you copy-paste CJK snippets or pass them around in whatever interface or protocol you have, otherwise everything will need a document structure and not just a string.
    I don't think they should "unmerge" and duplicate all the han characters, that'd be silly. What they should do is add CJK indicators - say HANC, HANJ, HANK like for bi-directional text, only simpler with no nesting just one indicator applying until superseded by another. Like (HANJ) konnichiwa (HANC) ni hao and the former will render as a Japanese han, the latter as a Chinese. If it doesn't have any indicator, well take a guess. Am I missing something blindingly obvious or would this trivially solve the problem?
    
    Parent Share
    twitter facebook
    - Re: (Score:3)
      
      by AmiMoJo ( 196126 ) writes:
      
      I agree, font encoding should not be part of the character encoding. Unicode even screws that up though, because there are things like text direction marks in it. Anyway, the problem is that often you have text without metadata. A file name, audio file metadata, a plain text database entry etc. You have to pick a font to render it, and the choice depends on the language because thanks to Unicode it's impossible to have a universal all-language font.
      You could have meta characters as you suggest, but that isn
      - Re: (Score:2)
        
        by Kjella ( 173770 ) writes:
        
        You could have meta characters as you suggest, but that isn't what Unicode is supposed to be for. It's a character encoding scheme, not a metadata encoding scheme.
        Actually I was thinking of it more like a "sticky" composite character, like you can have a + circle = å you'd have unihan + HAN(C|J|K) = "right" glyph while:
        a) Extending existing single-language CJK documents with just one character
        b) Preserving backwards compatibility with all current CJK systems
        c) Avoiding any complex CJK conversion functions
        d) Creating a simple way to override with "show as C/J/K"
        It would require adding a bit of intelligence to copy-paste for preservation, like:
        (HANC)abcde -> c
        
        Re: (Score:2)
        
        by AmiMoJo ( 196126 ) writes:
        
        That wouldn't really improve things IMHO, because you would still be reliant on the application knowing how to handle the character. In practice what would you do, add it to the start of file names? Then on all current software your filename would start with a little box representing an unknown character. The whole concept of composite characters is ridiculous as well, they should all get their own code points and let the font system handle saving some memory by re-using parts of glyphs. Otherwise your simp
        
        Re: (Score:2)
        
        by Kjella ( 173770 ) writes:
        
        That wouldn't really improve things IMHO, because you would still be reliant on the application knowing how to handle the character. In practice what would you do, add it to the start of file names? Then on all current software your filename would start with a little box representing an unknown character.
        Yes, until the software got updated to treat it as a non-printing character but it wouldn't make everything unreadable, there's bad and there's much much worse.
        The whole concept of composite characters is ridiculous as well, they should all get their own code points and let the font system handle saving some memory by re-using parts of glyphs. Otherwise your simple character count suddenly requires a massive look-up table of composite characters.
        It already does for a huge number of reasons. Oh and if you thought giving every character a code point would mean a 1:1 mapping to glyphs that's still wrong, many characters map to alternate glyphs depending on the context. For example Arabic and Latin cursive characters substitute different glyphs to connect glyphs together depending on whether the
  - Re: (Score:2)
    
    by Megol ( 3135005 ) writes:
    
    The problem is outside the problem domain Unicode attempts to solve so it isn't strange it doesn't solve it. For some other problems Unicode try to solve the result is a mess (example: bidirectional text) so that is probably a good thing.
  - Re: (Score:2)
    
    by GuB-42 ( 2483988 ) writes:
    
    First of all, the hiragana "no" is always Japanese, not Chinese, not Korean. The CJK unification is only about han characters (in Japanese, that's kanji).
    As for maths, there are usually markers to indicate we are in an equation, which makes sense because Unicode is not powerful enough for this : fractions, integrals, matrices, etc... cannot be rendered with just code points. So in this case Unicode provide the characters (roman and geek letters, numbers, mathematical symbols, the hiragana "no", etc...) and
  - - Re: (Score:2)
      
      by Megane ( 129182 ) writes:
      
      This is not a unified character, it is Japanese-only. Some program (apparently LaTeX) is using the wrong font because it thinks it is part of a mathematical equation, even to the point of showing the wrong font for the character in a font character viewer window.
JUst a rendering problem? (Score:2)

by Applehu Akbar ( 2968043 ) writes:

The character in the Unicode table looks like a mashup of the hiragana (grammar-forming) version of the character, and the katakana (used as we do italics) form.
They're trying to unify *similar* characters (Score:5, Informative)

by ciaran2014 ( 3815793 ) writes: on Saturday July 18, 2015 @10:25AM (#50135123) Homepage

A lot of people complain about the idea of unification without understanding it. I can't judge if unicode's unification is great or awful. The English-speaking media constantly says it's awful, but it's usually clear the authors don't know what unification is, who's driving it, or how unicode's work compares to what existed beforehand, so they can only be ignored. (They're sometimes trying to spin up some clickbait about ignorant westerners imposing blah blah blah on Asia, which just shows they no nothing about the topic.)
The issue:
There's a certain number of symbols which have been copied from one East Asian language to another. They're the same symbol, so unicode has one slot for that symbol. Then there's a second category where the symbol has been copied, but one group draws it a little different (the Japanese might like to put a little flick at the end of one line, or the Chinese draw the line a little slantier). And a third category where one group has developed a simplified symbol, which means again the traditional and the simplified symbols are the same thing but drawn differently. The two symbols are equivalent, the new one is just a new suggestion for how to draw it.
Unification is about having one slot for the symbols in categories two and three and leaving it to the font to decide how to display it.
(Unicode uses more precise terms, but I'm calling them "symbols" and "slots" for simplicity.)
A disadvantage to this approach is that there can't be a font which would display a symbol both the way a Japanese would draw it and the way a Chinese would draw it. Fonts have to choose one style to draw each unified symbol.
An advantage of this approach is that new languages and dialects can be added supported without needing another 100,000 slots per language or dialect (we do all know there are more than three East Asian languages, don't we?), and it's much easier for fonts to add support for all the East Asian languages because once they've done Chinese, Japanese is automatically almost finished.
Here are some example symbols:
https://en.wikipedia.org/wiki/... [wikipedia.org]
unicode.org's FAQ also has clarifications:
If the character shapes are different in different parts of East Asia, why were the characters unified?
http://www.unicode.org/faq/han... [unicode.org]
Isn't it true that some Japanese can't write their own names in Unicode?
http://www.unicode.org/faq/han... [unicode.org]
(All that said, it's been years since I looked into this so there's a chance I've gotten some detail wrong, but I'm confident it's a good summary of the issue.)

Share
twitter facebook
- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  An advantage of this approach is that new languages and dialects can be added supported without needing another 100,000 slots per language or dialect (we do all know there are more than three East Asian languages, don't we?), and it's much easier for fonts to add support for all the East Asian languages because once they've done Chinese, Japanese is automatically almost finished.
  The first one isn't really an advantage, since there is no shortage of code points. There are massive disadvantages though.
  From a software point of view it would be good to have universal fonts that can render any Unicode character correctly for anyone in the world. The Unicode consortium has tried to support this by splitting some of the more distinct symbols into separate code points for each language, but it's far from complete and every new version adds many more. The FAQ is a joke - when people point o
  - Re: (Score:2)
    
    by ciaran2014 ( 3815793 ) writes:
    
    Thanks for this reply!
    Can you give me an example of a Japanese name that can't be written in unicode? I keep hearing English speakers mention this problem but I've never seen exactly what the problem is.
  - Re: (Score:2)
    
    by ciaran2014 ( 3815793 ) writes:
    
    > it would be good to have universal fonts that can render
    > any Unicode character correctly for anyone in the world
    But a line has to be drawn between substance and style. There are two (main) ways to draw the number 4. One has a slanty line and is closed at the top, the other is made of straight lines and is open at the top. Or the number 7. For English speakers it's two lines, but for French speakers there's also a horizontal bar across the middle. Should unicode have two 4's and two 7's, or should
- - Re: (Score:2)
    
    by ciaran2014 ( 3815793 ) writes:
    
    > you are summarizing A issue, not THE issue the author was making up.
    Yes, my post only relates to the last line of the summary.
In other news (Score:2)

by rabbin ( 2700077 ) writes:

Some slashdot editors have failed to notice that incomplete sentences, which are less and less common in the first sentence of slashdot summaries.
Some users have noticed that the Japanese character "no", which is extremely common in the Japanese language
Can anyone illustrate? (Score:4, Insightful)

by BlueMonk ( 101716 ) writes: <BlueMonkMN@gmail.com> on Saturday July 18, 2015 @10:50AM (#50135233) Homepage

I have been reading the comments for 20 minutes because I don't understand Japanese, but I still don't understand the problem. There's a Japanese character called no, it looks very much like a lowercase English/Latin "e" rotated clockwise about 80 degrees and then flipped over the vertical axis. Is this being mixed up with something else or rendered wrongly? Can anybody provide examples of what it's getting mixed up with or how or where it's being rendered improperly?

Share
twitter facebook
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  Here's a picture [twitter.com]. Notice that the character at the end is rendered in a different font than the rest of the characters. It's not a critical bug, the text is still legible, just an annoying cosmetic bug.
- Re: (Score:2)
  
  by Actually, I do RTFA ( 1058596 ) writes:
  
  I can give an example, if you don't mind me running to greek. Imagine some program renders mathmatical symbols differently from text. Imagine that someone writes out, using unicode, the formula for the area of a circle. No problem, right? The pi is clearly a math symbol. But imagine the same thing if you were reading greek. And beyond that, imagine if all the greek you read though pi was being used in a mathematical sense.
  - Re: Can anyone illustrate? (Score:2)
    
    by BlueMonk ( 101716 ) writes:
    
    What I still don't understand is, if there's only one code point for this character, where are the multiple renderings coming from? Multiple fonts? Is the source of the problem that Japanese fonts are providing a bad glyph/rendering for this character that doesn't match the style of the rest of the font, or is it that they are unable to provide both glyphs because there's only one code point? Would there still be a problem if they just changed their glyph to the other style; could this just be considered a
- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  It's rendered in a way that a Japanese person could read it, but looks ugly because software can't tell if it is Japanese, Chinese or mathematical. It's rather jarring in the middle of sentence and makes the output unsuitable for publishing without manual editing.
  This is due to Unicode assigning the same code to the Japanese, Chinese and mathematical versions. It would be like they tried to merge the Latin "o" and Cyrillic "o". Imagine if every "o" character you wrote was rendered in a different font to all
- - Re: (Score:2)
    
    by BlueMonk ( 101716 ) writes:
    
    So, pardon my apparent inexperience with Unicode, fonts and glyphs, but this looks like an application or framework issue wherein someone decided that we should switch fonts in the middle of a string if there's another font that contains a glyph for the character we're after in some circumstances. Is that what's happening? Why shouldn't all text drawing operations be restricted to the currently active font, and make it the responsibility of the application developer and user to pick a font that contains all
Timothy can't write in English. (Score:2)

by Gibgezr ( 2025238 ) writes:

Some users have noticed that the Japanese character "no", which is extremely common in the Japanese language (forming parts of many words, or meaning something similar to the English word "of" on its own).
That isn't even a sentence in English. It is extremely grating to read crap like this, and it does not convey much about the story. .
- Re:Is it the same as in Chinese? (Score:5, Funny)
  
  by Chris Mattern ( 191822 ) writes: on Saturday July 18, 2015 @07:56AM (#50134627)
  
  Like Ã¦ or ÃÃÂ¼Y
  If so, seems many Chinese website will have problems too, becuase it's used so often in Chinese.
  As you have just discovered, Slashdot cleverly avoids all Unicode bugs by not supporting Unicode at all.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by ChunderDownunder ( 709234 ) writes:
    
    meanwhile the folks at soylent implemented it ages ago.
    With all the effort wasted on 'beta', I wonder how much of the open source slashcode remains.
    - Re: (Score:2)
      
      by KiloByte ( 825081 ) writes:
      
      Actually, slashcode does support Unicode, all that needs to be done for /. to get Unicode is reconfiguring the database (and converting old comments, I guess).
      - Re: (Score:2)
        
        by Carewolf ( 581105 ) writes:
        
        Actually, slashcode does support Unicode, all that needs to be done for /. to get Unicode is reconfiguring the database (and converting old comments, I guess).
        No, it already works. It was active for a while some 10 years ago, but was removed because it was hard to sanitize. You could easily write you own comment score by reversing direction at the right time.
        Still they could reactivate it if they just found a reasonable way of sanitizing features they don't want.
        
        Re: (Score:2)
        
        by jones_supa ( 887896 ) writes:
        
        No, it already works. It was active for a while some 10 years ago, but was removed because it was hard to sanitize. You could easily write you own comment score by reversing direction at the right time.
        Still they could reactivate it if they just found a reasonable way of sanitizing features they don't want.
        Dude, all other websites support Unicode. Sanitizing it properly cannot be rocket science.
      - Re: (Score:2)
        
        by interval1066 ( 668936 ) writes:
        
        If only everyone just used UTF8 encoding. Unfortunately, Microsoft insisted on using UTF16 and now here we are...
        
        Re: (Score:2)
        
        by KingMotley ( 944240 ) writes:
        
        Considering that Windows NT was around *before* UTF-8, it would have been rather difficult to implement it. What you really meant to say was, unfortunately, standards committees are often too slow to implement things like UTF-8 in a timely manner.
  - Re: (Score:2)
    
    by Megane ( 129182 ) writes:
    
    Slashdot does support Unicode (assuming your browser can be convinced to post in the right encoding). It just happens to have most of the code points (basically everything above U+00FF) blacklisted.
- Re: (Score:3)
  
  by AmiMoJo ( 196126 ) writes:
  
  It's a Unicode bug. Unicode tries to merge different characters into a single code point, because long ago they had the same origin. This particular character exists in Japanese, Chinese, Korean and mathematics, so can be rendered four different ways, but they all share one code point.
  Applications have to guess what font to use. Being a mathematical program, this one defaults to the system language (Japanese) but has logic to detect this "no" character and render it in a different font. It isn't clever enou
  - Re: (Score:2)
    
    by Carewolf ( 581105 ) writes:
    
    It's a Unicode bug. Unicode tries to merge different characters into a single code point, because long ago they had the same origin. This particular character exists in Japanese, Chinese, Korean and mathematics, so can be rendered four different ways, but they all share one code point.
    Applications have to guess what font to use. Being a mathematical program, this one defaults to the system language (Japanese) but has logic to detect this "no" character and render it in a different font. It isn't clever enough to notice that the rest of the sentence is Japanese, but it shouldn't have to be.
    The funny thing is that the same have never been done with latin letters and symbols, because that would be a mess. I really don't understand why they couldn't see it would be the same in Asian langauges.
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Typewriter character sets without 1 and 0 (Score:2)
        
        by billstewart ( 78916 ) writes:
        
        I'm pretty sure my mom's manual typewriter when I was a kid didn't have 1, less sure about whether it had 0. But it did have the proper French and Spanish accent marks (left, right, circumflex, N~, cedilla, most of which my PC keyboard doesn't have), and you composed them with letters by using the backspace.
        And yes, she could do two-column left-and-right-justified newsletters on it - she'd type a draft, count the letters, type the final. But she happily switched to using a Macintosh to type them, and let
- - - Re: (Score:2)
      
      by JustOK ( 667959 ) writes:
      
      Que?
      - Re: (Score:2)
        
        by JustOK ( 667959 ) writes:
        
        nuq ghe''or vIghel SoH?
    - Re: Why not just use English, and only English? (Score:2, Insightful)
      
      by Anonymous Coward writes:
      
      There are more native Chinese speaker than English speaker. How about you learn Chinese and shut the fuck up?
      - Re: Why not just use English, and only English? (Score:4, Insightful)
        
        by John Allsup ( 987 ) writes: on Saturday July 18, 2015 @08:22AM (#50134697) Homepage Journal
        
        Just write chinese in pinyin and speak it normally. (the number of Chinese speakers does not matter, the issue is with how it is written down.) When it comes to ideograph based languages, we would have been better off designing an entirely separate text system rather than trying to shoehorn it into a font-character paradigm derived from the needs of writing and printing latin scripts. Indeed having a writing system designed around the needs of calligraphy would be a useful thing, but like with ideograph based writing systems it is a long way from the use case we normally see with alphabet based writing systems.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Insightful)
        
        by amake ( 673443 ) writes:
        
        we would have been better off
        No, you might have been better off. Chinese speakers would not. They would like to use their written language, as it exists today, on computers just like everyone else.
        
        MOD PARENT UP, PLEASE! (Score:2)
        
        by billstewart ( 78916 ) writes:
        
        Even using vector fonts doesn't fix the problem that Unicode wasn't a great solution for managing the diversity of characters in many Asian languages.
        
        Mandarin dependency and homophone confusion (Score:5, Interesting)
        
        by tepples ( 727027 ) writes: <tepples@[ ]il.com ['gma' in gap]> on Saturday July 18, 2015 @09:48AM (#50134983) Homepage Journal
        
        Just write chinese in pinyin and speak it normally. (the number of Chinese speakers does not matter, the issue is with how it is written down.)
        "Chinese" is not a single spoken language. A passage written in one Chinese language, such as Mandarin, is often readable in another Chinese language, such as Cantonese, so long as they're written with Han characters. It's as if French could be read as Italian or Spanish with the same characters. In addition, different words that sound the same in a given Chinese language due to historic sound changes usually have different Han characters. They may end up sounding different in a different Chinese language whose different historic sound changes produced different homophone sets. Pinyin, on the other hand, depends on Mandarin and confuses homophones.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by interval1066 ( 668936 ) writes:
        
        Something I've always been curious about though; my understanding is that a Japanese speaker can understand written Chinese, to a certain extent. Is that not correct? I know that the reverse isn't really possible due to the Japanese use of Kana. But if the text is written using Han glyphs Cantonese, Mandarin, Hunan, Kan, Taiwan, etc, and Japanese speakers can sort-of understand each other's written stuff, or is that just nonsense?
        
        Re:Mandarin dependency and homophone confusion (Score:5, Informative)
        
        by Fire_Wraith ( 1460385 ) writes: on Saturday July 18, 2015 @12:03PM (#50135535)
        
        To a degree, yes, because the symbols themselves are the same. Note however that some of the original Chinese characters have been altered in use (simplified) by the PRC in the 50s and 60s, but those are only used in mainland China (and I think Singapore maybe?), but not Taiwan or Japan. Aside from that though, the characters for something like 'University' would still be a combination of the character for 'large' and the character for 'school'. It might be pronounced totally differently, but could be read and understood by all. Fun fact: The proper reading of the characters for the country of "Japan" in Japanese is actually "Nihon" or "Nippon." However, in certain Chinese dialects, the characters that comprise it are pronounced more like "Zep-pen" or "Japan." What's also fascinating to consider is that Korean is the same way, but that in modern usage you hardly ever see the Chinese characters (Hanja) used, even though I think they're still taught in some schools. Almost everything I saw when I was in Korea was in Hangul, the Korean native alphabetic script.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by aix tom ( 902140 ) writes:
        
        Another interesting thing little tidbit I stumbled upon while learning Japanese: "Peking" is written with the Characters North-Capital, "Nanking" is written with the Characters "South-Capital", while "Tokio" is written with the Characters "East-Capital".
        
        Re: Mandarin dependency and homophone confusion (Score:2)
        
        by jrumney ( 197329 ) writes:
        
        If you write it with long vowels spelt out, Toukyou is clearly different than Kyouto, though the Kyou in both cases is the same, Kyoto being the former capital. On the subject of Chinese being able to understand written Japanese, it is only partially the case, as Chinese characters are not always used for their meaning in Japanese. Sometimes they were used for their (Middle Chinese) sound.
        
        Re: (Score:2)
        
        by Fire_Wraith ( 1460385 ) writes:
        
        No, it's a different character for 'To'.
        
        Different characters can have the same phonetic representation in Japanese, which is one of the tricky parts of the language. English has homonyms too, though they're usually easier to differentiate based on context. Kanji puns from this are definitely a big deal in Japanese humor, as you might expect.
        
        Also, fun fact, prior to the Tokugawa era where Tokyo became the capital, it was called Edo.
        
        Kanbun: Reordering Chinese to Japanese (Score:3)
        
        by tepples ( 727027 ) writes:
        
        Japanese and Chinese syntax differ too much for parallels as close as those of Mandarin and Cantonese. Japanese puts the verb at the end (SOV) and marks noun case with postpositions (wa, ga, o, e). Chinese, on the other hand, puts the verb in the middle (SVO), more like English. (Other orders are possible: Welsh and Arabic put the verb at the beginning, or VSO, and Kashmiri and Dutch split the verb into a part that's second and a part at the end, or V2.)
        Chinese also uses serial verb construction [wikipedia.org], where verb
        
        Re: Kanbun: Reordering Chinese to Japanese (Score:3)
        
        by _merlin ( 160982 ) writes:
        
        That sentence doesn't require multiple verb clauses in Japanese. You can use destination, origin and means particles "ni", "kara" and "de": Watashi wa Shanghai kara Beijing ni hikouki de ikimasu. Since it's a single verb clause you can reorder it however you want for emphasis as long as the verb comes last - the way I have it there emphasises the subject. If you want to emphasise means of travel and use implicit speaker-as-subject, you can say: Beijing ni Shanghai kara hikouki de ikimasu. It's all easy as l
        
        Re: Kanbun: Reordering Chinese to Japanese (Score:2)
        
        by _merlin ( 160982 ) writes:
        
        Gah posting at 4:20AM is a bad idea. I emphasised destination in the second example. To emphasise means of transport: Hikouki de Beijing ni Shanghai kara ikimasu. Just put the aspect you want to emphasise (and it's associated particle) first. The only part that absolutely must be in a certain place in the sentence is the verb that comes last.
        
        Re: (Score:2)
        
        by billstewart ( 78916 ) writes:
        
        And apparently Korean's even weirder. (I'm going by my childhood memories of my mom describing her job translating Korean during the early 50s. Unfortunately, I don't think she still has her books on basic Chinese characters these days, though I could just as easily find them in a bookstore around here.)
        Some parts of Silicon Valley have a lot of Korean restaurants. I don't think I've seen any Chinese characters on their signs or menus, just alphabetic Korean.
        
        Re: (Score:2)
        
        by Fire_Wraith ( 1460385 ) writes:
        
        Korean sentence structure and grammar is pretty similar to Japanese. I had very little trouble picking up Korean after learning Japanese, because all the concepts were the same (topic/subject/object markers, use of counters, etc), it was just different words. A lot of the Sino-Korean words were also very similar to their Sino-Japanese counterparts, too. It's not surprising, since they're both from the same linguistic family and root, and both share a ton of Chinese influence.
        
        If anything, the biggest tro
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        But if the text is written using Han glyphs Cantonese, Mandarin, Hunan, Kan, Taiwan, etc, and Japanese speakers can sort-of understand each other's written stuff, or is that just nonsense?
        I went to China once with a professor of ancient Korean. He couldn't speak any Chinese, but he learned enough Chinese characters from studying Korean that he could write well enough to communicate with a taxi driver. They had to write to each other, they couldn't speak.
        
        Essentially, there was an old style of Chinese that everyone wrote in (but probably no one ever spoke, including Chinese). Over time, Japan, Korea, Hong Kong and eventually all of China modified the writing system to match the speaking sys
        
        Re: (Score:2)
        
        by AmiMoJo ( 196126 ) writes:
        
        It would have been absolutely fine if they had just stuck to one codepoint per character and not tried to merge them.
        
        Re: (Score:2)
        
        by tepples ( 727027 ) writes:
        
        "symbols" occupy less space
        Not if you have to make the font bigger to keep the strokes from touching each other. By that point, you could have used a smaller font on the Latin.
        
        Re: (Score:3)
        
        by interval1066 ( 668936 ) writes:
        
        I'll buy that, but even native Sinolanguage speakers have told me the learning curve for an alphabet is much shallower. Like, MUCH shallower. And since most modern technical terms have Greek and Latin roots, sometimes its simpler for them to just use the Latin words, otherwise they have to convert the terms to native sounds using bizarre and difficult to use conversion systems. I do agree however that it would have been nice to use a system similar to Kanji right from the beginning had we had one.
        
        Re: (Score:3)
        
        by interval1066 ( 668936 ) writes:
        
        English is the official technical language for flight. ALL international pilots, military and civ, MUST know enough English to pass flight school and to fly international commercial flights. Its also the official language of sea navigation, but to a lesser extent. I don't think you need to be as proficient. And English with a number of loan words from Greek and Latin are used in international Engineering. But yeah, English is spoken by the majority of technical people around the world as a common informatio
        
        Re: (Score:2)
        
        by dunkelfalke ( 91624 ) writes:
        
        ICAO general rules and regulations
        4.4.1c - ICAO languages are English, Spanish, French, Arabic, Russian, and Chinese.
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: (Score:2, Informative)
        
        by interval1066 ( 668936 ) writes:
        
        I think Chinese is the only language we need, it's already the most spoken language in the world.
        Only in head count, not by region. If the world was populated only by the Chinese, which seems to be their goal, then yes, Chinese is the most spoken language in the world. However, if you break that fact down by dialect, your statement is really weak. Mao's goal to have the entire PRC speak Mandarin really failed.
        It's a democratic language that will draw from other languages where necessary and useful.
        Not really. Mao tried to force all Chinese to speak Mandarin, and he failed miserably. Kinda the exact opposite of "Democratic". But of course that's not the fault of the language per se...
        It's a language that has proven it can adapt to changing circumstances.
        Chinese
        
        Re: (Score:2)
        
        by KingMotley ( 944240 ) writes:
        
        Not even by head count. 1.5 billion people can speak English, contrasted to 1.0 billion can speak "Chinese".
        
        Bad English is the world's most common language (Score:2)
        
        by billstewart ( 78916 ) writes:
        
        I was once at a conference in Germany, most of which was given in English because it was an international crowd. One of the German speakers started off by saying that he used to start by apologizing for his bad English, but the host (who was Turkish) told him not to worry; Bad English is the most widely spoken language in the world. (Which is fine; English is flexible enough about most things that if you don't need to be subtle, Bad English will usually do.)
        German's the only non-English language that I'm
        
        Re: (Score:2)
        
        by Fire_Wraith ( 1460385 ) writes:
        
        Keep in mind that India, which is nearly as populous as China, is a predominantly English speaking country. If sheer number of speakers is a key, the future will probably turn out to be something like Firefly, with a mishmash of Chinese and English.
        
        Re: (Score:2)
        
        by AmiMoJo ( 196126 ) writes:
        
        Chinese may be, but if Japanese is an example, and Japanese is adapted from Chinese by Han explorers to Japan in the Iron Age; its not very adaptable at all. The Japanese have developer THREE different writing systems to cope with with some shortcomings of the language (only two tenses, underdeveloped pronoun system, etc). That may be a shortcoming of Japanese, but Japanese is just a symptom of a language root that isn't very forgiving. I will say however that a language that can be nuanced such that 9 different meanings from changing the tone of one word may be more flexible than I give it credit for,
        That's not right. The exact origins of the Japanese language are lost to pre-history, only guessed at. It was the writing system that was brought over from China. Then katakana and hiragana were added to support the parts of the Japanese language that can't be written adequately in the Chinese system. They were simply added to support the way the language was already spoken, not to make up for any limitations.
      - Re: (Score:2)
        
        by KingMotley ( 944240 ) writes:
        
        I think Chinese is the only language we need, it's already the most spoken language in the world.
        That is false. English is the most spoken language in the world. Chinese is the most popular primary language.
      - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
    - Re: (Score:2)
      
      by ArcadeMan ( 2766669 ) writes:
      
      We shouldn't strive to eliminate other languages, of course. They do have their value, but more as historic curiosities for linguists and historians rather than something to use on a daily basis.
      Quoi? [youtube.com]
- - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
    - Re: (Score:2)
      
      by ledow ( 319597 ) writes:
      
      Fuck it doesn't even support ASCII, let alone Unicode.
      Try doing an English pound sign:
      Â£
      Nope.
      - Re: (Score:2)
        
        by alexhs ( 877055 ) writes:
        
        HTML entity pound sign (£): £
        Literal pound sign, as on my keyboard: £
        It's OK for me in preview mode.
        Maybe it's your browser's encoding that's broken ? I have it set as UTF-8. Your rendering (£ (*)) seems to indicate you sent the byte sequence for UTF-8. But I suspect that your browser set the character encoding as ISO-8859-1 in its headers.
        While I'm at it: "" <- This was supposed to be the "no" hiragana. Disallowed characters are stripped, rather than being "converted" to moji
        
        Re: (Score:2)
        
        by epine ( 68316 ) writes:
        
        But I suspect that your browser set the character encoding as ISO-8859-1 in its headers.
        Drawing an inference from the not-fact that the top of the batting order in every Wikipedia FAQ does not include how to set your user agent to send the right encoding header, I'd suggest that Slashdot's long-disabled Unicode support fell far short of the mark in the first place. (2005 just called. It wants to dissolve its de facto clue-stick monopoly.)
        I authored a CJK word processor that ran under MS-DOS in the 1980s a
      - Re: (Score:2)
        
        by ledow ( 319597 ) writes:
        
        Then neither are basically all of the accented characters:
        ÃÃ©ÃÃÃ
        ÃÃÃÃ"Ãs
        Quarter, half, most of the currency symbols, etc.
        Extended ASCII is pretty bog-standard. But my point really? I press the pound-sign (or the other characters) on my keyboard, and Slashdot can't render them. Facebook can. The Register can. Every forum in the world can. But not Slashdot.
        
        Re: (Score:2)
        
        by butlerm ( 3112 ) writes:
        
        ASCII is the American Standard Code for Information Interchange, a 7 bit encoding system. The most common strictly 8 bit encoding is ISO-8859-1, slightly expanded by Microsoft as Windows-1252, also known as Win-ASCII.
        Of course these days, everyone in their right mind should generally be using UTF-8 for transfer and storage. UCS-16 and UTF-16, though widely used internally, are basically a mistake for that kind of thing.
- Re: Or speak English, it's 7bit clean (Score:2)
  
  by John Allsup ( 987 ) writes:
  
  As I pointer out elsewhere here, Chinese can be written with a latin alphabet and a few accents. Likewise languages such as Sanskrit. Just as there is a difference between English handwriting and what can be represented in Ascii, we face a related issue with ideograph based writing systems. We would be better of writing Chinese webpages in pinyin, and developing a separate system for calligraphy and ideographs.
  - Re: (Score:2, Interesting)
    
    by Anonymous Coward writes:
    
    As I pointer out elsewhere here, Chinese can be written with a latin alphabet and a few accents. Likewise languages such as Sanskrit. Just as there is a difference between English handwriting and what can be represented in Ascii, we face a related issue with ideograph based writing systems. We would be better of writing Chinese webpages in pinyin, and developing a separate system for calligraphy and ideographs.
    Except that there are so many homonyms in pinyin that a strong sense of the context is needed to read it. The logograms are much harder to write but reading is quite a bit easier, which is why they are still in use. That's not the same as English handwriting vs printing, where the differences are only in rendering and there is a 1:1 correspondence between a handwritten and a printed character.
    - Re: (Score:2)
      
      by ciaran2014 ( 3815793 ) writes:
      
      Example: the story of the man who tried to eat ten lions:
      Shí shì sh shì sh shì, shì sh, shì shí shí sh. Shì shí shí shì shì shì sh. Shí shí, shì shí sh shì shì. Shì shí, shì Sh shì shì shì. Shì shì shì shí sh, shì shì shì, sh shì shí sh shì shì. Shì shí shì shí sh sh, shì shí shì. Shí s
  - Re: (Score:2)
    
    by amake ( 673443 ) writes:
    
    No one, absolutely no one who is actually proficient in any of these languages, would find your proposal acceptable. The only people who advocate such things are, deservedly, dismissed as cranks.
    So instead, how about we fix the problems with the current, largely acceptable system we have now?
- English? 7bit clean?? Bwahahah! (Score:2)
  
  by billstewart ( 78916 ) writes:
  
  Yes, I know you were trolling, but in your mythical 7-bit-clean English, even if you're not using English letters like ð or , or ligatures like æ , or distinguishing between short and long S's (you know, the s you used to think were f's), how do you put diaeresis marks over words like cooperate, or distinguish between m-dash and n-dash and hyphen, or get the left- and right-side quotation marks without using some Microsoft or Apple ``smart quote'' breakage, much less deal with accent marks in wo
- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  English is fine for factual information like air traffic control or shipping, but it would never work for Japanese society. There are too many important things you can't adequately express in English that are essential to Japanese people. Same with Chinese.
  - Re: (Score:2)
    
    by dunkelfalke ( 91624 ) writes:
    
    Actually, English isn't very good for factual information either. It has too many homonyms, a very inconsistent spelling, too ambiguous sentences even with the very strict word order English has to use, no single language authority and too many standard variations.
    Other Germanic languages are much more precise, as are Slavic languages. Due to the more complicated grammar and being synthetic instead of analytic, the meaning of a sentence is clear even if words in the sentence are shifted around, the spelling
- - Re: (Score:2)
    
    by Xtifr ( 1323 ) writes:
    
    Why do you think language gets overhauled in Orwell's 1984?
    Because Orwell was a little too enamored of the so-called "Sapir-Whorf hypothesis" [wikipedia.org]? I hate to break it to you, but, despite its many obvious parallels to the real world, 1984 was ultimately a work of fiction.
    While it's undeniable that language has some influence on culture and thought, the idea that it can be as influential as proposed by some early SF writers (e.g. Orwell, Jack Vance's The Languages of Pao, or Samuel Delaney's Babel-17) is mostly discredited.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

No? (Score:2)

Re: (Score:2)

Re: (Score:2)

What bug? (Score:5, Informative)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Nitpick (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Language markup (Score:2)

Re:Nitpick (Score:4, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

JUst a rendering problem? (Score:2)

They're trying to unify *similar* characters (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

In other news (Score:2)

Can anyone illustrate? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: Can anyone illustrate? (Score:2)

Re: (Score:2)

Re: (Score:2)

Timothy can't write in English. (Score:2)

Re:Is it the same as in Chinese? (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Typewriter character sets without 1 and 0 (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Why not just use English, and only English? (Score:2, Insightful)

Re: Why not just use English, and only English? (Score:4, Insightful)

Re: (Score:3, Insightful)

MOD PARENT UP, PLEASE! (Score:2)

Mandarin dependency and homophone confusion (Score:5, Interesting)

Re: (Score:2)

Re:Mandarin dependency and homophone confusion (Score:5, Informative)

Re: (Score:2)

Re: Mandarin dependency and homophone confusion (Score:2)

Re: (Score:2)

Kanbun: Reordering Chinese to Japanese (Score:3)

Re: Kanbun: Reordering Chinese to Japanese (Score:3)

Re: Kanbun: Reordering Chinese to Japanese (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

They're trying to unify similar characters (Score:5, Informative)