New Unicode Bug Discovered For Common Japanese Character "No" 196
AmiMoJo writes: Some users have noticed that the Japanese character "no", which is extremely common in the Japanese language (forming parts of many words, or meaning something similar to the English word "of" on its own). The Unicode standard has apparently marked the character as sometimes being used in mathematical formulae, causing it to be rendering in a different font to the surrounding text in certain applications. Similar but more widespread issues have plagued Unicode for decades due to the decision to unify dissimilar characters in Chinese, Japanese and Korean.
No? (Score:2)
Re: (Score:2)
Correct. Unfortunately Slashdot does not allow me to enter Japanese text, hence the confusion.
This is what happens when I type that character in Japanese: ã®
Re: (Score:2)
Most Japanese characters in the two phonetic alphabets stand for a consonant tied to a vowel. The no phonetic in grammar indicates possession. In the phrase Katoh no boshi (Kato's cap) all of the other characters would be the Chinese-derived kanji.
What bug? (Score:5, Informative)
The character in question is Hiragana "No", codepoint U+306E. As far as I can tell, this has existed since Unicode 1.1 and there are no differences in the Unicode metadata when compared to any other Hiragana glyph. It is marked as IsAlphabetic=True, Category=Other Letter, and NumbericType=None for example. So are all the other common Hiragana glyphs. If there is a bug, it's clearly with some specific application, and not Unicode or Unicode metadata. Compare http://www.fileformat.info/inf... [fileformat.info] with any other Hiragana glyph, like http://www.fileformat.info/inf... [fileformat.info] (Hiragana "Ha").
Re: (Score:3, Interesting)
The bug is that the Japanese, Chinese, Korean and mathematical versions of this character all share a common code point. There is no reliable way for an application to select the right character and render it properly.
You can't mix C/J/K and mathematics in Unicode, which is a new bug beyond just the failure to support mixing C/J/K.
Re: (Score:2)
“-” looks very differently in text and formulas, too. I don't get why people assume that you can get nice rendering without additional markup.
Re: (Score:2)
Except while that is called "Hyphen-Minus" and can be used for two things, Unicode does try to solve that problem by having:
00AD Soft Hyphen
2010 Hypen
2011 Non-Breaking Hyphen
2012 Figure Dash
2013 En Dash
2014 Em Dash
2015 Horizontal Bar
2212 Minus Sign
2796 Heavy Minus Sign
There is no "Mathematical Hiragana No" glyph defined by Unicode, and as such, it should never be rendered in a different font just because somebody *might* use it in a formula. The application is wrong, and there is no bug in Unicode.
Re: (Score:2)
I'm aware of the problems with the han unification and certain Kanji being displayed "wrong" because the Chinese equivalent is drawn significantly different from the Japanese Kanji, but this doesn't seem to be anything close to that kind of problem. I'm also aware of the Unicode block U+1D400 "Mathematical Alphanumeric Symbols" which is what should be used for formulas. Any application that is rendering one particular character in the Hiragana block in a different font than the rest of the Hiragana block, i
Re: (Score:2)
There are no Chinese or Korean versions of this Japan-specific character. This is the first time I've ever heard of a "mathematical use" of this character, and I suspect the vast majority of users would be surprised at this as well.
Re: (Score:2)
It's been imported to China: http://portal.nifty.com/koneta... [nifty.com]
Re: (Score:2)
How can you tell that any of those pictures are from China? They all look like they are from Japan, a country that makes extremely heavy use of Chinese characters (much more than Korea for example), to me.
Re: (Score:2)
All the text is in Chinese. The blog post itself in in Japanese, and it says that the pictures are of China.
Re: (Score:2)
What you probably mean is that an application can't select the right glyph based on the Unicode string. That is correct, but nothing specific to CJK. Without markup or metadata, Unicode often won't render as expected by readers even in Western languages. Unicode used to have its own system fo
Re: (Score:2)
Nitpick (Score:5, Informative)
This is not a "Unicode bug". It is a rendering bug exhibited by some applications.
Re: (Score:3)
How is an application supposed to know if a random character is Japanese, Chinese, Korean it mathematical? It would need some kind of strong AI to interpret and understand the text. It's a Unicode bug, merged characters are impossible to render correctly all the time because apps are forced to guess which font to use.
Re: (Score:2)
Ask the people who wrote the software that doesn't exhibit the bug. Obviously it can be done.
Re: (Score:3)
Software that doesn't have this bug only avoids it by not supporting mathematical symbols. So far there is no known software that avoids the CJK confusion problem either.
Most software doesn't even try. How many programmers are even aware of the issue? No Unicode library is immune. It's a problem with the standard that can only be fixed by starting fresh with about 150,000 new CJK characters, and then updating all fonts and libraries to handle translation and equivalence.
Re: (Score:2)
In other news a new bug is shown to exhibit a behaviour where some mathematical programs substitute a Japanese character into the formula.
The problem is it can't be done. Not without intelligent user / designer input (such as signifying that the unicode to be displayed is Japanese and not a maths formula). If an application is correct in determining one context it will be incorrect in determining the other.
Language markup (Score:2)
Please name said software.
Any HTML renderer ought to be able to tell an element with lang="zh-Hans" (Chinese using simplified characters) from one with lang="ja" (Japanese).
Re:Nitpick (Score:4, Informative)
How is an application supposed to know if a random character is Japanese, Chinese, Korean it mathematical? It would need some kind of strong AI to interpret and understand the text. It's a Unicode bug, merged characters are impossible to render correctly all the time because apps are forced to guess which font to use.
Except font encoding has never been part of the character encoding, you might want your English text in Arial, your French in Times New Roman and the formula in Courier, but Unicode doesn't encode that. You might argue that this is not a bug, that it's simply out of scope and should be solved by a higher level encoding like <font="some japanese font">konnichiwa</font><font="some chinese font">ni hao</font> and not plaintext Unicode. That's what the Unicode consortium says [unicode.org] and if you express it as simply a style issue, it actually sounds plausible.
On the other hand, you might argue that there's no reasonable way to map a "unihan" character to a glyph except as a band-aid since the CJK styles are distinctly different and so any comprehensive font should have three variations, it shouldn't take three fonts to make a mixed CJK document look correct just one. That this information belongs on the lowest level and should be passed along as you copy-paste CJK snippets or pass them around in whatever interface or protocol you have, otherwise everything will need a document structure and not just a string.
I don't think they should "unmerge" and duplicate all the han characters, that'd be silly. What they should do is add CJK indicators - say HANC, HANJ, HANK like for bi-directional text, only simpler with no nesting just one indicator applying until superseded by another. Like (HANJ) konnichiwa (HANC) ni hao and the former will render as a Japanese han, the latter as a Chinese. If it doesn't have any indicator, well take a guess. Am I missing something blindingly obvious or would this trivially solve the problem?
Re: (Score:3)
I agree, font encoding should not be part of the character encoding. Unicode even screws that up though, because there are things like text direction marks in it. Anyway, the problem is that often you have text without metadata. A file name, audio file metadata, a plain text database entry etc. You have to pick a font to render it, and the choice depends on the language because thanks to Unicode it's impossible to have a universal all-language font.
You could have meta characters as you suggest, but that isn
Re: (Score:2)
You could have meta characters as you suggest, but that isn't what Unicode is supposed to be for. It's a character encoding scheme, not a metadata encoding scheme.
Actually I was thinking of it more like a "sticky" composite character, like you can have a + circle = å you'd have unihan + HAN(C|J|K) = "right" glyph while:
a) Extending existing single-language CJK documents with just one character
b) Preserving backwards compatibility with all current CJK systems
c) Avoiding any complex CJK conversion functions
d) Creating a simple way to override with "show as C/J/K"
It would require adding a bit of intelligence to copy-paste for preservation, like:
(HANC)abcde -> c
Re: (Score:2)
That wouldn't really improve things IMHO, because you would still be reliant on the application knowing how to handle the character. In practice what would you do, add it to the start of file names? Then on all current software your filename would start with a little box representing an unknown character. The whole concept of composite characters is ridiculous as well, they should all get their own code points and let the font system handle saving some memory by re-using parts of glyphs. Otherwise your simp
Re: (Score:2)
That wouldn't really improve things IMHO, because you would still be reliant on the application knowing how to handle the character. In practice what would you do, add it to the start of file names? Then on all current software your filename would start with a little box representing an unknown character.
Yes, until the software got updated to treat it as a non-printing character but it wouldn't make everything unreadable, there's bad and there's much much worse.
The whole concept of composite characters is ridiculous as well, they should all get their own code points and let the font system handle saving some memory by re-using parts of glyphs. Otherwise your simple character count suddenly requires a massive look-up table of composite characters.
It already does for a huge number of reasons. Oh and if you thought giving every character a code point would mean a 1:1 mapping to glyphs that's still wrong, many characters map to alternate glyphs depending on the context. For example Arabic and Latin cursive characters substitute different glyphs to connect glyphs together depending on whether the
Re: (Score:2)
The problem is outside the problem domain Unicode attempts to solve so it isn't strange it doesn't solve it. For some other problems Unicode try to solve the result is a mess (example: bidirectional text) so that is probably a good thing.
Re: (Score:2)
First of all, the hiragana "no" is always Japanese, not Chinese, not Korean. The CJK unification is only about han characters (in Japanese, that's kanji).
As for maths, there are usually markers to indicate we are in an equation, which makes sense because Unicode is not powerful enough for this : fractions, integrals, matrices, etc... cannot be rendered with just code points. So in this case Unicode provide the characters (roman and geek letters, numbers, mathematical symbols, the hiragana "no", etc...) and
Re: (Score:2)
JUst a rendering problem? (Score:2)
The character in the Unicode table looks like a mashup of the hiragana (grammar-forming) version of the character, and the katakana (used as we do italics) form.
They're trying to unify *similar* characters (Score:5, Informative)
A lot of people complain about the idea of unification without understanding it. I can't judge if unicode's unification is great or awful. The English-speaking media constantly says it's awful, but it's usually clear the authors don't know what unification is, who's driving it, or how unicode's work compares to what existed beforehand, so they can only be ignored. (They're sometimes trying to spin up some clickbait about ignorant westerners imposing blah blah blah on Asia, which just shows they no nothing about the topic.)
The issue:
There's a certain number of symbols which have been copied from one East Asian language to another. They're the same symbol, so unicode has one slot for that symbol. Then there's a second category where the symbol has been copied, but one group draws it a little different (the Japanese might like to put a little flick at the end of one line, or the Chinese draw the line a little slantier). And a third category where one group has developed a simplified symbol, which means again the traditional and the simplified symbols are the same thing but drawn differently. The two symbols are equivalent, the new one is just a new suggestion for how to draw it.
Unification is about having one slot for the symbols in categories two and three and leaving it to the font to decide how to display it.
(Unicode uses more precise terms, but I'm calling them "symbols" and "slots" for simplicity.)
A disadvantage to this approach is that there can't be a font which would display a symbol both the way a Japanese would draw it and the way a Chinese would draw it. Fonts have to choose one style to draw each unified symbol.
An advantage of this approach is that new languages and dialects can be added supported without needing another 100,000 slots per language or dialect (we do all know there are more than three East Asian languages, don't we?), and it's much easier for fonts to add support for all the East Asian languages because once they've done Chinese, Japanese is automatically almost finished.
Here are some example symbols:
https://en.wikipedia.org/wiki/... [wikipedia.org]
unicode.org's FAQ also has clarifications:
If the character shapes are different in different parts of East Asia, why were the characters unified?
http://www.unicode.org/faq/han... [unicode.org]
Isn't it true that some Japanese can't write their own names in Unicode?
http://www.unicode.org/faq/han... [unicode.org]
(All that said, it's been years since I looked into this so there's a chance I've gotten some detail wrong, but I'm confident it's a good summary of the issue.)
Re: (Score:2)
An advantage of this approach is that new languages and dialects can be added supported without needing another 100,000 slots per language or dialect (we do all know there are more than three East Asian languages, don't we?), and it's much easier for fonts to add support for all the East Asian languages because once they've done Chinese, Japanese is automatically almost finished.
The first one isn't really an advantage, since there is no shortage of code points. There are massive disadvantages though.
From a software point of view it would be good to have universal fonts that can render any Unicode character correctly for anyone in the world. The Unicode consortium has tried to support this by splitting some of the more distinct symbols into separate code points for each language, but it's far from complete and every new version adds many more. The FAQ is a joke - when people point o
Re: (Score:2)
Thanks for this reply!
Can you give me an example of a Japanese name that can't be written in unicode? I keep hearing English speakers mention this problem but I've never seen exactly what the problem is.
Re: (Score:2)
> it would be good to have universal fonts that can render
> any Unicode character correctly for anyone in the world
But a line has to be drawn between substance and style. There are two (main) ways to draw the number 4. One has a slanty line and is closed at the top, the other is made of straight lines and is open at the top. Or the number 7. For English speakers it's two lines, but for French speakers there's also a horizontal bar across the middle. Should unicode have two 4's and two 7's, or should
Re: (Score:2)
> you are summarizing A issue, not THE issue the author was making up.
Yes, my post only relates to the last line of the summary.
In other news (Score:2)
Some users have noticed that the Japanese character "no", which is extremely common in the Japanese language
Can anyone illustrate? (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
I can give an example, if you don't mind me running to greek. Imagine some program renders mathmatical symbols differently from text. Imagine that someone writes out, using unicode, the formula for the area of a circle. No problem, right? The pi is clearly a math symbol. But imagine the same thing if you were reading greek. And beyond that, imagine if all the greek you read though pi was being used in a mathematical sense.
Re: Can anyone illustrate? (Score:2)
Re: (Score:2)
It's rendered in a way that a Japanese person could read it, but looks ugly because software can't tell if it is Japanese, Chinese or mathematical. It's rather jarring in the middle of sentence and makes the output unsuitable for publishing without manual editing.
This is due to Unicode assigning the same code to the Japanese, Chinese and mathematical versions. It would be like they tried to merge the Latin "o" and Cyrillic "o". Imagine if every "o" character you wrote was rendered in a different font to all
Re: (Score:2)
Timothy can't write in English. (Score:2)
Some users have noticed that the Japanese character "no", which is extremely common in the Japanese language (forming parts of many words, or meaning something similar to the English word "of" on its own).
That isn't even a sentence in English. It is extremely grating to read crap like this, and it does not convey much about the story. .
Re:Is it the same as in Chinese? (Score:5, Funny)
As you have just discovered, Slashdot cleverly avoids all Unicode bugs by not supporting Unicode at all.
Re: (Score:2)
meanwhile the folks at soylent implemented it ages ago.
With all the effort wasted on 'beta', I wonder how much of the open source slashcode remains.
Re: (Score:2)
Actually, slashcode does support Unicode, all that needs to be done for /. to get Unicode is reconfiguring the database (and converting old comments, I guess).
Re: (Score:2)
Actually, slashcode does support Unicode, all that needs to be done for /. to get Unicode is reconfiguring the database (and converting old comments, I guess).
No, it already works. It was active for a while some 10 years ago, but was removed because it was hard to sanitize. You could easily write you own comment score by reversing direction at the right time.
Still they could reactivate it if they just found a reasonable way of sanitizing features they don't want.
Re: (Score:2)
No, it already works. It was active for a while some 10 years ago, but was removed because it was hard to sanitize. You could easily write you own comment score by reversing direction at the right time.
Still they could reactivate it if they just found a reasonable way of sanitizing features they don't want.
Dude, all other websites support Unicode. Sanitizing it properly cannot be rocket science.
Re: (Score:2)
Re: (Score:2)
Considering that Windows NT was around *before* UTF-8, it would have been rather difficult to implement it. What you really meant to say was, unfortunately, standards committees are often too slow to implement things like UTF-8 in a timely manner.
Re: (Score:2)
Re: (Score:3)
It's a Unicode bug. Unicode tries to merge different characters into a single code point, because long ago they had the same origin. This particular character exists in Japanese, Chinese, Korean and mathematics, so can be rendered four different ways, but they all share one code point.
Applications have to guess what font to use. Being a mathematical program, this one defaults to the system language (Japanese) but has logic to detect this "no" character and render it in a different font. It isn't clever enou
Re: (Score:2)
It's a Unicode bug. Unicode tries to merge different characters into a single code point, because long ago they had the same origin. This particular character exists in Japanese, Chinese, Korean and mathematics, so can be rendered four different ways, but they all share one code point.
Applications have to guess what font to use. Being a mathematical program, this one defaults to the system language (Japanese) but has logic to detect this "no" character and render it in a different font. It isn't clever enough to notice that the rest of the sentence is Japanese, but it shouldn't have to be.
The funny thing is that the same have never been done with latin letters and symbols, because that would be a mess. I really don't understand why they couldn't see it would be the same in Asian langauges.
Re: (Score:2)
Because the Europeans override the Asians when creating the unicode "standard". They wanted to save code space, despite not being short on it (maybe some idiots think it could be done in 16 bits, but no one on the committee was that naive).
In English, why is 1 and l not the same code point, despite having the same look in so many fonts, and even many typewriters did not have a separate 1 and 0 key (tell that to kids these days and they won't believe you). It sounds idiotic to us to give them the same ASCI
Typewriter character sets without 1 and 0 (Score:2)
I'm pretty sure my mom's manual typewriter when I was a kid didn't have 1, less sure about whether it had 0. But it did have the proper French and Spanish accent marks (left, right, circumflex, N~, cedilla, most of which my PC keyboard doesn't have), and you composed them with letters by using the backspace.
And yes, she could do two-column left-and-right-justified newsletters on it - she'd type a draft, count the letters, type the final. But she happily switched to using a Macintosh to type them, and let
Re: (Score:2)
Re: (Score:2)
Re: Why not just use English, and only English? (Score:2, Insightful)
There are more native Chinese speaker than English speaker. How about you learn Chinese and shut the fuck up?
Re: Why not just use English, and only English? (Score:4, Insightful)
Re: (Score:3, Insightful)
we would have been better off
No, you might have been better off. Chinese speakers would not. They would like to use their written language, as it exists today, on computers just like everyone else.
MOD PARENT UP, PLEASE! (Score:2)
Even using vector fonts doesn't fix the problem that Unicode wasn't a great solution for managing the diversity of characters in many Asian languages.
Mandarin dependency and homophone confusion (Score:5, Interesting)
Just write chinese in pinyin and speak it normally. (the number of Chinese speakers does not matter, the issue is with how it is written down.)
"Chinese" is not a single spoken language. A passage written in one Chinese language, such as Mandarin, is often readable in another Chinese language, such as Cantonese, so long as they're written with Han characters. It's as if French could be read as Italian or Spanish with the same characters. In addition, different words that sound the same in a given Chinese language due to historic sound changes usually have different Han characters. They may end up sounding different in a different Chinese language whose different historic sound changes produced different homophone sets. Pinyin, on the other hand, depends on Mandarin and confuses homophones.
Re: (Score:2)
Re:Mandarin dependency and homophone confusion (Score:5, Informative)
Re: (Score:2)
Another interesting thing little tidbit I stumbled upon while learning Japanese: "Peking" is written with the Characters North-Capital, "Nanking" is written with the Characters "South-Capital", while "Tokio" is written with the Characters "East-Capital".
Re: Mandarin dependency and homophone confusion (Score:2)
Re: (Score:2)
Different characters can have the same phonetic representation in Japanese, which is one of the tricky parts of the language. English has homonyms too, though they're usually easier to differentiate based on context. Kanji puns from this are definitely a big deal in Japanese humor, as you might expect.
Also, fun fact, prior to the Tokugawa era where Tokyo became the capital, it was called Edo.
Kanbun: Reordering Chinese to Japanese (Score:3)
Japanese and Chinese syntax differ too much for parallels as close as those of Mandarin and Cantonese. Japanese puts the verb at the end (SOV) and marks noun case with postpositions (wa, ga, o, e). Chinese, on the other hand, puts the verb in the middle (SVO), more like English. (Other orders are possible: Welsh and Arabic put the verb at the beginning, or VSO, and Kashmiri and Dutch split the verb into a part that's second and a part at the end, or V2.)
Chinese also uses serial verb construction [wikipedia.org], where verb
Re: Kanbun: Reordering Chinese to Japanese (Score:3)
That sentence doesn't require multiple verb clauses in Japanese. You can use destination, origin and means particles "ni", "kara" and "de": Watashi wa Shanghai kara Beijing ni hikouki de ikimasu. Since it's a single verb clause you can reorder it however you want for emphasis as long as the verb comes last - the way I have it there emphasises the subject. If you want to emphasise means of travel and use implicit speaker-as-subject, you can say: Beijing ni Shanghai kara hikouki de ikimasu. It's all easy as l
Re: Kanbun: Reordering Chinese to Japanese (Score:2)
Gah posting at 4:20AM is a bad idea. I emphasised destination in the second example. To emphasise means of transport: Hikouki de Beijing ni Shanghai kara ikimasu. Just put the aspect you want to emphasise (and it's associated particle) first. The only part that absolutely must be in a certain place in the sentence is the verb that comes last.
Re: (Score:2)
And apparently Korean's even weirder. (I'm going by my childhood memories of my mom describing her job translating Korean during the early 50s. Unfortunately, I don't think she still has her books on basic Chinese characters these days, though I could just as easily find them in a bookstore around here.)
Some parts of Silicon Valley have a lot of Korean restaurants. I don't think I've seen any Chinese characters on their signs or menus, just alphabetic Korean.
Re: (Score:2)
If anything, the biggest tro
Re: (Score:2)
But if the text is written using Han glyphs Cantonese, Mandarin, Hunan, Kan, Taiwan, etc, and Japanese speakers can sort-of understand each other's written stuff, or is that just nonsense?
I went to China once with a professor of ancient Korean. He couldn't speak any Chinese, but he learned enough Chinese characters from studying Korean that he could write well enough to communicate with a taxi driver. They had to write to each other, they couldn't speak.
Essentially, there was an old style of Chinese that everyone wrote in (but probably no one ever spoke, including Chinese). Over time, Japan, Korea, Hong Kong and eventually all of China modified the writing system to match the speaking sys
Re: (Score:2)
It would have been absolutely fine if they had just stuck to one codepoint per character and not tried to merge them.
Re: (Score:2)
"symbols" occupy less space
Not if you have to make the font bigger to keep the strokes from touching each other. By that point, you could have used a smaller font on the Latin.
Re: (Score:3)
Re: (Score:3)
Re: (Score:2)
ICAO general rules and regulations
4.4.1c - ICAO languages are English, Spanish, French, Arabic, Russian, and Chinese.
Re: (Score:2)
Re: (Score:2, Informative)
I think Chinese is the only language we need, it's already the most spoken language in the world.
Only in head count, not by region. If the world was populated only by the Chinese, which seems to be their goal, then yes, Chinese is the most spoken language in the world. However, if you break that fact down by dialect, your statement is really weak. Mao's goal to have the entire PRC speak Mandarin really failed.
It's a democratic language that will draw from other languages where necessary and useful.
Not really. Mao tried to force all Chinese to speak Mandarin, and he failed miserably. Kinda the exact opposite of "Democratic". But of course that's not the fault of the language per se...
It's a language that has proven it can adapt to changing circumstances.
Chinese
Re: (Score:2)
Not even by head count. 1.5 billion people can speak English, contrasted to 1.0 billion can speak "Chinese".
Bad English is the world's most common language (Score:2)
I was once at a conference in Germany, most of which was given in English because it was an international crowd. One of the German speakers started off by saying that he used to start by apologizing for his bad English, but the host (who was Turkish) told him not to worry; Bad English is the most widely spoken language in the world. (Which is fine; English is flexible enough about most things that if you don't need to be subtle, Bad English will usually do.)
German's the only non-English language that I'm
Re: (Score:2)
Re: (Score:2)
Chinese may be, but if Japanese is an example, and Japanese is adapted from Chinese by Han explorers to Japan in the Iron Age; its not very adaptable at all. The Japanese have developer THREE different writing systems to cope with with some shortcomings of the language (only two tenses, underdeveloped pronoun system, etc). That may be a shortcoming of Japanese, but Japanese is just a symptom of a language root that isn't very forgiving. I will say however that a language that can be nuanced such that 9 different meanings from changing the tone of one word may be more flexible than I give it credit for,
That's not right. The exact origins of the Japanese language are lost to pre-history, only guessed at. It was the writing system that was brought over from China. Then katakana and hiragana were added to support the parts of the Japanese language that can't be written adequately in the Chinese system. They were simply added to support the way the language was already spoken, not to make up for any limitations.
Re: (Score:2)
I think Chinese is the only language we need, it's already the most spoken language in the world.
That is false. English is the most spoken language in the world. Chinese is the most popular primary language.
Re: (Score:2)
Re: (Score:2)
Quoi? [youtube.com]
Re: (Score:2)
Re: (Score:2)
Fuck it doesn't even support ASCII, let alone Unicode.
Try doing an English pound sign:
£
Nope.
Re: (Score:2)
HTML entity pound sign (£): £
Literal pound sign, as on my keyboard: £
It's OK for me in preview mode.
Maybe it's your browser's encoding that's broken ? I have it set as UTF-8. Your rendering (£ (*)) seems to indicate you sent the byte sequence for UTF-8. But I suspect that your browser set the character encoding as ISO-8859-1 in its headers.
While I'm at it: "" <- This was supposed to be the "no" hiragana. Disallowed characters are stripped, rather than being "converted" to moji
Re: (Score:2)
Drawing an inference from the not-fact that the top of the batting order in every Wikipedia FAQ does not include how to set your user agent to send the right encoding header, I'd suggest that Slashdot's long-disabled Unicode support fell far short of the mark in the first place. (2005 just called. It wants to dissolve its de facto clue-stick monopoly.)
I authored a CJK word processor that ran under MS-DOS in the 1980s a
Re: (Score:2)
Then neither are basically all of the accented characters:
ÃéÃÃÃ
ÃÃÃÃ"Ãs
Quarter, half, most of the currency symbols, etc.
Extended ASCII is pretty bog-standard. But my point really? I press the pound-sign (or the other characters) on my keyboard, and Slashdot can't render them. Facebook can. The Register can. Every forum in the world can. But not Slashdot.
Re: (Score:2)
ASCII is the American Standard Code for Information Interchange, a 7 bit encoding system. The most common strictly 8 bit encoding is ISO-8859-1, slightly expanded by Microsoft as Windows-1252, also known as Win-ASCII.
Of course these days, everyone in their right mind should generally be using UTF-8 for transfer and storage. UCS-16 and UTF-16, though widely used internally, are basically a mistake for that kind of thing.
Re: Or speak English, it's 7bit clean (Score:2)
Re: (Score:2, Interesting)
As I pointer out elsewhere here, Chinese can be written with a latin alphabet and a few accents. Likewise languages such as Sanskrit. Just as there is a difference between English handwriting and what can be represented in Ascii, we face a related issue with ideograph based writing systems. We would be better of writing Chinese webpages in pinyin, and developing a separate system for calligraphy and ideographs.
Except that there are so many homonyms in pinyin that a strong sense of the context is needed to read it. The logograms are much harder to write but reading is quite a bit easier, which is why they are still in use. That's not the same as English handwriting vs printing, where the differences are only in rendering and there is a 1:1 correspondence between a handwritten and a printed character.
Re: (Score:2)
Example: the story of the man who tried to eat ten lions:
Shí shì sh shì sh shì, shì sh, shì shí shí sh. Shì shí shí shì shì shì sh. Shí shí, shì shí sh shì shì. Shì shí, shì Sh shì shì shì. Shì shì shì shí sh, shì shì shì, sh shì shí sh shì shì. Shì shí shì shí sh sh, shì shí shì. Shí s
Re: (Score:2)
No one, absolutely no one who is actually proficient in any of these languages, would find your proposal acceptable. The only people who advocate such things are, deservedly, dismissed as cranks.
So instead, how about we fix the problems with the current, largely acceptable system we have now?
English? 7bit clean?? Bwahahah! (Score:2)
Yes, I know you were trolling, but in your mythical 7-bit-clean English, even if you're not using English letters like ð or , or ligatures like æ , or distinguishing between short and long S's (you know, the s you used to think were f's), how do you put diaeresis marks over words like cooperate, or distinguish between m-dash and n-dash and hyphen, or get the left- and right-side quotation marks without using some Microsoft or Apple ``smart quote'' breakage, much less deal with accent marks in wo
Re: (Score:2)
English is fine for factual information like air traffic control or shipping, but it would never work for Japanese society. There are too many important things you can't adequately express in English that are essential to Japanese people. Same with Chinese.
Re: (Score:2)
Actually, English isn't very good for factual information either. It has too many homonyms, a very inconsistent spelling, too ambiguous sentences even with the very strict word order English has to use, no single language authority and too many standard variations.
Other Germanic languages are much more precise, as are Slavic languages. Due to the more complicated grammar and being synthetic instead of analytic, the meaning of a sentence is clear even if words in the sentence are shifted around, the spelling
Re: (Score:2)
Why do you think language gets overhauled in Orwell's 1984?
Because Orwell was a little too enamored of the so-called "Sapir-Whorf hypothesis" [wikipedia.org]? I hate to break it to you, but, despite its many obvious parallels to the real world, 1984 was ultimately a work of fiction.
While it's undeniable that language has some influence on culture and thought, the idea that it can be as influential as proposed by some early SF writers (e.g. Orwell, Jack Vance's The Languages of Pao, or Samuel Delaney's Babel-17) is mostly discredited.