Google Releases An Open Source Font That Supports 800 Languages (googleblog.com) 175
An anonymous Slashdot reader quotes Hot Hardware:
It's been working on the project over the past five years in collaboration with Monotype in hopes of eradicating so-called "tofu" -- the blank boxes you see when a PC or website can't display a particular text -- from the web. Noto, or No more tofu, is Google's answer, and it's available now to download...
"We are thrilled to have played such an important role in what has become one of the most significant type projects of all time," said Scott Landers, president and CEO of Monotype... Monotype played the biggest role, though Google also collaborated with Adobe and had a network of volunteer reviewers. As far as Monotype is concerned, Noto is one of the expansive typography projects ever undertaken.
There's 110,000 characters, and Google says the project "required design and technical testing in hundreds of languages."
"We are thrilled to have played such an important role in what has become one of the most significant type projects of all time," said Scott Landers, president and CEO of Monotype... Monotype played the biggest role, though Google also collaborated with Adobe and had a network of volunteer reviewers. As far as Monotype is concerned, Noto is one of the expansive typography projects ever undertaken.
There's 110,000 characters, and Google says the project "required design and technical testing in hundreds of languages."
Keeping up with the emojis (Score:2)
Re: (Score:1, Funny)
Isn't a lot of this due to all the new stuff that Unicode keeps adding? I still have a Bitstream Cyberbit font somewhere from... was it back in the late '90s? This is the same thing all over again, just up to date.
The whole rest of the world just needs to learn fuckin' English!
Signed,
Provincial Americans Everywhere (by "everywhere" I mean the USA -- clearly there is no "where" else to be! So what if Candians and other foreigners understand our politics better than we do. That just shows our awesomeness!)
Re: Keeping up with the emojis (Score:2, Funny)
I just need the Klingon word for mocking condescension to belittle you with.
Re: (Score:2)
Re: (Score:2)
You misspelled p'takh!
Re: Keeping up with the emojis (Score:5, Funny)
toDSaH
Wow, Klingons have a word for everything. They're like Space Germans.
Re: (Score:2)
Why not tell the rest of the world (including you) to learn Chinese, or Spanish, or Bangla? That's easy, right?
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
I'd say that's more of a feature than a bug.
Re: (Score:2)
Je vois ce que vous avez fait là.
Re: (Score:2)
There is no real french equivalent of "I see what you did there". (a.k.a. : saying in one expression you a witty remark) ... but it does not feel natural and they do not require more than 7 bits per character.
Perhaps one could say "joli" (nice), or "bien dit" (well said) maybe with a smiley next to it
Most french-speaking kids would probably end up using the english expression without a clue of how to write it nor its exact meaning.
Re: (Score:2)
> it's pretty much the only language you can be sure to find somebody that speaks it nomatter where you are
this is why i'm learning spanish.
Re:Keeping up with the emojis (Score:5, Informative)
Bitstream Cyberbit was closed source, and had a license incompatible with GPL. Noto is free and open source. The source files for the fonts, and the build tools, are all open.
Noto is an ongoing open source project that will continue to track the Unicode standard, while Cyberbit implemented Unicode 1.0.1 and then just stopped.
Noto has Sans and Serif variants in a range of weights and styles, unlike Cyberbit, which had only a single style and weight (serif).
So that's more than just "the same thing all over again".
hells teeth (Score:4, Interesting)
honestly
where is the mathematical fonts and symbols for science ?
STIX goes some way but why this is not in noto ?
why would you send a mathematical explanation into the stars but we cant express those notations on machines we use every day ?
thanks
John Jones
Google management is becoming more and more messy. (Score:2)
I notice that Noto Serif is a well-designed font. There is an italic and a bold, but no semi-bold. The Google Noto font download web page [google.com] is a mess. How is NotoSansMandaic-unhinted different from NotoSans? When I look at the font in Windows font preview, I see no difference.
I see many examples of Google management becoming more and more messy.
Re:Keeping up with the emojis (Score:4, Interesting)
Hate to say it but I consider the conversion of all emojis to tofu a feature, not a bug. The tofu neatly summarises the vacuousness of the original abomination... I mean, message.
Re: (Score:2)
So make your own branch of Noto called NoEmo, in which all emoji are rendered the same (possibly blank, possibly some generic 'this is an emoji' symbol.) It is open, so there is nothing to stop you.
Re: (Score:2, Insightful)
Re:Keeping up with the emojis (Score:4, Informative)
There are still multiple font files for different languages, because you can't have a unified "all language" font with Unicode. It's impossible to support Chinese, Japanese and Korean in the same font, for example.
Android's font rendering is excellent, has been for years. It also helps that many Android phones, even mid range ones from a few years back, have 1080p or better displays that start to rival print for DPI (400-500 PPI on the screen, 3x that horizontally with sub-pixel rendering, vs. 600 DPI for prints).
Google just want consistency everywhere and the ability to ship one font that covers all possible languages. You still need hacks because of the Unicode flaw mentioned above, but it's a big step none the less. AFAIK the only other open source font that tries to do this is GNU Unifont, but it's more functional that pretty.
Re: (Score:2)
Re: (Score:2)
Also... do we really want to load that large of a font when most people only use a fraction of the data?
The problem with this argument is that people only use a fraction of the data right up until the point where they don't, and then everything breaks. I don't speak a word of Japanese but I have Japanese fonts on my computer. Why? Because at some point something important was embedded in a PDF which had some Japanese in it and it refused to render. Up until that point I would have agreed with you, but really the ability to see things how they are supposed to be trumps having a broken view that could construe
Re: (Score:2)
Isn't a lot of this due to all the new stuff that Unicode keeps adding? I still have a Bitstream Cyberbit font somewhere from... was it back in the late '90s? This is the same thing all over again, just up to date.
Did Noto need to support emojis?
"Now available to download" link (Score:5, Informative)
https://www.google.com/get/not... [google.com] You're welcome
Came across this a few days ago when I borked my Slackware upgrade. Everything went fine except GUI login; X kept crashing because I deleted the fonts it was trying to use. One of the google search results was Noto.
All fonts = 472.6 MB.
Re: (Score:1)
Forgot to mention - this still doesn't solve the tofu problem since you need to have the font installed to not see tofu. In which case Google Web Fonts [google.com] is still the way to go. You just pick a font which supports your content/language. Or one of the Noto fonts.
Re:"Now available to download" link (Score:5, Informative)
1. On the emjoi's fonts [google.com] there's "Raised Hand With Part Between Middle And Ring Fingers" - WhyTF is that not called "live long and prosper"? Some fonts are described by how they look while others are described by what they mean. A bit inconsistent but I guess that's more of a Unicode consortium issue [unicode.org].
2. Some of the hand emoji's like "White Left Pointing Backhand Index" are all called "white..." even though they've clearly done the race/skin tone colour spectrum ala whatsapp [indiatimes.com].
2b. The colours are a second unicode code (emoji modifier sequence [unicode.org]) on the emoji ranging from U+1F3FB (white/pale) to 1F3FF (black/dark). (Btw, that's counter intuitive to programmers since RGB colour codes [google.com] have "#00" being dark and "#FF" being light.) P.S. I haven't decided if the skin colour aspect of emoji's is racist or not. There may be some people who found the default yellow emoji's racist.
Answer to #2 [unicode.org]:
Names of symbols such as BLACK MEDIUM SQUARE or WHITE MEDIUM SQUARE are not meant to indicate that the corresponding character must be presented in black or white, respectively; rather, the use of “black” and “white” in the names is generally just to contrast filled versus outline shapes, or a darker color fill versus a lighter color fill. Similarly, in other symbols such as the hands U+261A BLACK LEFT POINTING INDEX and U+261C WHITE LEFT POINTING INDEX, the words “white” and “black” also refer to outlined versus filled, and do not indicate skin color.
and
General-purpose emoji for people and body parts should also not be given overly specific images: the general recommendation is to be as neutral as possible regarding race, ethnicity, and gender. Thus for the character U+1F777 CONSTRUCTION WORKER, the recommendation is to use a neutral graphic like (with an orange skin tone) instead of an overly specific image like (with a light skin tone). This includes the emoji modifier base characters listed in Sample Emoji Modifier Bases. The emoji modifiers allow for variations in skin tone to be expressed.
Re:"Now available to download" link (Score:4, Insightful)
And helps Google track users one more way. Please be a good hacker and serve fonts from your own domain. Thank you.
Re: "Now available to download" link (Score:5, Insightful)
It's not always laziness (or tracking, from Google's perspective). Google sets a long cache value for most of these resources. If 10 different sites all host them individually, then someone visiting the site will have to download the fonts 10 times. Alternatively, if they all point to Google then they'll download once and cache the copy locally for the other 9 sites.
There was a proposal a couple of years ago to embed a cryptographic hash of the resource in the link. This would allow you to specify a download location, but if you've already downloaded the file from another source then you could still use it (it would also make caches more efficient, because you could set an infinite timeout and make clients redownload by having a different hash in the link - clients would keep their copy potentially forever, until you updated the version). I don't know of any browsers that implemented it though.
Don't favor minor cache savings over tracking. (Score:2)
Storage is cheap and plentiful these days; the caching argument doesn't convince me and minor improvements strike me as possibly nice conveniences but nothing significant. I'd rather promote not centralizing the web and not encouraging doing work with known trackers including Google.
Re: (Score:2)
Storage is cheap and plentiful these days;
Tell that to my mobile phone contract struggling under the weight of yet another multi-megabyte websites that does not need to be.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
By far the vast majority of that download size is taken up by the fonts for the 1000s of characters in Japanese, Korean, Simplified Chinese and Traditional Chinese.
All the other fonts only total to about 10mb or so.
Re:"Now available to download" link (Score:5, Informative)
Way back when Unicode decided to unify all the CJK glyphs they made several screwups in unifiying characters that were not actually the same in each of the languages. Aside from the character looking wrong in Chinese or Japanese (whichever language you don't have installed as default) they may sort differently in different languages so collation is wrong too. More information (note that you'll need a full CJK font and a browser supporting language selection to see the differences). [wikipedia.org]
Noto's solution was to create a font with every possible glyph, then for systems which can't support identifying the correct glyph based on language, they made versions of the fonts where the default characters are the Japanese versions or the Chinese versions or so on, then for embedded stuff they made versions of the fonts with just one language's characters. Noto's explanation of their CJK fonts [google.com]. In other words, you only need one of the 110MB font files.
Re: (Score:2)
Aside from the character looking wrong in Chinese or Japanese (whichever language you don't have installed as default) they may sort differently in different languages so collation is wrong too.
Collation shouldn't be broken. Collation is always locale-specific. German, English, and French all have different collation orders, even though they're using the same character set (how you sort capitals vs lower case vs accented variants is different in each). The only reason that this would break collation would be if, for example, Japanese sorts Chinese characters differently from the equivalent Kanji (does it? I have no idea).
Re: (Score:2, Informative)
German and Swedish might be a better example.
They both have ö and ä, but German orders ö like o and ä like a, while Swedish puts them after z.
And those very much ARE the same characters.
Re: (Score:2)
Real programmers avoid using MySQL.
The "Han Unification" hack does have it's problems (often exaggerated, but still there there), but I wouldn't say that that's the real problem: I think you're right about needing metadata for every string, and the real question in my mind is why isn't that part of unicode itself? There used to be a way to embed locale hints in the text, but that was deprecated with Unicode 5. WTF? What exactly were they thinking?
There's another issue I don't get at all, w
Re: (Score:2)
Oh, but of course, we're expected to rewrite every application where every box a user could type an international name or address or text has a separate drop down to select a language. That's totally less exasperating.
No you're not. If it's a desktop application, you get the locale from the local user's settings. If it's a web application, you get it from the Accept-Language HTTP request header field. And then you just use that. Since POSIX 2008, even libc has contained thread-safe interfaces for locale-aware sorting. If you're using a database that doesn't support locale-aware collation, then I suggest that you find one that doesn't suck: PostgreSQL has had support for it for well over a decade and can use either l
Re: (Score:2)
It's for situations where you allow user input, and don't want to limit them to entering text in a single language. Or if you want to display filenames, or the contents of e-mails, or whatever.
Re: (Score:2)
Imagine you were writing software for an airline that operates in East Asia. Naturally you have customers from Japan, China and Korea, and naturally they expect their names to be rendered correctly on your web site and on printed material like tickets and boarding passes. They expect to be able to book online. Note that HTML doesn't allow mixing Japanese and Chinese in the same page, the most you can do is Unicode and the browser is guaranteed to render some characters incorrectly for your international cus
Re: (Score:2)
Note that HTML doesn't allow mixing Japanese and Chinese in the same page
Note that the font is half a gigabyte and any web page that attempts to send it off to your browser, because a character might look slightly different otherwise, should be removed from the internet.
Re: (Score:2)
All modern operating systems come with Japanese and Chinese fonts. The issue is that each HTML page can only specify one character encoding. If it says "Unicode" it can also specify a language to give the computer a hint as to which font to use, but again only one.
If you look at pages like Chinese language lessons for Japanese readers they often use images or Flash to render the Chinese text correctly, because the browser can't do it. More recently it became possible to hack it with CSS and font stacks, but
Re:"Now available to download" link (Score:4, Interesting)
Yeah, but it's like "90% of people use 10% of features" - everyone uses a different 10%, so 100% of features are used. Similarly, everyone needs a different combination of languages, so if you're going to use one family of fonts, you want to have massive coverage.
Re: (Score:2)
The CJK Unified Ideographs block has 20950 assigned code points [wikipedia.org] most of which are significantly more complicated than Latin script. Add to that katakana, hiragana, hangul, radicals, and so on and there are a lot of characters, making the font significantly larger than fonts for latin-1.
Re: (Score:2)
All fonts = 472.6 MB.
That's for all of them. Individual fonts are reasonably and typically sized. Bear in mind, having these many more glyphs for so many languages does require them to be bigger.
Noto Sans: 657 KB (4 styles, 581 languages)
Noto Serif: 838 KB (4 styles, 581 languages)
Noto Mono: 69.5 KB (1 style, 209 languages) # this should have had 581 langs
Re: (Score:2)
Re: (Score:2)
Fonts generally don't have support for color. It's just lines, fills, and ligature instructions. There are just a LOT of languages out there.
Re: (Score:2)
Re: (Score:2)
That doesn't negate anything I've said. In fact, what you linked to says that support is rare or even difficult to get working. And that's even on Linux, so that's likely some sort of non-standard extension.
But you can see on the download page [google.com] that Noto color emoji is only 2.8MB.
Re: (Score:2)
Is that a combination of pyrotechnic + phone?
I think Samsung have a patent on that.
This should have been put together by Unicode (Score:5, Insightful)
The Unicode consortium should have published glyphs like these as part of the effort of defining the standard.
Why did it take a separate private company to do this?
Re: (Score:3)
The Unicode consortium should have published glyphs like these as part of the effort of defining the standard.
Why did it take a separate private company to do this?
Probably because building a consortium to even define the characters is hard enough and expensive. Getting buy-in from everyone in the consortium to develop high quality glyphs for orphan languages would have reduced overall support. I agree they should have, but I don't think most company's are as generous as Google.
Re: (Score:2)
Unicode doesn't consider renderings, that's why. A lot of characters can be rendered in multiple ways, but there is only one code point for all of them and it's up to the font designer which one they want to use. It's actually a huge problem in Chinese, Japanese and Korean, as well as other languages.
It's time Unicode was deprecated and we moved on to something better. There is the TRON system that fixes or avoids most of the problems with Unicode, for example. Wouldn't be much of a change for applications
Re: (Score:2)
It's time Unicode was deprecated and we moved on to something better.
So we can be even further behind the curve here?
Re: (Score:3)
Then along came emojis and the entire clusterfuck that led to.
Re: (Score:2)
When you choose to allocate your time to an open source project, you are choosing to allocate your "private" capital to that project.
That's true but there are also people who's public time is allocated to an open source project.
No programmers' typeface (Score:5, Insightful)
They have a monospaced typeface, but it's not useable for programming - doesn't even have a significant distinction between zero and O, let alone any other programmer-friendly features.
Since I presume they're going to want people at Google to use Noto as standard, it seems sensible to me that they create a programmers' version.
Re: (Score:2, Insightful)
I don't see why distinguishing between the zero digit and the letter O is more important for programmers than for anyone else. Sure, programmers might make mistakes when writing code and want to fix them; but that's true for other people writing text that might contain digits and letters, too.
If anything, distinguishing between the characters is less important for programmers than other people because programmers will already notice the problem when their code won't compile. I think it is very probable not
Re:No programmers' typeface (Score:4, Insightful)
...because programmers will already notice the problem when their code won't compile.
Substitutions of the letter 'O' for the number zero in numeric literals, function names, variable names, and other similar constructs will usually generate syntax errors, yes. (This makes me want to create a library called "Input0utput", just for headaches.)
However, the compiler probably won't notice if you make the substitution within a string or character literal (if the user types "Outbound", but the software is expecting "0utbound", this might be a hard problem to debug). I've only done this once or twice, but it was infuriating. It's one of those few times when commenting out the line and retyping it verbatim will actually fix the problem.
The fact that the keys are adjacent on QWERTY keyboards doesn't help anything.
...but that's true for other people writing text that might contain digits and letters, too.
I misunderstood this at first. I was picturing something like, "Mr. Orville's appointment is at 1O:OO.", where the substitution is harmless, so I didn't understand. In something like a model number, "MSO001" might be the first (001) release of a Mixed Signal Oscilloscope (MSO). Writing it as "MSOOO1" definitely obfuscates the meaning behind the model number. Of course, "MSO-001" would probably be best, but it's preferable to match the label on the hardware itself. So yes, I see your point.
But no, I'm firmly of the belief that the average programmer has a greater need (than the average typist) for easily distinguishable characters.
Re: (Score:2)
I see this mistake a lot with my girlfriend's handwritten text entries. She writes in Chinese and occasionally inserts Arabic numerals (0123456789). The zero is often interpreted either as a capital O or as a Chinese character that seems to have been adopted from Japanese that is just a perfect circle, used as a substitute for censored characters. It's similar to how newspapers write "sh*t" in English (maybe it's a British thing).
She knows my Chinese is crap so sometimes writes '9' in Chinese and then selec
Re: (Score:2)
I'm pretty sure AmiMoJo meant arabic numerals [wikipedia.org].
Re:No programmers' typeface (Score:4, Insightful)
Where I find the problem is in randomly generated passwords. I have a large spreadsheet of VPN passwords for users at work that I had to change the the password column to an OCR font just to make sure I was giving out the correct code.
The original C64 had this issue which was worse on the SX64 with its 5" screen. I went as far as to design a custom font and burn it into the font EPROM.
Re: (Score:2)
Where I find the problem is in randomly generated passwords.
Yes. KeePassXs "exclude lookalike characters" when generating is really useful. I doesn't drop that many bits, and for most situations I can just make the PW a bit longer if it's a concern.
Trying to type a "random" generated password with lookalikes is an exercise in futility.
Re: (Score:2)
I once transcribed a program from a magazine into my first computer... as a hex dump.
The magazine chose a font where 0, 8, and B were practically identical. That's ~20% of the hexadecimal digit space that's confusing.
I guess I was a glutton for punishment, because I did get the program to run.
Re: (Score:2)
Saying oh for zero is common in (British) English.
Dialing code for London:
020 - Oh two oh.
Start of a telephone number:
700 - seven double oh.
International dialing code for the US:
00 1 - oh oh one. (Don't know why we don't say double oh but I've never heard it said that way.)
Bus number:
205 - two oh five
In normal spoken or written English you can usually determine whether it's a zero or a letter-o from the context and where you can't it rarely matters.
Re: (Score:2)
00 1 - oh oh one. (Don't know why we don't say double oh but I've never heard it said that way.)
You mean in the same way that nobody says "double oh seven?"
Re: (Score:2)
The international dialing code for Kazakhstan from the UK would be 00 7 (I've just looked it up). I've never heard anyone quote a Kazakhstan telephone number to call from the UK but I would expect them to say oh oh seven, not double oh seven. Apart from anything else, if you did try to tell someone a Kazakhstan telephone number and started double oh seven I'd expect them to not hear the rest of the number while they were laughing.
Re: (Score:2)
Since I presume they're going to want people at Google to use Noto as standard, it seems sensible to me that they create a programmers' version.
What kind of madness makes you presume Google wants all its employees to use this font?
Tell, I'm genuinely curious. Do you also believe they do all their programming on phones running Android? Or do you suppose they might be allowed to use, I don't know, laptops or normal desktops?
Re: (Score:2)
This font is intended as the fallback font. When the currently selected font doesn't have a glyph for the desired codepoint, your font engine will provide a substitute. It will start with similar styles (e.g. sans serif, monospace) and if that fails it will fall back to a generic font that has large coverage. That's the point of this font. If you're using it for most of the glyphs you're rendering, then you're doing it wrong.
If you want a good font for programming, Adobe released Source Code Pro [github.com] a coup
Re: (Score:2)
Except Source Code Pro only contains English glyphs, so it's useless for e.g. debugging exotic-language XML files. I keep switching between Source Code Pro and Arial Unicode MS, which has pretty good language support.
Re: (Score:2)
Slashdot solved the tofu problem long ago (Score:1)
Horrible Mono Font (Score:2)
That lowercase 'm' is a horror show. Simply awful.
It's also no good as a coding font (lack of distinction between various problematic glyphs) but that's probably not its audience.
Re: (Score:2)
Yeah, it's a bit naff, and obviously not their main focus. Luckily, there are tons of awesome monospaced fonts out there, and coding rarely needs full Unicode coverage.
Re: (Score:2)
I don't think it's intended for use as a general-purpose font at all. Just for filling in gaps if the font you're reading in is missing a glyph for a particular codepoint. As an English reader/writer, it's unlikely you'll be seeing an 'm' substituted in.
Anywhere you would see a square box now for missing characters, this font would render in. Will be really useful for viewing Wikipedia (where I see this the most).
Repairing the Unicode Consortium Clusterfuck (Score:5, Interesting)
Then without consulting Asian language speakers they decided to combine all the Asian language characters - including those that were physically different.The result was like some elitist looking at the Greek and Roman alphabets and deciding 'a' is a lot like alpha, 'b' a lot like beta, so why not comine the two of them into a single alphabet, then tell you your name isn't Sam, it's "S". (Slashdot probably won't display this but you get the idea.) This affected eastern and central and south east asian languages.
This created the absurd situation where some people couldn't even spell write their names or enter them into databases prompting the famous "I Can Text You A Pile of Poo, But I Can't Write My Name" https://modelviewculture.com/p... [modelviewculture.com]
When it was pointed out did the Unicode Consortium admit they fucked up and fix it? Nope. They dug in their heels and insisted each country produce their own font which would display each Unicode character differently to suit their own language. Given the original goals of Unicode this was an amazing backflip. https://en.wikipedia.org/wiki/... [wikipedia.org] https://books.google.com/books... [google.com] https://plus.google.com/+LizHa... [google.com] There are other problems too: The encoding the consortium expected makes asian codepages use more space than the standards they were supposed to replace. This was stupid since ASCII was already super efficient for English language, so what was the point?
If you only write English language software and ASCII is good enough you won't notice any of this but if you have to write International software it's a nightmare. Yes, you might think adding Unicode support allows any your app to run in any language, but it doesn't work like that because of this clusterfuck. You still have to provide different fonts for different countries, and you often have to provide support for old codepages (the various BIG5 variants) for fallback which Unicode was supposed to replace. It also makes translation very hard.
But Unicode fixed it eventually? Nope. The Unicode consortium continued to ignore it to this very day and instead started churning out stupid emoji: a steaming pile of poo, a taco, and farcical 'equality' emoticons. https://www.theguardian.com/te... [theguardian.com] https://www.theguardian.com/ar... [theguardian.com]
I hope this new font gives us one font which can display all languages and fuck the Unicode Consortium
Re: (Score:2)
Mod Parent +1 Informative !
I've running into my own problems of Unicode's shortsightedness.
2 common glyph are:
* mouse pointer (See fa-mouse-pointer [] [fontawesome.io])
* cardinal 4 direction arrows (such as used on Windows, Move) (See fa-arrows [] [fontawesome.io])
Yet are nowhere to be found in Unicode.
You're definitely right - the Unicode Consortium is more interested in fluff crap like emoji then practical stuff.
If the Unicode Consortium didn't have their head's up their asses we wouldn't even need fonts like Font Awesome [fontawesome.io]
The funny thing
Re: (Score:2)
Arrows are definitely [wikipedia.org] present [wikipedia.org] in Unicode.
Re: (Score:2)
And the cardinal 4 direction arrows [wikimedia.org] are _where_ again in Unicode ??
We're not talking about general arrows, we are talking about a specific arrow. If you don't want to look like a fool, learn to read before replying, please.
Re: (Score:3)
I've been using the Noto font(s) for a while, they're installed by default in Linux Mint (probably Ubuntu and others, too), so I assume this is an incremental release, where they've finally achieved some semblance of full(ish) coverage.
While I have a couple of minor issues with the fonts design (the lowercase 'm' and 0/O distinction in Noto Mono are atrocious), the font is quite nice on the whole. And while I will never personally use all of the myriads of different scripts included, I whole-heartedly appla
Re:Repairing the Unicode Consortium Clusterfuck (Score:5, Informative)
It's even worse than that. On many systems, e.g. Windows, w_char is defined as 16 bits, meaning it can only ever support the Unicode Basic Multilingual Plane without hacks. Since a lot of the fixed CJK characters are outside this plane, software that uses w_char usually doesn't support them. Some of this is baked into hardware, for example Unicode uses UTF16,
I'm seriously thinking about writing an open source library to support TRON encoding. The lack of a good alternative seems to be what is preventing Unicode from being deprecated in favour of something better.
Re: (Score:2)
On many systems, e.g. Windows, w_char is defined as 16 bits, meaning it can only ever support the Unicode Basic Multilingual Plane without hacks.
True UTF-16 supports non-BMP code points just fine, and is not a "hack". In fact, it's actually slightly easier to do so in UTF-16 than with UTF-8 (the only other common Unicode encoding).
The real problem is that there is no single concept in Unicode that maps to the "character" of the old, simple ASCII standard with which most programmers are familiar. Depending on the task at hand, the correct substitute under Unicode may be code units, code points, or graphemes. Ignorant and/or lazy programmers who make
Re: (Score:2)
A lot of developers just throw in Unicode support and assume their software supports all languages. We need something better that actually does that, rather than Unicode.
Re: (Score:2)
I get the feeling that you don't understand how numerous, complex, arbitrary, diverse, ambiguous, etc. natural languages are. That phrase, "all languages", doesn't even have a knowable, well-defined meaning, either in theory or in practice.
It would certainly be possible to improve upon Unicode, if you're willing to sacrifice backwards compatibility. However, it will never achieve your stated goal of guaranteeing support for "all languages" just by "throwing in" a new text processing library.
Projects that re
Re: (Score:2)
That was my point. Developers who aren't experts in languages just assume that if they tick the Unicode box in the compiler options their software supports everything, but in reality that's far from the case.
Re: (Score:2)
Note that this new font doesn't fix the 'Han unification' problem. It just provides 3 versions of the font, one for C, one for J and one for K. This sidesteps the clusterfuck (and forces you to select a different font for each language), but does not fix it.
Re: (Score:2)
This is a substitution/fallback font - and shouldn't be used for design or UI except where the chosen font is missing a character. If your native language is Chinese, you won't be using this font to view any glyphs that are already included in your Chinese font.
That's great .. there's nothing more annoying (Score:2)
Re: (Score:2)
I'm sure a lot of East Asian people share your annoyance
Re: (Score:2)
Accept headers schmaccept schmeaders. (Score:2)
I can see why this is important to Google, since they seem to like showing me ads in the wrong language.
Re: (Score:2)
Feature, not a bug. Be quiet.
"Reelelsed"? When? (Score:2)
http://www.google.com/get/noto/updates [google.com]
Last entry: "September 29, 2015"
Yeah... so it's the same thing I downloaded and installed last year.
I'm so glad Slashdot is catching up...
This is dumb. (Score:2)
If I don't have the character set installed that the page is written in, then its probably because I can't read that language any way.
And I damn sure don't want to load every character set that exists on the web into my browser. It would run like balls.
If you go to a page that has characters that your browser doesn't understand and you need to get to the information, use Google translate.
tofu is faster (Score:2)
Don't show me glyphs that I am not trained to read. i'd really rather see square boxes in situations where foreign text was displayed with the wrong font. Wrong font being the font that I'm using.
Re: (Score:2)
Re: (Score:2)
why is there not a single "Noto serif" font that combines them all? Or how else is one supposed to configure the browser now to give access to all those symbols?
A single font for all of them, as has been mentioned above, is possible but would be over 400MB, which is a problem for some of us.
Browsers will search other available fonts for a code point that's not in the current font, so you can install a collection of subset fonts that includes all the characters you are likely to need.
Re: (Score:2)
That's not what Unicode is for.
If you want Serif or Sans Serif, those are entirely different typefaces.
If you want monospaced or not, again those are entirely different typefaces.
All Unicode does - especially when you combine it with TrueType semantics or want a font that works everywhere - is provide characters for everything you might need.
Re: (Score:2)
I for one welcome our sharks with lasers on their heads, eating hot grits, suitcase cracking overlords! From Soviet Russia, in the name of longcat.
Whatever happened to Natalie Portman...?