Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Google Open Source

Google Releases An Open Source Font That Supports 800 Languages (googleblog.com) 175

An anonymous Slashdot reader quotes Hot Hardware: It's been working on the project over the past five years in collaboration with Monotype in hopes of eradicating so-called "tofu" -- the blank boxes you see when a PC or website can't display a particular text -- from the web. Noto, or No more tofu, is Google's answer, and it's available now to download...

"We are thrilled to have played such an important role in what has become one of the most significant type projects of all time," said Scott Landers, president and CEO of Monotype... Monotype played the biggest role, though Google also collaborated with Adobe and had a network of volunteer reviewers. As far as Monotype is concerned, Noto is one of the expansive typography projects ever undertaken.

There's 110,000 characters, and Google says the project "required design and technical testing in hundreds of languages."
This discussion has been archived. No new comments can be posted.

Google Releases An Open Source Font That Supports 800 Languages

Comments Filter:
  • Isn't a lot of this due to all the new stuff that Unicode keeps adding? I still have a Bitstream Cyberbit font somewhere from... was it back in the late '90s? This is the same thing all over again, just up to date.
    • Re: (Score:1, Funny)

      by Anonymous Coward

      Isn't a lot of this due to all the new stuff that Unicode keeps adding? I still have a Bitstream Cyberbit font somewhere from... was it back in the late '90s? This is the same thing all over again, just up to date.

      The whole rest of the world just needs to learn fuckin' English!

      Signed,

      Provincial Americans Everywhere (by "everywhere" I mean the USA -- clearly there is no "where" else to be! So what if Candians and other foreigners understand our politics better than we do. That just shows our awesomeness!)

    • by dmoen ( 88623 ) on Sunday October 09, 2016 @09:33PM (#53044673) Homepage

      Bitstream Cyberbit was closed source, and had a license incompatible with GPL. Noto is free and open source. The source files for the fonts, and the build tools, are all open.

      Noto is an ongoing open source project that will continue to track the Unicode standard, while Cyberbit implemented Unicode 1.0.1 and then just stopped.

      Noto has Sans and Serif variants in a range of weights and styles, unlike Cyberbit, which had only a single style and weight (serif).

      So that's more than just "the same thing all over again".

    • by Anonymous Coward on Sunday October 09, 2016 @10:14PM (#53044795)

      Hate to say it but I consider the conversion of all emojis to tofu a feature, not a bug. The tofu neatly summarises the vacuousness of the original abomination... I mean, message.

      • So make your own branch of Noto called NoEmo, in which all emoji are rendered the same (possibly blank, possibly some generic 'this is an emoji' symbol.) It is open, so there is nothing to stop you.

    • Re: (Score:2, Insightful)

      by DraconPern ( 521756 )
      I think it's more, this is all the glyph in one font, where as before, you had Chinese, Arabic etc. all in separate fonts. The other half the problem google had was that they didn't have good font rendering in Android, e.g. how you actually render the font. Microsoft, Apple, and Adobe had it figured out a long time ago and all that knowledge is part of the OS. So google is basically just playing catch up and open sourcing the data part. Also... do we really want to load that large of
      • by AmiMoJo ( 196126 ) on Monday October 10, 2016 @05:58AM (#53046231) Homepage Journal

        There are still multiple font files for different languages, because you can't have a unified "all language" font with Unicode. It's impossible to support Chinese, Japanese and Korean in the same font, for example.

        Android's font rendering is excellent, has been for years. It also helps that many Android phones, even mid range ones from a few years back, have 1080p or better displays that start to rival print for DPI (400-500 PPI on the screen, 3x that horizontally with sub-pixel rendering, vs. 600 DPI for prints).

        Google just want consistency everywhere and the ability to ship one font that covers all possible languages. You still need hacks because of the Unicode flaw mentioned above, but it's a big step none the less. AFAIK the only other open source font that tries to do this is GNU Unifont, but it's more functional that pretty.

      • Also... do we really want to load that large of a font when most people only use a fraction of the data?

        The problem with this argument is that people only use a fraction of the data right up until the point where they don't, and then everything breaks. I don't speak a word of Japanese but I have Japanese fonts on my computer. Why? Because at some point something important was embedded in a PDF which had some Japanese in it and it refused to render. Up until that point I would have agreed with you, but really the ability to see things how they are supposed to be trumps having a broken view that could construe

    • Isn't a lot of this due to all the new stuff that Unicode keeps adding? I still have a Bitstream Cyberbit font somewhere from... was it back in the late '90s? This is the same thing all over again, just up to date.

      Did Noto need to support emojis?

  • by aneroid ( 856995 ) <gmail> on Sunday October 09, 2016 @08:59PM (#53044603) Homepage Journal

    https://www.google.com/get/not... [google.com] You're welcome

    Came across this a few days ago when I borked my Slackware upgrade. Everything went fine except GUI login; X kept crashing because I deleted the fonts it was trying to use. One of the google search results was Noto.

    All fonts = 472.6 MB.

    • by aneroid ( 856995 )

      Forgot to mention - this still doesn't solve the tofu problem since you need to have the font installed to not see tofu. In which case Google Web Fonts [google.com] is still the way to go. You just pick a font which supports your content/language. Or one of the Noto fonts.

      • by aneroid ( 856995 ) <gmail> on Sunday October 09, 2016 @09:42PM (#53044691) Homepage Journal

        1. On the emjoi's fonts [google.com] there's "Raised Hand With Part Between Middle And Ring Fingers" - WhyTF is that not called "live long and prosper"? Some fonts are described by how they look while others are described by what they mean. A bit inconsistent but I guess that's more of a Unicode consortium issue [unicode.org].

        2. Some of the hand emoji's like "White Left Pointing Backhand Index" are all called "white..." even though they've clearly done the race/skin tone colour spectrum ala whatsapp [indiatimes.com].

        2b. The colours are a second unicode code (emoji modifier sequence [unicode.org]) on the emoji ranging from U+1F3FB (white/pale) to 1F3FF (black/dark). (Btw, that's counter intuitive to programmers since RGB colour codes [google.com] have "#00" being dark and "#FF" being light.) P.S. I haven't decided if the skin colour aspect of emoji's is racist or not. There may be some people who found the default yellow emoji's racist.

        Answer to #2 [unicode.org]:
         

        Names of symbols such as BLACK MEDIUM SQUARE or WHITE MEDIUM SQUARE are not meant to indicate that the corresponding character must be presented in black or white, respectively; rather, the use of “black” and “white” in the names is generally just to contrast filled versus outline shapes, or a darker color fill versus a lighter color fill. Similarly, in other symbols such as the hands U+261A BLACK LEFT POINTING INDEX and U+261C WHITE LEFT POINTING INDEX, the words “white” and “black” also refer to outlined versus filled, and do not indicate skin color.

        and

        General-purpose emoji for people and body parts should also not be given overly specific images: the general recommendation is to be as neutral as possible regarding race, ethnicity, and gender. Thus for the character U+1F777 CONSTRUCTION WORKER, the recommendation is to use a neutral graphic like (with an orange skin tone) instead of an overly specific image like (with a light skin tone). This includes the emoji modifier base characters listed in Sample Emoji Modifier Bases. The emoji modifiers allow for variations in skin tone to be expressed.

      • by ptaff ( 165113 ) on Sunday October 09, 2016 @10:32PM (#53044845) Homepage

        Google Web Fonts is still the way to go.

        And helps Google track users one more way. Please be a good hacker and serve fonts from your own domain. Thank you.

      • In HTML5 you can serve fonts, so it's just a matter of including Noto on sites where tofu might be a problem.
    • Big. But with some luck, they will be integrated into Chrome, at least the main ones, regular / bold / italics. The size would go down 75+%.
  • The Unicode consortium should have published glyphs like these as part of the effort of defining the standard.

    Why did it take a separate private company to do this?

    • The Unicode consortium should have published glyphs like these as part of the effort of defining the standard.

      Why did it take a separate private company to do this?

      Probably because building a consortium to even define the characters is hard enough and expensive. Getting buy-in from everyone in the consortium to develop high quality glyphs for orphan languages would have reduced overall support. I agree they should have, but I don't think most company's are as generous as Google.

    • by AmiMoJo ( 196126 )

      Unicode doesn't consider renderings, that's why. A lot of characters can be rendered in multiple ways, but there is only one code point for all of them and it's up to the font designer which one they want to use. It's actually a huge problem in Chinese, Japanese and Korean, as well as other languages.

      It's time Unicode was deprecated and we moved on to something better. There is the TRON system that fixes or avoids most of the problems with Unicode, for example. Wouldn't be much of a change for applications

      • It's time Unicode was deprecated and we moved on to something better.

        So we can be even further behind the curve here?

    • The entire point of unicode is that the glyphs are separate from the codepoints. The codepoints (defined by the unicode spec) convey semantics, not presentation. There are lots of different (valid) ways of representing each codepoint (if there weren't, then you wouldn't need fonts at all).

      Then along came emojis and the entire clusterfuck that led to.

  • by tdelaney ( 458893 ) on Sunday October 09, 2016 @10:51PM (#53044895)

    They have a monospaced typeface, but it's not useable for programming - doesn't even have a significant distinction between zero and O, let alone any other programmer-friendly features.

    Since I presume they're going to want people at Google to use Noto as standard, it seems sensible to me that they create a programmers' version.

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      I don't see why distinguishing between the zero digit and the letter O is more important for programmers than for anyone else. Sure, programmers might make mistakes when writing code and want to fix them; but that's true for other people writing text that might contain digits and letters, too.

      If anything, distinguishing between the characters is less important for programmers than other people because programmers will already notice the problem when their code won't compile. I think it is very probable not

      • by Hypoon ( 1095383 ) on Monday October 10, 2016 @12:43AM (#53045291)

        ...because programmers will already notice the problem when their code won't compile.

        Substitutions of the letter 'O' for the number zero in numeric literals, function names, variable names, and other similar constructs will usually generate syntax errors, yes. (This makes me want to create a library called "Input0utput", just for headaches.)

        However, the compiler probably won't notice if you make the substitution within a string or character literal (if the user types "Outbound", but the software is expecting "0utbound", this might be a hard problem to debug). I've only done this once or twice, but it was infuriating. It's one of those few times when commenting out the line and retyping it verbatim will actually fix the problem.

        The fact that the keys are adjacent on QWERTY keyboards doesn't help anything.

        ...but that's true for other people writing text that might contain digits and letters, too.

        I misunderstood this at first. I was picturing something like, "Mr. Orville's appointment is at 1O:OO.", where the substitution is harmless, so I didn't understand. In something like a model number, "MSO001" might be the first (001) release of a Mixed Signal Oscilloscope (MSO). Writing it as "MSOOO1" definitely obfuscates the meaning behind the model number. Of course, "MSO-001" would probably be best, but it's preferable to match the label on the hardware itself. So yes, I see your point.

        But no, I'm firmly of the belief that the average programmer has a greater need (than the average typist) for easily distinguishable characters.

        • by AmiMoJo ( 196126 )

          I see this mistake a lot with my girlfriend's handwritten text entries. She writes in Chinese and occasionally inserts Arabic numerals (0123456789). The zero is often interpreted either as a capital O or as a Chinese character that seems to have been adopted from Japanese that is just a perfect circle, used as a substitute for censored characters. It's similar to how newspapers write "sh*t" in English (maybe it's a British thing).

          She knows my Chinese is crap so sometimes writes '9' in Chinese and then selec

      • by Nethead ( 1563 ) <joe@nethead.com> on Monday October 10, 2016 @02:01AM (#53045517) Homepage Journal

        Where I find the problem is in randomly generated passwords. I have a large spreadsheet of VPN passwords for users at work that I had to change the the password column to an OCR font just to make sure I was giving out the correct code.

        The original C64 had this issue which was worse on the SX64 with its 5" screen. I went as far as to design a custom font and burn it into the font EPROM.

        • Where I find the problem is in randomly generated passwords.

          Yes. KeePassXs "exclude lookalike characters" when generating is really useful. I doesn't drop that many bits, and for most situations I can just make the PW a bit longer if it's a concern.

          Trying to type a "random" generated password with lookalikes is an exercise in futility.

      • by Ken D ( 100098 )

        I once transcribed a program from a magazine into my first computer... as a hex dump.
        The magazine chose a font where 0, 8, and B were practically identical. That's ~20% of the hexadecimal digit space that's confusing.

        I guess I was a glutton for punishment, because I did get the program to run.

    • Since I presume they're going to want people at Google to use Noto as standard, it seems sensible to me that they create a programmers' version.

      What kind of madness makes you presume Google wants all its employees to use this font?

      Tell, I'm genuinely curious. Do you also believe they do all their programming on phones running Android? Or do you suppose they might be allowed to use, I don't know, laptops or normal desktops?

    • This font is intended as the fallback font. When the currently selected font doesn't have a glyph for the desired codepoint, your font engine will provide a substitute. It will start with similar styles (e.g. sans serif, monospace) and if that fails it will fall back to a generic font that has large coverage. That's the point of this font. If you're using it for most of the glyphs you're rendering, then you're doing it wrong.

      If you want a good font for programming, Adobe released Source Code Pro [github.com] a coup

      • Except Source Code Pro only contains English glyphs, so it's useless for e.g. debugging exotic-language XML files. I keep switching between Source Code Pro and Arial Unicode MS, which has pretty good language support.

  • That lowercase 'm' is a horror show. Simply awful.

    It's also no good as a coding font (lack of distinction between various problematic glyphs) but that's probably not its audience.

    • Yeah, it's a bit naff, and obviously not their main focus. Luckily, there are tons of awesome monospaced fonts out there, and coding rarely needs full Unicode coverage.

    • I don't think it's intended for use as a general-purpose font at all. Just for filling in gaps if the font you're reading in is missing a glyph for a particular codepoint. As an English reader/writer, it's unlikely you'll be seeing an 'm' substituted in.

      Anywhere you would see a square box now for missing characters, this font would render in. Will be really useful for viewing Wikipedia (where I see this the most).

  • by Anonymous Coward on Monday October 10, 2016 @12:32AM (#53045251)
    Thank you Google! This is badly needed because the Unicode Consortium screwed up Asian language support badly. The problem started when a bunch of Silicon Valley WASPS got together and formed the Unicode Consortium. Their experts were a joke. They had a foreign language expert who by his own admission couldn't speak the language he was supposedly expert it.

    Then without consulting Asian language speakers they decided to combine all the Asian language characters - including those that were physically different.The result was like some elitist looking at the Greek and Roman alphabets and deciding 'a' is a lot like alpha, 'b' a lot like beta, so why not comine the two of them into a single alphabet, then tell you your name isn't Sam, it's "S". (Slashdot probably won't display this but you get the idea.) This affected eastern and central and south east asian languages.

    This created the absurd situation where some people couldn't even spell write their names or enter them into databases prompting the famous "I Can Text You A Pile of Poo, But I Can't Write My Name" https://modelviewculture.com/p... [modelviewculture.com]

    When it was pointed out did the Unicode Consortium admit they fucked up and fix it? Nope. They dug in their heels and insisted each country produce their own font which would display each Unicode character differently to suit their own language. Given the original goals of Unicode this was an amazing backflip. https://en.wikipedia.org/wiki/... [wikipedia.org] https://books.google.com/books... [google.com] https://plus.google.com/+LizHa... [google.com] There are other problems too: The encoding the consortium expected makes asian codepages use more space than the standards they were supposed to replace. This was stupid since ASCII was already super efficient for English language, so what was the point?

    If you only write English language software and ASCII is good enough you won't notice any of this but if you have to write International software it's a nightmare. Yes, you might think adding Unicode support allows any your app to run in any language, but it doesn't work like that because of this clusterfuck. You still have to provide different fonts for different countries, and you often have to provide support for old codepages (the various BIG5 variants) for fallback which Unicode was supposed to replace. It also makes translation very hard.

    But Unicode fixed it eventually? Nope. The Unicode consortium continued to ignore it to this very day and instead started churning out stupid emoji: a steaming pile of poo, a taco, and farcical 'equality' emoticons. https://www.theguardian.com/te... [theguardian.com] https://www.theguardian.com/ar... [theguardian.com]

    I hope this new font gives us one font which can display all languages and fuck the Unicode Consortium
    • Mod Parent +1 Informative !

      I've running into my own problems of Unicode's shortsightedness.

      2 common glyph are:

      * mouse pointer (See fa-mouse-pointer [] [fontawesome.io])
      * cardinal 4 direction arrows (such as used on Windows, Move) (See fa-arrows [] [fontawesome.io])

      Yet are nowhere to be found in Unicode.

      You're definitely right - the Unicode Consortium is more interested in fluff crap like emoji then practical stuff.

      If the Unicode Consortium didn't have their head's up their asses we wouldn't even need fonts like Font Awesome [fontawesome.io]

      The funny thing

    • I've been using the Noto font(s) for a while, they're installed by default in Linux Mint (probably Ubuntu and others, too), so I assume this is an incremental release, where they've finally achieved some semblance of full(ish) coverage.

      While I have a couple of minor issues with the fonts design (the lowercase 'm' and 0/O distinction in Noto Mono are atrocious), the font is quite nice on the whole. And while I will never personally use all of the myriads of different scripts included, I whole-heartedly appla

    • by AmiMoJo ( 196126 ) on Monday October 10, 2016 @04:44AM (#53046015) Homepage Journal

      It's even worse than that. On many systems, e.g. Windows, w_char is defined as 16 bits, meaning it can only ever support the Unicode Basic Multilingual Plane without hacks. Since a lot of the fixed CJK characters are outside this plane, software that uses w_char usually doesn't support them. Some of this is baked into hardware, for example Unicode uses UTF16,

      I'm seriously thinking about writing an open source library to support TRON encoding. The lack of a good alternative seems to be what is preventing Unicode from being deprecated in favour of something better.

      • On many systems, e.g. Windows, w_char is defined as 16 bits, meaning it can only ever support the Unicode Basic Multilingual Plane without hacks.

        True UTF-16 supports non-BMP code points just fine, and is not a "hack". In fact, it's actually slightly easier to do so in UTF-16 than with UTF-8 (the only other common Unicode encoding).

        The real problem is that there is no single concept in Unicode that maps to the "character" of the old, simple ASCII standard with which most programmers are familiar. Depending on the task at hand, the correct substitute under Unicode may be code units, code points, or graphemes. Ignorant and/or lazy programmers who make

        • by AmiMoJo ( 196126 )

          A lot of developers just throw in Unicode support and assume their software supports all languages. We need something better that actually does that, rather than Unicode.

          • I get the feeling that you don't understand how numerous, complex, arbitrary, diverse, ambiguous, etc. natural languages are. That phrase, "all languages", doesn't even have a knowable, well-defined meaning, either in theory or in practice.

            It would certainly be possible to improve upon Unicode, if you're willing to sacrifice backwards compatibility. However, it will never achieve your stated goal of guaranteeing support for "all languages" just by "throwing in" a new text processing library.

            Projects that re

            • by AmiMoJo ( 196126 )

              That was my point. Developers who aren't experts in languages just assume that if they tick the Unicode box in the compiler options their software supports everything, but in reality that's far from the case.

    • Note that this new font doesn't fix the 'Han unification' problem. It just provides 3 versions of the font, one for C, one for J and one for K. This sidesteps the clusterfuck (and forces you to select a different font for each language), but does not fix it.

      • This is a substitution/fallback font - and shouldn't be used for design or UI except where the chosen font is missing a character. If your native language is Chinese, you won't be using this font to view any glyphs that are already included in your Chinese font.

  • That's great .. there's nothing more annoying than having little rectangles on a web page instead of the proper glyph that you wouldn't understand anyway!
  • I can see why this is important to Google, since they seem to like showing me ads in the wrong language.

  • http://www.google.com/get/noto/updates [google.com]

    Last entry: "September 29, 2015"

    Yeah... so it's the same thing I downloaded and installed last year.

    I'm so glad Slashdot is catching up...

  • So what are they using besides "tofu", nothing? Blank spaces?
    If I don't have the character set installed that the page is written in, then its probably because I can't read that language any way.
    And I damn sure don't want to load every character set that exists on the web into my browser. It would run like balls.
    If you go to a page that has characters that your browser doesn't understand and you need to get to the information, use Google translate.
  • Don't show me glyphs that I am not trained to read. i'd really rather see square boxes in situations where foreign text was displayed with the wrong font. Wrong font being the font that I'm using.

The debate rages on: Is PL/I Bachtrian or Dromedary?

Working...