Unicode 6.1 Released 170

Posted by Unknown Lamer on Wednesday February 01, 2012 @12:19PM from the tent-emoji-style dept.

An anonymous reader writes "The latest version of the Unicode standard (v. 6.1.0) was officially released January 31. The latest version includes 732 new characters, including seven brand new scripts. It also adds support for distinguishing emoji-style and text-style symbols and emoticons with variation selectors, updates to the line-breaking algorithm to more accurately reflect Japanese and Hebrew texts, and updates other algorithms and technical notes to reflect new characters and newly documented text behaviors."

This discussion has been archived. No new comments can be posted.

Unicode 6.1 Released

Load All Comments

Search 170 Comments Log In/Create an Account

Comments Filter:

27cb appearing in HTML in 5.4.3.2.1... (Score:3)

by vlm ( 69642 ) writes: on Wednesday February 01, 2012 @12:26PM (#38892187)

Take a good look at glyph 27cb aka \diagup part of the Misc Math Symbols. People are gonna try embedding that in html now. Can't wait.

Share
twitter facebook
- - - Re: (Score:2)
      
      by vlm ( 69642 ) writes:
      
      Thats a good once, but I'm also worried about html parsers needing to understand half a dozen variants of the "closing slash"
      - Re: (Score:3)
        
        by BetterThanCaesar ( 625636 ) writes:
        
        Parsers of XML, HTML and SGML need and may only support U+002F SOLIDUS as "closing slash". If that weren't the case, we'd already have problems with people writing <htm1> and <B0DY>.
Favourite unicode character (Score:3, Interesting)

by Cocodude ( 693069 ) writes: on Wednesday February 01, 2012 @12:26PM (#38892189) Homepage

has got to be the Love Hotel [fileformat.info].
Does anyone know why this is even there?

Share
twitter facebook
- Re: (Score:3)
  
  by vlm ( 69642 ) writes:
  
  As if http://www.fileformat.info/info/unicode/char/1f4be/index.htm [fileformat.info] makes sense to anyone under age 30. I demand the addition of a punchcard glyph...
  - Re: (Score:2)
    
    by tepples ( 727027 ) writes:
    
    What better icon is there for the action of committing an edited document to storage?
    - Re: (Score:3)
      
      by am 2k ( 217885 ) writes:
      
      The "don't bother me with those implementation details"-icon?
      - Re: (Score:2)
        
        by tepples ( 727027 ) writes:
        
        What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?
        
        Re: (Score:2)
        
        by am 2k ( 217885 ) writes:
        
        What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?
        The location where the data ist stored (RAM vs. harddrive). There are some effects that play against each other here:
        
        For editing, the data has to be in RAM (at least the part that's edited at the moment).
        When the data is in RAM, but not on the disk, the state is lost after a crash or sudden power loss. This is undesirable.
        Copying from RAM to harddrive (aka "saving") takes time.
        As computers get better, the latter effect becomes negligible. This means that when this is done automatically in the background (w
        
        Re: (Score:2)
        
        by tepples ( 727027 ) writes:
        
        ...Copying from RAM to harddrive (aka "saving") takes time. As computers get better, the latter effect becomes negligible.
        Continuous autosave is possible with current technology, but it requires wasting battery power on spinning a hard drive's platter at all times while the user continues to edit the document. I agree that it's an implementation issue, but the underlying technical reason for the implementation issue is still present in 2012 technology. I don't see the distinction between fast temporary storage and large nonvolatile storage "becom[ing] negligible" until large SSDs and cellular data become a lot cheaper. In add
        
        Re: (Score:2)
        
        by am 2k ( 217885 ) writes:
        
        Continuous autosave is possible with current technology, but it requires wasting battery power on spinning a hard drive's platter at all times while the user continues to edit the document. I agree that it's an implementation issue, but the underlying technical reason for the implementation issue is still present in 2012 technology. I don't see the distinction between fast temporary storage and large nonvolatile storage "becom[ing] negligible" until large SSDs and cellular data become a lot cheaper. In addition, one ordinarily doesn't want to create a new numbered revision of the document in a revision control system after each keypress; there has to be some way to mark one's changes as suitable for being viewed by other editors of the document, not unlike the SQL keyword COMMIT.
        Yes, you shouldn't save after every single keypress, but a timer for saving every minute or so (if there are any changes) should suffice. Committing for others to see is a different thing, that's something a user can be expected to understand.
        Ultimately, for revert/versions there should be a timeline slider like there was in Google Wave, where you can go back to your document's state of any point in the past.
        btw, affordable SSDs are already large enough for everyday use. My notebook has a 256GB SSD in it, a
        
        Re: (Score:2)
        
        by tepples ( 727027 ) writes:
        
        Committing for others to see is a different thing, that's something a user can be expected to understand.
        Back to my original question: If not a floppy disk, what icon should be used for this action of committing an edited document to the part of the file system viewable by other users and applications?
        btw, affordable SSDs are already large enough for everyday use.
        Not when "everyday use" includes storing a large collection of purchased music and purchased movies.
        I didn't have to sell my car for [a 256 GB SSD].
        But you did have to pay more than one would for the stock hard drive that comes bundled with a low-end laptop. Google Product Search shows 256 GB SSD in the $300-$400 range. Until the ultrabook market matures, auto
        
        Re: (Score:3)
        
        by DragonWriter ( 970822 ) writes:
        
        Back to my original question: If not a floppy disk, what icon should be used for this action of committing an edited document to the part of the file system viewable by other users and applications?
        The generic flowchart datastore symbol with an inbound arrow (retrieving something previously committed would use the same symbol with an outbound arrow.)
        For products with less technical audiences, a stone tablet with an etching instrument, since committing results in the data being "carved in stone".
        
        Re: (Score:2)
        
        by pjt33 ( 739471 ) writes:
        
        But you did have to pay more than one would for the stock hard drive that comes bundled with a low-end laptop.
        You could remove "the stock hard drive that comes bundled with" from that sentence and it would still be true.
        
        Re: (Score:2)
        
        by tepples ( 727027 ) writes:
        
        The generic flowchart datastore symbol with an inbound arrow
        Thank you. I had forgotten about the flowchart symbols because nowadays none of them appear see popular use except an oval for module entry and exit, a box for a step, and a diamond for a decision.
        
        Re: (Score:2)
        
        by jbengt ( 874751 ) writes:
        
        If the user never saved it, then where is it when the user needs it later? Auto-saved, OK, but where and under what name? There still needs to be a save option, and an icon, even if outdated, is useful for that.
        
        Re: (Score:2)
        
        by am 2k ( 217885 ) writes:
        
        If the user never saved it, then where is it when the user needs it later? Auto-saved, OK, but where and under what name? There still needs to be a save option, and an icon, even if outdated, is useful for that.
        Saved to an internal directory, and will be opened as an untitled document the next time you open the application.
        
        Re: (Score:2)
        
        by tlhIngan ( 30335 ) writes:
        
        What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?
        Why should the user be bothered with it? There aren't many real-life instances where a user creates and it isn't "autosaved".
        It's one of the things that OS X Lion is doing - it's asking "why do we still do this?". Lion-aware apps automatically autosave in the background, and have a time-machine like feature that lets them view their document as it existed in the past. If they
        
        Disclosure, drive space, and spinning up (Score:3)
        
        by tepples ( 727027 ) writes:
        
        If they write a brilliant paragraph a day ago, then deleted it in the morning, they can view the document as it existed yesterday, copy the paragraph back out, and be done with it.
        For one thing, an application that saves (and sends) a document's undo history along with the document can disclose things that the document's author did not want to disclose. I seem to vaguely remember scandals with Word's AutoRecover being used to recover redacted parts of a document. For another, how much of the limited space on the drive should be dedicated to saving a document's undo history since creation, especially when the document is a large layered picture or multitrack audio project?
        And that's because people forget to save - why not have the OS do it for them?
        I agree, but
        
        Re: (Score:2)
        
        by am 2k ( 217885 ) writes:
        
        Yes, however I don't think that many users know what an internal hard drive looks like... So using this as an icon for saving is not a solution either. USB sticks and external drives vary too wildly in their looks to be recognized at that size.
    - Re: (Score:2)
      
      by Hognoxious ( 631665 ) writes:
      
      What better icon is there for the action of committing an edited document to storage?
      One with the word "Save" on it.
      - "Save" with no icon in a toolbar full of icons (Score:2)
        
        by tepples ( 727027 ) writes:
        
        In a toolbar full of icons, the word "Save" or its localization without an icon will probably look out of place. Is this out-of-placeness somehow superior to the use of a floppy disk icon?
        
        Re: (Score:2)
        
        by Hognoxious ( 631665 ) writes:
        
        Yes, because none of the [working] machines here has a floppy drive and nobody under the age of twenty has ever even seen one except in a museum, you smug wanker.
        
        U+0057 U+2693 = wanker (Score:2)
        
        by tepples ( 727027 ) writes:
        
        We already have a two-character icon for wanker: Latin capital letter W (U+0057), followed by Anchor (U+2693).
  - Re: (Score:2)
    
    by JDG1980 ( 2438906 ) writes:
    
    Oh, come on. Everyone who uses computers even casually knows that the floppy-disk icon means "Save." That it no longer reflects the underlying hardware is irrelevant.
  - Re: (Score:2)
    
    by GreatBunzinni ( 642500 ) writes:
    
    Here, a punch card glyph. Not quite what I expected but still...
    http://www.fileformat.info/info/unicode/char/5361/index.htm [fileformat.info]
    There is also a card index glyph do?
    http://www.fileformat.info/info/unicode/char/1f4c7/index.htm [fileformat.info]
    There might not be a punchcard glyph, but there is a minidisk one:
    http://www.fileformat.info/info/unicode/char/1f4bd/index.htm [fileformat.info]
    and an optical disk one:
    http://www.fileformat.info/info/unicode/char/1f4bf/index.htm [fileformat.info]
    and a DVD one:
    http://www.fileformat.info/info/unicode/char/1f4c0/index.htm [fileformat.info]
    I cannot
    - Re: (Score:3)
      
      by snowgirl ( 978879 ) writes:
      
      They have 14 planes of ~65,536 characters... even after including massive syllabaries, and the unified CJK ideographs, they still had really only used the first plane. Now they're presented with only using about 7% of the space available, and so they started chucking just about every pictograph that they could possibly come up with into it...
      I'm sorry, but while I'm down for having every script that is actually used, and every script that has been decoded, I don't see why we should have all of these pictogr
      - Re: (Score:2)
        
        by Actually, I do RTFA ( 1058596 ) writes:
        
        They have 14 planes of ~65,536 characters
        I thought unicode was unlimited? The coding methods might each have a limit, but the standard is unlimited.
        
        Re: (Score:2)
        
        by unixisc ( 2429386 ) writes:
        
        If it is a 16 bit standard, how can it be unlimited? It can support at the most 2^16, or 65,536 characters. Where does it get planes from?
        
        Re: (Score:2)
        
        by snowgirl ( 978879 ) writes:
        
        If it is a 16 bit standard, how can it be unlimited? It can support at the most 2^16, or 65,536 characters. Where does it get planes from?
        UTF-16 is NOT a naive 16-bit encoding, and has a set of surrogate pairs that allow one to construct codepoints of up to 20-bits in a UTF-16 stream. Subtract out the 16-bits per plane, and you're left with 4-bits, which is 16.
        I misquoted 14 in my post, the Unicode standard only defines 14 planes, and 2 private use areas.
        
        Re: (Score:2)
        
        by Actually, I do RTFA ( 1058596 ) writes:
        
        If it is a 16 bit standard, how can it be unlimited?
        
        If it were a 16-bit standard, it couldn't be unlimited. But it's not. In two ways. First, Unicode is simply a number->meaning table, and doesn't specify actual in memory format. There are a lot of competing standards for that. Second, UTF-16 has 1.1 M values. UTF-32 has 4B. UTF-8 has a 2B or a 1.1M limit depending on the version.
        
        Re: (Score:2)
        
        by snowgirl ( 978879 ) writes:
        
        They have 14 planes of ~65,536 characters
        I thought unicode was unlimited? The coding methods might each have a limit, but the standard is unlimited.
        The limit is mostly purely arbitrary as newer encodings allow for much more expanded coding sequences. However, due to the way UTF-16 encodes values above UTF+0xFFFF it is limited to expressing at most a 20-bit codepoint, meaning that the Unicode standard is basically limited practically to 16 pages of 65536 values. So, short of breaking changes to the UTF-16 standards you're basically SOL.
      - Re: (Score:2)
        
        by amorsen ( 7485 ) writes:
        
        I'm sorry, but while I'm down for having every script that is actually used, and every script that has been decoded, I don't see why we should have all of these pictographs
        If they are or were in use in real programs, it sucks to not have them in the standard. Unicode started out as a quite political project (e.g. Han Unification) but it has become much more pragmatic over time.
        We need the emoji and the other junk in the standard so that we are able to use Unicode as a credible archiving format.
    - Re: (Score:2)
      
      by X0563511 ( 793323 ) writes:
      
      The first one you link is a Chinese symbol. Looks totally valid to me.
      Remember, Chinese has symbols for entire words or ideas, it is not "alphabetical" like most other popular languages.
      - Re: (Score:2)
        
        by GreatBunzinni ( 642500 ) writes:
        
        Yes, it is. I don't question that character. The others, on the other hand, are a bit silly though.
        
        Re: (Score:2)
        
        by X0563511 ( 793323 ) writes:
        
        Agreed. Myself, I think it would be better to just reserve the space for future use, giving us plenty of expansion room without having to increase the word size (utf8 to utf16 to utf32) - instead of just filling the section up with nonsense.
- And where's Tengwar? (Score:2)
  
  by Xtifr ( 1323 ) writes:
  
  They've got symbols for a love hotel, a horse [fileformat.info], and a steaming pile of poo [fileformat.info], along with emoticons, and they still haven't accepted the Tengwar [evertype.com] draft that's been around since '93? Where are these people's priorities!?
- Re: (Score:2)
  
  by Xest ( 935314 ) writes:
  
  I had no idea but was intrigued to find out myself, and stumbled upon this, which presumably explains it:
  http://www.developerfusion.com/news/91207/unicode-6-out-with-2000-new-characters-but-what-support-does-it-have/ [developerfusion.com]
  I knew the Japanese would be involved somewhere!
- Re: (Score:2)
  
  by neonsignal ( 890658 ) writes:
  
  The "love hotel" symbol is part of the Emoji set. These are a semi-standardized set of emoticons that had widespread use in Japan. It was Google that proposed their inclusion in Unicode. http://sites.google.com/site/unicodesymbols/Home/emoji-symbols [google.com]
Why Slashdot won't adopt it (Score:5, Informative)

by tepples ( 727027 ) writes: <tepples@nosPAM.gmail.com> on Wednesday February 01, 2012 @12:27PM (#38892197) Homepage Journal

Before anyone chimes in complaining that Slashdot doesn't even support an old version of Unicode, this is for several reasons. For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text. For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.

Share
twitter facebook
- Re:Why Slashdot won't adopt it (Score:5, Insightful)
  
  by BetterThanCaesar ( 625636 ) writes: on Wednesday February 01, 2012 @12:43PM (#38892387)
  
  Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.
  I'd love to be able to write IPA when discussing pronunciation, or actually write out words in other languages, ohm character for discussing electronics, pound and yen signs for currency ... Hey, even a bigger whitelist than what we have now would be great!
  
  Parent Share
  twitter facebook
  - Checking for the release of a new version (Score:2)
    
    by tepples ( 727027 ) writes:
    
    Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.
    🙋 If I were writing such a parser, I don't know how I'd get it to automatically check for the release of a new version of the standard and determine which code points are new bidi characters to be popped.
    I'd love to be able to write IPA when discussing pronunciation
    It'd be nice but not necessary: X-SAMPA.
    or actually write out words in other languages
    I guess the rationale is that most moderators would not be able to read foreign words without transliteration into Latin characters.
    pound and yen signs for currency
    £ is Alt+0163 on a Windows machine, and ¥ is Alt+0165. They're probably Ctrl+Shift+U A 3 Enter and Ctrl+Shift+U A
    - Re:Checking for the release of a new version (Score:5, Funny)
      
      by Canazza ( 1428553 ) writes: on Wednesday February 01, 2012 @01:16PM (#38892833)
      
      £ is Shift+3, what are you on about?
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by hackertourist ( 2202674 ) writes:
        
        Only on UK keyboard layouts.
    - Re: (Score:2)
      
      by pjt33 ( 739471 ) writes:
      
      I guess the rationale is that most moderators would not be able to read foreign words without transliteration into Latin characters.
      So at least give us Latin-1. There are English words which use accents in high registers.
      - Re: (Score:2)
        
        by tepples ( 727027 ) writes:
        
        What high registers are you talking about that use accents other than what one can already get by typing Alt+0233 to get é and similar?
  - Re: (Score:2)
    
    by bill_mcgonigle ( 4333 ) * writes:
    
    Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.
    Um, so do it and submit a patch against Slashcode?
  - - Re: (Score:2)
      
      by unixisc ( 2429386 ) writes:
      
      Do raw HTML 4 symbols show up, if one explicitly typed them? Such as &#923 or &Lambda? That would help quite a bit - one is not likely to usually see morons use them to make pornographic ASCII or Unicode art.
- Re: (Score:2)
  
  by Kjella ( 173770 ) writes:
  
  Just admit that it's because it's old and random, there's a few HTML entities working but there's no reason why æ = æ should would and μ = shouldn't - like in micrograms, or uTorrent. It's a geeky site, but it's made for writing English prose with some half-hearted Latin1 support, no math or science.
  - Re: (Score:2)
    
    by X0563511 ( 793323 ) writes:
    
    Here's the reason: æ = 0xE6 (or 0xC6 for capitol) in extended ASCII, where Mu is not present in extended ASCII. It appears slashdot dumps anything outside of that range.
    Lets try an experiment:
    0xAB and 0xBB:
    0xA7 and 0xB6:
    - Re: (Score:2)
      
      by X0563511 ( 793323 ) writes:
      
      False! Only a subset is allowed, but anything outside of it most definitly seems to fail.
- Re: (Score:2)
  
  by Fastolfe ( 1470 ) writes:
  
  There are technical solutions to these problems, such as tracking language/BIDI overrides when embedding strings provided by users (and reversing the effect afterward). You could also do it the "easy" way and just filter out characters based on their Unicode property (e.g. disallow all 'other' characters, which would include these formatting characters).
- Re: (Score:3)
  
  by Hentes ( 2461350 ) writes:
  
  For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text.
  If ASCII can be used for trolling just the same than there is little point in not implementing Unicode. The point of moderation is to prevent these issues.
  For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.
  That's because of a buggy/unsecure implementation. It doesn't mean it can't be done right.
- - The next version of the standard (Score:2)
    
    by tepples ( 727027 ) writes:
    
    Trolls gonna troll; that's what moderation is for.
    At one point, ASCII art spammers were filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis, so fast that moderators could not keep up.
    So filter those character ranges.
    Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.
    - Re:The next version of the standard (Score:5, Funny)
      
      by StuartHankins ( 1020819 ) writes: on Wednesday February 01, 2012 @01:25PM (#38892987)
      
      ...filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis...
      Yeah, the way they are going they might actually *have* these characters in the set now...
      
      Parent Share
      twitter facebook
    - Re: (Score:2)
      
      by afabbro ( 33948 ) writes:
      
      Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.
      That would lead to the Slashdot "editors" having to maintain their code, and we can't have that.
    - Re: (Score:2)
      
      by Dahan ( 130247 ) writes:
      
      Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.
      It's not difficult to update a simple file/DB entry/whatever to add more characters to the blacklist. Include a little util to parse the UnicodeData file and automatically blacklist all control characters. But even if you wanted to go with a whitelist instead of a blacklist, there's no reason for the whitelist to be as small as it currently is. And then there's what I assume is a Slashcode bug where non-ASCII characters that are in the whitelist don't come through properly. I've seen numerous posts where a
    - Re: (Score:2)
      
      by Hognoxious ( 631665 ) writes:
      
      At one point, ASCII art spammers were filling pages with sexually explicit ASCII art, such as Goatse, male masturbation, and birds perched on a penis, so fast that moderators could not keep up.
      They can do that with or without unicode, so how does blocking unicode help?
      Blacklisting doesn't work because the next version of the standard, such as Unicode 6.1, may introduce more undesirable character ranges.
      How often do new versions come out? We aren't talking about Firefox here.
- - Hundreds of iframes (Score:2)
    
    by tepples ( 727027 ) writes:
    
    Unicode has different *pages*. You can filter by page.
    
    New versions of Unicode introduce new pages. If you're blocking a page for some reason, the next version of Unicode might introduce another page that extends the functionality of the old page, reintroducing the behavior that led you to block the old page.
    What's stopping us from just creating a Greasemonkey script that translates back and forth from HTML with square brackets and allows the full HTML set
    Slashdot's lameness filter would probably confuse those square brackets with ASCII art, and even if not, the comment would likely draw negative moderations from moderators who haven't installed the Greasemonkey script.
    by putting every message in its own e.g. IFRAME
    There was a time when hundreds of <i
    - Re: (Score:2)
      
      by Jesus_666 ( 702802 ) writes:
      
      Why not use a reasonable whitelist? It's unlikely that a new version of Unicode would turn a printable character into a bidi control character and printable JIS characters are not automatically evil, especially not if the lameness filter treats them as non-letters.
      
      As for "people could spam ASCII art": People could also flood Slashdot with bizarre textual porn copypasta. The key part of "posting ASCII art faster than the mods can cope" is "faster than the mods can cope", not "ASCII art".
      
      It is fairly weir
    - Re: (Score:2)
      
      by JDG1980 ( 2438906 ) writes:
      
      New versions of Unicode introduce new pages. If you're blocking a page for some reason, the next version of Unicode might introduce another page that extends the functionality of the old page, reintroducing the behavior that led you to block the old page.
      So use a whitelist instead of a blacklist for pages.
    - Re: (Score:2)
      
      by amorsen ( 7485 ) writes:
      
      Until April 2014, when IE 6 passes out of extended support, one can't assume that all supported browsers support CSS max-width.
      Who the fuck cares whether Slashdot renders on IE6?!
      Although to be fair, it does seem like that is the only browser that Slashdot does care about. All the others probably spend more time supporting Slashdot than Slashdot spends supporting them.
- - Re: (Score:2)
    
    by X0563511 ( 793323 ) writes:
    
    Looks like extended-ASCII, not necessarily UTF/UCS. For example, 0xE9: é
emoticons? (Score:4, Insightful)

by pz ( 113803 ) writes: on Wednesday February 01, 2012 @12:50PM (#38892481) Journal

Seriously, emoticons? Who ever thought it a good idea to include those in a standard? Should we have an encoding for hearts as dots over lower case i as well? And little horseys, too? And y with a big tail that wraps around to the front of the word?

Share
twitter facebook
- Re:emoticons? (Score:4, Informative)
  
  by snowgirl ( 978879 ) writes: on Wednesday February 01, 2012 @01:06PM (#38892701) Journal
  
  And little horseys, too?
  U+1F40E ... no, seriously...
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by GreatBunzinni ( 642500 ) writes:
    
    The U+1f4af character is a bit harder to explain than little horses, because it relies on a 4-octet code character to express something which can be easily expressed by using 3 1-octed characters.
- Smile emoticon at CP437 code 0x01 (Score:2)
  
  by tepples ( 727027 ) writes:
  
  Seriously, emoticons? Who ever thought it a good idea to include those in a standard?
  
  Unicode had to be able to round-trip (losslessly encode and decode) all old popular encodings. This includes encoding now called "code page 437", introduced with the first IBM PC, which includes a smile emoticon at code value 0x01. It also includes the encodings associated with the widely distributed system fonts Zapf Dingbats and Wingdings.
- Re: (Score:2)
  
  by gutnor ( 872759 ) writes:
  
  Unicode encode old characters of a dead languages only a few professor will ever use, that makes a lot less sense than emoticons, character that are actually used daily by lots of people.
- Re:emoticons? (Score:4, Funny)
  
  by Hentes ( 2461350 ) writes: on Wednesday February 01, 2012 @04:07PM (#38895211)
  
  The next thing will be teenagers building bigger emoticons out of emoticon characters. Then they will have to be included in the standard as well, and so on...
  
  Parent Share
  twitter facebook
- - Tetris, Chess, Baseball, and gang symbols (Score:5, Informative)
    
    by tepples ( 727027 ) writes: <tepples@nosPAM.gmail.com> on Wednesday February 01, 2012 @01:30PM (#38893073) Homepage Journal
    
    all the Tetris pieces
    The polyominoes up to five squares can be composed from U+2580 (upper half block), U+2584 (lower half block), and 2588 (full block) characters. Unicode tends not to introduce precomposed ligatures except when needed for round-tripping with pre-Unicode encodings.
    glyphs of game pieces of all well known games
    A lot of well-known pre-1923 tabletop games' game pieces already exist in Unicode. Chess is U+2654 through U+265F, and Checkers is U+26C0 through U+26C3. A lot of game pieces are simple enough in form that the Geometric Shapes (U+25A0 through U+25FF) represent them just fine. For example, Othello is U+25CB and U+25CF, as is Connect Four. Even the enemy in Fast Eddie for Atari 2600 is in Miscellaneous Technical (U+237E) as is home plate in Baseball (U+2302).
    heck, instead of just the suit symbols why not 52 glyphs for a standard deck of cards
    Those can already be composed from a Basic Latin letter or number and a suit symbol. Unicode tends not to introduce precomposed ligatures except when needed for round-tripping with pre-Unicode encodings.
    throw the Major Arcana tarot cards in there too
    I don't know about Tarot, but all twelve signs of the zodiac are in Miscellaneous Symbols, even the "69" looking sign of Cancer (U+264B).
    gang symbols
    The symbol of "Folk Nation" gangs is similar to that of Judaism: a Star of David (U+2721). The symbol of "People Nation" gangs is similar to that of Islam: a 5-point star and crescent (U+262A).
    
    Parent Share
    twitter facebook
  - - Re: (Score:2)
      
      by unixisc ( 2429386 ) writes:
      
      How about Mahjong? ;)
It needed to be flexible, so it's a VM now. (Score:2, Offtopic)

by VortexCortex ( 1117377 ) writes:

"It needed to be flexible, so it's a VM now."
I fear this is the next step. The right to left and line wrapping BS is complicated enough that I'd welcome a specialized VM with loadable bytecode & glyph data. Yes, from a security standpoint this could create a wider attack surface. However, I'd argue it would be less attack surface considering that the VM for my unlimited precision scientific & programming calculator is smaller than my UTF-8 text display implementation.
I'd also argue that it woul
I don't know... (Score:2)

by frank_adrian314159 ( 469671 ) writes:

I'm sure we could have found some way to get along without "Mathematical Rising Diagonal" and "Kissing Face".
- Re:Stick to ASCII (Score:5, Funny)
  
  by cc1984_ ( 1096355 ) writes: on Wednesday February 01, 2012 @12:29PM (#38892227)
  
  Yeah but can you write a pile of poo in ASCII?
  http://www.fileformat.info/info/unicode/char/1f4a9/index.htm [fileformat.info]
  
  Parent Share
  twitter facebook
  - Re:Stick to ASCII (Score:4, Funny)
    
    by metamatic ( 202216 ) writes: on Wednesday February 01, 2012 @03:31PM (#38894727) Homepage Journal
    
    This is Slashdot, I'm sure you can find any number of examples of people who've written a pile of poo in ASCII.
    
    Parent Share
    twitter facebook
  - Re: (Score:3)
    
    by Xtifr ( 1323 ) writes:
    
    Yeah but can you write a pile of poo in ASCII?
    As far as I know, Windows was originally written in ASCII... :)
  - Re: (Score:2)
    
    by rishistar ( 662278 ) writes:
    
    Something wrong with the Java code for this though Character.getNumericValue() is documented as returning -1 for this character, when quite clearly it should be a number 2.
- Re: (Score:2)
  
  by unixisc ( 2429386 ) writes:
  
  Yeah, it's fantastic that Cyrillic or Katanaga or Devanagiri scripts can be so beautifully supported in ASCII. Speaking of which, does HTML5 have a complete character list for unicode, or is it still restricted to ASCII?
  - Re: (Score:3)
    
    by petermgreen ( 876956 ) writes:
    
    I'm pretty sure in HTML5 like in HTML4 the document is considered to be made up of unicode characters and other charsets are considered as encodings of unicode. Of course the HTML5 spec doesn't include all unicode characters explicitly that would be insane.
    - Re: (Score:2)
      
      by unixisc ( 2429386 ) writes:
      
      But that defeats the purpose of Unicode, doesn't it? I'm not expecting that HTML5 support, for instance, Wingdings, but if someone, for whatever reason, in an English document needs to type a foreign character outside ASCII, such as a word in Cyrillic, or Mandarin, and can't, what's the good of making the spec Unicode, as opposed to ASCII compliant? I'd just want all the characters in all languages to be supported, but things like card symbols, or emoticons are okay not to support.
      - Re: (Score:2)
        
        by neonsignal ( 890658 ) writes:
        
        The character entities in HTML are only to try to get around legacy encodings. And since you can specify numerical Unicode entities, all of the Unicode set is accessible, there is no need for explicit names for everything.
        If you aren't constrained to legacy encodings, then the obvious approach is just to set the encoding to something sensible, for example UTF8. There are several ways to do this in HTML. http://www.w3.org/TR/html5-diff/#character-encoding [w3.org]
      - Re: (Score:2)
        
        by petermgreen ( 876956 ) writes:
        
        Specifying the "document character set" as unicode means that even if the charset you are writing your document in doesn't support the character you want you can still enter it as a numeric (or named if one is defined) entity, whether it will be displayed is mostly a matter of whether appropriate fonts are installed but generally i'd expect someone who writes Chinese to have Chinese fonts installed.
        Generally it's the GUI system's job to handle input and output of text not an individual application. Is it re
- I blame Star Trek & LotR. (Score:2)
  
  by Hognoxious ( 631665 ) writes:
  
  Well said, that man. If you feel the desire to "write" with stick figures and squiggles use a bastarding graphic, for fuck's sake.
  Eklinóringëon my arse.
- - Re:Stick to ASCII (Score:4, Informative)
    
    by Pieroxy ( 222434 ) writes: on Wednesday February 01, 2012 @02:44PM (#38894115) Homepage
    
    ASCII is just 128 characters.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by jrumney ( 197329 ) writes:
      
      95
- - - Re: (Score:3)
      
      by icebraining ( 1313345 ) writes:
      
      They're only "easy" if you have your system configured for ISO-8859-1. Those of us who use UTF-8 get this result: Ã Ã©.
      - Re: (Score:2)
        
        by marcosdumay ( 620877 ) writes:
        
        Hey, so it is the /.'s web server that doesn't do encoding right? I always tought it was the GCI code.
        WTF are they using to serve those pages?
- Re: (Score:2)
  
  by BSAtHome ( 455370 ) writes:
  
  The correct sequence for business, politics and everything is of now:
  #1F648 #1F649 #1F64A
  Gotta love the effort that went into providing the proper symbols.
- - Re: (Score:3, Insightful)
    
    by piripiri ( 1476949 ) writes:
    
    Yes, lolcats are a standard now.
    - Re:Zomg (Score:5, Funny)
      
      by fuzzyfuzzyfungus ( 1223518 ) writes: on Wednesday February 01, 2012 @01:09PM (#38892737) Journal
      
      I believe you mean to say that lolcats are in ur standardz, occupyin ur code-points; but not necessarily prescribing ur particular choice of glyph...
      
      Parent Share
      twitter facebook
- Re: (Score:3)
  
  by DragonWriter ( 970822 ) writes:
  
  Standardise the world on English. It'll be easier. It's already the second-most-spoken language, and Chinese is a real nightmare of character encoding in itsself. Then we can go back to good old ASCII.
  ASCII leaves off a lot of English punctuation, and accents that are, in fact, used in English (sure, in words of foreign origin, but they are still used.)
  - Re: (Score:3)
    
    by snowgirl ( 978879 ) writes:
    
    ASCII leaves off a lot of English punctuation, and accents that are, in fact, used in English (sure, in words of foreign origin, but they are still used.)
    Some that aren't foreign as well. "Coöperate" is an archaic spelling. Basically, any prefix that ends in "o" that is attached to a word that starts with an "o" can archaically be spelled with a diaeresis, in the French/Dutch method of "this vowel should be pronounced separately, and not as part of a diphthong".
  - Re: (Score:2)
    
    by SuricouRaven ( 1897204 ) writes:
    
    Drop the accents, people will know what you mean... and in a long enough period of time, only historians will care.
- - Re: (Score:3)
    
    by snowgirl ( 978879 ) writes:
    
    English also has the second-worst spelling system on the planet (only outdone by Japanese).
    ??? WTF are _YOU_ on about? English does not have the worst spelling system on the planet, and Japanese certainly doesn't qualify as the worst. "But they have three different scripts: two syllabaries, and an ideographic set" but...
    Look, perhaps I better just demonstrate to you what a real bad spelling system looks like; go look at Irish [wikipedia.org].
    - Re: (Score:2)
      
      by shutdown -p now ( 807394 ) writes:
      
      ??? WTF are _YOU_ on about?
      
      Can you concisely explain why the English word "psyche" is pronounced the way it is to a non-native speaker of the language?
      - Re: (Score:2)
        
        by pjt33 ( 739471 ) writes:
        
        It's a loan-word from Greek. It follows the basic English rules for borrowing Greek words.
        
        Re: (Score:2)
        
        by shutdown -p now ( 807394 ) writes:
        
        The rules for regular English words are no better, to be honest. It's like someone was trying to come up with the most perverted way to make a letter represent something as different as possible from what it does in most European languages (and Latin, where it originates). The only language that's possibly worse in that regard is French, but at least they are consistent in the way they mutilate their phonemes (and most of it is just dropping them altogether), whereas in English you have to guess which of th
      - Re: (Score:2)
        
        by snowgirl ( 978879 ) writes:
        
        ??? WTF are _YOU_ on about?
        Can you concisely explain why the English word "psyche" is pronounced the way it is to a non-native speaker of the language?
        The word being originally from Greek and pronounced /psyxe/ was transliterated and taken into English. English phonology does not allow for a word to start with /ps/, and so the rules change that to a /s/. English phonology does not allow for a /x/, and so the rules change that to a "k". English phonology does not allow for a word to end with /e/, and so the rules change that to either a /ej/ or an /i/, but more more commonly /i/ (e.g. Japanese "sake" is typically pronounced /saki/). All that is left is the
      - Re: (Score:2)
        
        by shutdown -p now ( 807394 ) writes:
        
        Can you explain why any word in french is pronounced the way it is? It seems like they have different rules for what letters to pronounce for every word.
        You know, "better than French" is not a great achievement. Indeed, one of the reasons why English is in such a sorry shape is because it absorbed an unhealthy dose of French poison as part of its history.
        Anyway, the rule of thumb in French seems to be, if you don't know how to pronounce any given letter, just skip it altogether - >50% chance of you getting it right in that case. ~
        Anyway the reason you pronounce psyche like that is because it sounds better than psitsh.
        Technically, it should be /psixe/, which sounds reasonable to me.
- Re:Obligatory XKCD (Score:5, Insightful)
  
  by marcosdumay ( 620877 ) writes: <marcosdumay@nOspam.gmail.com> on Wednesday February 01, 2012 @08:55PM (#38898371) Homepage Journal
  
  You know that this is the exact situation that Unicode AVOIDED, doesn't you?
  Now we have one standard with 3 different representation. Those replaced literaly thousands of standards. Yep, sometimes doing that new standard works.
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

27cb appearing in HTML in 5.4.3.2.1... (Score:3)

Re: (Score:2)

Re: (Score:3)

Favourite unicode character (Score:3, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Disclosure, drive space, and spinning up (Score:3)

Re: (Score:2)

Re: (Score:2)

"Save" with no icon in a toolbar full of icons (Score:2)

Re: (Score:2)

U+0057 U+2693 = wanker (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

And where's Tengwar? (Score:2)

Re: (Score:2)

Re: (Score:2)

Why Slashdot won't adopt it (Score:5, Informative)

Re:Why Slashdot won't adopt it (Score:5, Insightful)

Checking for the release of a new version (Score:2)

Re:Checking for the release of a new version (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

The next version of the standard (Score:2)

Re:The next version of the standard (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Hundreds of iframes (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

emoticons? (Score:4, Insightful)

Re:emoticons? (Score:4, Informative)

Re: (Score:2)

Smile emoticon at CP437 code 0x01 (Score:2)

Re: (Score:2)

Re:emoticons? (Score:4, Funny)

Tetris, Chess, Baseball, and gang symbols (Score:5, Informative)

Re: (Score:2)

It needed to be flexible, so it's a VM now. (Score:2, Offtopic)

I don't know... (Score:2)

Re:Stick to ASCII (Score:5, Funny)

Re:Stick to ASCII (Score:4, Funny)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)