Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Upgrades Technology

Unicode 6.1 Released 170

An anonymous reader writes "The latest version of the Unicode standard (v. 6.1.0) was officially released January 31. The latest version includes 732 new characters, including seven brand new scripts. It also adds support for distinguishing emoji-style and text-style symbols and emoticons with variation selectors, updates to the line-breaking algorithm to more accurately reflect Japanese and Hebrew texts, and updates other algorithms and technical notes to reflect new characters and newly documented text behaviors."
This discussion has been archived. No new comments can be posted.

Unicode 6.1 Released

Comments Filter:
  • by vlm ( 69642 ) on Wednesday February 01, 2012 @11:26AM (#38892187)

    Take a good look at glyph 27cb aka \diagup part of the Misc Math Symbols. People are gonna try embedding that in html now. Can't wait.

  • by Cocodude ( 693069 ) on Wednesday February 01, 2012 @11:26AM (#38892189) Homepage

    has got to be the Love Hotel [fileformat.info].

    Does anyone know why this is even there?

    • by vlm ( 69642 )

      As if http://www.fileformat.info/info/unicode/char/1f4be/index.htm [fileformat.info] makes sense to anyone under age 30. I demand the addition of a punchcard glyph...

      • by tepples ( 727027 )
        What better icon is there for the action of committing an edited document to storage?
        • by am 2k ( 217885 )

          The "don't bother me with those implementation details"-icon?

          • by tepples ( 727027 )
            What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?
            • by am 2k ( 217885 )

              What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?

              The location where the data ist stored (RAM vs. harddrive). There are some effects that play against each other here:

              • For editing, the data has to be in RAM (at least the part that's edited at the moment).
              • When the data is in RAM, but not on the disk, the state is lost after a crash or sudden power loss. This is undesirable.
              • Copying from RAM to harddrive (aka "saving") takes time.

              As computers get better, the latter effect becomes negligible. This means that when this is done automatically in the background (w

              • by tepples ( 727027 )

                ...Copying from RAM to harddrive (aka "saving") takes time. As computers get better, the latter effect becomes negligible.

                Continuous autosave is possible with current technology, but it requires wasting battery power on spinning a hard drive's platter at all times while the user continues to edit the document. I agree that it's an implementation issue, but the underlying technical reason for the implementation issue is still present in 2012 technology. I don't see the distinction between fast temporary storage and large nonvolatile storage "becom[ing] negligible" until large SSDs and cellular data become a lot cheaper. In add

                • by am 2k ( 217885 )

                  Continuous autosave is possible with current technology, but it requires wasting battery power on spinning a hard drive's platter at all times while the user continues to edit the document. I agree that it's an implementation issue, but the underlying technical reason for the implementation issue is still present in 2012 technology. I don't see the distinction between fast temporary storage and large nonvolatile storage "becom[ing] negligible" until large SSDs and cellular data become a lot cheaper. In addition, one ordinarily doesn't want to create a new numbered revision of the document in a revision control system after each keypress; there has to be some way to mark one's changes as suitable for being viewed by other editors of the document, not unlike the SQL keyword COMMIT.

                  Yes, you shouldn't save after every single keypress, but a timer for saving every minute or so (if there are any changes) should suffice. Committing for others to see is a different thing, that's something a user can be expected to understand.

                  Ultimately, for revert/versions there should be a timeline slider like there was in Google Wave, where you can go back to your document's state of any point in the past.

                  btw, affordable SSDs are already large enough for everyday use. My notebook has a 256GB SSD in it, a

                  • by tepples ( 727027 )

                    Committing for others to see is a different thing, that's something a user can be expected to understand.

                    Back to my original question: If not a floppy disk, what icon should be used for this action of committing an edited document to the part of the file system viewable by other users and applications?

                    btw, affordable SSDs are already large enough for everyday use.

                    Not when "everyday use" includes storing a large collection of purchased music and purchased movies.

                    I didn't have to sell my car for [a 256 GB SSD].

                    But you did have to pay more than one would for the stock hard drive that comes bundled with a low-end laptop. Google Product Search shows 256 GB SSD in the $300-$400 range. Until the ultrabook market matures, auto

                    • Back to my original question: If not a floppy disk, what icon should be used for this action of committing an edited document to the part of the file system viewable by other users and applications?

                      The generic flowchart datastore symbol with an inbound arrow (retrieving something previously committed would use the same symbol with an outbound arrow.)

                      For products with less technical audiences, a stone tablet with an etching instrument, since committing results in the data being "carved in stone".

                    • by pjt33 ( 739471 )

                      But you did have to pay more than one would for the stock hard drive that comes bundled with a low-end laptop.

                      You could remove "the stock hard drive that comes bundled with" from that sentence and it would still be true.

                    • by tepples ( 727027 )

                      The generic flowchart datastore symbol with an inbound arrow

                      Thank you. I had forgotten about the flowchart symbols because nowadays none of them appear see popular use except an oval for module entry and exit, a box for a step, and a diamond for a decision.

              • by jbengt ( 874751 )
                If the user never saved it, then where is it when the user needs it later? Auto-saved, OK, but where and under what name? There still needs to be a save option, and an icon, even if outdated, is useful for that.
                • by am 2k ( 217885 )

                  If the user never saved it, then where is it when the user needs it later? Auto-saved, OK, but where and under what name? There still needs to be a save option, and an icon, even if outdated, is useful for that.

                  Saved to an internal directory, and will be opened as an untitled document the next time you open the application.

            • by tlhIngan ( 30335 )

              What exactly did you mean by this statement? What are you calling an implementation detail with which the user shouldn't be bothered?

              Why should the user be bothered with it? There aren't many real-life instances where a user creates and it isn't "autosaved".

              It's one of the things that OS X Lion is doing - it's asking "why do we still do this?". Lion-aware apps automatically autosave in the background, and have a time-machine like feature that lets them view their document as it existed in the past. If they

              • If they write a brilliant paragraph a day ago, then deleted it in the morning, they can view the document as it existed yesterday, copy the paragraph back out, and be done with it.

                For one thing, an application that saves (and sends) a document's undo history along with the document can disclose things that the document's author did not want to disclose. I seem to vaguely remember scandals with Word's AutoRecover being used to recover redacted parts of a document. For another, how much of the limited space on the drive should be dedicated to saving a document's undo history since creation, especially when the document is a large layered picture or multitrack audio project?

                And that's because people forget to save - why not have the OS do it for them?

                I agree, but

        • What better icon is there for the action of committing an edited document to storage?

          One with the word "Save" on it.

      • Oh, come on. Everyone who uses computers even casually knows that the floppy-disk icon means "Save." That it no longer reflects the underlying hardware is irrelevant.
      • Here, a punch card glyph. Not quite what I expected but still...
        http://www.fileformat.info/info/unicode/char/5361/index.htm [fileformat.info]

        There is also a card index glyph do?
        http://www.fileformat.info/info/unicode/char/1f4c7/index.htm [fileformat.info]

        There might not be a punchcard glyph, but there is a minidisk one:
        http://www.fileformat.info/info/unicode/char/1f4bd/index.htm [fileformat.info]

        and an optical disk one:
        http://www.fileformat.info/info/unicode/char/1f4bf/index.htm [fileformat.info]

        and a DVD one:
        http://www.fileformat.info/info/unicode/char/1f4c0/index.htm [fileformat.info]

        I cannot

        • They have 14 planes of ~65,536 characters... even after including massive syllabaries, and the unified CJK ideographs, they still had really only used the first plane. Now they're presented with only using about 7% of the space available, and so they started chucking just about every pictograph that they could possibly come up with into it...

          I'm sorry, but while I'm down for having every script that is actually used, and every script that has been decoded, I don't see why we should have all of these pictogr

          • They have 14 planes of ~65,536 characters

            I thought unicode was unlimited? The coding methods might each have a limit, but the standard is unlimited.

            • If it is a 16 bit standard, how can it be unlimited? It can support at the most 2^16, or 65,536 characters. Where does it get planes from?
              • If it is a 16 bit standard, how can it be unlimited? It can support at the most 2^16, or 65,536 characters. Where does it get planes from?

                UTF-16 is NOT a naive 16-bit encoding, and has a set of surrogate pairs that allow one to construct codepoints of up to 20-bits in a UTF-16 stream. Subtract out the 16-bits per plane, and you're left with 4-bits, which is 16.

                I misquoted 14 in my post, the Unicode standard only defines 14 planes, and 2 private use areas.

              • If it is a 16 bit standard, how can it be unlimited?

                If it were a 16-bit standard, it couldn't be unlimited. But it's not. In two ways. First, Unicode is simply a number->meaning table, and doesn't specify actual in memory format. There are a lot of competing standards for that. Second, UTF-16 has 1.1 M values. UTF-32 has 4B. UTF-8 has a 2B or a 1.1M limit depending on the version.

            • They have 14 planes of ~65,536 characters

              I thought unicode was unlimited? The coding methods might each have a limit, but the standard is unlimited.

              The limit is mostly purely arbitrary as newer encodings allow for much more expanded coding sequences. However, due to the way UTF-16 encodes values above UTF+0xFFFF it is limited to expressing at most a 20-bit codepoint, meaning that the Unicode standard is basically limited practically to 16 pages of 65536 values. So, short of breaking changes to the UTF-16 standards you're basically SOL.

          • by amorsen ( 7485 )

            I'm sorry, but while I'm down for having every script that is actually used, and every script that has been decoded, I don't see why we should have all of these pictographs

            If they are or were in use in real programs, it sucks to not have them in the standard. Unicode started out as a quite political project (e.g. Han Unification) but it has become much more pragmatic over time.

            We need the emoji and the other junk in the standard so that we are able to use Unicode as a credible archiving format.

        • The first one you link is a Chinese symbol. Looks totally valid to me.

          Remember, Chinese has symbols for entire words or ideas, it is not "alphabetical" like most other popular languages.

          • Yes, it is. I don't question that character. The others, on the other hand, are a bit silly though.

            • Agreed. Myself, I think it would be better to just reserve the space for future use, giving us plenty of expansion room without having to increase the word size (utf8 to utf16 to utf32) - instead of just filling the section up with nonsense.

    • They've got symbols for a love hotel, a horse [fileformat.info], and a steaming pile of poo [fileformat.info], along with emoticons, and they still haven't accepted the Tengwar [evertype.com] draft that's been around since '93? Where are these people's priorities!?

    • by Xest ( 935314 )

      I had no idea but was intrigued to find out myself, and stumbled upon this, which presumably explains it:

      http://www.developerfusion.com/news/91207/unicode-6-out-with-2000-new-characters-but-what-support-does-it-have/ [developerfusion.com]

      I knew the Japanese would be involved somewhere!

    • The "love hotel" symbol is part of the Emoji set. These are a semi-standardized set of emoticons that had widespread use in Japan. It was Google that proposed their inclusion in Unicode. http://sites.google.com/site/unicodesymbols/Home/emoji-symbols [google.com]

  • by tepples ( 727027 ) <.tepples. .at. .gmail.com.> on Wednesday February 01, 2012 @11:27AM (#38892197) Homepage Journal
    Before anyone chimes in complaining that Slashdot doesn't even support an old version of Unicode, this is for several reasons. For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text. For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.
    • by BetterThanCaesar ( 625636 ) on Wednesday February 01, 2012 @11:43AM (#38892387)

      Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.

      I'd love to be able to write IPA when discussing pronunciation, or actually write out words in other languages, ohm character for discussing electronics, pound and yen signs for currency ... Hey, even a bigger whitelist than what we have now would be great!

      • Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.

        &#x1F64B; If I were writing such a parser, I don't know how I'd get it to automatically check for the release of a new version of the standard and determine which code points are new bidi characters to be popped.

        I'd love to be able to write IPA when discussing pronunciation

        It'd be nice but not necessary: X-SAMPA.

        or actually write out words in other languages

        I guess the rationale is that most moderators would not be able to read foreign words without transliteration into Latin characters.

        pound and yen signs for currency

        £ is Alt+0163 on a Windows machine, and ¥ is Alt+0165. They're probably Ctrl+Shift+U A 3 Enter and Ctrl+Shift+U A

      • Raise your hand if you couldn't code a parser that detects those characters and takes appropriate action, such as popping bidi characters.

        Um, so do it and submit a patch against Slashcode?

    • by Kjella ( 173770 )

      Just admit that it's because it's old and random, there's a few HTML entities working but there's no reason why &aelig; = æ should would and &mu; = shouldn't - like in micrograms, or uTorrent. It's a geeky site, but it's made for writing English prose with some half-hearted Latin1 support, no math or science.

      • Here's the reason: æ = 0xE6 (or 0xC6 for capitol) in extended ASCII, where Mu is not present in extended ASCII. It appears slashdot dumps anything outside of that range.

        Lets try an experiment:
        0xAB and 0xBB:
        0xA7 and 0xB6:

    • by Fastolfe ( 1470 )

      There are technical solutions to these problems, such as tracking language/BIDI overrides when embedding strings provided by users (and reversing the effect afterward). You could also do it the "easy" way and just filter out characters based on their Unicode property (e.g. disallow all 'other' characters, which would include these formatting characters).

    • by Hentes ( 2461350 )

      For one thing, there was once a fad of posting pornographic ASCII art on Slashdot, so it appears Slashdot disallows any character that would be more useful for glyph art than for English text.

      If ASCII can be used for trolling just the same than there is little point in not implementing Unicode. The point of moderation is to prevent these issues.

      For another, there was once a fad of using bidirectionality override control characters for turning text backwards, which would break the layout and allow spoofing a comment's moderation score.

      That's because of a buggy/unsecure implementation. It doesn't mean it can't be done right.

  • emoticons? (Score:4, Insightful)

    by pz ( 113803 ) on Wednesday February 01, 2012 @11:50AM (#38892481) Journal

    Seriously, emoticons? Who ever thought it a good idea to include those in a standard? Should we have an encoding for hearts as dots over lower case i as well? And little horseys, too? And y with a big tail that wraps around to the front of the word?

    • Re:emoticons? (Score:4, Informative)

      by snowgirl ( 978879 ) on Wednesday February 01, 2012 @12:06PM (#38892701) Journal

      And little horseys, too?

      U+1F40E ... no, seriously...

      • The U+1f4af character is a bit harder to explain than little horses, because it relies on a 4-octet code character to express something which can be easily expressed by using 3 1-octed characters.

    • Seriously, emoticons? Who ever thought it a good idea to include those in a standard?

      Unicode had to be able to round-trip (losslessly encode and decode) all old popular encodings. This includes encoding now called "code page 437", introduced with the first IBM PC, which includes a smile emoticon at code value 0x01. It also includes the encodings associated with the widely distributed system fonts Zapf Dingbats and Wingdings.

    • by gutnor ( 872759 )
      Unicode encode old characters of a dead languages only a few professor will ever use, that makes a lot less sense than emoticons, character that are actually used daily by lots of people.
    • by Hentes ( 2461350 ) on Wednesday February 01, 2012 @03:07PM (#38895211)

      The next thing will be teenagers building bigger emoticons out of emoticon characters. Then they will have to be included in the standard as well, and so on...

  • "It needed to be flexible, so it's a VM now."

    I fear this is the next step. The right to left and line wrapping BS is complicated enough that I'd welcome a specialized VM with loadable bytecode & glyph data. Yes, from a security standpoint this could create a wider attack surface. However, I'd argue it would be less attack surface considering that the VM for my unlimited precision scientific & programming calculator is smaller than my UTF-8 text display implementation.

    I'd also argue that it woul

  • I'm sure we could have found some way to get along without "Mathematical Rising Diagonal" and "Kissing Face".

Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!

Working...