Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

The Need For A Tagging Standard

Posted by Hemos on Mon Jan 15, 2007 09:13 AM
from the tagging-joy dept.
John Carmichael writes "Tags are everywhere now. Not just blogs, but famous news sites, corporate press bulletins, forums, and even Slashdot. That's why it's such a shame that they're rendered almost entirely useless by the lack of a tagging standard with which tags from various sites and tag aggregators like Technorati and Del.icio.us can compare and relate tags to one another. Depending on where you go and who you ask, tags are implemented differently, and even defined in their own unique way. Even more importantly, tags were meant to be universal and compatible: a medium of sharing and conveying info across the blogosphere — the very embodiment of a semantic web. Unfortunately, they're not. Far from it, tags create more discord and confusion than they do minimize it. I have to say, it would be nice to just learn one way of tagging content and using it everywhere.""
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • Don't agree (Score:5, Insightful)

    by pubjames (468013) on Monday January 15 2007, @09:17AM (#17612984)
    Isn't the power of tags that you can tag stuff however you want? To me a standard for tagging would be a negative thing.

    I don't thing the problem is a standard for tagging, the problem is having a standard for sharing tags between applications. But that's another problem and it doesn't need to be solved to implement tagging itself.
    • Some ideas for tag standards:

      <yes>
      <no>
      <maybe>
      <haha>
      <evil>
      <spam>
      <cow boyneal>
      <firstpost>
    • Re:Don't agree (Score:5, Interesting)

      by Anonymous Coward on Monday January 15 2007, @09:40AM (#17613236)
      Er, guys?

      Tags are keywords.

      There's a keyword line up in the header that isn't being used for much these days.

      If you want to tag your document in a machine-readable way, put the tags in the keyword field. Problem solved.
      • MOD PARENT UP (Score:4, Insightful)

        by metamatic (202216) on Monday January 15 2007, @11:17AM (#17614576) Homepage Journal
        We've got a standard for keywords in HTML documents. There's no problem there.

        The only issue is what to do when there are multiple sub-documents on a single page, like if Slashdot allowed individual replies to be tagged.
    • Re:Don't agree (Score:5, Insightful)

      by lousehr (584682) on Monday January 15 2007, @09:42AM (#17613276)
      Your analysis of "the problem" is exactly the point of TFA. The stated concern is not that the content of the tags has no standard, but that the format of the tags has no standard. If a single tag contains multiple words, should the words be separated by spaces or underscores, or should we use StudlyCaps?
      • Re: (Score:3, Interesting)

        Case is also an issue. Some sites only allow lowercase tags while others don't care about case.

        This is similar to the problem blogging sites have with cross site scripting. Try to tell a blogger you won't take HTML or bbcode posts (depending on generation of the blogger). Regardless of what you do, there's going to be sites that don't follow the rules and there will also be ways to screw it up for everybody.

        There isn't a standard for many things on the internet which causes validation to be near impossib
      • Re:Don't agree (Score:5, Insightful)

        by radtea (464814) on Monday January 15 2007, @11:28AM (#17614744)
        The stated concern is not that the content of the tags has no standard, but that the format of the tags has no standard.

        The medium, as Marshall Maclluhan said, is the message. As soon as you standardize the format of the tags you will restrict the kind of information people can convey with them. That may be an acceptable limitation to you, but not to others, and they will find workarounds that effectively break the standard.

        For example, if tags were standardized on underscores to separate words you would have to forbid spaces and caps to enforce that standard. And then we would have no way of distinguishing between Polish and polish, which would be bad if you were looking for things to do with Eastern European culture or furniture care products. People would then start doing things like expressing capitalization by some other syntactical hack which would be inconsistently applied and a greater mess would ensue.

        Alternatively, tags could be represented as more complex markup:
        <tag>
        <word order="1">really</word>
        <word order="2">stupid</word>
        </tag>

        But because words and concepts have no general one-to-one correspondence (many words do not convey a unique concept or a concept at all, and many concepts cannot be conveyed in one word) this would be inadequate, and in any case even if the content model of the "word" tag forbade spaces, caps and underscores, people would still create tags that looked like:

        <tag><word>reallystupid</word></tag>

        The basic idea of "semantic markup" is wrong. From the summary:

        the very embodiment of a semantic web. Unfortunately, they're not. Far from it, tags create more discord and confusion than they do minimize it. I have to say, it would be nice to just learn one way of tagging content and using it everywhere.

        Actually, tags as they stand are the very embodiment of the semantic web. The only function of the semantic web is to create confusion and discord, because confusion and discord is the essence of the human epistemological condition. And the call for "one way of doing X" has a nice religious ring to it, history shows that attempts to standardize things relating to human thought are very much misguided.
  • Automatic tagging (Score:5, Insightful)

    by drcoppersmith (1048722) on Monday January 15 2007, @09:19AM (#17613002) Homepage Journal
    I'm inclined to disagree that 'tags' are the answer here. I wrote my masters thesis on a method automatically generating semantic webs from plaintext. It's a huge problem with about a dozen different stages, but I had backing in all of my research from the psycholinguistics and computer-science field.

    Herein lies the rub: You're never going to get everyone to agree on a set of appropriate tags. Even if you do, you'll never have them uniformly applied (well I find that humorous but you have it tagged as inappropriate).

    There are other solutions here, such as automatic semantic generation. Hey, I never said it was an easy solution, but it's one that I'm certain can be accomplished. Flame away ;-)
    • Re:Automatic tagging (Score:5, Informative)

      by mangu (126918) on Monday January 15 2007, @09:28AM (#17613106)
      I wrote my masters thesis on a method automatically generating semantic webs from plaintext.


      In the end, this could be said to be one of the central problems in AI. Basically, this is dimensionality reduction. People have been trying to do this manually for a long time. The Encyclopaedia Britannica's Propaedica is an example of a tentative semantic web for all human knowledge, but it's so inefficient that it's of very little use by a human, not to mention by automatic mechanisms.


      You're never going to get everyone to agree on a set of appropriate tags ... There are other solutions here, such as automatic semantic generation


      I believe it could be done if it were an automatically generated tag set. If it could be proven mathematically optimal in a certain context, it would be hard for anyone to disagree.

      • Re:Automatic tagging (Score:4, Interesting)

        by drcoppersmith (1048722) on Monday January 15 2007, @09:58AM (#17613476) Homepage Journal
        There are a lot of instances of manual tagging, and I agree with you that they're just too cumbersome (as does almost an entire field of psycholinguists [if you think you can get all of them to agree on anything you're sorely mistaken. They'll disagree just because they can]).

        The automatically generated tags are exactly what I was talking about. I didn't get terribly explicit with my ideas, but you seem to be going in the same direction I was. Getting the software to both tag incoming documents and categorize the semantic webs generated by each is the key to some 'universal' tagging sytem. This way we have maximally efficient tags along with a standardized definition for each and (perhaps most importantly) an automatic way of tagging all the documents to be processed. No room for the "13 year old cheerleader tags" as someone so eloquently put before.

        We still have the problem of naming the 'generic' tag categories generated by the software... The solution for that one is a lot hazier, though important. I don't think anyone will go looking for 'category 12233242' to find 'academic humor'.
    • You're never going to get everyone to agree on a set of appropriate tags.

      Then how come everyone on here has agreed on a handful of standard tags:

      itsatrap
      fud
      haha
      stupid

      ?????

      transporter_ii

    • Re:Automatic tagging (Score:4, Interesting)

      by remmelt (837671) on Monday January 15 2007, @09:36AM (#17613184) Homepage
      Tags are probably very community based, so they would only make sense within that community. (!itsatrap wouldn't work so well on iloveponies.co.ae). That said, why make tags which are meaningless to other communities or have vastly different meanings to other people available as a sorting or searching option? Sure, you could make some pretty mean stats proving any point you'd like (bad grammar in tags up 14.8% from last year! tag "yes" used in 87% of all blogs, world population feeling positive!) but I don't see the point.

      Also, anyone trying to make a serious argument containing the word "blogosphere" should really try and get out more. Come on people, it's not world hunger we're solving here. Viz: http://coolestshop.com/headline-blog.html [coolestshop.com]
    • by maxume (22995) on Monday January 15 2007, @09:38AM (#17613216)
      The article is mostly talking about standardizing the envelope, not the message, which is to say, how do you share/create a two word tag, and how to you specify exactly what is supposed to be described by that tag, and how do you share that in a useful way.

      The fact that someone thinks something is funny and someone else thinks it is inappropriate is useful information to gather, if you get 5000 funny and 5 inappropriate, you have a lot more information than if you have nothing at all, but even in you get 10 and 10 you still have more information, which is probably a good thing.
  • by T-Ranger (10520) <jeffw&chebucto,ns,ca> on Monday January 15 2007, @09:19AM (#17613004) Homepage
    Is not to tag everything like 13 year old cheerleaders.
  • One Key Point (Score:5, Insightful)

    by Azarael (896715) on Monday January 15 2007, @09:20AM (#17613016) Homepage
    How do you standardize something that has not been widely implemented before? It's great to say that it would be good idea to have one standard practice for tagging, but which one? There's no reason to make a huge fuss about this until it a least one clear contender for standardization emerges (which will probably happen on its own).
    • by Qzukk (229616) on Monday January 15 2007, @09:30AM (#17613130) Journal
      Obviously we have to find the lowest common denominator between all the different tagging systems.

      I propose that we standardize the following tags:
      thissux
      omgthisrox
      That should cover 100% of the content in a manner that everyone can relate to.
    • Re:One Key Point (Score:4, Insightful)

      by T-Ranger (10520) <jeffw&chebucto,ns,ca> on Monday January 15 2007, @09:43AM (#17613282) Homepage
      Well, not quite. Reading the blog post the problem lies with two areas: technology and linguistics.

      For technology, as an example, how do you quote things? How do you separate tokens? Do you use StudlyCaps and spaces? "Quoted words", and commas? If the later, what about nested quotes?

      Bullshit question. The question is solved. Use XML. (Yeah, well, it is the web). We don't need Yet Another CSV "standard". Tags may be presented as lists, in spans, or WTF ever. But if you are talking about storage and transmission, then store the tokens separately, and transmit them in an unambiguous format; in 2007, on the web, the solutions are implementation-specific and XML, respectively.

      For linguistics, thats harder. Nouns or verbs? Talk to a librarian, Im sure there are volumes of information on the right way. But I don't care, as I'm still disgusted that the technology problem even exists.

      Right now it seems there is little discussion on the problem. Right now, if implementations are trying to reinvent data encoding schemes either the implementations are totally brain dead (and need a kick in the ass from an outside force), or are completely oblivious to the problems they are encoding into there core features (and thus still need a kick in the ass). This is so bad, its worse then wrong. You have to try to get to the point of being wrong.

      Of course, I don't care because tags are stupid. OTOH, perhaps I would care if they at least were implemented in a potentially useful way.
    • Re: (Score:3, Insightful)

      I can't believe I'm reading this -- it's a sad day for information science (I'm a librarian) when many otherwise knowledgeable, tech-savvy people are blinded by Web-2.0-speak. Let me reiterate another poster's comment:

      Tags are keywords. More specifically, they are subject keywords.

      If you can wrap your head around this idea, then you might realize what the author is talking about is a list of 'standardized' subject headings. You may know this by its common street name: a thesaurus [wikipedia.org] (although some peopl
  • Er, I mean notatrap!

    One big problem is that people can just make them up, then you get the "greifers" who put bogus joke tags all over the place.

    (remember, the opposite of "itsatrap" is "!itsatrap", not "notatrap"!)

    • Um, no. I don't care what the FAQ says; "!itsatrap" is hard to distinguish from "itsatrap." Maybe it works in monospaced code, but not so well in proportional font.

      People who insist on sticking to the fucking rules are the number one problem facing today's society, methinks.
  • by zappepcs (820751) on Monday January 15 2007, @09:21AM (#17613028) Journal
    How to share and categorize information is an ages old problem. One man's trash is another man's treasure, likewise, one man's bread is another man's dietary problem.

    I'm not sure, but haven't we already figured out that tagging would require more tags than the actual information being tagged to accomplish what the original poster was asking for?
  • Why not use an XML standard? If sites used a or similar, then people could put whatever they wanted inside. It would be simple for automated tools and users to find the tags and search against them. One of the most useful things that are similar to tags is the alt field in img. It allows people to search for photos online. Of course, this is open to abuse like anything else, and weren't search engines based on meta tags in the header at one point until people took advantage of them? Still, if tags ar
      • Re: (Score:3, Insightful)

        Words have spaces between them. A tag may have multiple words and be an independent thought. Store it as English demands, with spaces. The Space, NoSpace question is only relevant if you are using an encoding scheme that is broken. Verb/Noun is a different question, but space/nospace, quotes, and BS like that is quickly solved with existing technology.
  • A tagging standard isn't needed. Tags are just keyword that describe something. They're *WORDS* for Christs's sake. Just screen scrape them if you have to. Put them in a database. Read them aloud with a British accent, if you'd like. But if you can't parse plain old words, then I don't think that any kind of "standard" is going to help you.

    In the article, this guy is saying that some tags have spaces in them, and some don't, so that makes it hard. How about "where lcase(tags) like '%vista%'? How har
  • by east coast (590680) on Monday January 15 2007, @09:22AM (#17613050)
    I don't feel that tags have enough significance behind them to merit a standard. I'd be more concerned with truth in journalism first, for my part.
  • Hopeless (Score:5, Insightful)

    by bigmouth_strikes (224629) on Monday January 15 2007, @09:23AM (#17613066) Journal
    Trying to standardize tags in the context of standardizing what they are, is hopeless. It'll be like the Unicode standard; too complex to use in its entirety.

    But to standardize the format of tags and to standardize how to exchange tags between systems, is a great idea.
  • by setirw (854029) on Monday January 15 2007, @09:24AM (#17613072) Homepage
    Which is why I tagged this article with "njkewjdkewd."
    • by KincaidKMF (998395) on Monday January 15 2007, @11:20AM (#17614620)
      How random... I was looking through tagged articles for more information about the "New Journal for Keeping Every Word a Just and Defined Kooky Emphasis While Describing" and popped over here. And all this time I thought tags were working.
  • by elzahir (442873) on Monday January 15 2007, @09:25AM (#17613078) Homepage Journal
    He said "blogosphere." Instantly, I don't care.

    Only thing worse would be something like, I dunno, "tags should be a Web 2.0 standard" or somesuch.

    Excuse me, but "proactive" and "paradigm"? Aren't these just buzzwords that dumb people use to sound important?
  • Hyphens. (Score:3, Funny)

    by caluml (551744) <{slashdot} {at} {spamgoeshere.calum.org}> on Monday January 15 2007, @09:29AM (#17613118) Homepage
    I must say that the Slashdot way of tagging irks me. I think tags should have hyphens between words, much like they do in their "from the the-slow-down dept". Makes it more readable.
    Any-tagging-stuff-I-have-to-write-will-use-hyphens as who knows what analbum is?
    • Re:Hyphens. (Score:5, Insightful)

      by Chryana (708485) on Monday January 15 2007, @11:52AM (#17615132)
      I would add to this that slashdot tags tend to be not very useful.

      Most of the time, the tags have little to do with the actual article (eg. yes, no, maybe, fud, notfud, flamebait). I thought the purpose of tags was to be able to find an article easily later on when it has been archived, and the usefulness of the tags I just mentioned for this purpose is dubious at best. I do not pretend to have a solution to this problem, but I think the situation would be improved if the editors or maybe the /.ers who wrote the top rated comment where the only people allowed to set the tags.
  • tagging (Score:5, Funny)

    by AcidLacedPenguiN (835552) on Monday January 15 2007, @09:34AM (#17613176)
    and here I thought the standard for tagging was for the first person to agree or disagree with the headline, then the next has to immediately disagree with the first person. 5 minutes down the line if no one has added another tag, the third must disgree with BOTH the first and the second poster. Finally, a serious slashdotter will show up to add a relevant tag, followed by the oh so frequent itsatrap and slownewsday tags.
  • Tags are human assigned labels for something that we don't have better meta-data for, or where we don't want to be bothered with formalism. If you want something formal, go use a proper taxonomy/ontology and put bucketloads of OWL or RDF-schema data on your site to define relationships, or use format with well defined semantics to add information. Noone is stopping you, and there are cases where formally defining relationships is worthwhile, such as when you want software agents to be able to infer stuff about the data. But that's not what tagging is used for. Tagging is used for ad-hoc manual classification in situations where it is good enough
  • XSLT for Tags? (Score:3, Interesting)

    by null etc. (524767) on Monday January 15 2007, @09:36AM (#17613194)
    Similar to how XML uses XSLT to transform XML documents from one application to another, it wouldn't be a half-bad idea to have a Tag Transformation Language. Organizations with a lot of market share can define their own tag standards, and then people can optionally specify the transformation between their own local ontologies and the established tag standards. This has the advantage of being participation-driven.
  • Too many chefs, etc. (Score:4, Informative)

    by Pope (17780) on Monday January 15 2007, @09:40AM (#17613244) Homepage
    Tagging, like anything else designed to be helpful, simply won't work if *anything* is allowed. For every person who tags something "correctly" in an effort to do good, how many people will deliberately mis-tag something to produce misleading results?

    Better to get rid of tagging altogether and go back to text searching! :)
  • Argh (Score:3, Funny)

    by eMbry00s (952989) on Monday January 15 2007, @10:10AM (#17613636)
    I think by "blogosphere", you really mean "internet".
  • there is a standard (Score:5, Interesting)

    by Yonder Way (603108) on Monday January 15 2007, @10:17AM (#17613718) Homepage

    There is a standard but nobody uses it these days. Even the search engines disavow it anymore.

    <META name="keywords" content="foo, bar, baz"/>
  • Better hurry... (Score:4, Insightful)

    by supabeast! (84658) on Monday January 15 2007, @10:22AM (#17613792)
    If someone gets started on a tagging standard right now, it might see a little use before the whole silly idea goes out of style next year.
  • by AdamHaun (43173) on Monday January 15 2007, @11:00AM (#17614312)
    The only tags I like are my own. The real use of other people's tags is to show how they organize information, not to help me find something. The problems the article brings up are only the beginning -- the natural tendency of a global tagging system is for the number of tags applied to an object to increase without bound. If I'm doing a master's thesis on, say, web design, I might tag any number of sites "thesis". Is that useful to anyone else? Probably not. But it will interfere with someone who's searching for sites about writing theses.

  • I think for me the moment I realized that the idea of tags needed a little bit of work was the day I saw them on Amazon.com. I was viewing a product there, and it had been tagged "Presents for Jim".
  • by BovineSpirit (247170) on Monday January 15 2007, @01:20PM (#17616452) Homepage
    The rel-tag microformat [microformats.org] is an attempt to standardise tagging. It relies on other microformats to define what it is you are tagging. There isn't a 'photo' microformat at the moment, so you can't do a web-wide search for photos tagged 'fireworks' for example. If you're interested in the semantic web it's worth checking out microformats. You can download a plugin [mozilla.org] for firefox that reads microformats. Go and have a look at Flickr with it, or any other site that implements microformats. If people have tagged something with a 'geo' tag giving long. and lat. then it will bring up a Google Map showing the location. If they've included a 'hCard' around their contact details you can add it to your address book.
    • Re: (Score:3, Insightful)

      Yeah, I agree. I personally feel tags are hyped way beyond their actual worth. I couldn't care less about 'conveying info across the blogosphere', but I'm genuinely interested in organising my own information neater (e.g. my bookamrks).
      Look at gmail, frinstance. Labels replace folders, and a mail can have more than one label. More importantly, they're predefined, and the interface doesn;t really allow you to be prolific with your tagging.
      Compare this with the crappy way del.icio.us allows you to put a bi