Forgot your password?
typodupeerror
Google Microsoft The Internet Yahoo!

Schema.org — Google, Microsoft and Yahoo! Agree On Markup Vocabulary 192

Posted by Soulskill
from the executive-decision dept.
aabelro writes "Google, Microsoft and Yahoo! have decided to propose a common markup vocabulary, Schema.org, based on the Microdata format, simplifying the job of webmasters who want to give meaning to their web pages' content." Manu Sporny, chair of the W3C group that created RDFa, added his (personal) dissenting opinion about Schema, calling it a 'false choice,' and saying, "The entire Web community should decide which features should be supported – not just Microsoft or Google or Yahoo."
This discussion has been archived. No new comments can be posted.

Schema.org — Google, Microsoft and Yahoo! Agree On Markup Vocabulary

Comments Filter:
  • by SwedishChef (69313) <craig@@@networkessentials...net> on Monday June 06, 2011 @08:34PM (#36357836) Homepage Journal

    Microsoft will break this one, too.

    • Re:Not to worry... (Score:5, Insightful)

      by Anonymous Coward on Monday June 06, 2011 @09:00PM (#36358046)

      The proposal is itself breaking html. This time, Google and Yahoo are in with the "extending". The vague promise of better search positions will drive web developers to completely muck up their html output. There is no reason not to re-use the Dublin Core [wikipedia.org].

      • by Homburg (213427)

        So they're breaking HTML by following the HTML5 specification [w3.org]?

        • Microdata is not part of the HTML5 specification. Right at the top of your linked document it says:

          Status: Controversial Working Draft. ISSUE-76 (Microdata/RDFa) blocks progress to Last Call

          and then if you click on the issue link, you see:

          There will be a forthcoming HTML5+RDFa proposal that may either be published along-side the Microdata specification or in place of the Microdata specification. RDFa is a alternate technology that is currently published as a Recommendation via the W3C . An additional alternative that is being proposed is the removal of Microdata and RDFa from the HTML5 specification and the placement of each section into a separate specification that is implemented on top of the HTML5 standard.

          In addition, the charter for the HTML WG mentions:

          "The HTML WG is encouraged to provide a mechanism to permit independently developed vocabularies such as Internationalization Tag Set (ITS), Ruby, and RDFa to be mixed into HTML documents. Whether this occurs through the extensibility mechanism of XML, whether it is also allowed in the classic HTML serialization, and whether it uses the DTD and Schema modularization techniques, is for the HTML WG to determine."

          The current microdata section precludes this Charter requirement.

      • HTML5 itself started off with WhatWG documenting the "breaking" browser specific extensions to HTML by various browsers... XmlHttpRequest is based on a non-standard MS active-x control. All of HTML itself is a series of non-compliant extensions later ratified... at least this time there are three disparate third parties behind it. If google, ms and yahoo are for it, it's probably not a bad thing... besides they've already been using this metadata for a while. though I think meta-* attributes to counterpa
      • To be fair, whatwg's "HTML is a living standard" bullshit was the first successful attempt at breaking HTML, and is in essence the exact same thing that these companies are proposing. So, it appears that everyone is trying to break up HTML to their own benefit.

  • by John Hasler (414242) on Monday June 06, 2011 @08:36PM (#36357856) Homepage

    Right. You've got to include Facebook.

  • by jader3rd (2222716) on Monday June 06, 2011 @08:51PM (#36357976)
    One of the reasons why Google was able to tromp AltaVista was that AltaVista's search was based completley on the MetaData tag of the html page, and Google ignored the MetaData tag. The reason why? Website administrator were putting false information into the MetaData tag in hopes of generating more web crawler search hits. Google decided to go off of what was actually being presetned on the page, and we all found that to be more useful.
    • by Ruke (857276)
      I guess it remains to be seen whether content sites will actually implement this, or whether it'll just be another tool in the black-hat SEO bag. I can see how this may be useful on, say, Wikipedia, which is content-dense and could be rapidly renovated; however, I kind of get the feeling that Wikipedia doesn't really need the help getting to the top of Google's results list.
      • I don't know about this specific format, but for example e-commerce companies have been annotating their pages with semantic tags. Best Buy, for example, has annotated a huge amount of data with the Good Relations ontology.

        And I don't really see how could this be abused, except for the boost that Google gives to any semantically tagged pages - but that effect should wear off as most sites implement them too.

      • I guess it remains to be seen whether content sites will actually implement this, or whether it'll just be another tool in the black-hat SEO bag. I can see how this may be useful on, say, Wikipedia, which is content-dense and could be rapidly renovated; however, I kind of get the feeling that Wikipedia doesn't really need the help getting to the top of Google's results list.

        The problem with the concept of a semantic lies at the feet of human nature.

        The people who would benefit the most are not those th
      • It's not meant for hidden data but to mark a name as a name. People have been using microdata (or microformats) for awhile. It's not really taken off because it needed to be standardised as it is now. If there was any benefit to abusing the format it has already existed. I don't see any real benefit in relation to SEO other than things like indexing addresses and names easier so search for a person may be easier.
    • "More is better, except for hidden text" - I think this is the key difference between this and meta tags - the emphasis is on adding markup to text/content you provide to the user, in a way that makes it more quantifiable to search engines. Metatags weren't visable to the end user, and didn't particular concern specific content, but rather pages as a whole. I mean, that isn't to say that this system won't be scammed, but it does at least have a different focus of providing context for extant data, not additional data from which to help create a context.
      • It used to be very common for pages to have a few KB of point size 1, white-on-white text at the end, containing popular search terms. This didn't affect Google, because Google's algorithm placed more emphasis on the pages that linked to you than on the content of your page. The problem with this, is that it's also subject to gaming (lots of sites that do nothing but link to pages with popular search terms). Google had an early advantage because everyone was exploiting flaws in their competitors' algorit
        • Sure, but if that became a problem then I'd imagine that a) It would show up to the user whenever Google, Bing or Yahoo presented a "RIch search result" from a page trying to game the system, and b) The search engines could look at the size of the text by analyzing the markup, and demote pages with a whole bunch of these tags surrounding really tiny text - bonus marks for detecting low contrast or unreadable text by analyzing the markup.

          There is also no suggestion that this markup will be the primary factor

          • Sure, but if that became a problem then I'd imagine that a) It would show up to the user whenever Google, Bing or Yahoo presented a "RIch search result" from a page trying to game the system,

            Not if it isn't near the top of the page.

            and b) The search engines could look at the size of the text by analyzing the markup, and demote pages with a whole bunch of these tags surrounding really tiny text - bonus marks for detecting low contrast or unreadable text by analyzing the markup.

            Well, that was how it was done in the mid-'90s. These days, you'd put the text in a SEO div that you would then remove via JavaScript. For extra points, you could remove it in response to a mouse move event or similar, so that even if the search engine's crawler used a JavaScript engine it would still see the fake text.

            There is also no suggestion that this markup will be the primary factor in ranking search results.

            The problem with that, is that metadata is only useful if things make use of it. If search engines are going to use it, that makes it worth a lot o

    • This isn't meant to replace the page's content, just to annotate it (point out the semantic structure). So that the page consumer can understand that "6/10" means a rating or that "John Smith" is a person's name.

    • Thats exactly what i was thinking. It just makes more sense that search results should be based off of what is actually on the page, not what the developer whats you to think is on the page. Another problem I have are things like this (taken from the documentation on schema.org)

      <time itemprop="startDate" datetime="2011-05-08T19:30">May 8, 7:30pm</time>

      Is that really necessary? Is it that hard to parse that string into a valid timestamp? The only reason I can think of would be if someone want

      • In Phoenix ... <time datetime="2011-05-18T02:00:00Z">Next Friday at 7PM</time>

        Might be a better example... also, it allows for a easier client-side reformatting in JS.
      • by Eivind (15695)

        Yes it is, generally speaking, hard to parse strings into date and time.

        What date or time does "Tomorrow", "Next friday", "7/5/09", "6 o 'clock", or "directly after lunch" refer to ? Keep in mind that the document may not be current - and make sure to take into account different time-zones and different conventions for dates. (in particular, some odd countries like to print dates as M/D/Y where the least-significant part is in the *middle*)

        May 8, 7:30pm is better than average - but you're still left with th

        • and most of the world outside the US uses 31/12/1990, which is fine until you get to 1/2/1992

          • by Eivind (15695)

            Yeah. Though arguably even 31/12/1990 is suboptimal, it -does- have the advantage of not having the least-significant-part in the *middle*, but it still, for example, sorts wrong regardless of if you sort numerically or alphabetically.

            Generally, the most significant part should be -first-

            1990-05-30 is superior for this reason, it sorts correctly both numerically and alphabetically, and follows the general convention we have of having the most significant part come FIRST.

            URLs suffer from the same problem, wi

            • Let's not dredge up the bang path wars.

              And I'm not holding my breath for the world to switch to year first dates. US will go metric first.


      • <time itemprop="startDate" datetime="2011-05-08T19:30">May 8, 7:30pm</time>
        <time itemprop="startDate" datetime="2011-05-08T19:30">Mai 8, 19:30</time>
        <time itemprop="startDate" datetime="2011-05-08T19:30">Mei 8, 19:30</time>

        <time itemprop="startDate" datetime="2011-07-08T19:30">July 8, 7:30pm</time>
        <time itemprop="startDate" datetime="2011-07-08T19:30">Jul 8, 7:30pm</time>
        <time itemprop="startDate" datetime="2011-07-08T19:30">Juillet 8, 19:30

      • by PybusJ (30549)

        Really. Do you have a reliable parser for common date formats that works in all languages and scripts? If so I'd be keen for a reference.

        Or is this a comment by someone with the typical ASCII-is-all-we-need view of the world?

      • by Chelloveck (14643)

        Yes, it is that hard to parse a timestamp. People are inconsistent. You might see "7:00" (unadorned), "7:00a", "7:00 AM", "7AM", "07:00", and "7 o'clock" all in the same document. And dates are nearly impossible. When is 1/2/11? January 2nd or February 1st? 2011 or 1911?

        Of course, I don't think this proposal is going to make things any better. If the display data was generated directly from the metadata, maybe. But as soon as someone touches it by hand you're going to see the two get out of sync. I

        • by SEE (7681)

          When is 1/2/11? January 2nd or February 1st? 2011 or 1911?

          Silly. It's clearly February 11th, the year 1. Or 1901. Or 2001.

    • Website administrator were putting false information into the MetaData tag in hopes of generating more web crawler search hits. Google decided to go off of what was actually being presetned on the page, and we all found that to be more useful.

      Yes, we found that to be more useful - until website administrators learned to put false information into what is actually being presented on the page.

    • Microdata format isn't meant to be meta data in the sense of html metadata tags. It's for marking names as a first name and a surname. It allows you to pull an address for example out of a site straight into your email client. I'm sure someone may try to abuse it for better search rankings but I don't think it would work that well.
  • Dammit (Score:5, Interesting)

    by Sloppy (14984) on Monday June 06, 2011 @09:16PM (#36358184) Homepage Journal

    I am a whore and have to do whatever the big guys say, because I want their traffic. Ok, so I admit it.

    But dammit, did it have to be microdata? I already mark up with microformat classes and RDFa (both the sortof standardized namespaces and Google's) and Google was handling it pretty well, and every once in a while it looked like Yahoo grokked it too. Microdata was the ugly stepchild third choice, the least well-supported one, with the fewest number of parsers out there in the wild.. So I left that one out, because nobody cared. Now it's going to be The One?

    I have better things to do than add Yet Another fucking attribute to my generated HTML which is already bloated with otherwise unnecessary classes and properties and typeofs. Now I'm going to have itemscope and itemtype attributes too, huh? Just how many characters long can we make each element become, just so that everything can make sense of it? Fuck you guys. No seriously, fuck you. Yes, I'm going to do it anyway, but even so, fuck you.

    • I was really expecting for RDFa to win the competition, it had already a decent user base and it's much more flexible and useful.

    • That there is what you call a "compromise candidate" - the one everyone objects to the least. Surprising though that Google, Microsoft & Yahoo got together on something like this outside the context of an industry group like W3C.

      As to your better things to do, go ahead and do them - I assume you're a working web developer, so this really can be viewed as a revenue-generating opportunity. Think of it as a chance to tack an extra "SEO structuring" charge on top. If you're not doing them, I know I will!

    • Do you really have better things to do?
      • by Sloppy (14984)

        Ouch! Damn, that's cold.

      • by omfgnosis (963606)
        Even masturbating furiously over the standard we wish was adopted is a better thing to do than implement the standard we wish wasn't adopted.
    • by msclrhd (1211086)

      I'm ignoring everything except RDFa on my site. I took the decision of dropping the HTML5 markup for HTML+RDFa and getting the pages validating properly (still using CSS3, though).

      It would be great if Google had support for DOAP (Definition of a Project) for open source projects and read that through RDFa.

    • Good comments - thanks. You sound like someone who could help me: what's the difference between microformats and microdata? I thought the two were synonymous until this recent announcement and now lots of people are talking about them as if they are different. I've been googling but the conversation still seems pretty confused. What I think I make out is that microdata is a specific implementation of using microformats, designed to handle many use-cases that RDFa handles for the web?

      Any help would be apprec

  • I say 'be careful with Microsoft' because if my memory serves me well, Microsoft had some agreement with now defunct SUN Microsystems over Java and its use...that was until SUN realized that Microsoft had a hidden agenda [internetnews.com].

    Nothing will prevent Microsoft from attempting to pull off what I will call a 'SUN moment.'

  • FTTOS: [schema.org]

    Terms of service

    This is a contract between you and each of the sponsors of Schema.org: Google, Inc., Yahoo, Inc., and Microsoft Corporation (referred to collectively in this agreement as the "Sponsors", "we" or "us"). By using the Schema.org website (the "Website") you agree to be bound by the following terms and conditions (the "Terms of Service").

    Changes in Website and Terms and Conditions; Change in Schema

    We may modify or terminate the Website, for any reason, and without notice. We also reserve the right to modify these Terms of Service from time to time without notice, and you expressly agree to be bound by such modifications when posted on the Website.

    This legalese basically says: By using the schema.org website, (esp. their schemas) you agree to whatever we want forever. THE END.

    Even Facebook's horrid TOS agreement is better for you than this, at least you can terminate Facebook's agreement.

    I for one rebel against our Gigantic Corporate Lawyer-wielding privacy-and-competition-hating overlords. If I can't get past the TOS page, I'll just stick to RDFa. Just added "0.0.0.0 schema.org" to my hosts file just in case I get link-baited into agre

    • Re:It's a Trap! (Score:5, Interesting)

      by Raenex (947668) on Monday June 06, 2011 @10:37PM (#36358736)

      You're right, it is a trap, but it gets worse:

      The short summary: The "Sponsors" (read: cartel) may have patents on this crap. You can, for now, use the crap royalty free for markup only if you follow the standard. Non-cartel search engines are not granted such rights. In addition, future versions may not be royalty free. Your existing markup is safe, but any new versions or pages won't be.

      The actual fine print:

      In addition, if the Sponsors have patent claims that are necessarily infringed by including markup of structured data in a webpage, where the markup is based on and strictly complies with the Schema, they grant an option to receive a license under reasonable and non-discriminatory terms without royalty, solely for the purpose of including markup of structured data in a webpage, where the markup is based on and strictly complies with the Schema. [..] Notwithstanding the foregoing, the Sponsors agree that no change that we make to these Terms of Service will terminate or modify the license granted under paragraph 1 above with respect to any use or implementation of the Schema occurring prior to the date that the change is published.

  • Can someone explain to me why there is a need for a separate metadata vocabulary?
    Wasn't this the issue that XML, XSD, XSLT and XSLT-FO supposed to address? Document verbiage aside, don't these families adequately cover the issue of structure, and semantics?
    If the issue is to teach the browser/search engine, the document semantics -- can't they (MS,Yahoo,Google) actually parse XML for common dictionary words and build semantics themselves? Why make humans do all the tedious annotations? They can probably p
    • by Homburg (213427)

      XML, XSD, XSLT and XSLT-FO

      Which of those have anything to do with semantics?

      • XML, XSD, XSLT and XSLT-FO

        Which of those have anything to do with semantics?

        True, more to do with structure than semantics, but usually semantics can be derived from structure, if the structure is meaningful.

        For example: <Thing> <Place> <Volcano></Volcano></Place></Thing> . . .


        A meaningful structure in XML can itself lead to semantics. XSLT, XSL-FO can then just transform it to whatever flavor.

        What I am trying to understand here is that -- why do we have to micro-annotate everything? Can't search engines/browsers do

  • I hate to say this but when you have Bing, Google, and Yahoo saying that if I clean the dishes, use microdata, I can get laid in a search engine sense, I'm sorry the dishes will be cleaned. I don't know if this is live but let's be honest, they have us by our proverbial search engine balls. The days of free and fair elections, I mean fair SEO are as dead as Rep. Weiner's political career. Do you realize that he never had sex with any of those women. I'm not sure about you but if I'm going to ruin my car
    • My wife said I couldn't use the c work.
      • This is why I shouldn't post at 12:43 am. My wife said I couldn't us the c word. Although I think it is fair to say the c word does require a good deal of work.
  • Semantic Web....

    I hope the companies would just put their efforts in creating a semantic web, instead of trying to hack-patch html by adding random meta-data for the purpose of search. Seriously.. focus people!

    Focus!

  • Is if amazon, walmart and ebay decided to come up with a common tagging system for shopping searches.

    This does not change how web page are displayed only how they are tagged for search engines.
    Having the top 3 search engines in the world come to a common agreement is not a bad thing.
    These three search engines have represented 95%+ of the search engine market for the last 5 years.

    When 95% of the market decides on a common standard that is THE standard regardless of any hand waving
    by the "de jure" standards b

    • Hear hear. I don't have to like it but I'm going to have to live with it. And if FB hops onto this bandwagon, it's totally finito.

  • You are not the "entire web community". You seem to have not realised this last time, when everyone implemented the WHATWG's HTML standard instead of your XHTML 2.0 pet project. Please get relevant or get bent.

  • HTML and CSS are used as a presentation markup language. Adding "meaning" to those is approaching it backward. First, mark up the document / data, using XML or RDF (for this argument the preference doesn't necessarily matter). Then, use XSLT to provide a transform into the presentation language of your choice. I guess we could argue that XHTML _is_ XML at its core, though, so adding attributes to add "meaning" might be doing what I said, anyway. As far as 'why are the big companies agreeing to this,
  • by mrthoughtful (466814) on Tuesday June 07, 2011 @01:40PM (#36364678) Journal

    Manu Sporny, [...] (said), "The entire Web community should decide which features should be supported – not just Microsoft or Google or Yahoo."

    So just who is the entire Web community? It certainly isn't W3C, who effectively bar individuals and SME's with their $8000 annual membership fees.
    The corporations are only interested in establishing or brokering leverage.
    The IETF isn't the easiest means of establishing support for a feature, and not many of us have read all 6000 odd RFCs anyhow.

    So, basically, who cares what schema org says, or Manu Sporny for that matter?
    Since when has anyone been able to make a change to the status quo?

There are worse things in life than death. Have you ever spent an evening with an insurance salesman? -- Woody Allen

Working...