Slashdot Log In
Google Launches Google Sitemaps
Posted by
Zonk
on Fri Jun 03, 2005 09:55 AM
from the please-stop-innovating dept.
from the please-stop-innovating dept.
Ninwa writes "Google has launched Google Sitemaps. It seems to be a service that allows webmasters to define how often their sites' content is going to change, to give Google a better idea of what to index. It uses some basic XML as the method of submitting a sitemap. More information on the protocol is available in an FAQ. What's most interesting is that Google is licensing the idea under the Attribution/Share Alike Creative Commons license. According to the Google Blog, this is being done '...so that other search engines can do a better job as well. Eventually we hope this will be supported natively in webservers (e.g. Apache, Lotus Notes, IIS).' They even offer an open source client in Python."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
great interview (Score:5, Informative)
http://blog.searchenginewatch.com/blog/050602-195
More unabashed Google loving... (Score:5, Funny)
I guess the rest of the world has a long way to go to catch up...
Search Engine (Score:3, Funny)
They had white pages
And hits by the score
All the people's queries
Waiting by the door
Ooooh, what a search engine it was
Ooooh, what a search engine it was
Many geeks and hackers
They made up its core
Everybody's dearest
A daily stop for more
Ooooh, what a search engine it was
Ooooh, what a search engine it was
It went to the market
Of the engines it was king
Of his honor and his glory
Slashdot would sing
Ooooh, what a search engine it was
Ooooh, what a search engine it was
A burst had found it
Cool idea (Score:5, Interesting)
For example, a website we launched a couple months ago is primarily images. We played nice - all of the images have legitimate alt tags, and we tried to let the site degrade properly in older browsers (although you really wouldn't get much, in those instances).
But the biggest problem we had was trying to get the site spidered by Google. It would be, and it would appear in the index, but it would be listed far below sites that linked to it. I don't believe Google likes sites that are primarily images. We populated meta tags with descriptions, but they weren't included; we even tried using hidden text - legitimate, hidden text that would serve as the sites description, but not break the design - but you know how Google feels about those sorts of things. We had to walk a fine line. This'll be nicer.
Re:Cool idea (Score:2, Interesting)
Re:Cool idea (Score:4, Informative)
Quite right, a new site can be listed in the Google index pretty quickly -- it only took a few days for my latest site to be found by the Googlebot -- but it takes a while before any PageRank gets assigned to its pages, especially if there are no inbound links to the site. No PageRank, no top listing...
EricCurrently at #1 for adsense tips [google.com]
Parent
Re:Cool idea (Score:5, Informative)
Parent
Re:Cool idea (Score:4, Informative)
It's quite common to be high up for matching terms for about a week, then disappear for three months or so. This seems to be normal behaviour for new sites and is nicknamed the Google sandbox [google.com] and seems to have been confirmed by the patent application recently made public.
The sandbox is just an artificial lowering, so if you're a match for a rare term you can still be found quite easily.
Parent
Re:Cool idea (Score:3, Informative)
This is how the system works. Google can index your site very quickly (within a couple of days), if you have an incoming link or submit to their crawler. If your site is well keyword optimized for a fairly rare keyword, it is entirely plausible that it would come up number one fairly quickly.
What takes a long time is for google to update their pagerank index. This is where your site will sit in the Go
fuckedgoogle.com anyone? (Score:2, Interesting)
http://www.fuckedgoogle.com/ [fuckedgoogle.com]
Sitemaps abuse? (Score:3, Insightful)
Re:Sitemaps abuse? (Score:2)
Re:Sitemaps abuse? (Score:3, Interesting)
Re:Sitemaps abuse? (Score:2)
I've not seen anything to suggest sitemaps will improve your ranking, just get you indexed more often.
If you claim pages update every day, but they don't, it will be pretty easy for the spider to tell. So you could stop the frequent scans if they aren't really needed, if after say a month the supposed daily updates never happened.
Re:Sitemaps abuse? (Score:2, Interesting)
Anyway, a brief look at the proposed format gives very little scope for abuse - you can specify location, change frequency, last modified and a priority, and that's it. The priority is specified as only applying to urls from the same site, so what you can do with it is fairly limited. Overall, it looks written as a set of additional hints to spiders crawling the site.
Re:Sitemaps abuse? (Score:4, Informative)
First, the priority is a relative priority, so if you want to set every page to 1.0 (defined as the highest priority) it'll mean nothing.
Second, if you lie about update frequency or the date of the last update they'll figure it out pretty quick.
These aren't commands, they're hints.
Parent
While they are at itmaybe new meta tags? (Score:2)
Re:While they are at itmaybe new meta tags? (Score:2)
I think the "coverage" tag would be probably what you're looking for.
Reinventing the wheel? (Score:2)
Reminds me of blog pings - what's wrong with using the Referer header? Doing some checking and then fetching the referering page and checking for linkage?
Has the world gone XML crazy?
Has the world gone XML crazy? (Score:2)
Just think of this sort of thing as inter-linking web services sitting on top of the http protocol.
Justin.
Google is IT's Willy Wonka (Score:5, Funny)
Re:Google is IT's Willy Wonka (Score:3, Informative)
I can neither confirm nor deny the existence of any secret video game testing rooms.
-B
Creative Commons Meme (Score:2, Informative)
http://www.realmeme.com/miner/preinflection.php?s
How is this a win-win? Here's how.... (Score:2)
Livejournal.com has had a number of problems with Google, and often just plain outright bans them from spidering the site. Part of the problem is that all the registered users have their journals at journalname.livejournal.com as well as livejournal.com\users\journalname. This means indexing the journals for resisted users doubles the load on their server farm!
With something like this, livejournal would be able to define exactly how often the indexing process occurs,
Re:How is this a win-win? Here's how.... (Score:2)
Re:How is this a win-win? Here's how.... (Score:2)
robots.txt (Score:2)
Google Evil Index (Score:5, Funny)
Lotus Notes? (Score:2)
(The submitter probably meant Lotus Domino, which is still a bad webserver, but not nearly as bad as Notes would be.)
Re:Lotus Notes? (Score:2)
great idea (Score:2)
Or maybe another hidden use... (Score:4, Insightful)
Marketplace of Ideas (Score:2)
And I'm willing to license my idea, "better search engines with better user interfaces", to Google, for a modest sum.
what's the basis of the license? (Score:2)
In any case, patented or not, the CC license that this falls under seems acceptable for an open standard, even if it is patented, because it is transferable and because its requirements are minimal. Contrast this with the Microsoft Office XML license, which is royalty-free (for now...), but non-transferable.
Darn it (Score:2)
Will wait until I get my new server.
What does Creative Commons mean here? (Score:2)
More proof that Google isn't Netscape (Score:2)
The thing that seems so cool about this sort of thing is that it opens up the search service to the rest of us to help us make our content easier to find when it is updated. One thing that I have come to really respect about Google is that they don't rely on the government to beat Microsoft back down the way Netscape did. Google has managed to make a product that 47% of the US Internet users want to use, even though MSN is the default in IE. Remember Netscape 4? There's a reason that bloated POS failed, any
502 Server Error! (Score:4, Funny)
Why not just use rss/atom? (Score:3, Insightful)
Re:Why not just use rss/atom? (Score:3, Interesting)
insight into unlinked directories (Score:3, Informative)
essentially using "find" and "grep" alone, but this tool is much better,
faster and easy to configure. Cool.
Note that this tool will allow google to reach files which never would be
found by spidering a site, because the files are not linked. If you
include something like
<directory path="/var/www/html" url="http://www.example.com/"
in your config.xml and run "sitemap_gen.py" on it, you will give the world
access to a large amount of material
(like test versions of your website or source code you did not want to
make accessible). We might see lot more material material which had been
'hidden'.
Re:IIS? (Score:2)
Remember that MS doesn't have a monopoly on web servers, so they can't be dicks about it like they can with everything else.
Re:Off Topic, Yeah, But I Am So-o-o-o Googled Out (Score:3, Insightful)
Well, maybe if Google stop doing stuff for a while?
Lots of slashdotters seem interested in what Google does, either becuase it tends to be neat, or so they can worry about privacy and the info Google potentially has access to.
Re:Still in Beta (Score:2)
Just about all of Google seems to be in beta. While it is nice to get the stuff early, "beta" is a pretty meaningless term as far as Google stuff is concerned.
Re:Still in Beta (Score:2)
Re:How does this benefit me? (Score:5, Insightful)
It benefits you because:
Also, you wouldn't necessarily have to maintain more than one sitemap. You could use XSLT to create the sitemap.html file for your site from the XML file you create for Google. In fact, wouldn't it be nice for Web authoring tools to do this automatically for you?
EricMake Easy Money with Google: The Blog [makeeasymo...google.com] (powered by blojsom [sf.net])
Parent
Re:How does this benefit me? (Score:3, Interesting)
Google immediately knows that the site exists, immediately knows how many pages there are, how often they are supposed to change, AND what priority I place on them, so out of my 150 pages, the 10 I want spidered first are labeled as higher priority.
This makes total sense to me.
Re:SHUT UP SHUT UP SHUT UP (Score:2)
Re:Next thing you know... (Score:4, Informative)
As long as everyone can freely and voluntarily use these specs without having to pay anything, how is this a bad thing?
Parent
Re:Next thing you know... (Score:5, Insightful)
Nice. Google proposes a way to help web site administrators have a bit more control over how their site is perceived by a search engine, releases this proposal under an open source license, and at least a few people on slashdot accuse them of (*pinky to corner of mouth*) taking over the internet.
Most of Google's recent actions have been good things -- sponsoring open source developers for the summer, proposing ways for site administrators to provide additional info about their site, and implementing a "nofollow" option to prevent spammers trying to increase their page ranking. However, if they constantly get criticized and second-guessed for doing good things, what incentive do they have to continue? If you give a charity $20 and they criticize you for not giving them $30, are ever going to give anything to that charity again?
Let's give Google the benefit of the doubt. Just like a person, they'll probably make some mistakes, but like a person I'll give them the benefit of the doubt until they prove me wrong. Some corporations do actually do good things and still manage to be successful, and in those cases they should be supported, not attacked.
Parent