Googlebot and Document.Write 180
With JavaScript/AJAX being used to place dynamic content in pages, I was wondering how Google indexed web page content that was placed in a page using the JavaScript "document.write" method. I created a page with six unique words in it. Two were in the plain HTML; two were in a script within the page document; and two were in a script that was externally sourced from a different server. The page appeared in the Google index late last night and I just wrote up the results.
How did this make the front page? (Score:2, Insightful)
Google request external JavaScript file? (Score:4, Insightful)
Re:Nonsense words? (Score:1, Insightful)
Seriously, he shouldn't have posted these words until he was done with the test.
Doesn't work; Good (kind of) (Score:5, Insightful)
The model for websites is supposed to work something like this:
In other words, your web page should work for any browser that supports HTML. It should work regardless of whether CSS and/or Javascript is enabled.
So why would Google's crawler look at the Javascript? Javascript is supposed to enhance content, not add it.
Now, that's not saying many people don't (incorrectly) use Javascript to add content to their pages. But maybe when they find out search engines aren't indexing them, they'll change their practices.
The only problem I can see is with scam sites, where they might put content in the HTML, then remove/add to it with Javascript so the crawler sees something different than the end-user does. I think they already do this with CSS, either by hiding sections or by making the text the same color as the background. Does anyone know how Google deals with CSS that does this?
Re:Doesn't work; Good (kind of) (Score:3, Insightful)
Re:Doesn't work; Good (kind of) (Score:3, Insightful)
Define "work". A web page without formatting is going to be useless to anyone who isn't a part-time web developer. To them, it's just going to be one big, messy looking freak out... akin to a television show whose cable descrambler broke. Sure all the "information" is there, somewhere, but in such a horrible format that a human being can't use it.
Web pages are dynamic these days. Saying that the only acceptable model is staticly defined strict XHTML mixed with an additional layer of tableless CSS is foolish zelotry. With so much happening dynamically based upon end-user created pages, along with the somewhat annoying usage of Flash, Powerpoint, or PDF for important information, you really can't create a comprehensive index without being a little flexible.
Saying that Google shouldn't take into account scripting when scanning pages is like saying they shouldn't index the PDF's that are online. Sure, it may not conform to what you believe is "good web coding standards," but the reality is that they're out there.
google.com/?q=slashdotting+in+google+dollars (Score:5, Insightful)
I think the actual experiment here is:
I look forward to the follow-up piece which details the financial results.
Re:How does document.write mess up your DOM tree? (Score:5, Insightful)
Re:Doesn't work; Good (kind of) (Score:3, Insightful)
Re:Doesn't work; Good (kind of) (Score:3, Insightful)
Notice I keep putting the X in (X)HTML in brackets. That's because I'm not convinced strict XHTML is the only viable method (though I'm not convinced it's not -- I'm on the fence).
Re:If they weren't, then they're trying (Score:5, Insightful)
And if pages are designed using AJAX and dynamic rendering just for the sake of using AJAX and dynamic rendering.. well, they deserve what they get
I would make normal links, then use JS on top (Score:4, Insightful)
It's a nice improvement. Less bandwidth used, and a quicker interface.
Unfortunately, it's not often done right. The way I would do it is to first make the menu work like it normally would. Make each menu item a link to a new page. Then you apply Javascript to the menu item. Something like this: (FYI, this is how I do pop-up windows, too.)
Putting it behind a login screen doesn't solve all the problems. You're right that it won't be searchable anyway, but people with older browsers or screen readers won't be able to access it.
I think Gmail actually offers two versions. One for older browser that uses no (or little?) Javascript, and the other which almost everyone else (including me) uses and loves. But I'm not sure how easy it would be to maintain two versions of the same code like that. I also don't think it's nice for the end user to have to choose "I want the simple version", though it may encourage them to update to a newer browser, I guess.
(Of course this is all "ideally speaking", I realize there are deadlines to meet and I violate some of my own guidelines sometimes. I still think they're good practices, though.)
Re:google.com/?q=slashdotting+in+google+dollars (Score:5, Insightful)
Re:Doesn't work; Good (kind of) (Score:3, Insightful)
The model for websites is supposed to work something like this:
If only. Turn off JavaScript and try these sites:
Google holds back the web! (Score:2, Insightful)
Luckily blind people don't drive! (Score:1, Insightful)
Those selling professional web services should be liable under ADA and similar laws, that's how we fix the web.
Re:google.com/?q=slashdotting+in+google+dollars (Score:3, Insightful)
It used to be that the web as a whole avoided this crap. Now, it's so easy to make stupid amounts of money from stupid content that a huge percentage of what gets submitted only even exists for the money -- it's like socially-acceptable spam. Digg is by far the worst confluence of this kind of crap, but the problem is web-wide, and damn near impossible to avoid.
Re:How does document.write mess up your DOM tree? (Score:3, Insightful)
Based on all the segfaults, blue screens of death, X-Window crashes, Firefox crashes, code insertion bugs et cetera I've seen, I'd say that no, in general programmers don't know what they're doing, and certainly shouldn't be trusted to not fuck it up. The less raw access to any resource - be it memory or document stream - they are given, the better.