Forgot your password?
typodupeerror
The Internet Network

Fixing Broken Links With the Internet Archive 79

Posted by Soulskill
from the maintain-URIs-or-T.B-L.-will-beat-you-up dept.
eggboard writes "The Internet Archive has copies of Web pages corresponding to 378 billion URLs. It's working on several efforts, some of them quite recent, to help deter or assist with link rot, when links go bad. Through an API for developers, WordPress integration, a Chrome plug-in, and a JavaScript lookup, the Archive hopes to help people find at least the most recent copy of a missing or deleted page. More ambitiously, they instantly cache any link added to Wikipedia, and want to become integrated into browsers as a fallback rather than showing a 404 page."
This discussion has been archived. No new comments can be posted.

Fixing Broken Links With the Internet Archive

Comments Filter:
  • Please no? (Score:3, Insightful)

    by DMiax (915735) on Friday January 24, 2014 @04:01PM (#46060923)

    ...want to become integrated into browsers as a fallback rather than showing a 404 page

    Fuck no. If a page does not exist it does not exist.

  • by mrchaotica (681592) * on Friday January 24, 2014 @04:05PM (#46060997)

    To everyone who might think of subverting the HTTP standard to "helpfully" show me an alternative to a page that does not exist: fuck you.

    I don't give a shit whether you're doing it because you want to advertise to me or because you want to altruistically show me what I'm looking for even if it doesn't exist anymore. Either way, you are still lying to me and breaking everything that relies on accurate error reporting. So quit it!

  • by Sarten-X (1102295) on Friday January 24, 2014 @04:10PM (#46061065) Homepage

    Supply HTTP code 404, and provide the content of the old page, preferably with a large banner saying "we couldn't find it, but here's what we had before".

    I believe that meets all applicable standards. Automated systems should recognize the 404 code, and human systems (which won't likely see the underlying code) will see the banner.

  • by barlevg (2111272) on Friday January 24, 2014 @04:18PM (#46061175)
    While I honestly think this is an awesome idea, I wonder, if this takes off, whether anyone who currently pays for web hosting of a static site will decide, "fuck it--it's backed up on Internet Archive. Might as well save the $N a month I pay to maintain the website and lease the domain name."
  • by Minwee (522556) <dcr@neverwhen.org> on Friday January 24, 2014 @04:43PM (#46061483) Homepage

    Sorry but that violates the standard as well. It must return a 404 or you break testing.

    RFC 2616 mandates a 4xx error code followed by an optional human readable reason phrase. While the reason phrase is usually "Not Found" for a 404 error, there's nothing keeping it from being augmented by "...but a copy of a previous version is over there."

    If your testing relies on anything beyond the numeric error code, then it's probably already broken.

  • by pavon (30274) on Friday January 24, 2014 @07:21PM (#46063037)

    None of those examples should result in a broken link if you are maintaining your website correctly. And this feature is only "fixing" broken links; that is links that once existed and are now 404'ed.

    If you want to discontinue a product, then replace those pages with one that explains that the product is discontinued, and provides links to simular current products, as well as the support page for the discontinued product. If a users is clicking on links in reviews or forum posts about your old product and receive 404's, or redirection to a completely unrelated and unhelpfull page on your site, they will be frustrated with or without this feature.

    In the second case, just redirect the entire demo website URL tree to a current list of examples.

    In the third case, you shouldn't do that without redirecting the old url to the new one. Seriously, are you trying to make your content hard to find?

    Again, redirect to the new menu.

    In no case is sending a user a 404 useful or benificial, nor is it the most appropriate thing to do according to the HTTP standard. If you really want to be pendantic then send a 301 or 303 to perform the redirect, otherwise use URL rewriting, or just change the contents of the existing URL, whichever is easiest. The user should only see a 404 if they clicked an invalid link that was never a real URL for your website. Otherwise, you have failed your users, and it's no-one's fault but your own if they choose to use a service that tries to make up for your short-commings.

Recent investments will yield a slight profit.

Working...