Forgot your password?
typodupeerror
The Internet

404-No-More Project Seeks To Rid the Web of '404 Not Found' Pages 72

Posted by Unknown Lamer
from the rm-is-not-forever dept.
First time accepted submitter blottsie (3618811) writes "A new project proposes to do away with dead 404 errors by implementing a new HTML attribute that will help access prior versions of hyperlinked content. With any luck, that means that you'll never have to run into a dead link again. ... The new feature would come in the form of introducing the mset attribute to the <a> element, which would allow users of the code to specify multiple dates and copies of content as an external resource." The mset attribute would specify a "reference candidate:" either a temporal reference (to ease finding the version cited on e.g. the wayback machine) or the url of a static copy of the linked document.
This discussion has been archived. No new comments can be posted.

404-No-More Project Seeks To Rid the Web of '404 Not Found' Pages

Comments Filter:
  • by K. S. Kyosuke (729550) on Monday April 21, 2014 @07:10PM (#46810653)

    Also, when it comes to handling all simple 404, there could be a browser extension that would redirect you to archive.org. People would be able to use that on existing content. It's what I'm already doing manually, only this would be faster.

    By the way, I always thought that URIs were supposed to handle precisely this - that they were supposed to be unique, universally accessible identifiers for contents and resources - identifiers that, once assigned, wouldn't need to be changed to access the same contents or resources in the future. Oh, hell. Now we have to add extra layers on top of that?

  • by tepples (727027) <tepples@nOSpAM.gmail.com> on Monday April 21, 2014 @07:24PM (#46810723) Homepage Journal

    I always thought that URIs were supposed to handle precisely this - that they were supposed to be unique, universally accessible identifiers for contents and resources - identifiers that, once assigned, wouldn't need to be changed to access the same contents or resources in the future.

    That's the intent: cool URIs don't change [w3.org]. But in the real world, URIs disappear for political reasons. One is the change in organizational affiliation of an author. This happens fairly often to documents hosted "for free" on something like Tripod/Geocities, a home ISP's included web space, or a university's web space. Another is the sale of exclusive rights in a work, invention, or name to a third party. A third is the discovery of a third party's exclusive rights in a work, invention, or name that make it no longer possible to continue to offer a work at a given URI.

  • Re:Um, 301 and 302 (Score:4, Interesting)

    by Cloud K (125581) on Monday April 21, 2014 @07:46PM (#46810877)

    Yes indeed. I took control of a site in 2007 and haven't knowingly broken a link since. Various restructures just led to more redirect entries in .htaccess, and if you somehow have an old 2007 link it should take you to the relevant page on today's site. It just needs disciplined webmasters.

    (I'm not the most creative of people and our marketing girls are not exactly the most constructive in dealing with other departments (such as making suggestions for improvement or even opening their mouths and telling me they don't like it in the first place), so they've decided to simply outsource it from under me. The new developers will no doubt break my lovely 7 year chain. But hey ho, that's life.)

  • As user of both Bittorrent and Git and a creator of many "toy" operating systems which have such BT+Git features built in, I would like to inform you that I live in the future that you will someday share, and unfortunately you are wrong. From my vantage I can see that link rot was not ever, and is not now, acceptable. The architects of the Internet knew what they were doing, but the architects of the web were simply not up to the task of leveraging the Internet to its fullest. They were not fools, but they just didn't know then what we know now: Data silos are for dummies. Deduplication of resources is possible if we use info hashes to reference resources instead of URLs. Any number of directories AKA tag trees AKA human readable "hierarchical aliases" can be used for organization, but the data should always be stored and fetched by its unique content ID hash. This even solves hard drive journaling problems, and allows cached content to be pulled from any peer in the DHT having the resource. Such info hash links allows all your devices to always be synchronized. I can look back and see the early pressure pushing towards what the web will one day become -- Just look at ETags! Silly humans, you were so close...

    Old resources shouldn't even need to be deleted if a distributed approach is taken. There is no reason to delete things, is there not already a sense that the web never forgets? With decentralized web storage everyone gets free co-location, essentially, and there are no more huge traffic bottlenecks on the way to information silos. Many online games have built-in downloader clients that already rely on decentralization. The latest cute cat video your neighbor notified you of will be pulled in from your neighbor's copy, of if they're offline, then the other peer that they got it from or shared it with, and so on up the DHT cache hierarchy all the way to the source if need be, thus greatly reducing ISP peering traffic. Combining a HMAC with the info hash of a resource allows secured pages to link to unsecured resources without worrying about their content being tampered with: Security that's cache friendly.
    <img infohash="SHA-512:B64;2igK...42e==" hmac="SHA-512:SeSsiOn-ToKen, B64;X0o84...aP=="> <-- Look ma, no mixed content warnings! -->

    Instead of a file containing data, consider the names merely human readable pointers into a distributed data repository. For dynamism and updates to work, simply update the named link's source data infohash. This way multiple sites can be using the same data with different names (no hot linking exists), and they can point to different points in a resource's timeline. For better deduplication and to facilitate chat / status features some payloads can contain an infohash that it is a delta against. This way, changes to a large document or other resource can be delta compressed - Instead of downloading the whole asset again, users just get a diff and use their cached copy. Periodic "squashing" or "rebasing" of the resource can keep a change set from becoming too lengthy.

    Unlike Git and other distributed version controls, each individual asset can belong to multiple disparate histories. Optional per-site directories can have a time component. They can be more than a snapshot of a set of info-hashes mapped to names in a tree: Each name can have multiple info-hashes corresponding to a list of changes in time. Reverting a resource is simply adding a previous hashID to the top of the name's hash list. This way a user can rewind in time, and folks can create and share different views into the Distributed Hash Table File-system. Including a directory resource with a hypertext document can allow users to view the page with the newest assets they have available while newer assets are downloaded. Hypertext documents could then use the file system itself to provide multiple directory views, tagged for different device resolutions, paper vs eink vs screen, light vs dark, etc. CSS provides something similar, but why limit th

Line Printer paper is strongest at the perforations.

Working...