When the Internet Archive Forgets (gizmodo.com) 71
A reminder that Internet Archive's Wayback Machine, which many people assume keeps a permanent trail and origin of web-content, has little feasible choice but to comply with DMCA takedown notices. As a result of which, a portion of the archive of things people submit to the website continues to quietly fade away. Gizmodo: Over the last few years, there has been a change in how the Wayback Machine is viewed, one inspired by the general political mood. What had long been a useful tool when you came across broken links online is now, more than ever before, seen as an arbiter of the truth and a bulwark against erasing history. That archive sites are trusted to show the digital trail and origin of content is not just a must-use tool for journalists, but effective for just about anyone trying to track down vanishing web pages. With that in mind, that the Internet Archive doesn't really fight takedown requests becomes a problem. That's not the only recourse: When a site admin elects to block the Wayback crawler using a robots.txt file, the crawling doesn't just stop. Instead, the Wayback Machine's entire history of a given site is removed from public view.
In other words, if you deal in a certain bottom-dwelling brand of controversial content and want to avoid accountability, there are at least two different, standardized ways of erasing it from the most reliable third-party web archive on the public internet. For the Internet Archive, like with quickly complying with takedown notices challenging their seemingly fair use archive copies of old websites, the robots.txt strategy, in practice, does little more than mitigating their risk while going against the spirit of the protocol. And if someone were to sue over non-compliance with a DMCA takedown request, even with a ready-made, valid defense in the Archive's pocket, copyright litigation is still incredibly expensive. It doesn't matter that the use is not really a violation by any metric. If a rightsholder makes the effort, you still have to defend the lawsuit.
In other words, if you deal in a certain bottom-dwelling brand of controversial content and want to avoid accountability, there are at least two different, standardized ways of erasing it from the most reliable third-party web archive on the public internet. For the Internet Archive, like with quickly complying with takedown notices challenging their seemingly fair use archive copies of old websites, the robots.txt strategy, in practice, does little more than mitigating their risk while going against the spirit of the protocol. And if someone were to sue over non-compliance with a DMCA takedown request, even with a ready-made, valid defense in the Archive's pocket, copyright litigation is still incredibly expensive. It doesn't matter that the use is not really a violation by any metric. If a rightsholder makes the effort, you still have to defend the lawsuit.
Move to Canada (Score:5, Interesting)
They should move to Canada as we have an exemption for archives which would allow the content to remain.
Re: (Score:2, Funny)
Re: (Score:2)
A distributed web archive system, like BitTorrent, where different systems archive different webpages and can be searched in the same way.
Re: (Score:2)
They have back bacon, poutine, Molson's, The Odds, and toques. What's not to like, other than Avril Lavigne?
Re: (Score:3)
I like all kinds of people of good will. Assholes who won't grow the fuck up, not so much.
In which group do you think you belong?
Hellooo NAFTA! Hello digital dark ages! (Score:2, Interesting)
You will find out, what those agreements were made for.
This happens all over the world.
Scientist already call it the "digital dark ages".
Fun fact:
There was a time, when Germany didn't have any such laws, but the UK did.
The UK's creative scene suffocated. While Germany's creative scene flourished so much, that that is, where it got its title "the land of poets and thinkers" from.
I recommend you look it up, and don't just believe an AC on the Internet.
And all, because some cokeheads didn't want to work for th
Re: (Score:1)
They should move to Canada as we have an exemption for archives which would allow the content to remain.
Wouldn't work.
The USA has an exemption for archives and libraries, of which the Internet Archive is a legally registered one of, and they also are named explicitly in an exemption to the DMCA when it comes to software.
https://archive.org/post/82097/internet-archive-helps-secure-exemption-to-the-digital-millennium-copyright-act [archive.org]
To sue the Internet Archive for violating a law, where the law explicitly names the Internet Archive as exempt, takes some serious balls.
I can't see how it would be that costly to simp
Re: (Score:2)
Our litigation system works slightly differently and is more balanced. Courts will refuse to hear obviously frivolous/baseless claims far more readily and costs are awarded to defendants far more often. The chance that you'll have to pay both sides of a lawsuit's costs is a big deterrent.
Victory will be achieved.... (Score:3)
... when the Wayback Machine itself has been dropped into a memory hole.
Library of Congress (Score:5, Interesting)
Get a charter from the Library of Congress, which can essentially bypass DMCA restrictions by fiat. The LoC usually seems pretty progressive about these things.
Re: (Score:3)
Or move those archives out of the US where the DMCA does not apply.
This is why we had world wide mirrors back in the day, especially for crypto software.
Anyways. Remember to Donate (Score:4, Informative)
DOOM ? : https://archive.org/details/do... [archive.org]
Apple II : https://archive.org/details/ap... [archive.org]
Arcade: https://archive.org/details/in... [archive.org]
DOS GAmes: https://archive.org/details/so... [archive.org]
Like political office holders? (Score:5, Insightful)
if you deal in a certain bottom-dwelling brand of controversial content
I like how this insinuates that it's the "dark web" trying not to be blocked when it's political leaders, actors and other public personae (people very much out front and wanting to be seen) that go out of their way to delete their internet history when it contradicts with whatever they're pushing today so they can say "this has always been who I am!"
Archiving and to a greater point JOURNALISM (not "reporting" but actually chronicling and journaling the days' notable events in an objective manner) is an indispensable requirement for any person to become educated on a topic and to make an informed decision.
Eventually these things become history and are lost to current though until somebody digs through the archives to rediscover the truth. Except now we can make it go away with a keypress and, poof, we've always been at war with Eurasia.
Re: (Score:1)
holding news media accountable (Score:4, Insightful)
Eventually these things become history and are lost to current though until somebody digs through the archives to rediscover the truth. Except now we can make it go away with a keypress and, poof, we've always been at war with Eurasia.
There is more of an immediate need, since the ability to stealth edit a story after publishing it is too great a temptation to resist. There's been too many examples of 'reputable' news sources getting caught red handed doing this. [archive.org]
Anyway, an archive source that is subject to the hideously malformed DMCA is hardly an archive source at all.
Re: (Score:1)
Re: (Score:1)
Eventually these things become history and are lost to current though until somebody digs through the archives to rediscover the truth. Except now we can make it go away with a keypress and, poof, we've always been at war with Eurasia.
Hmm, true. However, there is also the need to forget some things. If you used twitter to post a stupid edgy joke back when you were a teen, do you really want that to sabotage your career? Who makes the distinction between what is personal and what is public history? Who decides the limits?
Not an easy thing to answer, especially when people are currently weaponizing it.
boxing helena problem (Score:1)
The fact of removal can still be shown (Score:4, Interesting)
So, someone requested, you remove a page — and you decide to comply. By replacing it with something like "Content removed by on date on request from such and such."
Requesting removals of evidence suddenly becomes less effective — an explicit record of removal may appear even more sinister, than whatever was there before...
Spirit? (Score:1)
This is stupid, if there is a 'spirit' to the protocol, formalize it and enforce it.
Otherwise all you're saying is that it's as useless as do-not-track/do-not-call and that people are free to ignore it as they choose.
The internet has shown us time and time again, if you are relying on people voluntarily following the 'spirit' of something, it's 100% guaranteed they won't.
My take is
Comment removed (Score:3)
Re: (Score:2)
What is it that separates some of us, who believe that a proper, immutable archive is more important to our species than copyright restriction, from those who feel otherwise?
Is it just money? Is that all it is? Or is it something deeper?
There are many in the copyright/content industry who consider copyright to be akin to physical property, and that their property rights trump anyone's desire to do things with their property.
Spirit of the protocol (Score:3, Insightful)
>"the robots.txt strategy, in practice, does little more than mitigating their risk while going against the spirit of the protocol."
Spirit of the protocol? I kinda disagree with that. If a site admin put up a robots.txt file, then they are clearly signaling they do not want the specified parts of the site crawled/archived/copied. It isn't just a directive to be convenient signaling to the crawler about what is a waste of time/load/bandwidth, but also a choice the admin made saying "these things should not be crawled" and for whatever reason the admin wants.
To me, the only controversial part would be- does having a robots.txt excluding something NOW mean that it should exclude things that had been "OK" in the past (because there was no exclusion back then). Personally, I tend to go with the interpretation that it means "now or in the past" (perhaps they changed their mind or forgot to put up a robots.txt initially). But that is certainly murky.
I think it is very hostile, and very much against the "spirit of the protocol" to ignore a robots.txt file. I could see where it might even have legal ramifications later (similar to a "no photography" sign in a store).
Re:Spirit of the protocol (Score:5, Informative)
The big issue came about in that some domains lapsed, years later someone else registered said domains, put up robots.txt, and as such the entire history from the previous owners were inadvertently deleted.
Re: (Score:3)
>"The big issue came about in that some domains lapsed, years later someone else registered said domains, put up robots.txt, and as such the entire history from the previous owners were inadvertently deleted."
OK, well I can certainly see where that would be an issue. A big one at that, since it is hard or impossible for any crawler to tell if it is even the same site, since the domain/path might be the same. Of course, one might also argue that if they sold the domain/site/path to someone else, they so
Some things need erasing (Score:2)
Some things do indeed need erasing though. I've yet to publish the paper because disclosure was only earlier this month. I found an entire ISP leaking publicly the physical addresses of virtual every single customer on their network (essentially EVERYONE was doxxed at once). This information is mirrored in the Wayback Machine (along with a few other archives). Part of the reason my paper isn't published yet is due to the ISP currently working with these archives to remove the sensitive information the ISP s
Free speech damage (Score:3)
If the DMCA stymies free speech in practice, then it could be considered a violation of the 1st Amendment. Form a coalition to sue all the way up to the Supreme Court.
A similar situation has arisen for the recent "Online Sex Trafficking Act of 2017", which is so vague that it makes hosting any kind of online romantic discussion
or message group too risky. One could end up in jail because they don't police content tightly enough.
(Craigslist removed the "personals" discussion group because that Act. Ads for shady services now spill over into other discussion groups, often ruining them. Craig may end up in jail anyhow for not scrubbing hard enough.)
Both laws have "excessive side-effects" on legitimate free speech.
Re: (Score:1)
Most people are turning against free speech now.
Why not host it in another country? (Score:2)
Many archives! (Score:1)
British Library UK Web Archive (Score:3)
UK websites are covered by the British Library
https://www.bl.uk/collection-g... [www.bl.uk]
http://data.webarchive.org.uk/... [webarchive.org.uk]