All of Gopherspace Available For Download 200
An anonymous reader writes "Cory Doctorow tells us that '[i]n 2007, John Goerzen scraped every gopher site he could find (gopher was a menu-driven text-only precursor to the Web; I got my first online gig programming gopher sites). He saved 780,000 documents, totalling 40GB. Today, most of this is offline, so he's making the entire archive available as a .torrent file; the compressed data is only 15GB. Wanna host the entire history of a medium? Here's your chance!' Get yourself a piece of pre-Internet history (torrent)." Update: 04/30 00:16 GMT by T: As several readers have pointed out below, our anonymous friend probably meant to say "pre-Web," rather than "pre-Internet."
Oh gopher from su.se (Score:1, Informative)
Porn, lots of porn. Also, not understanding why emacs wouldn't run on a mac.
Re:Far cry from "all of gopherspace" (Score:3, Informative)
"Do you have any facts or figures underpinning your statements ?"
That would indeed be interesting, but GP makes a reasonable assumption, akin to "There were more horse carriages out and about before the automobile." No?
Re:Shame on Slashdot (Score:5, Informative)
Beat me to it. The summary should read "Get yourself a piece of pre-world wide web history," since gopher came AFTER the birth of the internet (1981) but before the widespread usage of the web (circa 1993).
Re:Wrong (Score:3, Informative)
To a lot of people, WWW=Internet. Us old greybeards who remember when the Internet was telnet, FTP, e-mail and Usenet know better.
Re:The Ultimate Lesson in Open Source and Standard (Score:3, Informative)
That's more the fault of the clients than the protocol. There's no reason you can't serve hypertext documents over gopher, and no reason a gopher client couldn't display graphics.
Re:Wrong (Score:3, Informative)
Re:The Ultimate Lesson in Open Source and Standard (Score:4, Informative)
Re:Wrong (Score:2, Informative)
Re:Wrong (Score:4, Informative)
So, yes, Usenet preceded the Internet in the sense that it did not rely in IP, though both generally evolved around the same time.
But, there was a rather vibrant pre-WWW internet where the protocols of choice were smtp (mail), ftp (file transfer), and gopher and archie for repositories of places to find stuff. News could be carried via nntp (net news transfer protocol).
What some may not know was that sendmail could work over transiently connected points as well, rather like usenet. Anyone still remember bang path notation? One would address mail using the sequence of hosts required to get it from one's own to the destination, using names understood by each successive host in the sequence. One of the reasons sendmail configuration files were so horrendous was to permit relaying between networks using different host naming conventions.
Re:Pre-internet history? (Score:1, Informative)
They teach us the difference and why it no longer matters;P
Tell that to people using non-WWW email clients, pushing SOAP data, sshing into their servers, using Skype, video chat, P2P software, etc.
While the WWW is becoming ubiquitous, with Google and Bing as major hubs, there's a lot of stuff (including everything going via UDP) happening on the Internet that has little or nothing to do with WWW (or even http[s] for the most part).
Re:Compression routine (Score:1, Informative)
Gopher can contain binary files as well. If the archive is truly complete, then it contains more than just text.
I recall finding a ROM site on gopher about 2 years ago, so if this archive is complete you'll get a complete set of Atari 2600 and Coleco ROMs free with your torrent download. (I think it had a few NES too, but it was mostly the pre-NES consoles)
Re:Far cry from "all of gopherspace" (Score:5, Informative)
Yes.
In 1997 we had a 100Gb disk array holding the research data from our lab, all of which was available via gopher (and ftp, and the web). We moved to a 200Gb array shortly after, and then a 400Gb after that. And then 3Tb, around 2008.
Sometime around 2007 or 2008 the SunOS system that ran the gopher server died permanently and was replaced by a virtual linux server without gopher. Even without that server, I found not long ago that I was still creating .cap files -- which were gopher, as I recall, but maybe archie.
Quantitatively, online currently I have more than 15Gb of data for just 1997, all of which was gophered at the time. In 1998, another 18Gb.
So, I would say, had the gopher scraping been done in 1997 instead of 2007, the result would have been a lot more data. In fact, a few months earlier in 2007 and it might have BEEN a lot bigger.
Re:Wrong (Score:3, Informative)
Ahhh the good old days.
You post a question on rec.arts.tv like, "When does the new season of TNG start?", wait for the midnight syncing between your local BBS and the rest of the nation, and then you come back tomorrow morning to learn the answer. If you're lucky. Sometimes you had to wait 2 days for a reply.
Re:Gopher (Score:5, Informative)
No, just another ten years of November.
I believe you mean September. [wikipedia.org]
Gopher lives! (Score:5, Informative)
What do you mean, "was"? Gopher still works fine. There are dozens of servers out there. See quux.org [quux.org] or just install your Linux distribution's gopher package and fire it up.
Re:Shame on Slashdot (Score:3, Informative)
I've been around a while, and I can't think of any time a Slashdot editor fact-checked, spell-checked, or proofread a submission. Look at it, they put the entire thing into a quote. That way they can just say they're quoting the submitter and that's what he said.
They might add the "UserXXX writes," part themselves, but a couple characters of perl could probably do that part just as well.
Re:Shame on Slashdot (Score:3, Informative)
This definition is probably looser than most, but here's a quick and dirty view:
The Web is a huge collection of interlinked documents addressable by URLs and served with HTTP. The Internet is the world-wide TCP/IP network over which the Web and many other services operate.
Re:The Ultimate Lesson in Open Source and Standard (Score:5, Informative)
The original pre-RFC HTTP states that a response is an HTML message [w3.org].
Re:Shame on Slashdot (Score:4, Informative)
Re:Shame on Slashdot (Score:3, Informative)
Um, the generally accepted start of the Internet is by activities surrounding the start of ARPANET in the late 1960s. ARPA in its name still lives on as part of reverse DNS entries. Some people say it started in 1967, some say 1969, either way, it was much earlier than 1981 and there are a lot more protocols that are part of what we call "the Internet" than just TCP/IP, although of course not all of it is routed globally. Check your /etc/protocols file sometime, the first line says Internet (IP) protocols.
Re:Shame on Slashdot (Score:3, Informative)
History of the Internet from 1957 to present:
http://vimeo.com/2696386?pg=embed&sec=2696386 [vimeo.com]
Quite educational, even if you think you know all about it.
Re:Shame on Slashdot (Score:3, Informative)
Having just watched it again, it may not fully answer your question. With what you learned from the video in mind, the OSI model [wikipedia.org] is the layers the video talked about. There are seven layers altogether, with the lowermost layer being the physical hardware everything runs on, followed by the network connecting the hardware, then how data is passed over the network, and so on until you get to the application layer. You've heard of TCP/IP? That's TCP (layer 4) running on top of an IP (layer 3) network. ICMP is a different network which is what things like 'ping' (ICMP echo) and 'traceroute' run over. You've heard of UDP? That's another layer 4 protocol different from TCP.
What runs on the application layer is things you're already familiar with. SMTP (email), telnet, FTP, DNS, NTP (network time protocol), and so on including HTTP. HTTP is effectively the web -- it's what a world wide web browser ("web browser", or now just "browser" for short) uses as its primary protocol and why you see URLs starting with http: . So HTTP or "the web" is an application that runs on top of everything below it. You still need the physical hardware, the network connecting the hardware, the various transmission protocols and so on to deliver the data used by the web. Similarly, SMTP or commonly just "email" is an application that runs on top of everything below it.
Think of the acronyms if that will help you understand it better. SMTP is Simple Mail Transfer Protocol, a protocol for transferring simple mail. HTTP is HyperText Transfer Protocol, a protocol for transferring hypertext. FTP is File Transfer Protocol, a protocol for transferring files. NNTP is Network News Transfer Protocol, a protocol for transferring network news, what you've likely heard of as simply Usenet or "newsgroups". You get the idea.
That's the simplistic view of things. In reality, HTTP has been extended to transfer more than just hypertext. Through the use of MIME types (image/gif, image/jpeg, text/html, text/xml, image/binary, and so on) you can transfer arbitrary things that browsers and other applications can understand.
Hopefully that makes a bit more sense.
Re:The Ultimate Lesson in Open Source and Standard (Score:3, Informative)
To be somewhat more accurate, it's not "now" called hypertext: it was called hypertext before gopher even existed. Gopher was first released in 1991, while Ted Nelson coined "hypertext" in 1965, and there were dozens of implementations before the WWW (the most popular outside academia was probably Apple's HyperCard, released in 1987).
Re:Shame on Slashdot (Score:3, Informative)
Actually, the Internet is the world-wide IP network. TCP is just one of many protocols that are used to transmit information across it.