Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Technology

Open Content Network (P2P meets Open Source) 128

Orasis writes "The creators of Swarmcast have announced a new peer-to-peer content delivery network called the Open Content Network. The OCN will allow users to download open source and public domain content from multiple peers and mirrors in parallel. The system is designed to augment the existing mirrors with bandwidth from the p2p network and should eliminate the "Slashdot Effect" for popular open source content."
This discussion has been archived. No new comments can be posted.

Open Content Network (P2P meets Open Source)

Comments Filter:
  • OK, but... (Score:2, Informative)

    by scrm ( 185355 )
    The Open Content site [open-content.net] just announces a list of intentions. Anyone can put this kind of info up. It looks to me like nothing has been achieved yet, making this not really news.
    • Uninformed (Score:3, Informative)

      by BlueboyX ( 322884 )
      Maybe you didn't notice that these guys are the makers of Swarmcast. Or maybe you posted before figuring out what that meant.

      Swarmcast is a (working!) program for parallel p2p file downloading. In other words, the technology IS implimented. They basically are just making a modified program to work with a somewhat different set of files. No biggie.
  • by skroz ( 7870 ) on Tuesday May 21, 2002 @07:08AM (#3557317) Homepage
    A secure system for validation and verification of downloads will obviously need to be implemented. Imagine all of the fun things someone could do if they, say, inserted a rogue module into the linux kernel code. Or the latest release of samba, gtk, glibc, Mozilla, ssh, openssl... the list goes on and on.
    • by popeyethesailor ( 325796 ) on Tuesday May 21, 2002 @07:43AM (#3557479)
      It is part of the specs [open-content.net]


      1.2 Untrusted Caches

      It is currently unsafe to download web objects from an untrusted cache or mirror because they can modify/corrupt the content at will. This becomes particularly problematic when trying to create public cooperative caching systems. This isn't a problem for private CDNs, like Akamai, where all of their servers are under Akamai's control and are assumed to be secure. But for a public CDN, the goal is to allow user-agents to retrieve content from completely untrusted hosts but be assured that they are receiving the content intact. The CAW solves this problem by using content addressing that includes integrity checking information.

      • It's probably worth pointing out that the solution to this problem is really orthogonal to the use of content-based addressing. Also, while signatures etc. can be used to verify the integrity and provenance of the delivered data, there's a whole separate problem of ensuring that it's current or consistent.


    • The example protocol messages that they have given includes a SHA1 integrity check. SHA is the secure hash algorithm, a cryptographic standard. A cryptographic hash is a map h from the set of all bit strings to strings of length k, typically 128. The map h has the additional property that it is intractable to produce two strings (or files, or whatever is being signed) x and y such that h(x) = h(y).
    • read the subject. not much else to say...
  • Security question (Score:2, Interesting)

    by YanceyAI ( 192279 )
    "indviduals will be able to contribute to the open source movement by donating their spare bandwidth and disk space to the network. "


    Perhaps this is a silly question, but I worried about it with Napster and subsequent file sharing software, too. Is is possible to contribute and be secure?

  • by new death barbie ( 240326 ) on Tuesday May 21, 2002 @07:09AM (#3557323)
    Maybe it's just me, but this makes me a little nervous that when the "Open Content Network" gets too popular and dragged down in litigation, the "Open Source" folks are going to find themselves tarred with the same brush; guilty by association. Not what's needed at this juncture.
  • Openft (Score:2, Informative)

    by z-man ( 103297 )
    What about the openft protocol, they've been working on that for a while gift.sourceforge.net [sf.net]. They originally used the fasttrack protocol (KaZaa), but after kaZaa changed there specs, they decided to create their own protocol.
    • There are pleny of other open p2p products.

      Freenet [freenetproject.org] scaleable, not vaporware, very much beta.

      Alpine. [cubicmetercrystal.com]
      based on trust
      Gnunet. [purdue.edu] Sounds very open. based on electonic money. also seach for gnet.

      chord [mit.edu] Very efficient to find files.

      distrinet [sourceforge.net] At this stage: vaporware.(there is code....) But if you look at the description it beats any p2p software!

      But in the end the network with the most data (gnutella/kazaa) will be used. Note that users will switch networks very quickly. Look what happened to napster.
  • Sounds good. Maybe they can team up with the kind of artists who are currently promoting their music on MP3.com

    (I just bought a couple of CDs from a band who make all their music available for free download, so it must work!)

    Would it make much difference for software apps? They're mostly mirrored anyway, and Mozilla/OpenOffice-style distribution doesn't seem to be suffering many bandwidth problems.
    • (I just bought a couple of CDs from a band who make all their music available for free download, so it must work!)
      I would love it if this were completely true, but this merely means this band got 1 sale -- from 1 person. To clarify: 1 sale from 1 person. I do not think having one sale constitutes this to be working.
      • Re: Open Content (Score:2, Interesting)

        by peddrenth ( 575761 )
        "I would love it if this were completely true, but this merely means this band got 1 sale -- from 1 person"

        I like your logic. So obviously the record store is failing when I go in and buy my one CD. From one person. My one CD from one person. To clarify, only one CD from only one person. The record shop is obviously failing.

        Yeah right.

        No, it means that MP3.com is succeeding, that Aura are succeeding, and that Faithless are shafted (who made the CDs I wanted, but which I'm not gonna buy with the current state of the record industry, and their political representatives)

        Here's a hint: when you have lots of people buying one thing each, you make lots of money. Find a maths book. Revise the chapter on multiplication.
        • Clearly, you have misinterpreted what I meant. I understand a sale is a sale no matter how small or how big. I'm sure the band is thrilled for one person to one album. But, Never-the-less, mp3.com's most popular bands don't sell anywhere near what would be acceptable for most bands to make a living. Yes it's a step in the right direction, they get 100% of the money returned from cd sales minus the cost of the studio and cd production cost etc. It is just an over statement to say
          I just bought a couple of CDs from a band who make all their music available for free download, so it must work!
          Then again -- it depends what you mean by working.
  • by ausoleil ( 322752 ) on Tuesday May 21, 2002 @07:11AM (#3557334) Homepage
    ...longbeards can remember the "good ole days" where the free flow of ideas and not making money were what made the pre-commodity internet a very worthwhile place to be. Everyone was expected to contribute their resources for the benefit of all, and none of it was (apparently) designed to make help smartass b-school dropout come up with enough cash to buy a 4,000 square foot "bungalo" in Palo Alto.

    Count me in.
    • When the Internet was available only to career college students and others feeding at the public trough. The internet was paid for by public tax money and corporate subsidies, but unavailable to most people.

      The good old days really weren't so good. It's kind of funny, though, listening to some so-called old-timers constantly whinging about the commercialization of the 'net. Do they really think the huge advance in capabilities would have come about without the economic incentive? Sure, the early days were inventive. They invented the bricks and mortar of the Internet. But the commerce guys have driven the construction of cathedrals, roads, libraries and schools with those bricks.

      • This thread is off-topic, so it's AC time.
        Generally, the reality is in-between these two posts.
        I just want to chip in on the record with the note that the longbeard crowd didn't go in and start busting heads, throwing fits and crying about conspiracies when the Bill Gates clones closed in on the net like it was a patented Microsoft trademark. Sure, people talked shit and got mad and perhaps some people might have even been a little paranoid about consipiracies, but there was no violence. It will be nice to see similar restraint from the blue shirt khaki pants crowd when the wheel completes a revolution and breaks the back of the Evil Empire.
        Signed,
        Some dude who's pay depends on Microsoft and will still be glad to see them complete their fall

        P.S. they're already off 70% from 12/99, check the charts yourself. The end is near. Ding dong the witch is dead.
      • But the commerce guys have driven the construction of cathedrals, roads, libraries and schools...

        I prefer to think that they've built the strip malls, porn shops, and phone survey firms that we all so enjoy.

        --Jeremy
        • Perhaps. There is always an ugly side to commerce. as a community matures, these sorts of things change, move away, become someone else's problem. This is still a very young community, with parameters no one truly understands. It's much more complicated than any other society.
      • When the Internet was available only to career college students and others feeding at the
        public trough. The internet was paid for by public tax money and corporate subsidies, but
        unavailable to most people.


        Yeah! students will be the first with their backs to the wall, when the revolution comes!

        Jealous much? No need to have a hissy fit, just because you weren't there for the first wave of the 'net.
        I am not an old timer, but having been on the Internet since 1992, can easily see the
        how polluted it has become.
  • "...download open source and public domain content from multiple peers and mirrors in parallel."

    These usefull features have already been implemented in KaZaA [www.kazaa.com] [kazaa.com].
    • You know how Download Accelerator lets you get files faster by downloading different parts of the same file via multiple simultaneous connections. That is what this is about, except the Swarmcast guys have each connection going to a different person rather than tons of connections going to the same server.

    • Not really, KaZaA and similar generally just download the file from different sources, but start the download at different places.

      Swarmcast use FEC coding of the file so it's more efficient when multiple people download the same file. (The people downloading can share the file as well.) It's /not/ as trivial as downloading a file with different offsets.
  • by inkfox ( 580440 ) on Tuesday May 21, 2002 @07:14AM (#3557343) Homepage
    While this is a great concept, it scares me a bit.

    I'm fully expecting that if we ever reach a point where a substantial percentage of users' traffic becomes outbound traffic, the cablemodem and DSL providers are going to start to rethink the current pricing and service packages.

    How long before we find ourselves NATted away, able to originate connections only? A few cablemodem providers have already done this to reduce the traffic from file sharing and to knock out code red and other such silliness. And each time a major ISP does this, it leaves a slightly smaller number of other ISPs providing the outbound service, causing the traffic on the holdout systems to rise.

    At some point it's going to snowball, and most of us are going to find ourselves NATted away, with only those paying premium prices for real IP addresses getting the priviledge of having their uplink monopolized by strangers.

    • Welcome to capitalism.

      There's your niche, go earn top dollar.
    • >At some point it's going to snowball, and most of us are going to find ourselves NATted away, with only those paying premium prices for real IP addresses getting the priviledge of having their uplink monopolized by strangers.

      Close, but no cigar.

      There's no reason why a NATed box can't "upload" a file. The client simply needs to send a push-type request to the NATed server, rather than trying to pull the file from it.

      This, of course, requires the client to have a real, non-NATed IP address. And this means the client will have access to more software in exchange for a higher fee per month.

      Sure, you could be nice and use some of that bandwidth for sharing with the people who chose not to pay for the higher service level, but you would still benefit largely from it.
      • True, however sharing from a NAT connection to another NAT is problematic since there is no way for one of the parties to address the other (as is required for a peer to peer connection). If only one of the parties is NATed, it can still initiate connections with a regular connection (this is how Kazaa and gnutella clients allow uploading for clients behind a firewall). However nobody can initiate a connection with the NATed box because it doesn't have an address. Thus if the majority of users is behind NAT, that would effectively kill p2p networks because it would be hard to establish a connection between the majority of nodes in the network.

        However, broadband providers have an interest in p2p since it is a major reason for their clients to have broadband in the first place. A cheap modem connection will handle mail and instant messaging pretty effectively. Only when you start downloading mp3/movies/... you need the bandwidth they offer. Healthy p2p networks create a demand for broadband.

        My hope is that as ipv4 addresses get scarcer, adoption of ipv6 will finally happen. This would largely remove the need for NAT.
    • NAT alone is not an effective method of preventing people from using p2p programs. All it does is prevent incoming TCP connections, so as long as someone in the network (well, some reasonable minority of peers) can get incoming connections to bootstrap people into the network, everyone can still comminicate despite the inability to get new incoming connections.

      Good NAT bypassing is annoying to program (in the extreme case, it requires implementing something like TCP over UDP) but it's not a huge techncal hurdle, the main reason it's not commonly done is because too few people have hostile NATs for it to be worth the effort.

      --
      Benjamin Coates
    • If my service provider went to NAT's, I would screem at them to upgrade there networks to IPV6 so that they had no excuses to NAT me when IPV6 is commonplace.
      • If my service provider went to NAT's, I would screem at them to upgrade there networks to IPV6 so that they had no excuses to NAT me when IPV6 is commonplace.
        The purpose of the NAT would be to prevent inbound traffic, not to conserve address space.

        Still, to your point -- do any versions of Windows ship with IPv6 enabled by default? I think that's the true test of how ready business is for it.

        • i think Windows XP supports IPv6 by default, and windows 2000 supports it.

          NAT's also means that you need less global address space, which makes things a bit cheeper to run.

        • I keep on hearing this, but the truth is the net was rolling back when you had to obtain a 3rd party TCP/IP stack to make Win 3.1 work.

          If the net went IP6, Microsoft would either release a patch for WinME,2k,XP, and I bet some 3rd party would offer a d/l to support 95/98 (maybe even 3.1!)
      • Sounds nice, but to site another point: What happens when they get somebody who knows how to do packet filtering? Drop all inbound SYN packets and...
  • "The system is designed to augment the existing mirrors with bandwidth from the p2p network and should eliminate the "Slashdot Effect" for popular open source content."

    Hopefully you can configure the # of connections, otherwise we will finally be able to get slashdotted from the comfort of our own homes :-)
  • by Rogerborg ( 306625 ) on Tuesday May 21, 2002 @07:17AM (#3557356) Homepage

    The problem being that people are bastards.

    • "The Open Content Network will work with the Creative Commons [creativecommons.org] to use their machine-readable licenses to automatically identify open source and public domain content to be distributed through the OCN"

    Why is this a problem? Well, what's to stop an ignorant or malicious individual wrapping up some content with an CC complaint license and injecting it into OCR?

    I'm thinking of:

    • Advertising porn with embedded html links that pops up adverts (gnutella is rotten with this stuff).
    • Virii.
    • Other people's copyrighted content.

    Why would anyone do this last one? Pure malice, to open OCN up to DMCA attack, simply because people (as I said) are bastards, and can't be trusted to behave in a rational civilised fashion. OCN will be a trusted network, and that leaves it open to abuse. I really hope that an actual trustable human will vet everything injected into it.

    • One major problem i can see is that you'll have a few "unauthorised" mp3's, and the decss.exe file, and it'll be sued to hell and back by the MPAA and the RIAA.

      People cannot be trusted. Maybe some sort of signup, registered usage is needed. Though given the caution of most OS people, that won't happen.
    • Yes, if people use it like they should. The idea is great, but will this have the same effects as Kazaa b**** ?
    • Content should be signed, and certificates provided by the network.

      Signing something says, I have given my permission to.... This places responsibility on someone for any copyright violations.

      The network operators can kinda identify who they issued the certificate to.

      Digitally signing provides a checksum.
      • The network operators can kinda identify who they issued the certificate to.

        Which means the network operators will have to make deals with notary public offices in every major metropolitan area in all 180-odd independent countries order to be able to certify that people are who they say they are. This can become expensive, and the total cost of maintaining a certificate may rise up to $200 per cert per year, making this situation no better than the SSL cert situation.

        • Except that you don't actually have to do that. At the top are a few folks who are trusted by the root. kernel, kde, xfree86, gnome, etc. Each entity gets a certificate that it uses only for it's "products". Now if other people, like the developers of gnometris want to get their stuff out on the network, then they ask for a cert from the gnome folks (they don't have to but it would make more sense to ask them than to ask the kde folks). Since the gnome folks know and trust the gnometris folks, they give them a cert, but it isn't a 1st level cert, it's a 2nd level cert, and when you try to download gnometris from the OCN, then before you approve the download, you take a look at the line of authority to see if you trust everyone on that list.
          • Our initial approach is actually much simpler. We will simply certify a number of domains that are trusted to host mostly open source content, such as kernel.org, debian.org, etc, and use a network of metadata proxies that extract the secure hash information for file verification. Idealy each of the sites will run their own meta-data proxy so that the secure hash information is trustworthy.
        • Stik the cert on a floppy and post it to them (charge a nominal amount for the P&P).

          At least you can say in your defence, they were at this address, then you find them yourself.
    • all content sent up must be signed buy the uploader...soooooo.....if an MP3 leaks out onto the p2 service, you will get the feds knocking on your door is some one takes notice.

      I think that is a law is passed, it should not be a technocratic law, but one that mandates content signing by all uploaded material over p2p....that would be easier to enforce while at the same time, not infringing on the folks who follow the laws.
    • Actually the OCN should be safe from attack. The reason is that from the get go is intended to distribute content that is totally legal to distribute. Now certainly some may abuse it, but the problem that Napster ran into was that it so clearly built it's business model on make a profit from contributing to piracy.

      The courts are unlikely to shut down a network like this that makes a good faith effort to be legitimate. Most other P2P services establish themselves as trading points for all manner of illegal content. They try to cover this up to look good to the courts but there's no doubt that Kazaa, etc, wouldn't be this popular were it not for piracy.
  • Recently some Mame devs (www.mame.net) have been working on some Cojag drivers(cojag is an atari arcade system that uses harddrives - Area51 is a cojag game).

    Someone made compressed harddrive images that mame will eventually require. Dispite compression, two of the images were half a gig and one was a gigabyte. The guy who was distributing these files used swarmcast to prevent getting swamped.

    It worked pretty well in that tons of people were able to download those huge files without killing servers. However, swarmcast is new enough that swarmcast itself had some server problems. The server had to use an older version of swarmcast to be stable. That pretty much fixed the problem. It used to be that just hoasting ~40meg neogeo roms was nearly impossible, now it is possable to host half-gig files.
  • Fools, little do they realize the powers they are dealing with!
  • such a good move? (Score:4, Insightful)

    by tps12 ( 105590 ) on Tuesday May 21, 2002 @07:23AM (#3557384) Homepage Journal
    Okay, I'm sure I'm with the rest of the slashdot communisty when I say that my first reaction was "wow, awesome, score another 3 points for Open Source and freedom."

    But I've reconsidered. Before you mod me down, please read what I have to say.

    Basically, we are talking about P2P filesharing here. Now remember, other P2P services, like Napster, Gnutella, and IRC, were all originally based on good, sound, legal, moral ideals. But in the course of time, they each became corrupt with those who would use the infrastructure for illegal filesharing and copyright infringement.

    Now, I don't want to throw the baby out with the proverbial bathwater. And I don't want to get rid of a useful tool because of a potential for abuse, since by that logic we would not have silverware, cars, or handguns. But we in the Open Source community need to ask ourselves, is now the time when we want to risk associating Linux, *BSD, and Open Source with illegal activities? Don't we have enough anti-hacker rhetoric to fight against?

    We need to pick our battles. This isn't one of them.
    • Excellent point.

      The threat to filesharing as a technology comes from the rights-holders and from the legal system. Precident is being set all the time which threatens ISPs with liability for illegal activities on their network that they are "made aware of", DMCA notice and takedown letter or not.

      Its much more scary in the UK and in Canada. Canada just passed new legislation which will make ISPs vulnerable for distribution of child pornography on their network. SOCAN Copyright Tariff 22 was just it through the Court of Appeals and makes ISPs liable for infringing material stored on their "cache servers".

      P2P technology may by-pass these, but it is only a matter of time before some powerful organization convinces some judges that ISPs should be held liable for allowing P2P on their network. Blocking of ports, account terminations, and worse are all coming if the rights-holders have their way... and technology such as swarmcast which acts to distribution free software will likely get lumped in with the Napter/Kazaa/Foo P2P technologies.

      • it is only a matter of time before some powerful organization convinces some judges that ISPs should be held liable for allowing P2P on their network... technology such as swarmcast which acts to distribution free software will likely get lumped in with the Napter/Kazaa/Foo P2P technologies.

        Surely an important weapon against knee-jerk blanket bans would be for there to be a well-known, respectable, law-abiding P2P network?

        • Sure, further litigation. What a nice waste of money.. thats the problem with dumb laws, it costs a lot to counter them. Common sense is free, but law != common sense.
    • Now remember, other P2P services, like Napster, Gnutella, and IRC, were all originally based on good, sound, legal, moral ideals. But in the course of time, they each became corrupt with those who would use the infrastructure for illegal filesharing and copyright infringement.

      Nonsense. The original purpose of both Napster and Gnutella was to enable the sharing of copyrighted music. That's about all they've ever been used for (well, Gnutella and Gnutella-like networks have since branched out into other forms of mostly copyrighted content)

      This, OTOH, appears to be primarily designed to let people pool bandwidth, which is both legal and useful, and since bandwith costs are a big problem for the distribution of free content, it's entirely a good thing.

      --
      Benjamin Coates
    • Of the three P2P networks you mention, well really 2 since IRC is not P2P, Gnutella and especially Napster were basically created with the transfer of illegal content in mind. There are certain steps you can take to control what type of content is published to a P2P network. The easiest way to do this is have an activity log & authentication, but OSS folks would never go for that. A good example of a 100% legal public P2P network is the Furthur Network [furthurnet.com] which has managed to stay legal by limiting what people can share by specific bands that allow taping. This system could possibly do the same thing by only allowing certain files to be shared and implementing cheksums so when you downloading a deb package your not getting goatse.mpg.
    • Yes and that internet thing is full of porn and paedos, we shouldn't be using that...

      What if Sun had had the same worries about JAXP?

      This is a redundant argument made many times before, all technology can be used for good and bad, how this got modded up so far I have no idea.

      I would bet that in two years, the majority of popular downloads will be delivered with P2P.
  • by ipmcc ( 466386 ) on Tuesday May 21, 2002 @07:47AM (#3557492) Homepage Journal
    If the goal here is really to eliminate the "Slashdot Effect" a much more effective solution would be to set up a network of load-balanced caching proxies on geographically distributed fat pipes.

    Some will argue that this is in essence what a P2P network is, but why not do it right, using technology we already have that everyone can use(squid.)

    Other users' comments regarding the cumulative effects of NAT on P2P networks are incredibly apropos.

    But realistically, theres nothing I love more than when the story submitter posts a link to a Google cached version of the content he's posting. We're an agressive bunch and that calls for aggressive measures :)
    • If the goal here is really to eliminate the "Slashdot Effect" a much more effective solution would be to set up a network of load-balanced caching proxies on geographically distributed fat pipes.

      Who pays for all that equipment and bandwidth? The idea here is not to solve problems by throwing resources at a problem, but rather to solve them by using existing resources as effectively as possible. The technology involved can be applied to any resource base. The technology-intensive approach using almost-zero-cost resources might well make significant headway against the Slashdot Effect, even if you still think your capital-intensive approach based on older technology is even better.

      Another factor you seem to've overlooked is that software like CAW or BitTorrent are distributed for reasons beyond scalability. For example, consider the inherent attack-resistance characteristics of a highly distributed P2P network, vs. your centrally-administered servers. There are other goals as well, such as avoiding legal culpability or financial dependence on corporate benefactors to provide the systems and bandwidth. Whether you agree or disagree with those goals, the fact remains that many people believe in them. Networks like you describe are old hat, dozens have been deployed already, and yet a lot of people still want something different. You've proposed a solution to a different problem than the one Onion Networks et al seek to solve. There's a term for that; we call it missing the point.

    • If the goal here is really to eliminate the "Slashdot Effect" a much more effective solution would be to set up a network of load-balanced caching proxies on geographically distributed fat pipes.

      I started a project to help distribute the load. So far I've written code to pull out all the links from each Slashdot story. What's left is to cache those links, then transform the Slashdot main page HTML so that it points to the cached versions.

      If anyone's interested in taking this further, you can find the Perl code here [thingone.info] .

      Then just set up some machines (say, 3 to start just to test it), and cache the main page to all three machines, generating different HTML pages for each machine. When a new request comes in, round-robin it to the 3 machines. (Better algorithms can come later.)

      I agree with what you say about geographically distributing them, though, which would need to be handled by an entity with money (i.e., VA?).

  • Wheres the code (Score:2, Insightful)

    by nervlord1 ( 529523 )
    In those now famous words, wheres the code?

    Im sorry i wish i could say im excited, its certianly a VERY good idea, and noe in desprate need of realising, but untill i see the code, its just more hype.

    Certianly be a great way for non coders to contribute though, so many times my linux friends say "oh id love to contribute to open source but i can't code", this would definatly be one way, and one which requires very little effort too.
  • The main problem I encounter with p2p networks is that the same file with 2 different names will be considered as 2 different files.

    This is why the file sharing system only works well with audio and video files and not software files.

    Hence, such an open content network should include an advanced file recognition system with some sort of checksum or whatever : a blend of p2p and mirrors
    • From the website:

      Tree Hash EXchange format (THEX) - Coming 05/2002. This document defines a serialization and interchange format for Merkle Hash Trees. These hash trees allow very efficient, fine-grained integrity checking of content in a distributed network.

      It seems that files will be referenced by their hash and thus ensure that data has not been corrupted, and also in this manner will eliminate the "renaming files changes contents" thing that many P2P networks seem to believe in.

      Of course, Freenet [freenetproject.org] does this and more -- and already works -- so why not use it? Integrity checking, intelligent caching, and high anonymity to boot.

  • by Anonymous Coward
    Content-Addressable Web
    Content Distribution Networks (CDNs), such as Akamai, have shown that significant improvements can be made in throughput, latency, and scalability when content is distributed throughout the network and delivered from the edge. Likewise, peer-to-peer systems such as Napster and Gnutella have shown that normal desktop PCs can serve up enormous amounts of content with zero administration. And more recently, systems like Swarmcast have been introduced that combine the CDN and peer-to-peer concepts to gain the benefits of both. The goal of the Content-Addressable Web is to enable these advanced content location and distribution services with standard web servers, caches, and browsers.

    The main benefits of the Content-Addressable Web are:

    Throughput - Browsers will be able to download content from multiple sources in parallel

    Bandwidth Savings - Browsers will automatically discover and select the closest mirror for a piece of content.
    Fault Tolerance - Even if a site goes down in the middle of a download, browsers will automatically locate another mirror and continue downloading.
    Scalability - Any number of machines may be added to the network, creating a CDN ad hoc, with very little administration.
    Security - Browsers will be able to safely download content from untrusted mirrors without risk of corruption or viruses.
    The full paper describing the "HTTP Extensions for a Content-Addressable Web" is available here.

    The goal of the Content-Addressable Web (CAW) is to enable the creation of advanced content location and distribution services over HTTP. The use of content addressing allows advanced caching techniques to be employed, and sets the foundation for creating ad hoc Content Distribution Networks (CDNs). This document specifies HTTP extensions that bridge the current location-based Web with the Content-Addressable Web.

    1. Introduction
    Content Distribution Networks (CDNs), such as Akamai, have shown that significant improvements can be made in throughput, latency, and scalability when content is distributed throughout the network and delivered from the edge. Likewise, peer-to-peer systems such as Napster and Gnutella have shown that normal desktop PCs can serve up enormous amounts of content with zero administration. And more recently, systems like Swarmcast have been introduced that combine the CDN and peer-to-peer concepts to gain the benefits of both. The goal of the Content-Addressable Web is to enable these advanced content location and distribution services with standard web servers, caches, and browsers.

    There are a number of short-comings of current web architecture that the Content-Addressable Web aims to overcome. These include discovering optimal replicas, downloading from untrusted caches, and distributing content across the Transient Web.

    1.1 Optimal Replicas
    There are currently no mechanisms within HTTP that allows a user-agent to discover an optimal replica for a piece of content. This problem is due to the fact that HTTP caching practice assumes a hierarchical caching structure where each user has a single parent cache. Thus while one can discover an object's source URI from a cached copy, there is no mechanism to discover a list of replica locations from the source. This problem is evidenced by the fact that users must manually select the closest mirrors when downloading from Tucows, FilePlanet, or the various Linux distributions. The CAW solves this problem by providing distributed URI resolvers that user-agents can query to find an optimal replica.

    1.2 Untrusted Caches
    It is currently unsafe to download web objects from an untrusted cache or mirror because they can modify/corrupt the content at will. This becomes particularly problematic when trying to create public cooperative caching systems. This isn't a problem for private CDNs, like Akamai, where all of their servers are under Akamai's control and are assumed to be secure. But for a public CDN, the goal is to allow user-agents to retrieve content from completely untrusted hosts but be assured that they are receiving the content intact. The CAW solves this problem by using content addressing that includes integrity checking information.

    1.3 Transient Web
    The Transient Web is a relatively new phenomenon that is growing in size and importance. It is embodied by peer-to-peer systems such as Gnutella, and is characterized by unreliable hosts with rapidly changing locations and content. These characteristics make location-based addresses within the Transient Web quite brittle. Even if traditional HTTP caching was widely leveraged within the Transient Web, the situation wouldn't be helped much. This is because a single piece of content will often be available under many different URIs, which creates disjoint and inefficient caching hierarchies.

    This multiplicity of URIs occurs for a couple of reasons:

    The original source for a piece of content will often cease to exist or the source's URI will change.
    Multiple independent sources often introduce the same content into the network.
    Most applications and file manipulation tools will tend to "forget" the source URI of a piece of content.

    This URI multiplicity can also occur in the normal web, although it is RECOMMENDED that caching semantics be used when an authoritative source is known. The CAW solves the above problems by providing content-specific URIs that are location-independent and can be independently generated by any host. Additionally, various URI resolution services work in coordination to resolve issues associated with having multiple URIs for a web object.

    2. Scope
    The HTTP extensions for CAW are intended to be used for in the above scenarios where HTTP is currently lacking. This technology is focused on mostly static content that can benefit from advanced content distribution services. The extensions are intended to be hidden under the hood of web servers, caches, and browsers and should change nothing as far as end users are concerned. So even though a new URN scheme is introduced, there are very few situations where a human will ever interact with those URNs.

    One of the more interesting applications of the Content-Addressable Web is the creation of ad hoc Content Distribution Networks. In such networks, receivers can crawl across the network, searching for optimal replicas, and then downloading content from multiple replicas in parallel. After a host has downloaded the content, it then advertises itself as a replica, automatically becoming a part of the CDN.

    3. Content Addressing
    This specification introduces a URI scheme with many interesting capabilities for solving the problems discussed earlier. A particularly useful class of URI schemes are "Self-Verifiable URIs". These are URIs with which the URI itself can be used to verify that the content has been received intact. We also want URIs that are content-specific and can be independently generated by any host with the content. Finally, to show the intent that these addresses are location-independent, a URN scheme will be used.

    Cryptographic hashes of the content provide the capabilities that we are looking for. For example we can take the SHA-1 hash of a piece of content and then encode it using Base32 to provide the following URN.

    urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB

    Implementations MUST support SHA-1 URNs at minimum.([footnote] A future version of this document will also specify a URN format for performing streaming and random-access verification using Merkle Hash Trees.)

    Receivers MUST verify self-verifiable URIs if any part of the content is retrieved from a potentially untrusted source.

    4. HTTP Extensions
    In order to provide a bridge between the location-based Web and the Content-Addressable Web, a few HTTP extensions must be introduced. The nature of these extensions is that they need not be widely deployed in order to be useful. They are specifically designed to allow for proxying for hosts that are not CAW-aware.

    The following HTTP extensions are based off of the conventions defined in RFC 2169. It is RECOMMENDED that implementers of this specification also implement RFC 2169.
    The HTTP headers defined in this specification are all response headers. No additional request headers are specified by this document.
    It is RECOMMENDED that implementers of this specification use an HTTP/1.1 implementation compliant with RFC 2616.

    4.1 X-Content-URN
    The X-Content-URN entity-header field provides one or more URNs that uniquely identify the entity-body. The URN is based on the content of the entity-body and any content-coding that has been applied, but not including any transfer-encoding applied to the message-body. For example:

    X-Content-URN: urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB

    4.2 X-URI-RES
    The X-URI-RES header is based off of conventions defined in RFC 2169 and provides a number of flexible URI resolution services. These headers provide various ways of locating other content replicas, including additional sources for a multiple-source download. One can also build an application that crawls across the resolution services searching for an optimal replica. Many other uses can be imagined beyond those given in this specification. The general form of the header is as follows:

    X-URI-RES: ; [; target uri]

    The service URI specifies the URI of the resolution service. It is not necessary for the service URI to conform to "/uri-res/ ?" convention specified in RFC 2169.
    The service type identifies what type of resolution is being performed and how to interpret the results from the service URI. The types are those defined in RFC 2169 and include "N2L", "N2Ls", "N2R", "N2Rs", "N2C", "N2Cs", "N2Ns", "L2Ns", "L2Ls", and "L2C".
    The target URI is the URI upon which the resolution service will be performed. The target URI can be any URI and is specifically not limited to the URI specified by the X-Content-URN header. If there is only a single X-Content-URN value, the target URI can be left off to imply that the X-Content-URN value is to be resolved.
    It is RECOMMENDED that receivers assume that the URI resolver services are potentially untrusted and should verify all content retrieved using a resolver's services.
    It is believed that N2R, N2L, and N2Ls will be the most useful services for the Content-Addressable Web, so we will cover examples of those explicitly.

    4.3 N2R
    The N2R URIs directly specify mirrors for the content addressed by the URN and can be useful for multi-source downloads. For example:

    X-URI-RES: http://urnresolver.com/uri-res/N2R?urn:sha1:; N2R

    or

    X-URI-RES: http://untrustedmirror.com/pub/file.zip; N2R

    The key difference between these headers and something like the Location header is that the URIs specified by this header should be assumed to be untrusted.

    4.4 N2L and N2Ls
    These headers are used when other hosts provide URLs where the content is mirrored. This is most useful in ad hoc CDNs where mirrors may maintain lists of other mirrors. Browsers can simply crawl across the networks, recursively dereferencing N2L(s). For example:

    X-URI-RES: http://urnresolver.com/uri-res/N2L?urn:sha1:; N2L

    and

    X-URI-RES: http://untrustedmirror.com/pub/file-mirrors.list; N2Ls; urn:sha1

    For the N2Ls service, it is RECOMMENDED that the result conform to the text/uri-list media type specified in RFC 2169.

    4.5 Proxies and Redirectors
    It is useful to allow CAW-aware proxies that provide content-addressing information without modifying the original web server. This allows CAW-aware user-agents to take advantage of the headers, while simply redirecting user-agents that don't understand the Content-Addressable Web. It would be inappropriate to return an X-Content-URN header during a redirect, because HTTP 3xx responses often still include a message-body that explains that a redirect is taking place. Instead it is preferred to return a result of the text/uri-list media type that includes one or more URNs that would normally reside in the X-Content-URN header.

    4.6 Example Application
    The above HTTP extensions are deceptively simple and it may not be readily apparent how powerful they are. We will discuss an example application that will take advantage of a few of the features provided by the extensions.

    In this example we will will look at how the CAW could help at linuxiso.org where ISO CD-ROM images of the various linux distributions are kept. The first step will be to issue a GET request for the content:

    GET /pub/Redhat-7.1-i386-disc1.iso HTTP/1.1
    Host: www.linuxiso.org

    The abbreviated response:
    HTTP/1.1 200 OK
    Content-Type: Application/octet-stream
    Content-Length: 662072345
    X-Content-URN: urn:sha1:RMUVHIRSGUU3VU7FJWRAKW3YWG2S2RFB
    X-URI-R ES: http://www.linuxmirrors.com/pub/Redhat-7.1i386-dis c1.iso ; N2R
    X-URI-RES: http://123.24.24.21:8080/uri-res/N2R?urn:sha1:; N2R
    X-URI-RES: http://123.24.24.21:8080/uri-res/N2Ls?urn:sha1:; N2Ls

    With this response, a CAW aware browser can immediately begin downloading the content from www.linuxiso.org, linuxmirrors.com, and 123.24.24.21 all in parallel. At the same time the browser can be dereferencing the N2Ls service at 123.24.24.21 to discover more mirrors for the content.

    The existence of the 123...21 host is meant to represent a member of an ad hoc CDN, perhaps the personal computer of a linux advocate that just downloaded the ISO and wants to share their bandwidth with others. By dereferencing the N2Ls, even more ad hoc nodes could be discovered.

    4.7 Replica Advertisement
    The URI-RES framework provides a significant amount of flexibility in how replica advertisement and discovery can be implemented. One example implementation will be provided in a future specification.

    4.8 Acknowldgements
    Gordon Mohr (gojomo@bitzi.com), Tony Kimball (alk@pobox.com), Mark Baker (distobj@acm.org)
  • This is one of those things you hear about that is so simplistic, so necessary, and staring you right in the face, that you stop and say:

    "Now, why in the hell didn't I think of that?"
  • I mentioned something like that yesterday [slashdot.org]
    • Because, most people who download from P2P don't bother to check gpg signatures (or checksum if you prefer). If people start using P2P networks to download executable code (either binary or source code), it's going to make Outlook Express look like a securely designed e-mail client. All existing P2P network are desinged to deliever content, where security is not a large concern. This proposes a system to deliver code, which requires a tie-in to a master server that verifies authenticity of files on behalf of the clients.

  • Well, for one example, the new RedHat 7.3 .iso files have MD5 sums embedded in them. From the boot prompt, type "linux mediacheck" and it will prompt for a disk to be validated.

    A feature to take detatched/attached MD5 sums, GPG signatures or the like could be pretty easily added in.

    You're right, it is needed.
  • Okay, how about:

    split -b 65m filename.iso filename.iso.

    breaking the 650+ Mb iso in about ten 65 Mb chunks with the suffixes .aa-.aj.

    Share them on Gnutella, KaZaA and any other P2P services.

    Once downloaded, cat all the files together into one and check the MD5 sum (also downloaded, or embedded like RedHat 7.3 does).
    • You forgot the step: looking through all of your files to make sure you've got every extension, finding .ah when you figure out you're missing it, doing the MD5 sum, realizing that you're missing 1 meg of .ad, getting .ad again, etc. etc. Monkeywork like this is what computers are *for*
  • Read the line above. Now, somebody be ambitious and make a package for everything freenet needs in one RPM / deb package and put up a system to make it known. Now we play...
    • I continue to think that Freenet is "launched vaporware". Yeah, you can download it, but it won't work (for some definition of "work" including downloading actual files over a meg).

      -glenn
    • Throughput - Browsers will be able to download content from multiple sources in parallel

      Bandwidth Savings - Browsers will automatically discover and select the closest mirror for a piece of content.
      Fault Tolerance - Even if a site goes down in the middle of a download, browsers will automatically locate another mirror and continue downloading.
      Scalability - Any number of machines may be added to the network, creating a CDN ad hoc, with very little administration.
      Security - Browsers will be able to safely download content from untrusted mirrors without risk of corruption or viruses.

      Right on target. Freenet [freenetproject.org] accomplishes these goals, and actually works right now. Freenet is essentially an anonymous, distributed caching system into which anyone can insert data and retrieve it later. It supports both locating information by content hashes or by a human-readable redirect, as well as lots of really cool features like anonymous websites ("freesites"). So... what are you waiting for? Install Freenet today!

      </plug>

      • Freenet has intentional features that an open content network would see as bugs. It has too much freedom-without-accountability.

        Freenet users don't have control over what content they mirror. So while you download file x and want to share it and reduce the load on other servers, they system in its inscrutable wisdom actually has you sharing file y, just because it's something that a lot of other people want.

        If the network is used in a manner that either The Man with the guns or your own conscience doesn't like, and it is decided that you must stop contributing to the abuse, the only thing you do is completely drop your Freenet node. For example, you can't selectively mirror GPLed stuff and not mirror (pick your apocalpyptic horseman) warez, kiddie porn, mafia bookie records, terrorist communications, Microsoft ads, etc.

        I'm not saying that's bad (well, actually, yes I do think that's bad, but that's beside the point), but it's not quite what this other project is for.

        • Nodes don't control what they mirror because that would defeat the log(n) lexical routing system and the intelligent caching system. If you're in an area of the network that is constantly requesting a single key (i.e. the MySlashDottedHomepage freesite), yeah, your node will mirror it over, say, MyBoringUnpopularHomepage. That makes sense, does it not?

          And you are correct -- the only way to be sure you're not helping out the child porn sickos is to shut down your node. But then again, that would prevent the people in China from learning about things their government denied access to or to help distribute the latest kernel tarball. Besides, if you could control (or even knew) what you were mirroring, other people could figure it out, and you could get into legal headaches. As is, there's no way to prove that data is actually on your node without possibly helping that data spread.

          Correct, the open content network and Freenet do have different philosophies -- however, they share similar technical goals and Freenet (unlike the other project) actually does something.
  • by the_2nd_coming ( 444906 ) on Tuesday May 21, 2002 @09:20AM (#3558088) Homepage
    this is what we need to make the open music movment happen.....people will make music, license it as being free to trade, and then folkes will do more and more of it.....who knows, mabye this can become the "good example" needed to show the courts that P2P file sharing can be done with out infringing the rights of others, and even lead to some mainstream artist releasing some music on the system to advertise.
  • Using P2P networks for this kind of caching is something that is long overdue and if these guys can pull it off in a major way then I'm all for it.

    I'm just not sure that I buy the description of the "Open Content Network":

    "the OCN will allow users to download open source and public domain content from multiple peers and mirrors in parallel."

    I presume that it could just as easily be used for copyrighted material and is in no sense different from Napster etc. in its restrictions and potential (read: probable) use.

    Sounds a bit like a PR thing: our network is for Open Source material, if people use it for other things - well that's none of our business.

    Personally, I agree that they should have the right to focus and brand themselves however they want. I also agree that they shouldn't be held liable for the type of files users actually submit (unless they're either actively screening them or branding themselves as the "Illegal Warez Network" or something). I'm just not sure that this approach will help to limit their liability (although I sure hope it does). Or did I miss something and they are proposing some method of ensuring the content meets some guidelines, thus avoiding any of the Napsteresque controversy?

  • by HappyCamper ( 172343 ) on Tuesday May 21, 2002 @09:34AM (#3558193)
    I'd like to draw your attention to the Globe Distribution Network (GDN), like OCN, a content distribution network for freely redistributable software. Its design specifically addresses the problem of deviants abusing the network to distribute other people's copyrighted works and illicit content. In particular, it requires all content published to be digitally traceable to the publisher. If, after publication, someone finds that this content is not free software the content will be removed and its publisher blocked from the network.

    The GDN furthermore offers a scalable solution to the problem of finding the nearest replica (i.e., a scalable URI resolver service in OCN terms), and facilities for dynamically replicating content in areas with many downloaders.

    Publications on the GDN, the underlying Globe middleware, and its initial implementation (BSD license) can be found on http://www.cs.vu.nl/globe [cs.vu.nl]. The best description of the anti-abuse measures of GDN are found in the paper titled ``A Law-Abiding Peer-to-Peer Network for Free-Software Distribution'' published at the IEEE NCA'01 Conference.

  • They stole my name! http://theoretic.com/?action=history&id=Mike/OCN-F AQ [theoretic.com]

    I was thinking about this only 3 or 4 days ago.....

    hehe. Ah well, I'm glad somebody else is doing it really, I have more than enough on my plate right now. Perhaps they should check out the Creative Commons? [creativecommons.org]

  • This is similar to an idea I have had for debian, point to point apt. Clients only need to downoad the packages files from the server with md5sums then whenever i apt-get a package its in my local cache
    machines neer me can get these files faster
    A local debian mirror except its done by apt and i still get my packages list from the debian servers so i know they have the correct md5sums and stuff
    however i suspect somone is already doing this :)
    'piracy' is killed off as the only packages that are mirrored ar the ones listed at the debian servers (some on could create a pkg containing the files and host it on there own server but then they are thepoint of access) the servers still hold the originals but it meens that ordinary users can contribute to debian ( a fantastic dist :)) just by running apt-get and having a net connection
  • There was an application called mojonation, which became mnet that does something along these lines I think?

    When you "publish" something to the mnet it splits the file in X parts and puts those parts on Y servers. Your download is swarmed from these servers. The file stays on the net as long as there are atleast 1 server per block. Those servers also check what blocks are more popular and "purchase" those blocks from other servers to make the file more easily accessable.

    Or so I'm told anyway.

    I think the idea is quite nice.

    You can read more about it here [sourceforge.net].

    wbr

    .haeger
  • I read some comments and I realized that what you need is eDonkey file sharing (here [edonkey2000.com]).

    Files are represented as checksums md5 (no filename confusion). It is free, fast, realiable, secure. Files can be uploaded while being downloaded. This insures that a rare file that is wanted by many people will be distributed as quickly as possible. Support multiple server. This file sharing network is primary used for sharing movies, cd images(appz, games, ...), so it is ideal for large linux distros. Compare downloading movie from kazaa or from edonkey (speed 1 to 50)

    Check out feature list: here [edonkey2000.com]
  • Check out my paper (to appear tomorrow at http://arxiv.org/list/cs/new [arxiv.org] - cs.NI/0205058). From the abstract:
    "We propose centralized algorithm of dat distribution in the unicast p2p network. Good example of such networks are meshes of WWW and FTP mirrors. Simulation of data propogation for different network topologies is performed and it is shown that proposed method performs up to 200% better then common apporaches".
  • BitTorrent (Score:3, Informative)

    by bramcohen ( 567675 ) on Tuesday May 21, 2002 @12:51PM (#3559747)
    BitTorrent [bitconjurer.org] enables downloaders to send pieces to each other when they have an incomplete file, making almost unlimited scaling possible. Simple multi-source downloading can be good for performance, but still is limited by the server's upload capacity.

    We've had several large deployments of files which are a couple hundred megabytes and up, getting sustained downloads of a couple hundred downloaders at once, serving off a dsl line, and it's worked well.

    By the way, BitTorrent, Swarmcast, and OCN all check secure hashes under the hood, so data integrity isn't an issue.

  • I'll probably get modded down for this again, but why not just use Usenet?

    Set up an alt.binaries.geektoys and post all the Videogame Demos, Distros, Open Source Software, Movie Trailers and the like that we're all interested in. You can use RARs, PARs, SFVs, etc to make sure the file is downloaded properly. Then the only issue is making sure the checksum matches up with what you got off the original web site.

    ISPs already carry Usenet, so the infrastructure is set up, and this is definitely a useful, non-infringing use of Usenet.

    "What was I downloading? Why, the latest version of Mandrake!" Sounds good to me...
  • I've been hoping someone would do this for quite a long time now. No more playing with slow mirror ftp sites when I want to get my Linux upgrades.

Professional wrestling: ballet for the common man.

Working...