Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Technology

UDP + Math = Fast File Transfers 449

Wolfger writes: "A new file transfer protocol talked about in EE Times gets data from one place to another without actually transmitting the data..." Well, the information is transmitted, just not in its original form. I think there are a couple of other places working on similar ideas - wasn't there a company using this for a fast file download application? User would go to download a game demo or something, receive pieces from several different places, and knit them together? Wish I could recall the company's name.
This discussion has been archived. No new comments can be posted.

UDP + Math = Fast File Transfers

Comments Filter:
  • Kazaa does that (Score:2, Informative)

    by DOsinga ( 134115 ) <douwe.webfeedback@noSPaM.gmail.com> on Thursday December 13, 2001 @09:55AM (#2698422) Homepage Journal
    User would go to download a game demo or something, receive pieces from several different places, and knit them together?

    The file sharing networks based on fasttrack technology do that. You download a movie or game from different users at the same time. Kazaa stitches it back together.
  • getright (Score:4, Informative)

    by awptic ( 211411 ) <`infinite' `at' `complex.com'> on Thursday December 13, 2001 @09:55AM (#2698425)
    The name of the program michael is referring to is called getright, which can connect to several known mirrors of a file and download seperate fragments from each.
  • Other Applications (Score:2, Informative)

    by Keeper ofthe Keys ( 48750 ) on Thursday December 13, 2001 @09:56AM (#2698428) Homepage

    GetRight uses multiple sites to download from and then pieces them back together.

    http://www.getright.com [getright.com]

  • Flow Control (Score:3, Informative)

    by Detritus ( 11846 ) on Thursday December 13, 2001 @10:00AM (#2698448) Homepage
    You still need some form of flow control or rate limiting, otherwise a large percentage of the UDP packets are going to get dropped. Plus, you have the problem of UDP streams stealing bandwidth from TCP streams on a limited bandwidth link.
  • by Chirs ( 87576 ) on Thursday December 13, 2001 @10:05AM (#2698467)
    Guys, this is nothing like Kazaa. Kazaa will let you download from several sources simultaneously, but only because it just requests different parts of the file from each source. At that point there are still send/ack type protocols in use.

    This technology (from the write-up anyway) uses some kind of proprietary technique to re-map the data into another domain and send the information required to reproduce it. It sounds kind of like sending a waveform as a series of Fourier coefficients rather than as actual data samples. By changing to a different domain, it is possible to send metadata from which the original information can be recreated.

    I have no idea exactly how they would do it, but it doesn't sound completely impossible.

    However, its nothing like Kazaa or GetRight.
  • Swarmcast (Score:4, Informative)

    by Webmonger ( 24302 ) on Thursday December 13, 2001 @10:08AM (#2698480) Homepage
    I believe Michael's talking about OpenCola's Swarmcast [slashdot.org].
  • by Omnifarious ( 11933 ) <eric-slash@nOSpam.omnifarious.org> on Thursday December 13, 2001 @10:10AM (#2698485) Homepage Journal

    The company's name is OpenCola [opencola.com] and the name of the product was SwarmCast. The guy who did SwarmCast, Justin Chapewske, is now at a company he started named Onion Networks [onionnetworks.com]. OpenCola appears to have completely abandon its original Open Source approach to their software.

    Apparently, Justin has taken the GPL portions of Swarmcast and is improving them at Onion Networks.

  • Not a new concept (Score:2, Informative)

    by KarmaBlackballed ( 222917 ) on Thursday December 13, 2001 @10:12AM (#2698495) Homepage Journal
    The quirk is that none of the data is ever transmitted; the receiving end creates its own copy of a file based on a complete set of mathematical equations.

    This is called compression. Everybody is doing it and it has been done before.

    When you download a ZIP file, you are not downloading the content. You are downloading a mathematically transformed version of it. You then translate it back. Modems have been compressing and decrompressing on the fly since the late 1980s.

    Maybe they have a better compression scheme? (Fractal based?) That would be news. Everything else is a distraction.
  • Oops, make that Justin Chapweske. That's what I get for typing out an odd name from memory. :-)

  • my take (Score:5, Informative)

    by bpowell423 ( 208542 ) on Thursday December 13, 2001 @10:17AM (#2698516)
    There have been lots of comments along the lines of, "this is just a novel compression/transmittion scheme". In a way, that looks to be true, but here's my take.

    Judging from this:

    The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols. The receiving box then performs an XOR operation on the symbols to derive the original data.

    It appears to me that the transmitting side generates the symbols (parameters of the equations, I guess) and begins sending them to the receiving side as fast as it can. Apparently there are multiple solutions to the equations that will arrive at the same answer, so when the receiving end has received enough symbols to make it works it says, "stop sending already!" Apparently they're getting their speed because A) things don't have to go in any order (that's how the 'net is supposed to work, right?) and B) Alice and Bob don't have to keep up this conversation: Alice: Hey, Bob, can you send me X? Bob: Okay, are you ready? A: Yes, Go ahead? B: Okay, here it comes. A: I'm waiting. B: Here's the first packet." A: What? That packet didn't make it over. B: Okay, here it is again. A: Okay, I got that packet. B: Good. A: Okay, I'm ready for the second packet. B: Okay, here's the second packet.

    Okay, I had too much fun with the Alice and Bob conversation there. Anyway, it looks like there scheme is compressing things in the form of their equations, and then just sending them in a burst until the receiver is happy.

    Sounds like it might work, but it'll generate a ton of network traffic, I'd bet!

  • Re:Compression? (Score:4, Informative)

    by s20451 ( 410424 ) on Thursday December 13, 2001 @10:20AM (#2698531) Journal

    In essence, is not this the same as file compression? The amount of information is the same (for those, who remember, what Bit is).

    It is more than merely compression. The received data is compressed, which saves transmission time, but this technology is already well known (and the company isn't claiming a compression rate better than entropy [data-compression.com], or anything else silly). The innovation here is the elimination of acknowledgement or ARQ packets. I'm speculating here, but it looks like they are encoding the data by transforming a file into a huge "codeword" -- when the codeword is transmitted, the receiver waits for enough packets to correctly decode the codeword, which results in the recovery of the file. There's no need for ARQ or TCP because transmitting extra codeword elements will automatically correct any errors incurred in transmission.

  • by autocracy ( 192714 ) <(slashdot2007) (at) (storyinmemo.com)> on Thursday December 13, 2001 @10:20AM (#2698536) Homepage
    Just send restart at commands to many different servers, then cat the files onto each other. This is how Dowload Accelerator does it, and Fast Track is the same theory. Programs just take all the mental work out of it.
  • Re:Vectors... (Score:5, Informative)

    by hburch ( 98908 ) on Thursday December 13, 2001 @10:21AM (#2698544)
    Consider the following (almost certainly bad, but workable) scheme:
    • Convert a chunk of the file into an order-k polynomial (use the coefficients of the polynomial to encode the chunk)
    • Send the evaluation of the polynomial at several distinct locations (more than k+1).
    • Receiver gets at least k+1 packets.
    • Using math, it recreates the original polynomial, and thus the chunk.


    Please note that I'm not saying this is a good scheme. It is just an example one, and one that doesn't detail the chunk polynomial conversion, which would be very important. There are several papers describing schemes where people have actually worked at making them tenable.

    Modulo compression, if you want such a system to require only receiving k packets (although you send more than that), the sum of the size of the k packets must be at least the size of the original file (otherwise, you could use such a scheme to compress the file).
  • by null etc. ( 524767 ) on Thursday December 13, 2001 @10:23AM (#2698552)
    Quoted from the article:
    In this case, the Transporter Fountain creates not equations but hundreds of millions of "symbols" which can be used to reconstruct the data. The sending side transmits these symbols until the box on the receiving end confirms that it's collected enough symbols. The receiving box then performs an XOR operation on the symbols to derive the original data.
    So, assuming that each "symbol" is at least one byte, then creating "hundreds of millions" of these symbols would result in hundreds of megabytes of data. Furthermore, the guy quoted 20MB as being a large amount of data to send.

    Conclusion: Only sales & marketing would try to sell a product that turns 20MB into 100MB, sends it via UDP, only in order to have the results XOR'd together.

    Where do they get these people?

  • Re:Link to product (Score:3, Informative)

    by Omnifarious ( 11933 ) <eric-slash@nOSpam.omnifarious.org> on Thursday December 13, 2001 @10:23AM (#2698554) Homepage Journal

    Oh, yes, here's a link to the SourceForge project for SwarmCast [sourceforge.net].

  • Tornado Codes (Score:5, Informative)

    by Jonas Öberg ( 19456 ) <jonas@gnu.org> on Thursday December 13, 2001 @10:24AM (#2698560) Homepage
    While not actually related, John Byers, Michael Luby and Michael Mitzenmacher wrote a paper on using Tornado codes to speed up downloads. Basically, what they propose is clients accessing a file from more than one mirror in parallel and using erasure codes to make the system feedback-free. The abstract:

    Mirror sites enable client requests to be serviced by any of a number of servers, reducing load at individual servers and dispersing network load. Typicall, a client requests service from a single mirror site. We consider enabling a client to access a file from multiple mirror sites in parallel to speed up the download. To eliminate complex client-server negotiations that a straightforward implementation of this approach would require, we develop a feedback-free protocol based on erasure codes. We demonstrate that a protocol using fast Tornado codes can deliver dramatic speedups at the expense of transmitting a moderate number of additional packets into the network. Our scalable solution extends naturally to allow multiple clients to access data from multiple mirror sites imultaneously. Our approach applies naturally to wireless networks and satellite networks as well.

    I don't have the paper in a computer format, but the number is TR-98-021 and John and Michael were both at Berkeley at the time (1998), so it should be fairly easy to find if someone is interested. Doubtlessly, a number of other reports on the subject should also exist that deals with the same problem but with different solutions.
  • Re:Not a new concept (Score:4, Informative)

    by Omnifarious ( 11933 ) <eric-slash@nOSpam.omnifarious.org> on Thursday December 13, 2001 @10:26AM (#2698564) Homepage Journal

    UDP drops packets. What they are saying is they can packetize things in such a way that as soon as you pick up any N packets, you get the file, no matter what. They are also implying that anything less than N packets leaves you gibberish. This is quite different from file compression. It may be related to fractal file compression, but I think it's probably more related to cryptographic key sharing schemes.

  • Re:Vectors... (Score:1, Informative)

    by Izmunuti ( 461052 ) on Thursday December 13, 2001 @10:33AM (#2698606)
    It sounds sort of how RAID (5?) works. Take the original data, expand it with some redundancy and then splat it across 5 drives. Any one drive goes down the data from the other four can be combined to recover the data. The data is "expanded" 5:4. The math to do the recover involves a ton of XOR operations. Some hard drives have nifty hardware support for this to speed things up.

    I bet these guys have something like that: add a bit of redundancy, split things up, blast it out UDP to the destination. The receiving end puts it back in order and tries to recover from missing packets by using the XOR operations. It takes more bandwidth but still may be faster than FTP since it doesn't spend most of it's time doing handshakes.

    I don't get why the box costs 70-150 kilobucks though. Yikes.
  • by buridan ( 122382 ) on Thursday December 13, 2001 @10:39AM (#2698649)
    ok, people, you seem to be far away from your thinking apparatus.
    compression takes data, applies an algorythm to it to generate new data that is representation of the whole.

    contrary to that, this transforms the data to an algorythm that simulates the whole.

    it is a minor point of order, but one worth thinking about because it is theoretically different.

    this is also not distributed, it is point to point, I don't know where the submitter got the distributed model from, I suspect he pulled it out the air, but this is very clearly not that. It requires a machine at each end of the point2point.

    however, logically, all data that is not random, can be represented by equations that are smaller than the data. However, the real load exists in the generation of the equations from the data, and not the reconstruction of the data itself, so for me this seems to be quite a possible project, though i suppose it will take quite a bit optimization.

    one other thing, the article does not say that it uses a standardized technique, and it would be interesting if they did not use standard vector analysis or the like. If they used vectors, then this could just reduce to a system of file conversion from one standard to another. I think it would be far more interesting to be what it says it is supposed to be a file conversion to mathematical simulation of the file.
  • by mfg ( 16466 ) on Thursday December 13, 2001 @10:40AM (#2698653)
    See http://www.digitalfountain.com/getDocument.htm/tec hnology/DF_MetaContentWhitePaper_v4.pdf
  • The RFC (Score:5, Informative)

    by gargle ( 97883 ) on Thursday December 13, 2001 @10:42AM (#2698668) Homepage
    http://www.ietf.org/internet-drafts/draft-ietf-rmt -info-fec-01.txt

    The use of Forward Error Correction in Reliable Multicast

    Enjoy.
  • by Adam J. Richter ( 17693 ) on Thursday December 13, 2001 @10:49AM (#2698700)
    The technique is called Forward Error Correction [google.com]. I don't know much about it, but I know that you can do things like break up a file of N*B bytes into 2N blocks of B bytes each and then be able to reconstruct the file from any N of the 2N blocks. The GPL'ed Mojonation [mojonation.net] system uses it, if I recall correctly, to distribute each 32kB chunk of data into eight 8kB blocks allowing reonstruction from any four of them.
  • by Anonymous Coward on Thursday December 13, 2001 @10:50AM (#2698703)

    While not actually related, John Byers, Michael Luby and Michael Mitzenmacher wrote a paper on using Tornado codes to speed up downloads. Basically, what they propose is clients accessing a file from more than one mirror in parallel and using erasure codes to make the system feedback-free.

    Those "Tornado Codes" you mentioned are coauthored by one of the executives of this company (Luby is CTO).
  • Re:Article is wrong (Score:1, Informative)

    by Anonymous Coward on Thursday December 13, 2001 @10:50AM (#2698707)
    I think the writer is almost right. In FTP, the packets are ordered, but they don't need to arrive in order, since you can reorder them. However if a file consists of 10 packets, you need all ten packets to recover the document. If you use erasure codes, you can take the 10 packets and generate 15 encoded packets. Once you get ANY 10 of the 15 packets, you can calculate the original file. This is cool because any given packet can be lost, so UDP is just fine to transfer each packet.

    Look at google for information about forward error correcting, erasure codes, tornado codes, fcast and/or luigi rizzo. There is more to this topic that PR from business-types :)

    I understand that Microsoft used fcast to multicast nightly builds over a LAN and found the rate limiting step to be the write speed of the hard drives - the math to generate the message is not a serious strain for modern cpu's. If all of us could speak fcast, getting the lastest Linux/*BSD distro via cable modem would go a lot faster.
  • by cloudmaster ( 10662 ) on Thursday December 13, 2001 @10:52AM (#2698717) Homepage Journal
    Since it doesn't use TCP, I'll bet it won't have any problem handling the TCP part of "TCP/IP connections"... The networking end sounds really simple - send the amount of data to expect, wait for confirmation, send all of the data once *without* waiting for confirmation until it's all been sent. That's a lot easier than handling the overhead of tcp, which is the whole problem they're trying to solve.

    BTW, it helps to read the article before posting "insightful" comments. :) There's a nice little demo at http://www.digitalfountain.com/technology/coreTech nology.htm [digitalfountain.com]
  • by bpowell423 ( 208542 ) on Thursday December 13, 2001 @10:55AM (#2698731)
    That's not it. Because of their algorithm, order doesn't matter, and neither do dropped packets. The receiver only needs any n packets, so the transmitter keeps sending until the receiver says it got enough. Then the results are magically XORed to get the original file. So, no, you don't need a sequential number, you don't need to check if all packets arrived, and you don't have to rearrange them.
  • Swarmcast (Score:5, Informative)

    by Orasis ( 23315 ) on Thursday December 13, 2001 @10:59AM (#2698753)
    The "Math" they use is called Forward Error Correction (FEC) and is the same stuff that the Swarmcast distributed download system is based off of (http://sf.net/projects/swarmcast/ [sf.net]).

    I am the creator of Swarmcast, and we at Onion Networks (http://onionnetworks.com/ [onionnetworks.com]) already have a product that can provide file transfers just as fast as Digital Fountain, but ours is 3-5.5% more efficient and much much cheaper.

    On the open source front, we are working on the next generation of the Swarmcast technology which will combine parallel downloads over HTTP, Multicast IP, and UDP (for tunnelling through NAT). We have started work defining a standard for this system called the "Content-Addressable Web" [onionnetworks.com] and hope to see it implemented in everything from Apache to Mozilla.

    Please mod this up, people shouldn't be paying $150k for these types of technologies.
  • by no_such_user ( 196771 ) <jd-slashdot-20071008.dreamallday@com> on Thursday December 13, 2001 @11:01AM (#2698760)
    IANADP (i am not a dsp programmer...), but if I remember correctly, the process of the Fourier transform doesn't reduce data. The transform brings our data into the frequency domain, which, in terms of audio and video data, is where we do all of the tricks to eliminate what we've learned our "ears" can't hear or "eyes" can't see.
  • Tornado codes (Score:5, Informative)

    by srichman ( 231122 ) on Thursday December 13, 2001 @11:08AM (#2698793)
    This technology (from the write-up anyway) uses some kind of proprietary technique to re-map the data into another domain and send the information required to reproduce it. It sounds kind of like sending a waveform as a series of Fourier coefficients rather than as actual data samples.
    Actually, they use [digitalfountain.com] Tornado codes [berkeley.edu] (or a proprietary update thereof), an erasure code. That is, they use forward error correction to encode streaming data or a software distribution over a single (or multiple) client-independent streams of multicast. After a client grabs enough packets, it can reconstruct the source file.
  • by mprinkey ( 1434 ) on Thursday December 13, 2001 @11:16AM (#2698845)
    There is nothing "proprietary" here. The techniques for reliable transmission of digital data over unreliable media has been a central area of EE research for at least three decades. The unreliable media is now UDP instead of broadcast RF or transmission lines.

    Line reliability in the "normal" sense is classified by bit error rates. Here, the analogous rating would be packet dropped per second. So, it seems like a straightforward application of FEC. It is useful for the reasons that they state. TCP transmissions look like this:


    Client: Request Data Packet #00001
    Server: Transmit Date Packet #00001 then wait for Client to say OK or Retransmit.
    Client: Request Data Packet #00002
    Server: Transmit Date Packet #00002 then wait for Client to say OK or Retransmit.
    ....
    Client: Request Data Packet #20343
    Server: Transmit Date Packet #20343 then wait for Client to say OK or Retransmit.
    DONE

    The FEC/UDP form would look like this:

    Client: Request Data FILE
    Server: Transmit FEC Data Packet #00001
    Server: Transmit FEC Data Packet #00002
    ...
    Server: Transmit FEC Data Packet #20343
    Client: OK...I got it. Stop Transmitting.
    DONE

    There is no per-packet dialog between server and client. This removes network latency from the transfer equation. Note that FEC over UDP does require redundant information to be transmitted, so there is no free lunch here, but it is certainly likely to be faster than any TCP/IP implementation.
  • FEC Library (Score:5, Informative)

    by Orasis ( 23315 ) on Thursday December 13, 2001 @11:17AM (#2698854)
    Oh yeah, and you can download our completely patent-free and open source FEC library from here [onionnetworks.com] and build your own Multicast or UDP based download system very quickly (provided you get the flow control right :)

    --
    Justin Chapweske, Onion Networks [onionnetworks.com]
  • by TheSync ( 5291 ) on Thursday December 13, 2001 @11:21AM (#2698873) Journal
    OK folks, here is the "real deal."

    Digital Fountain's core technology is called "meta-content (tm)". The meta-content engine produces packets based on the original content, such that if the receiver receives enough packets (just slightly more than the size of the original content), the original content can be recreated. The neat part is that it doesn't matter which meta-content packets you get. If you need to receive 10 packets, you can get 5, miss 5, get another 5, and it works. Or you can get 1, miss 10, get 9, and it works as well. As long as you receive some 10 packets from the "fountain," you can recreate the original content.

    Why is this cool? Several reasons. Digital Fountain claims that TCP connections with RTT of 200ms and 2% packet loss have a bandwidth limitation of 300kbps, no matter how fast the actual transport channel is. So you just go to town to full bandwidth with UDP to use up the entire channel, and use Digital Fountain technology so it doesn't matter which 2% of packets get lost, just as long as you transmit enough packets to make up for the packet loss.

    OK, why else is this cool? Imagine a Digital Fountain continuously transmitting meta-data on a multicast address. If you want to receive a file, just listen to the multicast address. It doesn't matter when you start listening, just as long as you listen for enough packets to recreate the original file. Multicast file distribution.

    Interestingly enough, Digital Fountain has also pioneered multicast on-demand streaming, but the real secret sauce there is something besides meta-content, but meta-content makes it easier.

    As some people have mentioned, you can use UDP with FEC to achieve some error correction. But meta-content can handle long burst errors, whereas FEC is only appropriate for short, random errors. You can literally unplug the ethernet, wait a while, and plug it back in, and you're still good to go with Digital Fountain, as long as you listen long enough.

    I should mention, DF has something called "Fair Layered Increase Decrease Protocol," or FLID, to keep their UDP burst from blowing away other TCP traffic on the network.

    For more information on the basic Digital Fountain technology, see: A Digital Fountain Approach to Reliable Distribution of Bulk Data [berkeley.edu].
  • NOT NEWS. (Score:5, Informative)

    by tqbf ( 59350 ) on Thursday December 13, 2001 @11:23AM (#2698882) Homepage
    Digital Fountain has been around, with product, for a long time. The technique they are building on for file transfer, has been around even longer. Their protocols are even IETF drafts/standards now.

    The concept of "sending equations" instead of data is extremely well-known. It's called Forward Error Correction (FEC). FEC is a very simple idea: take the source data and encode it with parity data so that you can reconstruct the original message from any N chunks of it. One form of FEC that even your stereo components might already do is Reed-Solomon encoding; you can look this up in CS textbooks. If you Google for "Luigi Rizzo Vandermonde", you'll find a fast, free C library for FEC that you can use in your own applications.

    Digital Fountain was started by Mike Luby, who is something of a luminary in information theory/cryptography. The kernel of their company's IP is "tornado codes", an FEC codec that is leaner and faster than competing codecs. The basis for their protocols, last time I checked, is FLID/DL and RLC. These protocols set up multiple channels (or "sources" if you must), each transmitting random chunks of FEC-encoded files.

    The drawbacks to FEC are that it can take a long time to encode data for transfer, which makes FEC hard for some real-time applications like mail, and that the amount of data transferred is going to be some percentage greater than the original file (owing to parity information). But the drawback to FEC file transfers protocols is much more significant: they aren't TCP-friendly.

    The whole Internet depends on protocols that have built-in congestion responses that mimic those of TCP. Protocols that don't either starve TCP flows, or are starved by them. Protocols with no real congestion response at all rapidly destabilize Internet links by consuming all available resources. Digital Fountain originally targeted multicast media transfer. Congestion avoidance in multicast schemes is still an open research question. Does this protocol really scale?

    More to the point, what's the benefit? There's obvious payoff for FEC in multicast, where backtraffic from ACKs and NAKs quickly overwhelms the group and kills performance. But in unicast-world, where we will all be living for the next decade, TCP and TCP-friendly forward-reliable protocols like SCTP already provide good performance.

    Slow news week at EETimes I guess. Or Digital Fountain just hired a PR firm.

  • by hqm ( 49964 ) on Thursday December 13, 2001 @11:54AM (#2699054)
    You can easily use FEC for burst errors, you just use a cross-interleaved encoding on top of the basic encoding. That's what CD players do and they can eat a 4000 bit long burst error without a hiccup. And CD-ROM encodes another layer on top of CD Audio, and thus can sustain even larger burst errors.
  • Re:NOT NEWS. (Score:2, Informative)

    by adadun ( 267785 ) on Thursday December 13, 2001 @12:10PM (#2699125) Homepage
    The whole Internet depends on protocols that have built-in congestion responses that mimic those of TCP. Protocols that don't either starve TCP flows, or are starved by them. Protocols with no real congestion response at all rapidly destabilize Internet links by consuming all available resources. Digital Fountain originally targeted multicast media transfer. Congestion avoidance in multicast schemes is still an open research question. Does this protocol really scale?

    Yes, their protocol uses TCP-friendly congestion control - read about it in their SIGCOMM paper [acm.org].
  • Not quite like Kazaa (Score:1, Informative)

    by Anonymous Coward on Thursday December 13, 2001 @12:10PM (#2699127)
    If I am not mistaken, they are using FECs to transmit data to get over potential latencies incurred by waiting for TCP ACKs and packet ordering. In this way they worry less about dropped packets since any of the FEC packets can help reconstruct lost bits. In addition, if you kick off transfers from multiple sources, you never have to worry about receiving packets in order, etc...
  • I was a skeptic... (Score:5, Informative)

    by DaoudaW ( 533025 ) on Thursday December 13, 2001 @12:21PM (#2699166)
    For example, consider the transfer of a 1 GB file with an average Internet packet loss rate (2%) and global average RTT (just over 200 msec). For this case TCP has an inherent bandwidth limitation of 300 Kbps regardless of link speeds and thus will take more than 7 hours to transfer the file. A Digital Fountain server sending Meta-Content allows almost complete bandwidth utilization. With Digital Fountain, on a fully utilized 1.5 Mbps (T1) line the transfer of the 1 GB file will take about 1.5 hours and on a fully utilized 45 Mbps (T3) line the transfer time will be a little over 3 minutes.

    I was a Skeptic, but I'm now that I've read the Digital Fountain white papers: Meta-Content Technology White Paper [digitalfountain.com] and TCP White Paper [digitalfountain.com], I'm a True Believer.

    I don't pretend to understand all the details, but the technology is based on the Luby transform [google.com] which was invented by the company's founder. Essentially the content is broken into segments which are then randomly XORed to produce the metacontent. This provides redundancy since each segment of metacontent contains information from several of the original segments. Also the metacontent segments do not require sequential transmission.
  • Re:heh.. (Score:3, Informative)

    by MeerCat ( 5914 ) on Thursday December 13, 2001 @12:51PM (#2699323) Homepage
    TCP/IP adds a 16-bit checksum to the packets. This will generally detect an error burst of 15 bits or less, if the data is uniformly distributed then this will accept a corrupt segment with probability 1 / (2^16-1). [Snader p70]. This was designed to catch bugs in routers [postel.org] etc. (which may write the wrong data when forwarding packets) rather than catch all data corruption on the wire.

    Depending on how much noise goes undetected at your physical layers, you should expect a TCP session to pass thru incorrect data about 1 in 10^9 to 10^12 bytes passed (thats the metric I use) - and if this is unacceptable then your application layer should check and/or correct data itself, bearing in mind the end-to-end argument [reed.com] for system design.

    T
  • by Anonymous Coward on Thursday December 13, 2001 @02:05PM (#2699749)
    For all that at not a mention of this link from the Digital Fountain site itself?

    http://www.digitalfountain.com/technology/librar y/ standards.htm
  • Re:Tornado Codes (Score:2, Informative)

    by NickPest ( 84591 ) on Thursday December 13, 2001 @02:12PM (#2699798)
    You can find the paper here [bu.edu].

    The folks at Digital Fountain are indeed using a highly-tuned (and proprietary) version of tornado codes. I also recommend the following papers if you're interested in what I think has the potential to be the greatest thing since TCP: tornado codes [bu.edu] + end-system multicast [cmu.edu]

    <shameless plug>
    I'm currently working on a research project [bu.edu] with John and others that integrates tornado codes and end-system multicast into a Freenet-like system. Best of all, it's GPL'd!
    </shameless plug>

  • by Anonymous Coward on Thursday December 13, 2001 @04:35PM (#2700670)
    Checkout Bandwiz(www.bandwiz.com) software based products, it seems they have far reaching capabilities than Digital Fountain.
  • Swarm Cast (Score:1, Informative)

    by Anonymous Coward on Thursday December 13, 2001 @04:53PM (#2700806)
    I believe open cola's swarm cast was the info you were looking for. I remember it from a story on /. a while back, anyway, heres the link ;
    http://www.opencola.com/products/4_swarmcast/
    Enjoy,l8r.
  • by RackinFrackin ( 152232 ) on Thursday December 13, 2001 @07:59PM (#2701893)
    Yes. CDs encode data using two Reed-Solomon Codes. Each sample (44,100 per second) records two amplitude measurements (one for each channel), as a 16-bit words. Thus there are 4 bytes/sample. Measurements from 6 consecutive samples are grouped together, and encoded using the Reed Solomon code RS(2^8,5). This adds 4 bytes of redundancy to the 24 bytes of data. After that, the 28 resultant bytes of data are interleaved, then encoded in a a shortened Reed-Solomon code that adds another 4 bytes of redundancy. Then one more byte is added that is used for some type of control purpose (i don't know exactly). After that, a translation is used to make the physical reading/writing more feasable, three bits are added for some reason after the translation, then a 27 bit word is added for syncing purposes.

    Overall 6 "ticks" of stereo music contain 196 bits of data, which is expanded to 588 bits written to the disc. The end result is that any error burst of 8871 or fewer bits of data on the disc can be corrected.

Thus spake the master programmer: "After three days without programming, life becomes meaningless." -- Geoffrey James, "The Tao of Programming"

Working...