Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Technology

UDP + Math = Fast File Transfers 449

Wolfger writes: "A new file transfer protocol talked about in EE Times gets data from one place to another without actually transmitting the data..." Well, the information is transmitted, just not in its original form. I think there are a couple of other places working on similar ideas - wasn't there a company using this for a fast file download application? User would go to download a game demo or something, receive pieces from several different places, and knit them together? Wish I could recall the company's name.
This discussion has been archived. No new comments can be posted.

UDP + Math = Fast File Transfers

Comments Filter:
  • Compression? (Score:3, Interesting)

    by mi ( 197448 ) <slashdot-2017q4@virtual-estates.net> on Thursday December 13, 2001 @10:00AM (#2698444) Homepage Journal

    In essence, is not this the same as file compression? The amount of information is the same
    (for those, who remember, what Bit is). It is just, that the usual one character per byte is awfully wastefull. Which is why the various compressors are so effective.

    Add a modern data transfer protocol and you may
    get some start up money :-)

  • Kinda like IFS? (Score:2, Interesting)

    by pointym5 ( 128908 ) on Thursday December 13, 2001 @10:01AM (#2698452)
    I mean it's not for image compression specifically, but it definitely reminds me of IFS image compression in some ways. I'll bet that compression is very time consuming, but that's fine if you're warehousing data. I wonder if the clients are pre-loaded with a body of parameterized functions, so that the server just sends information describing what functions to run and what the parameters are. I guess if it's all based on polynomials all it needs to send are vectors of constants.

    Neat idea. Patents: here [uspto.gov] and here [uspto.gov].
  • XOLOX is the name (Score:1, Interesting)

    by arnwald ( 468380 ) on Thursday December 13, 2001 @10:03AM (#2698461) Homepage
    The program was called xolox,
    I know the developper personally and he is very disappointed about the corporate feedback he got.

    People loved it, corporations didnt, so he shut down his site and with it Xolox ( unless you have a hacked version of course ;)

    Cheers.
  • by seanmceligot ( 21501 ) <{moc.liamg} {ta} {togilecm.naes}> on Thursday December 13, 2001 @10:17AM (#2698515)
    This could be done easily without the proprietary algorithms. Just send update packets with a header in each on stating that it is packet number N and there are X total packets. Then, request missing packets when you get towards the end, and put them all together in order when you get them all.

    Somewhat unrelated --- Does anyone else miss Z-Modem. We need a zmodem like program for that works over telnet so we don't have to open a separate FTP session. In the BBS days, you just typed rz filename and it came to you.
  • Re:Flow Control (Score:3, Interesting)

    by Omnifarious ( 11933 ) <eric-slash@nOsPAM.omnifarious.org> on Thursday December 13, 2001 @10:20AM (#2698537) Homepage Journal

    Quite correct. This protocol does not sound at all TCP friendly [yahoo.com]. It needs some way of dynamically responding to network conditions to be that way. Even something so simple as doing an initial bandwidth test, then rate limiting the UDP packets to 90% of that would be a big help, though for a large file that would still create a lot of congestion problems.

    Does anybody know if IPV6 has any kind of congestion notification ICMP messages so stacks can know to throttle applications when the applications are blasting out data that's congesting the network?

  • Article is wrong (Score:4, Interesting)

    by saridder ( 103936 ) on Thursday December 13, 2001 @10:22AM (#2698545) Homepage
    The article quotes that "...FTP requires packets to arrive in sequence, and TCP requires a receiving end to acknowledge every packet that arrives, so that dropped packets can be resent..."

    This is incorrect. TCP has a concept of sliding windows where once a number of packets has been received successfully, the receiver increases the number of packets that can be sent without an ack. This is an exponential number, so if it receives 2 packets successfully, it will then tell the sender that it will take 4 before an ack is needed. The only time you get a 1 for 1 ack ratio is if you miss a packet and the window slams shut.

    Furthermore, UDP for data is highly unreliable, and I wouldn't trust it across WAN's. Frame Relay switches may drop packets if you exceed your CIR and begin bursting, so that whole transfer will never succeed. Therefore you actually waste bandwidth cause the whole transfer is doomed to fail, and the sender will never know it.

    Also some routers have WRED configured in their queues, purposely dropping TCP packets to increase bandwidth on a global scale. This would damage the file transfer process as well if it was UDP based, as this system is.

    Stick with the RFC's and the tried and true TCP transport system. This company will fail.
  • Re:Flow Control (Score:5, Interesting)

    by Omnifarious ( 11933 ) <eric-slash@nOsPAM.omnifarious.org> on Thursday December 13, 2001 @10:33AM (#2698607) Homepage Journal

    The 'equations' are broken up into pieces such that if you recieve any N pieces, you can reconstruct the entire file. It's like how some key sharing schemes work. Like Publius for example. Any N 'shares' of the key can be used to reconstruct the entire key. In this case the 'key' is the whole document, and I'm betting they use a different sharing scheme than ones already used for cryptography.

  • Re:Flow Control (Score:2, Interesting)

    by kmacleod ( 103404 ) on Thursday December 13, 2001 @10:42AM (#2698670) Homepage
    I think you missed the point. Sure, you need only to sip from the fire hose to finally quench your thirst, but that wastes a lot of water.

    Where TCP acks and controls flow on every single packet, which also must be received by the sender in order, a protocol like this can occasionally send back a report on what kind and how many packets it is receiving (fragmented, packet loss due to bandwith, packet loss due to congestion).
  • Sounds like CELP (Score:1, Interesting)

    by Anonymous Coward on Thursday December 13, 2001 @10:46AM (#2698685)
    Hmmm. Let's see. Instead of transmitting the data, put together a set of mathematical equations which can be used to replicate the data, then transmit a description of those equations.

    Sounds like Code Excited Linear Predition. Linear prediction uses an equation which can be used to replicate a data stream. Code excited, in this case, means you have a variety of equations to choose from, and you can plug in different coefficients for the various equations. Consequently, all you need to transmit are:
    • which equation to use (if there are 256 equations to choose from, that's an 8-bit value)
    • what coefficients to use with those equations
    • how long to follow the resulting data sequence, before you need to change the values
    CELP was applied particularly to speech compression. The DOD was using it with 4800 bps (yes, that's 4.8 kb/s, about 1/2 - 2/3 the bandwidth used by PCS cellphones) modems and encryption systems to transmit secure voice.

    It sounds to me like they've got a more general selection of equations, and the resulting datastreams are interleaved. If you don't get the information for all the equations, you can't accurately reconstruct the data. Additionally, since you don't have to respond to every packet, you reduce the turnaround.

    Unless I'm mistaken, Fiber Channel already does the latter part.

    In short: a more general application of existing compression schemes and protocols.

    95% of what you read about these days is just mucking around with new combinations and applications for existing inventions.
  • Luby Transform Codes (Score:5, Interesting)

    by Detritus ( 11846 ) on Thursday December 13, 2001 @10:52AM (#2698715) Homepage
    After looking through some of the material on the company's web site, this product appears to be based on LT (Luby Transform) coding. Each encoded packet is the result of XORing a random selected set of segments from the original file. When sufficient packets have been received, they can be used to reconstruct the original file. Insert magic algorithm. The nice thing about this is that the transmitter can continually stream packets, and a receiver can start collecting packets at any point in time. When the receiver has collected sufficient packets, it can reconstruct the original file. Packet ordering is totally irrelevant. You just need enough packets to generate a unique solution. The math for the code has not been published yet, but this is supposed to be a successor to tornado codes, which have been described in the literature.
  • Multicast (Score:4, Interesting)

    by srichman ( 231122 ) on Thursday December 13, 2001 @11:20AM (#2698866)
    Quite correct. This protocol does not sound at all TCP friendly [yahoo.com]. It needs some way of dynamically responding to network conditions to be that way.
    Well, "this protocol" is just multicast (from a network perspective). Though there have been research attempts to introduce pruning in multicast streams, it is inherent a non-flow controllable transmission protocol. If you take offense to the lack of "TCP friendliness" in multicast, then I suggest you complain to Steve Deering, not Digital Fountain.

    Multicast is congestion friendly in that it can inherently seriously reduce bandwidth consumption, but it's obviously not congestion adaptive. I think the easiest (and probably best) way to introduce congestion control in a multicast is to have multiple streams at different rates, e.g., you have five simultaneous multicast webcasts of a Victoria Secret show, and folks can join the mulitcast group with the video quality that best suits the bandwidth and congestion on their pipes. This idea works very well for the Digital Fountain-style software distribution: rather than having a server serving the same stream at 10 different rates, you can have 10 servers streaming their own unsynchronized Tornado encodings, and clients can subscribe to however many their pipes can support. With 10 streams of data coming at you, you can reconstruct the original data ~10 times as fast.

  • Re:Article is wrong (Score:5, Interesting)

    by Minupla ( 62455 ) <`moc.liamg' `ta' `alpunim'> on Thursday December 13, 2001 @11:31AM (#2698921) Homepage Journal
    Stick with the RFC's and the tried and true TCP transport system. This company will fail.

    You may be right, they may fall on their noses. Or the win with their system might be large enough that we decide that we need to rework the way we do some things. Hard to say at this point.

    I do take issue though with your 'Stick with the RFCs' comment. If we stick with what we know works, we will never evolve. If the folks at CERN had had this attitude, we'd still be using GOPHER (side note: my favorate interview question for testing people who claim to have a long time Net experience, 'Explain the difference between Archie and Veronicia'.)

    GOPHER was the tried and true text retrieval system. Besides, UDP has an RFC, and is a perfectly legit method of moving data, provided you accept its limitations and ensure you correct for them. TCP has a lot of overhead that is not always required. If our network breaks because someone sends UDP, its we who need to reread our RFCs.

    'Nuff Said.
  • Re:heh.. (Score:4, Interesting)

    by regen ( 124808 ) on Thursday December 13, 2001 @11:57AM (#2699073) Homepage Journal
    This is really funny. I used to work for the NYSE (Stock Exchange) and it wasn't uncommon to transmit 10GB of data over night via FTP to a single brokerage. The data that they were transmitting was at least as valuable as chip designs. It would allow you to alter trade position after trades had been transacted but before they cleared. (e.g. cancel a bad transaction or increase the amount of a good transaction)
  • Re:my take (Score:3, Interesting)

    by mr3038 ( 121693 ) on Thursday December 13, 2001 @01:30PM (#2699559)
    It appears to me that the transmitting side generates the symbols (parameters of the equations, I guess) and begins sending them to the receiving side as fast as it can. ...don't have to keep up this conversation: [long conversation removed]

    It seems to me that article indeed speaks about network that has high latency but high bandwidth with some loss. How about simply compressing [redhat.com] the data and using bigger packets to transmit it? If you can use big enough window while sending data you can push all the data to the network in the beginning. Conversation comes to A) Here it comes A) Done [64 packets, 125MB] B) Okay, listening B) Resend 2,7 and 25 A) Done [3 packets, 6MB] B) OK. Note that A starts sending before getting reply from B. In fact, with fast long-distance connection it could be that A gets to the end before B getting "Here it comes".

    I think if we want to speed up file transfer we need an API to tell OS that we're going to send lots of data so make it big packets or the opposite. Currently we just open socket connection to destination and start write()ing. OS has no way to guess whether or not we're going to write 100 or 10e8 bytes. We need a way to tell OS that the data we're sending isn't worth a dime before it's all done so make it big packets to minimize bandwidth wasted to TCP control traffic.

    You can opt to waste bandwidth to reduce perceived latency and that's what I think is done here. A sends file twice and in a case some packets were lost the sent copy would be used to fill in missing parts. A has sent missing packet before B had known it's missing it. Yeah, A wasted half the bandwidth for the redundant data that got correctly to the destination at the first time but we aren't interested in that. The key here is to use UDP so that lost packets are really lost instead of automatically resend. This kind of setup increases overall throughput only if latency is the only problem in your network. Perhaps this is needed in the future because increasing bandwidth is easy - not necessarily cheap - but the light crawling inside fiber is so damn slow!

  • Re:Article is wrong (Score:4, Interesting)

    by seanadams.com ( 463190 ) on Thursday December 13, 2001 @01:35PM (#2699596) Homepage
    Furthermore, UDP for data is highly unreliable, and I wouldn't trust it across WAN's.

    You're missing the point of UDP. UDP [faqs.org] is just a *tiny* layer on top of IP, which adds a little extra information (basically the port number) so that the OS can deliver a packet to the right application. UDP can not be compared with TCP - it doesn't provide reliability and flow control, and it has absolutely no notion of a stream of data. If desired, these can be provided in the application layer (see UDP, TFTP, NFS, etc. etc.)

    TCP [ibiblio.org] is a reliable transport, but it's much, much more than that. First off, the fact that you're using TCP doesn't make the path between sender/receiver any more reliable. Your packets get dropped just the same as if they were UDP or any other protocol over IP. TCP provides reliability by retransmitting lost packets, but you knew that. It also provides flow control and congestion avoidance - this means detecting when the receiving end (and the router queues in between) are ready for more data, and throttling your transmission rate to match that capacity. It also means being "fair" to other streams sharing the same bottleneck(s). It does this by "backing off" the number of packets in flight, i.e. halving the congestion window, to reduce the data rate. These algorithms are a very active field of research - there's a *lot* more to TCP than meets the eye of a socket programmer.

    When TCP loses a packet, that packet must be retransmitted. This is expensive because it means another RTT.

    Anyhoo...

    You can think of FEC for data transmission as being analogous to RAID 5 for storage. By adding one extra bit (the XOR of the rest of the word) you can lose any single bit and still know it's value. It's very simple. If the word is:

    0 1 0 1

    And I add an extra "parity" bit, where the parity bit is 1 is the number of ones in the rest of the word is odd, zero if it's even:

    0 1 0 1 [0]

    I can now lose any one bit (including of course the parity bit). Eg if I have only

    0 1 X 1 [0]

    Then I know the missing bit is a '0', because if it were '1' then the parity bit would be a zero.

    Applying this to data transmission, you can see that by sending just one extra packet, you greatly reduce the chance of having to retransmit anything.

    EG if I have to send you 10 packets over a link with 10% packet loss, there's a 65% chance that I'll have to retransmit one of those packets. (and a 10% chance that each retransmitted packet will have to be sent again, and so on).

    However if I'm using FEC and I send one extra "parity" packet, then I only have to retransmit if TWO OR MORE packets are lost. The chances of losing TWO out of the eleven packets is only 30%, so you can see that for an overhead of 10%, I've reduced the number of retransmits by a factor of more than two! I hope those figures are right. I used this tool [vassar.edu] to calculate them. Of course there are a lot of knobs you can turn depending on how much overhead you can afford for the parity information, and what degree of packet loss you want to be able to tolerate.

    Anyway, you can see that there are lots of possible improvements/alternatives to TCP - it's an old protocol and a lot of research has been done since RFC 793.
  • by justin_squinky ( 38349 ) on Thursday December 13, 2001 @02:04PM (#2699744)
    Bittorrent [bitconjurer.org]
    does that! It's not really trying to be a napster or gnutella but a way to allow people to host a high volume website without having a lot of bandwidth. It works seamlessly with mozilla and IE. It's also quite fast unlike the anonymous p2p networks.
  • Junk or Genius? (Score:2, Interesting)

    by Spazmania ( 174582 ) on Thursday December 13, 2001 @03:22PM (#2700249) Homepage
    This could be true genius or mediocre junk depending on the details. That article wasn't clear: Does it have to be N specific packets (in any order) or can it be any N of the transmitted packets?

    N specific packets is, well, nothing special. The idea of using negative acknowledgements instead of positive acknowledgements and retransmitting only the missing pieces is nothing new. It was used extensively in pre-Internet days on dedicated connections. Ever heard of Z-Modem? Its been avoided on the Internet because it makes the congestion control problem very difficult. Translating it into symbols is really just saying that they also compress the file first. Whoop de doo.

    On the other hand, if it can be ANY set of N of the transmitted packets, well, that's downright incredible. The applications for such a technique are boundless... Everything from finally making multicast viable and desirable to latency and congestion indifferent file transfers to ultra-reliable offline storage.

    How would you like a web server that can serve a T3's worth of clients off of a T1 and do it in such a way that the smart web browser can pre-cache data from the server in anticipation of the reader's next click, realtime adaptive based on the documents currently in multicast transmission to somebody else? Oh yeah, the web server can be on the Moon and respond to you as fast as if it was across the street. Genius.

    So which is it? Any set of N or a specific set?
  • Check out TSBN (Score:1, Interesting)

    by Anonymous Coward on Thursday December 13, 2001 @03:39PM (#2700372)
    A company located at www.tsbn.com has been doing this since at least 1996. Their IP is filed back past then anyway. They claim to do it all in software without devices. Guys from some of the companies mentioned above have stock in TSBN.. interesting.
  • Re:heh.. (Score:3, Interesting)

    by Thomas Charron ( 1485 ) <twaffle@gmai l . com> on Thursday December 13, 2001 @04:44PM (#2700739) Homepage
    Data transfered to variouse financial routines are routinely transmitted using insecure protocols in an unencrypted format. I've wored in at least 2 positions, one in a creditcard clearinghouse, another at an ECN. This data goes over unsecured channels all the time.

"Protozoa are small, and bacteria are small, but viruses are smaller than the both put together."

Working...