Forgot your password?
typodupeerror
Software IT

BitTorrent For Enterprise File Distribution? 291

Posted by Soulskill
from the make-it-so dept.
HotTuna writes "I'm responsible for a closed, private network of retail stores connected to our corporate office (and to each other) with IPsec over DSL, and no access to the public internet. We have about 4GB of disaster recovery files that need to be replicated at each site, and updated monthly. The challenge is that all the enterprise file replication tools out there seem to be client/server and not peer-to-peer. This crushes our bandwidth at the corporate office and leaves hundreds of 7Mb DSL connections (at the stores) virtually idle. I am dreaming of a tool which can 'seed' different parts of a file to different peers, and then have those peers exchange those parts, rapidly replicating the file across the entire network. Sounds like BitTorrent you say? Sure, except I would need to 'push' the files out, and not rely on users to click a torrent file at each site. I could imagine a homebrew tracker, with uTorrent and an RSS feed at each site, but that sounds a little too patchwork to fly by the CIO. What do you think? Is BitTorrent an appropriate protocol for file distribution in the business sector? If not, why not? If so, how would you implement it?"
This discussion has been archived. No new comments can be posted.

BitTorrent For Enterprise File Distribution?

Comments Filter:
  • Sneakernet (Score:5, Insightful)

    by 91degrees (207121) on Sunday December 14, 2008 @01:04PM (#26111385) Journal
    The bandwidth of a DVD in the postal service isn't great but it's reasonable and quite cost effective.
    • Re:Sneakernet (Score:5, Insightful)

      by tepples (727027) <tepples AT gmail DOT com> on Sunday December 14, 2008 @01:11PM (#26111451) Homepage Journal

      The bandwidth of a DVD in the postal service isn't great but it's reasonable and quite cost effective.

      From the summary: "I would need to 'push' the files out, and not rely on users to click a torrent file at each site." I imagine that the following is also true: "I would need to 'push' the files out, and not rely on users to insert a disc and run setup.exe at each site."

      • Re: (Score:3, Insightful)

        by gbjbaanb (229885)

        surely "push the files" to a remote site is the same as "posting the files" via a different transport mechanism. When people say that they need to remotely push the files, its not that the users can't/won't be able to handle them if they're not there already setup, its because they'll forget or just be too lazy to click the button to retrieve them. A DVD in the post is difficult to miss.

        However, a DVD in the post may not arrive or may be corrupt.

      • by unassimilatible (225662) on Sunday December 14, 2008 @07:26PM (#26114315) Journal
        subspace for its communication needs.

        I'm confused.
    • by Sentry21 (8183)

      The bandwidth can be great, it's the latency that kills you.

  • by drsmithy (35869) <drsmithyNO@SPAMgmail.com> on Sunday December 14, 2008 @01:06PM (#26111399)
    No need to get fancy with an "RSS feed". rTorrent, at least, can be configured to monitor a directory for .torrent files and automatically start downloading when one appears. You could set this up, then simply push out your .torrent file to each site with something like scp or rsync.
  • these are technologies that have been proven effective when working together by people everywhere. if you put it together, test it and build a system for fail-safes etc., you should be fine!
  • ask us (Score:5, Informative)

    by TheSHAD0W (258774) on Sunday December 14, 2008 @01:07PM (#26111417) Homepage

    Next time you should ask at the official BitTorrent IRC channel [irc].

    The Python BitTorrent client [bittorrent.com], which runs on Unix, has a version called "launchmany" which is easily controlled via script. It should fit your needs very nicely.

  • Works great (Score:5, Insightful)

    by Anonymous Coward on Sunday December 14, 2008 @01:09PM (#26111427)

    BitTorrent is an excellent intranet content-distribution tool; we used it for years to push software and content releases to 600+ Solaris servers inside Microsoft (WebTV).

    -j

  • Sure, why not? (Score:5, Insightful)

    by sexybomber (740588) <boccilino@@@gmail...com> on Sunday December 14, 2008 @01:09PM (#26111431)

    Is BitTorrent an appropriate protocol for file distribution in the business sector?

    Sure! BitTorrent, remember, is only a protocol, it's just become demonized due to the types of files being shared using it. But if you're sharing perfectly legitimate data, then what's wrong with using a protocol that's already been extensively tested and developed?

    Just because it's been used to pirate everything under the sun doesn't make it inappropriate in other arenas.

    • Re: (Score:3, Insightful)

      by Bert64 (520050)

      Pirates still prefer FTP, it seems all of the big warez groups are still pushing files around using FTP...

      • Re:Sure, why not? (Score:4, Insightful)

        by Fumus (1258966) on Sunday December 14, 2008 @04:12PM (#26112591)
        Don't forget about USENET. It's way more convenient than waiting days because the seeders to leechers ratio is 1:30.
      • Re: (Score:3, Informative)

        by nog_lorp (896553) *

        You're talking about the difference between the provider pirates and the end-user pirates. SCENE people hate p2p. Average Joe-wants-stuff-for-free doesn't know what the "scene" is, and uses p2p (always wondering why torrents say RELOADED or RAZOR1911).

    • Re: (Score:3, Interesting)

      by hedwards (940851)

      The main problem is that it introduces an extra vulnerability. With it the capability of very efficiently spreading malware and viruses around. Depending upon how locked down things are, it might not be a problem, but still it's definitely something to worry about.

      And yes, I am assuming that somebody's going to get their machine infected or that somebody's going to break into the VPN traffic. Not necessarily likely, but still has to be considered.

    • by Anonymous Coward

      ...or P2P when you first mention it to the CIO.

      I would venture most CIOs' exposure to such things has been limited to what the popular media is pushing: BitTorrent == PIRACY.

      I'd recommend sticking to vague terms like "Distributed file transfer".

    • Re:Sure, why not? (Score:5, Informative)

      by Xugumad (39311) on Sunday December 14, 2008 @03:18PM (#26112225)

      One of the things that always amused was when people claimed Bram Cohen was "selling out" by working with the movie/music industry. BitTorrent was never intended for piracy use, it's merely it's most common use.

      It's very regularly used for Linux distros, game patches (World of Warcraft!), etc.

  • rsync (Score:5, Informative)

    by timeOday (582209) on Sunday December 14, 2008 @01:10PM (#26111439)
    How much do these disaster recovery files change every month? If they stay mostly the same, using rsync (or some other binary-diff capable tool) may let you keep your simple client/server model while bringing bandwidth under control.
    • Re:rsync (Score:5, Informative)

      by Anonymous Coward on Sunday December 14, 2008 @04:32PM (#26112737)

      Yes, and there are ways you can use rsync from well-planned scripts that are very powerful beyond just file transfer.

      1. The basic case of "transfer or update existing files at destination to match source." It always takes advantage of existing destination data to reduce network transfers.

      2. The creation of a new destination tree that efficiently reuses existing destination data in another tree without modifying the old tree. See --copy-dest option.

      3. In addition to the previous, don't even create local disk traffic of copying existing files from the old tree to new, but just hard link them. This is useful for things like incremental backup snapshots. See --link-dest option.

      It may not be as sexy as p2p protocols, but you can implement your own "broadcast" network via a scattered set of rsync jobs that incrementally push their data between hops in your network. And a final rsync with the master as the source can guarantee that all data matches source checksums while having pre-fetched most of the bulk data from other locations.

      I've been enjoying various rsync applications such as the following (to give you an idea of its power): Obtain any old or partial mirror of a Fedora repository and update it from an appropriate rsync-enabled mirror site, to fill in any missing packages. This is a file tree of packages and other metadata. Concatenate all of the tree's files into one large file. Then use rsync to "update" this file to match a correponding DVD re-spin image on a distro website. Rsync will figure out when most of those file extents cooked into the ISO image are already in the destination file, and just go about repositioning them and filling in the ISO filesystem's metadata. An incredibly small amount of traffic is spent performing this amazing feat.

    • by Kozz (7764)

      You know, I'd had a need for an rsync-like tool for Windows (specifically between Windows Server 2003 machines). I found a Windows-based rsync implementation (whose name I can't recall), but the tool was clunky and unreliable. I saw someone suggest Unison, but do you have any other suggestions specifically for Windows?

    • by kinema (630983)
      Even better, rsync against a local binary diff that was distributed via BitTorrent.
  • In a word, Yes (Score:5, Informative)

    by cullenfluffyjennings (138377) <c.jennings@ieee.org> on Sunday December 14, 2008 @01:10PM (#26111443) Homepage

    I've seen bittorrent used for several business critical functions. One example is world of warcraft distributing updates using it.

  • by colinmcnamara (1152427) on Sunday December 14, 2008 @01:12PM (#26111453) Homepage

    It is like Rsync on steroids. Cisco's Wan optimization and Application Acceleration product allows you to "seed" your remote locations with files. It also utilizes some advanced technology called Dynamic Redundancy Elimination that replaces large data segments that would be sent over your WAN with small signatures.

    What this means in a functional sense is that you would push that 4 Gig file over the WAN one time. Any subsequent pushes you would only sync the bit level changes. Effectively transferring only the 10 megabytes that actually changed.

    While it is nice to get the propeller spinning, there is no sense reinventing the wheel.

    Cisco WAAS - http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html [cisco.com]

    • I'm a huge fan of WAN accelorators (though I prefer the products from Riverbed), but not sure of the fit here (and is certainly isn't anything like what the OP is asking about). First, these devices aren't cheap especially if you need to communicate between tons of locations as seems to be the case here as each location will require a unit. Even the lower-end product in the category will easily run 10k. Second we don't know how much the files being moved once a month are similar. If not a majority ident

    • Re: (Score:3, Insightful)

      by jamiebecker (160558)

      presumable "on steroids" means "with a fancy GUI".

      rsync does this too. rsync can push or pull.

      besides, there are plenty of rsync gui's, too.

      however, bittorrent is almost certainly the best solution for this purpose -- the real question is coherency. You always know that eventually you'll have a complete and perfect copy at each location -- but how do you know WHEN that copy is complete so you can work on it? if this is strictly a backup system, then it's not needed, but it's probably not a good thing to be

  • by aktzin (882293) on Sunday December 14, 2008 @01:15PM (#26111473)

    Personally I like the portable media shipment suggestions. But if your CIO/company requires enterprise software from a large vendor with good support, have a look at IBM's Tivoli Provisioning Manager for Software:

    http://www-01.ibm.com/software/tivoli/products/prov-mgrproductline/ [ibm.com]

    Besides the usual software distribution, this package has a peer-to-peer function. It also senses bandwidth. If there's other traffic it slows down temporarily so it won't saturate the link. Once the other traffic is done (like during your off-hours or maintenance windows) it'll go as fast as it can to finish distributing files.

  • by bistromath007 (1253428) on Sunday December 14, 2008 @01:17PM (#26111493)
    Haven't you been reading the warnings around here about how bad it is for the Internet? If big business starts using BT we'll microwave the baby!
    • Re: (Score:2, Interesting)

      by Mad-Bassist (944409)

      Oooooh... I can see the whole issue of throttling suddenly becoming very amusing as the corporate behemoths start slugging it out.

  • by Manfre (631065) on Sunday December 14, 2008 @01:19PM (#26111515) Homepage Journal

    Have you thought about building up a distribution tree for your sites?

    Group all of your stores based upon geographic location. State, region, country, etc. Pick one or two stores in each group and they are the only ones that interact with the parent group.

    E.g. Corporate will distribute the files to two locations in each country. Then two stores from each region will see that the country store has the files and download them. Repeat down the chain until all stores have the files.

  • Bittorrent is incredibly wasteful for the initial seeding and is pretty intense on network equiptment. You have to be careful configuring all of the network settings, last thing you want is all of the stores either crashing their routers or maxing out the connections.

    why not spread out the backups? Limit the bandwidth of the backups to allow enough regular traffic and have different stores send their backups on different days

  • Captain disillusion (Score:5, Informative)

    by jonaskoelker (922170) <{jonaskoelker} {at} {gnu.org}> on Sunday December 14, 2008 @01:28PM (#26111559) Homepage

    with IPsec over DSL, and no access to the public internet.

    Unless you have very long wires, some box is going to route them. Are those your own?

    Otherwise, your ISP's router, diligent in separating traffic though it may be, can get hacked.

    Why am I saying this? Not to make you don your tinfoil hat, certainly, but just to point out that if the scenario is as I describe, you're not 100% GUARANTEED to be invulnerable. Maybe a few tinfoil strips in your hair would look nice... ;)

    About the actual question: bit torrent would probably be fine, but if most of the data is unchanged between updates, you may want to compute the diff and then BT-share that. How do you store the data? If it's just a big tar(.gz|.bz2) archive, bsdiff might be your friend.

    If you push from a single seeder to many clients, maybe multicast would be a good solution. But that's in the early design phase I think, which is not what you need :)

    Best of luck!

    • Seriously... if you do your encryption right, it doesn't matter who is in-between. Have the initial transfer between both of them be a public end of a PGP key. There is no man-in-the-middle attack for that.
  • How I would do it... (Score:5, Interesting)

    by LuckyStarr (12445) on Sunday December 14, 2008 @01:37PM (#26111605)

    ...is quite straight forward in fact.

    1. Create a "Master" GnuPG/PGP Key for yourself. This key is used to sign all your data as well as your RSS feed (see below).
    2. Set up an RSS feed to announce your new files. Sign every entry in it using your "Master-Key".
      • All the stores check the validity of your RSS feed via your public key.
      • All the stores have one (or the same) GnuPG/PGP key to decrypt your files. The beauty of GnuPG/PGP is that given many destinations you can encrypt your data so that every recipient (each with their own key) can decrypt them. Nice, eh?
    3. Set up a standard BitTorrent server to distribute your files.
    4. Announce all your new files via your RSS feed.

    This has many advantages:

    The beauty of this system is that it relies heavily on existing technology (BitTorrent, RSS, GnuPG, etc), so you can just throw together a bunch of libraries in your favourite programming language (I would use Python for myself), and you are done. Saves you time, money and a lot of work!

    Furthermore you do not need to have a VPN set up to every destination as your files are already encrypted and properly signed.

    Another advantage is: As this is a custom-built system for your use-case it should be easy to integrate it into your already existing one.

  • How is the VPN setup (Score:5, Informative)

    by eagle486 (553102) on Sunday December 14, 2008 @01:40PM (#26111637)
    If the VPN is setup in a standard hub and spoke configuration then bittorrent would not help since all traffic between sites has to go via the central site.

    Your best bet is multicast, there are programs for software distribution that use multicast.

  • it's called dsync (Score:5, Interesting)

    by slashdotmsiriv (922939) on Sunday December 14, 2008 @01:40PM (#26111639)

    and you can find documentation for it here:
    http://www.cs.cmu.edu/~dga/papers/dsync-usenix2008-abstract.html [cmu.edu]

    It is rsync on steroids that uses a BitTorrent-like P2P protocol that is even more efficient because it exploits file similarity.

    You may have to contact the author of the paper to get the latest version of dsync, but I am sure they would be more than happy to help you with that.

  • I'd get a station wagon and fill it with tapes. Go on, mod me "-1 old fashioned"
  • by Mostly a lurker (634878) on Sunday December 14, 2008 @01:58PM (#26111747)
    CIOs are notoriously conservative. Any solution you suggest that involves building a solution from scratch will scare them. The solution is to use existing proven technology. In the MS Windows world, at least, root kits have been distributing updates successfully for years. You should be looking at simply modifying an existing root kit to your requirements.
  • Are you using IPSec in Tunnel mode or Transport mode? If you're using it in tunnel mode, then you're not going to fix your bandwidth problem, because all data has to go through corporate HQ anyway because that's where the tunnels end.

    • I;m guilty of abstracting away that detail in contemplating his article.

      If it proves his network architecture has the same bottleneck either way, all the more reason he needs to take a hard look at is data and how amenable it is to rsync.

  • by anexkahn (935249) on Sunday December 14, 2008 @02:30PM (#26111943) Homepage
    In windows 2003 R2/Windows Server 2008 they really improved DFS. It lets you set up throttling in 15 minute increments, and with Full Mesh replication, it decentralizes your replication..kind of like bit torrent. However, you have to make sure you don't accidentally use FRS, because it sucks. Where I work we have 5 branches that pull data from our data center. I have DFS replication setup so I can have all our software distribution at the local site. I need to keep the install points at all the sites the same, so I use DFS to replicate all the data, then to get to it I type \\mydomain.com\DFSSharename Active Directory determines what site I am in, then points me to the local share. If the local share is not available, it points me to the remote share, or to a secondary share in the same site...so it gives you failover for your file servers. If you don't have any windows boxes, this wont work, and this really locks you into Microsoft, but it won't cost you anything more than what you have already paid. Below is a link to Microsoft's page with more information, including how to set it up: http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/default.mspx [microsoft.com]
  • You could set up a NFS distributed file system. That may be more amenable to your boss and will have other advantages too.
  • What platform is used?
    Is it scriptable readily?
    How scheduled are the updates?
    How similar is the data day to day?

    Things come to mind as a tradtionally Unix admin:
    -cron job to download the file using screen and btdownloadcurses
    -ssh login to each site and do the same (if need to push at arbitrary times)
    -rsync (if the day-to-day diff is small, might as well do this)

    Analogous procedures can probably be down for whatever platform you choose. Learning how to generically apply this strategy in the platform of choi

  • Kontiki (Score:3, Informative)

    by MikeD83 (529104) on Sunday December 14, 2008 @03:19PM (#26112235)
    I work for a large company (>50,000 employees). IT recently rolled out a new "video delivery service." The system delivers videos to everyone's desktop. The system is designed by Kontiki [kontiki.com]. It's basically an enterprise BitTorrent tool which Kontiki prefers to call, "peer-assisted."
    • Re: (Score:2, Informative)

      by thintz (842339)
      I too work for a large company that rolled out Kontiki. Like the previous poster mentioned Kontiki is a commercial, enterprise class BitTorrent like tool. We also use it to deliver video to the desktop. I havn't worked directly with those guys for years but believe you could use it for most any type of content. I believe they can handle your security needs as well as dynamically adjust how much bandwidth they are using based on a number of different criteria. I'd give them a call to at least inquire fu
  • by Antique Geekmeister (740220) on Sunday December 14, 2008 @03:57PM (#26112473)

    It's spelled 'NNTP'. Look at how Usenet newsgroups, especially for binaries, have worked for decades for a robust distribution model. The commands to assemble the messages can be scripted as well.

    Similarly, the bottorrent files you describe can also be pushed or pulled from a centralized target list and activated via SSH as needed.

  • consider scripting the process of creating a torrent file of the data that needs replication. At each remote site, run some linux or bsd system and setup ssh keys so the central server can run a script on each remote machine.

    setup a local bittorrent tracker.

    On the main server, script building the torrent file and run an upload script against a list of remote sites that would download the torrent file via scp and run it until it has seeded out a given amount OR has run for x days.

    The only issue here that I

  • Why not send it simultaneously to all locations using multicast?

    What about uploading an encrypted version to S3 which can then be downloaded via torrent or the S3 API?

  • Browsers should support bittorrent-URLs right out of the box, there's really no excuse for not doing this. It would make hosting (large-ish) static content so much easier.
  • Doesn't a BitTorrent folder already allow adding additional stuff later?

    I would recommend making a small modification of an existing open source torrent client:
    Let the download never stop. Make it look for now parts, updates to downloaded parts (via sha1), and new files in the directory structure of the torrent until the end of time.

    That way you have an instant error-resistant peer-to-peer backup and replication service that is as easy to use, as copying (or linking) the files into the right folder.

    • by Archon-X (264195)

      ..and a fantastic way to seed all sorts of nasties into an originally clean download.

  • by asifyoucare (302582) on Sunday December 14, 2008 @08:20PM (#26114765)

    I looked at BitTorrent to distribute Windows XP SP2 (about 250 MB) to about 1,500 clients, most of whom were in small sites containing 3 - 5 computers. Network links were skinny.

    I broke the SP2 file into 1 MB chunks, and I wanted each chunk of the file to be transferred to a remote site only once(i.e. clients check for local sources first).

    BitTorrent didn't do that so I ended up writing a script, and that seemed to work well.

  • netapp (Score:3, Insightful)

    by Shotgun (30919) on Monday December 15, 2008 @10:14AM (#26119445)

    What you really want is a solution like Netapp file servers. It will distribute the files at the file system block level, updating only the blocks that have changed. You install a filer at your central office, then have multiple mirrors of that one at the various field offices. All the PCs get their boot image off the network file server (the local one). With one update, you can upgrade every PC in the entire company.

The one day you'd sell your soul for something, souls are a glut.

Working...