Slashdot Log In
BitTorrent For Enterprise File Distribution?
Posted by
Soulskill
on Sun Dec 14, 2008 12:01 PM
from the make-it-so dept.
from the make-it-so dept.
HotTuna writes "I'm responsible for a closed, private network of retail stores connected to our corporate office (and to each other) with IPsec over DSL, and no access to the public internet. We have about 4GB of disaster recovery files that need to be replicated at each site, and updated monthly. The challenge is that all the enterprise file replication tools out there seem to be client/server and not peer-to-peer. This crushes our bandwidth at the corporate office and leaves hundreds of 7Mb DSL connections (at the stores) virtually idle. I am dreaming of a tool which can 'seed' different parts of a file to different peers, and then have those peers exchange those parts, rapidly replicating the file across the entire network. Sounds like BitTorrent you say? Sure, except I would need to 'push' the files out, and not rely on users to click a torrent file at each site. I could imagine a homebrew tracker, with uTorrent and an RSS feed at each site, but that sounds a little too patchwork to fly by the CIO. What do you think? Is BitTorrent an appropriate protocol for file distribution in the business sector? If not, why not? If so, how would you implement it?"
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Sneakernet (Score:5, Insightful)
Re:Sneakernet (Score:5, Insightful)
The bandwidth of a DVD in the postal service isn't great but it's reasonable and quite cost effective.
From the summary: "I would need to 'push' the files out, and not rely on users to click a torrent file at each site." I imagine that the following is also true: "I would need to 'push' the files out, and not rely on users to insert a disc and run setup.exe at each site."
Parent
Re: (Score:3, Insightful)
surely "push the files" to a remote site is the same as "posting the files" via a different transport mechanism. When people say that they need to remotely push the files, its not that the users can't/won't be able to handle them if they're not there already setup, its because they'll forget or just be too lazy to click the button to retrieve them. A DVD in the post is difficult to miss.
However, a DVD in the post may not arrive or may be corrupt.
Congressnet (Score:5, Funny)
Hey I resent that. None of my DVDs have ever taken a bribe.
Parent
But I thought the Enterprise used (Score:4, Funny)
I'm confused.
Parent
Re:Sneakernet (Score:4, Insightful)
Also, burning (and packaging and mailing...) a bunch of DVDs isn't necessarily cheap/quick/easy, so it breaks down pretty quickly as the number of stores increases.
Parent
Different torrent client ? (Score:5, Informative)
Re:Different torrent client ? (Score:5, Interesting)
rtorrent [rakshasa.no] watching a directory for .torrent would be the way to go. And then use unison [upenn.edu] to keep the .torrent directory in-sync.
Parent
ask us (Score:5, Informative)
Next time you should ask at the official BitTorrent IRC channel [irc].
The Python BitTorrent client [bittorrent.com], which runs on Unix, has a version called "launchmany" which is easily controlled via script. It should fit your needs very nicely.
Works great (Score:5, Insightful)
BitTorrent is an excellent intranet content-distribution tool; we used it for years to push software and content releases to 600+ Solaris servers inside Microsoft (WebTV).
-j
Sure, why not? (Score:5, Insightful)
Sure! BitTorrent, remember, is only a protocol, it's just become demonized due to the types of files being shared using it. But if you're sharing perfectly legitimate data, then what's wrong with using a protocol that's already been extensively tested and developed?
Just because it's been used to pirate everything under the sun doesn't make it inappropriate in other arenas.
Re: (Score:3, Insightful)
Pirates still prefer FTP, it seems all of the big warez groups are still pushing files around using FTP...
Re:Sure, why not? (Score:4, Insightful)
Parent
Re: (Score:3, Informative)
You're talking about the difference between the provider pirates and the end-user pirates. SCENE people hate p2p. Average Joe-wants-stuff-for-free doesn't know what the "scene" is, and uses p2p (always wondering why torrents say RELOADED or RAZOR1911).
Re: (Score:3, Interesting)
The main problem is that it introduces an extra vulnerability. With it the capability of very efficiently spreading malware and viruses around. Depending upon how locked down things are, it might not be a problem, but still it's definitely something to worry about.
And yes, I am assuming that somebody's going to get their machine infected or that somebody's going to break into the VPN traffic. Not necessarily likely, but still has to be considered.
Re:Sure, why not? (Score:5, Informative)
One of the things that always amused was when people claimed Bram Cohen was "selling out" by working with the movie/music industry. BitTorrent was never intended for piracy use, it's merely it's most common use.
It's very regularly used for Linux distros, game patches (World of Warcraft!), etc.
Parent
rsync (Score:5, Informative)
Re:rsync (Score:5, Informative)
Yes, and there are ways you can use rsync from well-planned scripts that are very powerful beyond just file transfer.
1. The basic case of "transfer or update existing files at destination to match source." It always takes advantage of existing destination data to reduce network transfers.
2. The creation of a new destination tree that efficiently reuses existing destination data in another tree without modifying the old tree. See --copy-dest option.
3. In addition to the previous, don't even create local disk traffic of copying existing files from the old tree to new, but just hard link them. This is useful for things like incremental backup snapshots. See --link-dest option.
It may not be as sexy as p2p protocols, but you can implement your own "broadcast" network via a scattered set of rsync jobs that incrementally push their data between hops in your network. And a final rsync with the master as the source can guarantee that all data matches source checksums while having pre-fetched most of the bulk data from other locations.
I've been enjoying various rsync applications such as the following (to give you an idea of its power): Obtain any old or partial mirror of a Fedora repository and update it from an appropriate rsync-enabled mirror site, to fill in any missing packages. This is a file tree of packages and other metadata. Concatenate all of the tree's files into one large file. Then use rsync to "update" this file to match a correponding DVD re-spin image on a distro website. Rsync will figure out when most of those file extents cooked into the ISO image are already in the destination file, and just go about repositioning them and filling in the ISO filesystem's metadata. An incredibly small amount of traffic is spent performing this amazing feat.
Parent
In a word, Yes (Score:5, Informative)
I've seen bittorrent used for several business critical functions. One example is world of warcraft distributing updates using it.
Re:In a word, Yes (Score:5, Insightful)
For Blizzard, updates to World of Warcraft are very much a "business critical function".
Parent
Cisco already makes a product to do this - WAAS (Score:5, Informative)
It is like Rsync on steroids. Cisco's Wan optimization and Application Acceleration product allows you to "seed" your remote locations with files. It also utilizes some advanced technology called Dynamic Redundancy Elimination that replaces large data segments that would be sent over your WAN with small signatures.
What this means in a functional sense is that you would push that 4 Gig file over the WAN one time. Any subsequent pushes you would only sync the bit level changes. Effectively transferring only the 10 megabytes that actually changed.
While it is nice to get the propeller spinning, there is no sense reinventing the wheel.
Cisco WAAS - http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html [cisco.com]
Re: (Score:3)
I'm a huge fan of WAN accelorators (though I prefer the products from Riverbed), but not sure of the fit here (and is certainly isn't anything like what the OP is asking about). First, these devices aren't cheap especially if you need to communicate between tons of locations as seems to be the case here as each location will require a unit. Even the lower-end product in the category will easily run 10k. Second we don't know how much the files being moved once a month are similar. If not a majority ident
Re:Cisco already makes a product to do this - WAAS (Score:5, Interesting)
Bittorrent will transfer the differences too, if you make a new file overwrite an old one, it will replace any chunks which are different.
Parent
Re:Cisco already makes a product to do this - WAAS (Score:4, Informative)
BitTorrent is not very flexible in this regard and so if you have bits -added- to the middle, then everything after the first added bit will need to be updated.
The worse case is of course, if you have new material at the beginning and everything is shifted. BitTorrent is not designed for that.
Parent
Re:Cisco already makes a product to do this - WAAS (Score:4, Interesting)
Use bittorrent to distribute git blobs. They are immutable & append only; perfect for something like bittorrent. All you'd really need is a good means of syndication via Atom, & end users capable of understanding SCM.
Parent
Re:Cisco already makes a product to do this - WAAS (Score:4, Informative)
Even with a large file only the differences can be retransmitted with bittorrent, provided that the overall filesize doesn't change. At startup, bittorrent will verify the local data and then discard and redownload the chunks that don't match the checksum in the torrent file.
But rsync would be a better solution in this scenario as it was explicitly designed for such a use and will handle changes to the file much better.
Parent
Re: (Score:3, Insightful)
presumable "on steroids" means "with a fancy GUI".
rsync does this too. rsync can push or pull.
besides, there are plenty of rsync gui's, too.
however, bittorrent is almost certainly the best solution for this purpose -- the real question is coherency. You always know that eventually you'll have a complete and perfect copy at each location -- but how do you know WHEN that copy is complete so you can work on it? if this is strictly a backup system, then it's not needed, but it's probably not a good thing to be
If the CIO expects "official" support... (Score:5, Informative)
Personally I like the portable media shipment suggestions. But if your CIO/company requires enterprise software from a large vendor with good support, have a look at IBM's Tivoli Provisioning Manager for Software:
http://www-01.ibm.com/software/tivoli/products/prov-mgrproductline/ [ibm.com]
Besides the usual software distribution, this package has a peer-to-peer function. It also senses bandwidth. If there's other traffic it slows down temporarily so it won't saturate the link. Once the other traffic is done (like during your off-hours or maintenance windows) it'll go as fast as it can to finish distributing files.
Re:If the CIO expects "official" support... (Score:4, Insightful)
Actually, there is a Tivoli product that does more or less exactly what the OP asks for: IBM Tivoli Provisioning Manager for Dynamic Content Delivery [ibm.com]
Parent
No, you fool! (Score:5, Funny)
Chained client/server (Score:4, Insightful)
Have you thought about building up a distribution tree for your sites?
Group all of your stores based upon geographic location. State, region, country, etc. Pick one or two stores in each group and they are the only ones that interact with the parent group.
E.g. Corporate will distribute the files to two locations in each country. Then two stores from each region will see that the country store has the files and download them. Repeat down the chain until all stores have the files.
Captain disillusion (Score:5, Informative)
with IPsec over DSL, and no access to the public internet.
Unless you have very long wires, some box is going to route them. Are those your own?
Otherwise, your ISP's router, diligent in separating traffic though it may be, can get hacked.
Why am I saying this? Not to make you don your tinfoil hat, certainly, but just to point out that if the scenario is as I describe, you're not 100% GUARANTEED to be invulnerable. Maybe a few tinfoil strips in your hair would look nice... ;)
About the actual question: bit torrent would probably be fine, but if most of the data is unchanged between updates, you may want to compute the diff and then BT-share that. How do you store the data? If it's just a big tar(.gz|.bz2) archive, bsdiff might be your friend.
If you push from a single seeder to many clients, maybe multicast would be a good solution. But that's in the early design phase I think, which is not what you need :)
Best of luck!
How I would do it... (Score:5, Interesting)
...is quite straight forward in fact.
This has many advantages:
The beauty of this system is that it relies heavily on existing technology (BitTorrent, RSS, GnuPG, etc), so you can just throw together a bunch of libraries in your favourite programming language (I would use Python for myself), and you are done. Saves you time, money and a lot of work!
Furthermore you do not need to have a VPN set up to every destination as your files are already encrypted and properly signed.
Another advantage is: As this is a custom-built system for your use-case it should be easy to integrate it into your already existing one.
Re:How I would do it... (Score:5, Informative)
Not necessarily true. PGP allows you to sign with multiple keys. Each site would have their own key that they would use to decrypt the file. One file, multiple keys, multiple users. Simple.
Parent
How is the VPN setup (Score:5, Informative)
Your best bet is multicast, there are programs for software distribution that use multicast.
The question remains.. (Score:3, Insightful)
How are they connected to each other? If the same bottleneck router is used to reach each other, then it is a mott point. People often forget about the underlying network workings and abstract away that important detail. They can reach each others IPs, but that is not to say all traffic goes through the same weak link in the chain regardless.
Re: (Score:3, Informative)
I also assumed that this was hub and spoke and that the "to each other" statement was just routing. Depending on the number of remote sites, and that he did not mention a specific hardware supplier, I would assume that a meshed ipsec VPN setup would be a task to maintain as it would likely be all manual.
I am all for open source systems but find that Cisco 8xx series routers are well priced(under $500) and easily managed for easy mesh vpn setups for up to 20 links. I run this setup with a ASA5510 at the ce
it's called dsync (Score:5, Interesting)
and you can find documentation for it here:
http://www.cs.cmu.edu/~dga/papers/dsync-usenix2008-abstract.html [cmu.edu]
It is rsync on steroids that uses a BitTorrent-like P2P protocol that is even more efficient because it exploits file similarity.
You may have to contact the author of the paper to get the latest version of dsync, but I am sure they would be more than happy to help you with that.
Re:it's called dsync (Score:5, Informative)
I hate to reply to my posts, but this link has an even shorter description of the tool:
conferences.sigcomm.org/sigcomm/2008/papers/p505-puchaA.pdf
Parent
Use existing technology (Score:5, Funny)
Windows DFS -- Dont use FRS (Score:5, Informative)
Kontiki (Score:3, Informative)
This is a solved problem (Score:4, Interesting)
It's spelled 'NNTP'. Look at how Usenet newsgroups, especially for binaries, have worked for decades for a robust distribution model. The commands to assemble the messages can be scripted as well.
Similarly, the bottorrent files you describe can also be pushed or pulled from a centralized target list and activated via SSH as needed.
BitTorrent not efficient for this scenario (Score:4, Interesting)
I looked at BitTorrent to distribute Windows XP SP2 (about 250 MB) to about 1,500 clients, most of whom were in small sites containing 3 - 5 computers. Network links were skinny.
I broke the SP2 file into 1 MB chunks, and I wanted each chunk of the file to be transferred to a remote site only once(i.e. clients check for local sources first).
BitTorrent didn't do that so I ended up writing a script, and that seemed to work well.
Re:BitTorrent not efficient for this scenario (Score:4, Informative)
Parent
Re:Snail-mail USB sticks (Score:5, Insightful)
Why would they want to pay for those USB sticks (and any shipping fees that might be involved) when they have a perfectly good network already in place to send the data in a secure manner? There are too many variables involved in using USB sticks as a means of transferring back-up data. Sticks could get damaged, lost, stolen, etc, not to mention that the server at each store would need to allow USB access which could potentially open them up to other security risks. Just imagine if someone at a store decided to plug in their own USB stick and swipe a few files. Nice idea, but there are too many risks involved with a physical transfer of data.
Parent
Re: (Score:3, Insightful)
Because depending upon the actual files that might be overkill. For recovery files there's probably a lot of similar or same files in each batch. Something like Jigdo, rsync or distributing diffs might be a lot more efficient.
With those the main concern is having an appropriate client to automatically handle the updating on that end.
Most of those options would also be capable of checking the integrity of previous updates and could be run more frequently just to verify that the data is uncorrupted. I think t
Re:Bittorrent is not secure (Score:5, Informative)
While security is always something to be considered, this from the question:
"private network of retail stores connected to our corporate office (and to each other) with IPsec over DSL, and no access to the public internet"
Private network? Check.
No access to public internet? Check.
So pretty much no way for the files to be seeded outside the company.
And even if there were a way to seed on the internet when they don't have access to it, password protect the file so only a client with the password can download it. That's not unbreakable, but if a competitor wanted the information there are easier ways to get it.
Parent
Re:load balancing? (Score:5, Interesting)
Parent
Re: (Score:3, Insightful)
I don't like the DVD option. If it was a matter of sending out to "the other site," that'd be one thing. But, if you need to burn hundreds of DVD's for all the locations it suddenly becomes practically a full time job that could be replaced with a shell script and the WAN. I mean, 300 stores, assuming 15 minutes per DVD (in