BitTorrent For Enterprise File Distribution? 291
HotTuna writes "I'm responsible for a closed, private network of retail stores connected to our corporate office (and to each other) with IPsec over DSL, and no access to the public internet. We have about 4GB of disaster recovery files that need to be replicated at each site, and updated monthly. The challenge is that all the enterprise file replication tools out there seem to be client/server and not peer-to-peer. This crushes our bandwidth at the corporate office and leaves hundreds of 7Mb DSL connections (at the stores) virtually idle. I am dreaming of a tool which can 'seed' different parts of a file to different peers, and then have those peers exchange those parts, rapidly replicating the file across the entire network. Sounds like BitTorrent you say? Sure, except I would need to 'push' the files out, and not rely on users to click a torrent file at each site. I could imagine a homebrew tracker, with uTorrent and an RSS feed at each site, but that sounds a little too patchwork to fly by the CIO. What do you think? Is BitTorrent an appropriate protocol for file distribution in the business sector? If not, why not? If so, how would you implement it?"
Sneakernet (Score:5, Insightful)
Re:Sneakernet (Score:5, Insightful)
The bandwidth of a DVD in the postal service isn't great but it's reasonable and quite cost effective.
From the summary: "I would need to 'push' the files out, and not rely on users to click a torrent file at each site." I imagine that the following is also true: "I would need to 'push' the files out, and not rely on users to insert a disc and run setup.exe at each site."
Re: (Score:3, Insightful)
surely "push the files" to a remote site is the same as "posting the files" via a different transport mechanism. When people say that they need to remotely push the files, its not that the users can't/won't be able to handle them if they're not there already setup, its because they'll forget or just be too lazy to click the button to retrieve them. A DVD in the post is difficult to miss.
However, a DVD in the post may not arrive or may be corrupt.
Congressnet (Score:5, Funny)
Hey I resent that. None of my DVDs have ever taken a bribe.
But I thought the Enterprise used (Score:4, Funny)
I'm confused.
Re: (Score:2)
If that's the case, you would need a store to run setup.exe every time...
Re: (Score:2)
What has that got to do with anything? If that's the case, the files that are currently pushed out to DR still has to be executed manually or automagically.
The OP is not asking for that - the OP wants the files to be transferred automagically. A DVD works perfectly fine, just has high latency.
Re: (Score:2)
I don't think that putting a DVD into a (hopefully) physically secured computer is as automagical as doing absolutely nothing on the client end while a script/daemon takes care of all the work.
Re:Sneakernet (Score:4, Insightful)
Also, burning (and packaging and mailing...) a bunch of DVDs isn't necessarily cheap/quick/easy, so it breaks down pretty quickly as the number of stores increases.
Re: (Score:2, Funny)
"What to do if there is a fire on store property.pdf"
Step one: turn off the computer.
Oh crap! what's step two?
Re: (Score:2, Insightful)
I dunno, but step three is profit.
Re: (Score:2, Funny)
Slashdot: news for nerds, stiffs that matters.
Re: (Score:2)
The bandwidth can be great, it's the latency that kills you.
Different torrent client ? (Score:5, Informative)
Re:Different torrent client ? (Score:5, Interesting)
rtorrent [rakshasa.no] watching a directory for .torrent would be the way to go. And then use unison [upenn.edu] to keep the .torrent directory in-sync.
Re: (Score:2)
Azureus automatically pulls it's updates as well, so maybe there's a way to get Azureus to auto-subscribe to files, too.
technologies working together isn't patchwork (Score:2)
ask us (Score:5, Informative)
Next time you should ask at the official BitTorrent IRC channel [irc].
The Python BitTorrent client [bittorrent.com], which runs on Unix, has a version called "launchmany" which is easily controlled via script. It should fit your needs very nicely.
Works great (Score:5, Insightful)
BitTorrent is an excellent intranet content-distribution tool; we used it for years to push software and content releases to 600+ Solaris servers inside Microsoft (WebTV).
-j
Sure, why not? (Score:5, Insightful)
Sure! BitTorrent, remember, is only a protocol, it's just become demonized due to the types of files being shared using it. But if you're sharing perfectly legitimate data, then what's wrong with using a protocol that's already been extensively tested and developed?
Just because it's been used to pirate everything under the sun doesn't make it inappropriate in other arenas.
Re: (Score:3, Insightful)
Pirates still prefer FTP, it seems all of the big warez groups are still pushing files around using FTP...
Re:Sure, why not? (Score:4, Insightful)
Re: (Score:3, Informative)
You're talking about the difference between the provider pirates and the end-user pirates. SCENE people hate p2p. Average Joe-wants-stuff-for-free doesn't know what the "scene" is, and uses p2p (always wondering why torrents say RELOADED or RAZOR1911).
Re: (Score:3, Interesting)
The main problem is that it introduces an extra vulnerability. With it the capability of very efficiently spreading malware and viruses around. Depending upon how locked down things are, it might not be a problem, but still it's definitely something to worry about.
And yes, I am assuming that somebody's going to get their machine infected or that somebody's going to break into the VPN traffic. Not necessarily likely, but still has to be considered.
Just don't call it BitTorrent (Score:2, Interesting)
...or P2P when you first mention it to the CIO.
I would venture most CIOs' exposure to such things has been limited to what the popular media is pushing: BitTorrent == PIRACY.
I'd recommend sticking to vague terms like "Distributed file transfer".
Re:Sure, why not? (Score:5, Informative)
One of the things that always amused was when people claimed Bram Cohen was "selling out" by working with the movie/music industry. BitTorrent was never intended for piracy use, it's merely it's most common use.
It's very regularly used for Linux distros, game patches (World of Warcraft!), etc.
Re: (Score:2, Informative)
Re: (Score:2)
rsync (Score:5, Informative)
Re:rsync (Score:5, Informative)
Yes, and there are ways you can use rsync from well-planned scripts that are very powerful beyond just file transfer.
1. The basic case of "transfer or update existing files at destination to match source." It always takes advantage of existing destination data to reduce network transfers.
2. The creation of a new destination tree that efficiently reuses existing destination data in another tree without modifying the old tree. See --copy-dest option.
3. In addition to the previous, don't even create local disk traffic of copying existing files from the old tree to new, but just hard link them. This is useful for things like incremental backup snapshots. See --link-dest option.
It may not be as sexy as p2p protocols, but you can implement your own "broadcast" network via a scattered set of rsync jobs that incrementally push their data between hops in your network. And a final rsync with the master as the source can guarantee that all data matches source checksums while having pre-fetched most of the bulk data from other locations.
I've been enjoying various rsync applications such as the following (to give you an idea of its power): Obtain any old or partial mirror of a Fedora repository and update it from an appropriate rsync-enabled mirror site, to fill in any missing packages. This is a file tree of packages and other metadata. Concatenate all of the tree's files into one large file. Then use rsync to "update" this file to match a correponding DVD re-spin image on a distro website. Rsync will figure out when most of those file extents cooked into the ISO image are already in the destination file, and just go about repositioning them and filling in the ISO filesystem's metadata. An incredibly small amount of traffic is spent performing this amazing feat.
Re: (Score:2)
You know, I'd had a need for an rsync-like tool for Windows (specifically between Windows Server 2003 machines). I found a Windows-based rsync implementation (whose name I can't recall), but the tool was clunky and unreliable. I saw someone suggest Unison, but do you have any other suggestions specifically for Windows?
Re: (Score:2, Informative)
Re: (Score:2)
In a word, Yes (Score:5, Informative)
I've seen bittorrent used for several business critical functions. One example is world of warcraft distributing updates using it.
Re: (Score:2)
I suspect that the GP is either referring to professional farmers or to Blizzard's own staff.
Re:In a word, Yes (Score:5, Insightful)
For Blizzard, updates to World of Warcraft are very much a "business critical function".
Cisco already makes a product to do this - WAAS (Score:5, Informative)
It is like Rsync on steroids. Cisco's Wan optimization and Application Acceleration product allows you to "seed" your remote locations with files. It also utilizes some advanced technology called Dynamic Redundancy Elimination that replaces large data segments that would be sent over your WAN with small signatures.
What this means in a functional sense is that you would push that 4 Gig file over the WAN one time. Any subsequent pushes you would only sync the bit level changes. Effectively transferring only the 10 megabytes that actually changed.
While it is nice to get the propeller spinning, there is no sense reinventing the wheel.
Cisco WAAS - http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html [cisco.com]
Re: (Score:3)
I'm a huge fan of WAN accelorators (though I prefer the products from Riverbed), but not sure of the fit here (and is certainly isn't anything like what the OP is asking about). First, these devices aren't cheap especially if you need to communicate between tons of locations as seems to be the case here as each location will require a unit. Even the lower-end product in the category will easily run 10k. Second we don't know how much the files being moved once a month are similar. If not a majority ident
Re:Cisco already makes a product to do this - WAAS (Score:5, Interesting)
Bittorrent will transfer the differences too, if you make a new file overwrite an old one, it will replace any chunks which are different.
Re:Cisco already makes a product to do this - WAAS (Score:4, Informative)
BitTorrent is not very flexible in this regard and so if you have bits -added- to the middle, then everything after the first added bit will need to be updated.
The worse case is of course, if you have new material at the beginning and everything is shifted. BitTorrent is not designed for that.
Re:Cisco already makes a product to do this - WAAS (Score:4, Interesting)
Use bittorrent to distribute git blobs. They are immutable & append only; perfect for something like bittorrent. All you'd really need is a good means of syndication via Atom, & end users capable of understanding SCM.
Re:Cisco already makes a product to do this - WAAS (Score:4, Informative)
Even with a large file only the differences can be retransmitted with bittorrent, provided that the overall filesize doesn't change. At startup, bittorrent will verify the local data and then discard and redownload the chunks that don't match the checksum in the torrent file.
But rsync would be a better solution in this scenario as it was explicitly designed for such a use and will handle changes to the file much better.
Re: (Score:3, Insightful)
presumable "on steroids" means "with a fancy GUI".
rsync does this too. rsync can push or pull.
besides, there are plenty of rsync gui's, too.
however, bittorrent is almost certainly the best solution for this purpose -- the real question is coherency. You always know that eventually you'll have a complete and perfect copy at each location -- but how do you know WHEN that copy is complete so you can work on it? if this is strictly a backup system, then it's not needed, but it's probably not a good thing to be
If the CIO expects "official" support... (Score:5, Informative)
Personally I like the portable media shipment suggestions. But if your CIO/company requires enterprise software from a large vendor with good support, have a look at IBM's Tivoli Provisioning Manager for Software:
http://www-01.ibm.com/software/tivoli/products/prov-mgrproductline/ [ibm.com]
Besides the usual software distribution, this package has a peer-to-peer function. It also senses bandwidth. If there's other traffic it slows down temporarily so it won't saturate the link. Once the other traffic is done (like during your off-hours or maintenance windows) it'll go as fast as it can to finish distributing files.
Re:If the CIO expects "official" support... (Score:4, Insightful)
Actually, there is a Tivoli product that does more or less exactly what the OP asks for: IBM Tivoli Provisioning Manager for Dynamic Content Delivery [ibm.com]
No, you fool! (Score:5, Funny)
Re: (Score:2, Interesting)
Oooooh... I can see the whole issue of throttling suddenly becoming very amusing as the corporate behemoths start slugging it out.
Chained client/server (Score:4, Insightful)
Have you thought about building up a distribution tree for your sites?
Group all of your stores based upon geographic location. State, region, country, etc. Pick one or two stores in each group and they are the only ones that interact with the parent group.
E.g. Corporate will distribute the files to two locations in each country. Then two stores from each region will see that the country store has the files and download them. Repeat down the chain until all stores have the files.
load balancing? (Score:2)
why not spread out the backups? Limit the bandwidth of the backups to allow enough regular traffic and have different stores send their backups on different days
Re:load balancing? (Score:5, Interesting)
Re: (Score:2)
It doesn't appear like it's meant to get pushed to terribly many computers so a really low connection cap like 20 should probably be enough and I have a hard time seeing how that would trash any router.
Re: (Score:2)
Considering its all going to be running over an IPSEC tunnel it could cause problems unless his off-site routers are well built (dlink 4tl).
A basic box running monowall is pretty good at this kind of routing requirements, mainly because all the IPSEC requires is more memory and processing power, which can be upgraded then :)
Captain disillusion (Score:5, Informative)
with IPsec over DSL, and no access to the public internet.
Unless you have very long wires, some box is going to route them. Are those your own?
Otherwise, your ISP's router, diligent in separating traffic though it may be, can get hacked.
Why am I saying this? Not to make you don your tinfoil hat, certainly, but just to point out that if the scenario is as I describe, you're not 100% GUARANTEED to be invulnerable. Maybe a few tinfoil strips in your hair would look nice... ;)
About the actual question: bit torrent would probably be fine, but if most of the data is unchanged between updates, you may want to compute the diff and then BT-share that. How do you store the data? If it's just a big tar(.gz|.bz2) archive, bsdiff might be your friend.
If you push from a single seeder to many clients, maybe multicast would be a good solution. But that's in the early design phase I think, which is not what you need :)
Best of luck!
Re: (Score:2)
How I would do it... (Score:5, Interesting)
...is quite straight forward in fact.
This has many advantages:
The beauty of this system is that it relies heavily on existing technology (BitTorrent, RSS, GnuPG, etc), so you can just throw together a bunch of libraries in your favourite programming language (I would use Python for myself), and you are done. Saves you time, money and a lot of work!
Furthermore you do not need to have a VPN set up to every destination as your files are already encrypted and properly signed.
Another advantage is: As this is a custom-built system for your use-case it should be easy to integrate it into your already existing one.
Re: (Score:2)
why would they need to use gnuPG? The submitter did say that this was on a private network over ipsec links with no access to the internet.
Re:How I would do it... (Score:5, Informative)
Not necessarily true. PGP allows you to sign with multiple keys. Each site would have their own key that they would use to decrypt the file. One file, multiple keys, multiple users. Simple.
How is the VPN setup (Score:5, Informative)
Your best bet is multicast, there are programs for software distribution that use multicast.
The question remains.. (Score:3, Insightful)
How are they connected to each other? If the same bottleneck router is used to reach each other, then it is a mott point. People often forget about the underlying network workings and abstract away that important detail. They can reach each others IPs, but that is not to say all traffic goes through the same weak link in the chain regardless.
Re: (Score:2)
I would think that "and to each other" would mean connections to other stores as opposed to a central router, but the summary does kind of suck.
Re: (Score:2)
If using IPSec to maintain a private network over an untrusted provider, I find it hard to believe that they actually have full mesh configured. It's possible, but unlikely...
Re: (Score:3, Informative)
I also assumed that this was hub and spoke and that the "to each other" statement was just routing. Depending on the number of remote sites, and that he did not mention a specific hardware supplier, I would assume that a meshed ipsec VPN setup would be a task to maintain as it would likely be all manual.
I am all for open source systems but find that Cisco 8xx series routers are well priced(under $500) and easily managed for easy mesh vpn setups for up to 20 links. I run this setup with a ASA5510 at the ce
Re: (Score:2)
The way that he phrased the summary suggested not to me.
It read like he's got a central VPN server at the "corporate office" with the shops connecting to that. I would guess that shops can route to each other, but it's not going to help corporate office bandwidth if shop A can only get to shop B via the centre.
it's called dsync (Score:5, Interesting)
and you can find documentation for it here:
http://www.cs.cmu.edu/~dga/papers/dsync-usenix2008-abstract.html [cmu.edu]
It is rsync on steroids that uses a BitTorrent-like P2P protocol that is even more efficient because it exploits file similarity.
You may have to contact the author of the paper to get the latest version of dsync, but I am sure they would be more than happy to help you with that.
Re:it's called dsync (Score:5, Informative)
I hate to reply to my posts, but this link has an even shorter description of the tool:
conferences.sigcomm.org/sigcomm/2008/papers/p505-puchaA.pdf
Call me old fashioned (Score:2)
Use existing technology (Score:5, Funny)
IPSec over DSL (Score:2)
Are you using IPSec in Tunnel mode or Transport mode? If you're using it in tunnel mode, then you're not going to fix your bandwidth problem, because all data has to go through corporate HQ anyway because that's where the tunnels end.
I must say.. (Score:2)
I;m guilty of abstracting away that detail in contemplating his article.
If it proves his network architecture has the same bottleneck either way, all the more reason he needs to take a hard look at is data and how amenable it is to rsync.
Windows DFS -- Dont use FRS (Score:5, Informative)
NFS with DFS (Score:2)
More questions.. (Score:2)
What platform is used?
Is it scriptable readily?
How scheduled are the updates?
How similar is the data day to day?
Things come to mind as a tradtionally Unix admin:
-cron job to download the file using screen and btdownloadcurses
-ssh login to each site and do the same (if need to push at arbitrary times)
-rsync (if the day-to-day diff is small, might as well do this)
Analogous procedures can probably be down for whatever platform you choose. Learning how to generically apply this strategy in the platform of choi
Kontiki (Score:3, Informative)
Re: (Score:2, Informative)
This is a solved problem (Score:4, Interesting)
It's spelled 'NNTP'. Look at how Usenet newsgroups, especially for binaries, have worked for decades for a robust distribution model. The commands to assemble the messages can be scripted as well.
Similarly, the bottorrent files you describe can also be pushed or pulled from a centralized target list and activated via SSH as needed.
linux and bittorrent, and some light scripting. (Score:2)
consider scripting the process of creating a torrent file of the data that needs replication. At each remote site, run some linux or bsd system and setup ssh keys so the central server can run a script on each remote machine.
setup a local bittorrent tracker.
On the main server, script building the torrent file and run an upload script against a list of remote sites that would download the torrent file via scp and run it until it has seeded out a given amount OR has run for x days.
The only issue here that I
Multicast? S3? (Score:2)
Why not send it simultaneously to all locations using multicast?
What about uploading an encrypted version to S3 which can then be downloaded via torrent or the S3 API?
bittorrent URLs need to work with browsers ... (Score:2)
Already included! (Score:2)
Doesn't a BitTorrent folder already allow adding additional stuff later?
I would recommend making a small modification of an existing open source torrent client:
Let the download never stop. Make it look for now parts, updates to downloaded parts (via sha1), and new files in the directory structure of the torrent until the end of time.
That way you have an instant error-resistant peer-to-peer backup and replication service that is as easy to use, as copying (or linking) the files into the right folder.
Re: (Score:2)
..and a fantastic way to seed all sorts of nasties into an originally clean download.
Comment removed (Score:4, Interesting)
Re:BitTorrent not efficient for this scenario (Score:4, Informative)
netapp (Score:3, Insightful)
What you really want is a solution like Netapp file servers. It will distribute the files at the file system block level, updating only the blocks that have changed. You install a filer at your central office, then have multiple mirrors of that one at the various field offices. All the PCs get their boot image off the network file server (the local one). With one update, you can upgrade every PC in the entire company.
Re:Snail-mail USB sticks (Score:5, Insightful)
Why would they want to pay for those USB sticks (and any shipping fees that might be involved) when they have a perfectly good network already in place to send the data in a secure manner? There are too many variables involved in using USB sticks as a means of transferring back-up data. Sticks could get damaged, lost, stolen, etc, not to mention that the server at each store would need to allow USB access which could potentially open them up to other security risks. Just imagine if someone at a store decided to plug in their own USB stick and swipe a few files. Nice idea, but there are too many risks involved with a physical transfer of data.
Re: (Score:3, Insightful)
Because depending upon the actual files that might be overkill. For recovery files there's probably a lot of similar or same files in each batch. Something like Jigdo, rsync or distributing diffs might be a lot more efficient.
With those the main concern is having an appropriate client to automatically handle the updating on that end.
Most of those options would also be capable of checking the integrity of previous updates and could be run more frequently just to verify that the data is uncorrupted. I think t
Re: (Score:2)
4GB of files once per month, why bother using the network?
No one ever seems to answer the question. The dude has his reasons.
I find myself in a similar situation. 7 offices connected via Comcast cable. Every single office has a local backup to a USB-attached external hard drive. But they also want off-site backups in case of fire or flood. Making a rount-trip between the 7 offices takes half a day. None of the staff at the offices are technically competent. They used to do tape-backups at each office, but people would forget, tapes would go back, staff d
Re: (Score:3, Informative)
We use both to replicate data between windows servers internally and on external sites.
Re: (Score:2, Insightful)
Better yet, tack on:
6. Give the script that handles this a name, build deployment tools, and release them under GPL.
Re:Bittorrent is not secure (Score:5, Informative)
While security is always something to be considered, this from the question:
"private network of retail stores connected to our corporate office (and to each other) with IPsec over DSL, and no access to the public internet"
Private network? Check.
No access to public internet? Check.
So pretty much no way for the files to be seeded outside the company.
And even if there were a way to seed on the internet when they don't have access to it, password protect the file so only a client with the password can download it. That's not unbreakable, but if a competitor wanted the information there are easier ways to get it.
Re: (Score:2)
password protect the file so only a client with the password can download it.
I don't know of a good way to do that with BitTorrent. Simpler to just encrypt the whole file, so anyone who downloads it is just helping seed, and can't read the file.
That's not unbreakable
With a large enough key, and properly applied crypto, it can be unbreakable until quantum computers become feasible.
As for DHT, I don't see where that's a problem -- trivial to simply disable it, or use a client which doesn't support it.
Re: (Score:2)
I would go as far as to recommend encrypting the files before putting them on a seed node. The simplest would be to use an archiving program that offers AES encryption (7Zip, WinZip, WinRAR, StuffIt) and give all branch sites the password.
You can also use TrueCrypt volumes with a keyfile sent via E-mail and encrypted with the site admin's PGP or S/MIME key for better security.
Re: (Score:2)
DHT or the like might seed your files outside the company. Ok, I'm too lazy to work out if that really is a threat, but I'm not sure that bitorrent is appropriate for data that you don't want to end up in the public domain.
Every BitTorrent client that supports DHT also has the ability to disable it.
In addition, since this is a VPN network, the client IP addresses are likely to be non-routable, so even if you did leak the torrent through DHT, it's pretty unlikely that anyone outside the company would be able to connect to a client running at 192.168.1.1.
Re: (Score:2)
Please, think of the PFYs. His DR fileset is only 4Gigs. My pr0n is bigger than that. ASCII/text pr0n!
Others have already given him the best solution for his case - DVDs. Overnight them, and he is done. Latency may be a bit much, but not that much more than doing it over DSL or dialup.
Now, lets go back to discussing OT stuff.
Re: (Score:3, Insightful)
I don't like the DVD option. If it was a matter of sending out to "the other site," that'd be one thing. But, if you need to burn hundreds of DVD's for all the locations it suddenly becomes practically a full time job that could be replaced with a shell script and the WAN. I mean, 300 stores, assuming 15 minutes per DVD (in
Re: (Score:2)
Did you actually read what the OP wrote? IPsec to home office. NO PUBLIC INTERNET. NOTHING IN THE BITTORRENT SPEC WILL HELP because all the bittorrent traffic *still has to come home* to go back out.
The easiest way is just to script a push out to the individual stores.
Explain to me how bit torrent is going to help his home office wan traffic congestion?!
Re: (Score:2)
Well, why provision the data center with more expensive bandwidth, if a p2p solution can solve the problem without spending much/any extra money? Don't ever buy more of a resource until you are efficiently using the resource. Only if you are using it efficiently (or at least, as efficiently as you really can), and it's *still* not enough, should you actually buy more.
Businesses are pretty adamanant about expense justification (and they should be). You have to justify any expenses, and even when they are jus
Re: (Score:2)
agreed. Have you been watching the economy? well-funded shouldnt imply retarded.
If you maintain a culture of appropriate thriftiness at every level of your organizations, you will likely never get to the point of having 1 executive riding a private jet that can move 20 people for a dinner meeting.
That being said, bandwidth can be pretty cheap and at most places around the country you can get 20Mb of fiber for $500-$700/m.
Remeber the key word in the phrase, "appropriate" thriftiness.
Re: (Score:2)
So where is the '-1, Spam' mod?
Re: (Score:2)
Looks more like -1 Troll to me.
Re: (Score:3, Funny)