Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Networking Software IT

Use BitTorrent To Verify, Clean Up Files 212

jweatherley writes "I found a new (for me at least) use for BitTorrent. I had been trying to download beta 4 of the iPhone SDK for the last few days. First I downloaded the 1.5GB file from Apple's site. The download completed, but the disk image would not verify. I tried to install it anyway, but it fell over on the gcc4.2 package. Many things are cheap in India, but bandwidth is not one of them. I can't just download files > 1GB without worrying about reaching my monthly cap, and there are Doctor Who episodes to be watched. Fortunately we have uncapped hours in the night, so I downloaded it again. md5sum confirmed that the disk image differed from the previous one, but it still wouldn't verify, and fell over on gcc4.2 once more. Damn." That's not the end of the story, though — read on for a quick description of how BitTorrent saved the day in jweatherley's case.


jweatherley continues: "I wasn't having much success with Apple, so I headed off to the resurgent Demonoid. Sure enough they had a torrent of the SDK. I was going to set it up to download during the uncapped night hours, but then I had an idea. BitTorrent would be able to identify the bad chunks in the disk image I had downloaded from Apple, so I replaced the placeholder file that Azureus had created with a corrupt SDK disk image, and then reimported the torrent file. Sure enough it checked the file and declared it 99.7% complete. A few minutes later I had a valid disk image and installed the SDK. Verification and repair of corrupt files is a new use of BitTorrent for me; I thought I would share a useful way of repairing large, corrupt, but widely available, files."
This discussion has been archived. No new comments can be posted.

Use BitTorrent To Verify, Clean Up Files

Comments Filter:
  • Nice (Score:5, Interesting)

    by Goldberg's Pants ( 139800 ) on Sunday May 04, 2008 @07:31PM (#23295424) Journal
    Awesome idea. I've done this in the past with stuff. If a corrupt version was on one tracker, I'd save the files, get a new torrent and import the old files. Saves a lot of bandwidth wasting.
  • by b4dc0d3r ( 1268512 ) on Sunday May 04, 2008 @07:36PM (#23295460)
    If I happen to see a stuck torrent (many leechers, no seeds), sometimes I can find a good version of the file I already have - so I start the torrent, stop it, replace the single good file (sometimes you need more if the file is smaller than the part size), and upload a few Kb to finish the torrent. Then sit back and watch as everyone fills up.
  • by CSMatt ( 1175471 ) on Sunday May 04, 2008 @07:40PM (#23295486)
    Are their even MD5 hashes on Apple's download pages for such large files? Jusging by how the article was written and the lack of hashes on the QuickTime and iTunes download sites, it doesn't seem like they even bother.
  • by Bazar ( 778572 ) on Sunday May 04, 2008 @07:43PM (#23295508)
    One should be more concerned as to why your files are becoming corrupted.

    I'd say its a safe bet that the files from apple.com are in perfect condition.

    Which means it either became corrupted in transit to, or on arrival to your machine.

    Which leads the question, is your memory defective
    run memtest86 to check your memory.
    http://www.memtest86.com/ [memtest86.com]

    Check if your Harddrives have SMART and are reporting anything. A disk checker would also be a good idea.

    The other idea that springs to mind is if your behind some proxy with the above problems, although i doubt anyone would want to proxy a 1.5gig file.

    Fact is, if files are being corrupted on your disk, its just a matter of time before something more important is hit by corruption.
  • by Anonymous Coward on Sunday May 04, 2008 @07:47PM (#23295558)
    Those who have never developed P2P software might never understand why they all need to use strong checksums to detect data corruption, and why bad blocks actually do appear in the wild; frequently.

    You'd be shocked - SHOCKED - at how much data gets corrupted routinely - by errant antivirus software, flaky network equipment, plain ol' line noise that the checksums don't detect (which will happen much more often than you expect, see also birthday paradox), or misbehaving routers who think that any occurence of 0xC0A80102 obviously must be an internal IP address and needs to be changed to your external one. Even if that's in the middle of a ZIP file. Oops.

    Encryption actually aids this somewhat, as the same byte patterns don't get repeated, so if there's an errant IDS changing things for example, it tends not to fire the second time.

    I've done this before for file repairs. Works a treat, but you sort of wish that torrent used a Merkle hash tree such as the modified THEX standard Tiger Tree Hash. SHA-1's so last century.
  • by trawg ( 308495 ) on Sunday May 04, 2008 @07:47PM (#23295560) Homepage
    We have been doing this for ages [ausgamers.com] for certain high-demand games file that we mirror. While offering torrents for some of our download mirrors is only mildly useful (as we're in Australia we're trying to keep bandwidth on-shore to cut down international traffic, and BT doesn't really help this), it is extremely helpful for the VAST amount of users that appear to either have massively crazy Internet problems or are simply unable to drive a HTTP based downloader and resume downloads.

    When a large number of users are having problems downloading or resuming a particular file, I simply create a torrent for them and give them some vague instructions about how to resume it and then generally I never hear from them again. They're happy because they don't have to download a 4gb game client again from scratch, they don't have to worry about resuming/corrupt downloads, and because its a torrent it probably feels like they're getting something for free that they shouldn't be.
  • by Anonymous Coward on Sunday May 04, 2008 @07:50PM (#23295590)
    could also be one's routers.

    There was a problem w/ dlink routers back in the day that hit alot of p2p users. If you placed your machine in the dmz, the router basically did a search and replace on all packets replacing the bitstring representing the global address w/ the bitstring representing the local address. On large files, this didn't just hit in the ip header, but in the data as well corrupting it. If you didn't use dmz functionality, just port mapping, it worked fine, so if you were using bittorrent, you'd get repeated hash fails on some parts that would never fix, because bitorrent has no capability to work around that (as opposed to eMule's extensions)
  • by greerga ( 2924 ) on Sunday May 04, 2008 @08:01PM (#23295658)
    For even more fun, if you have two differently-corrupted copies of a file and a torrent to go with it, then you can have BitTorrent stitch them together into a valid file without involving any third parties.

    I used Azureus's internal tracker ability and two computers on a local network with the torrent modified to track on one of the machines, and one corrupted copy of the file on each.

    Obviously only works if they don't have corruption in common, but it also doesn't require the original torrent file tracker to work anymore.
  • Re:Nice (Score:5, Interesting)

    by Goldberg's Pants ( 139800 ) on Sunday May 04, 2008 @08:05PM (#23295704) Journal
    Okay, I had some AVI's and a bunch of them had issues. All I did was copy them out to a different directory, then find a GOOD torrent (with the same rips) then make sure the filenames match exactly. Chucked them in the directory and voila. It checks them all and uses what data it can that you already have and replaces the rest.

    Done this with RAR archived stuff as well. (Multipart rars on torrents are retarded, but that's another issue entirely.)
  • What a novel idea!!! (Score:3, Interesting)

    by WarJolt ( 990309 ) on Sunday May 04, 2008 @08:06PM (#23295706)
    Using bit torrent for it's actual legal intended use. I love it!!!

    I'm not a lawyer though. I just hope it doesn't violate apples NDA. Please please please follow the rules. Don't want to see you in prison or slapped with a large fine.

    Bit torrent has received a bad reputation because of pirates. There are legitimate uses though. I do believe that doctor who episodes aren't public domain, so shame on you for that. Might want to be careful what you admit to on /.
  • Re:Nice (Score:5, Interesting)

    by ThePhilips ( 752041 ) on Sunday May 04, 2008 @08:09PM (#23295728) Homepage Journal

    I do not know what GP meant precisely, but I had similar experience.

    Some game (very old RPG) was available on Overlord and on BitTorrent. Not sold anymore. Problem was that BitTorrent had only single seed which minuscule upload speed - in several day I have downloaded only few megs. I tried then Overlord and in few days I got the game almost complete - but another snag had hit me: whether by mistake or intentionally, file was poisoned and three parts couldn't be downloaded. I was ready to throw everything away - antique games interest me little (but friend was recommending it as milestone RPG I had to play). Then suddenly I was enlightened: I fed the incomplete ISO of game to BitTorrent. BT client happily announced something like 98% of file complete and in less than one night downloaded rest of the file.

  • by cheesybagel ( 670288 ) on Sunday May 04, 2008 @08:38PM (#23295898)
    Maybe, maybe not.

    IIRC TCP/IP has a guaranteed maximum error rate of at least 10^-5 bits. Well, the thing is, 1.5 Gigabytes is over 10^10 bits in length. So even at such an error rate, it is not guaranteed that your file will arrive without bit errors.

  • by erexx23 ( 935832 ) on Sunday May 04, 2008 @10:44PM (#23296652)
    I have been using Torrents for this very reason.

    I was being required to copy sometimes 10-20GB of Virtual Machine Image Files from Server to PC or PC to PC on up 40 machines at one time.
    This was taking way too long and copies were not perfect.
    Restoration of VM images presented the same problem.
    Updating a VM meant redistribution of the entire file to all machines again.

    Using (Micro) Torrent and my own tracker changed all that.

    I came up with the following solution using all available resources.
    First I started by copying all images to workstations to a separate partition. (about 200GB of VM's.)
    Then I created created my own internal Tracker and Web Page to host torrents.

    The results were:
    1. Extremely efficient use of all available network hard drive space.
    2. Utilities every machine on the network to distribute the files.
    3. Works extremely well restoring or redistributing the VM's to any one machine or several machines at once. (The more the better)
    4. 100% accuracy in distribution.
    5. The ability to quickly modify any one image on any machine, recreate the torrent(hash) and then update that image across hundreds of machines very quickly.
    In other words, modifying a file only means that the machines only have to download the bits that changed not the whole image again.
    6. With Micro Torrent any machine can be used as the tracker.
    7. The Tracker is also the "master" file server, however any machine can be used to modifiy and upload a change
    Just recreate and re-upload the new torrent replacing the old one. Remember that a torrent file serving network is Not a server centric file sharing system.
  • Re:Nice (Score:4, Interesting)

    by i.of.the.storm ( 907783 ) on Sunday May 04, 2008 @11:54PM (#23297056) Homepage
    I think it has to do with the way the "scene" releases things, they usually do it via multipart rars or something like that. I saw something to that effect in the comments on a torrent a while ago. I think the reason is that things in the "scene" get distributed in ways that aren't bittorrent, so the breaking up into pieces makes sense there. I'm still not entirely sure what the "scene" entails, and how they differ from the people that put the torrents up, so I don't know the whole answer to that.
  • Re:Nice (Score:2, Interesting)

    by Jarik_Tentsu ( 1065748 ) on Monday May 05, 2008 @01:15AM (#23297476)
    Mmm, in my experience, Firefox's Download Manager occassionally leaves me with incompletely downloaded files - especially when they're big. Dunno whether this is a bad connection (Telstra, I wouldn't be surprised) or an issue with the actual Download Manager, but I don't get these isseues when using Free Download Manager.

    Anyways, I've done this before for a different thing.

    There was a rare file I was trying to get my hands on, which was fairly large, but corrupted. There was a torrent which had it too, but was giving out really slow speeds (like...1-2 seeders, 3-4 leachers who must've been on dial up or Telstra broadband...). So I HTTP downloaded the corrupt file, then used the torrent to fix up the last corrupted parts. Worked perfectly. =)

    ~Jarik
  • by rdebath ( 884132 ) on Monday May 05, 2008 @01:29AM (#23297536)
    Transparent proxies also kill large downloads; especially when the browser is not IE. I hear "not IE" also included IE7!
  • by Anonymous Coward on Monday May 05, 2008 @02:18AM (#23297706)
    A few years ago, I wrote some software for patching and updating a rather large software installation on multiple clients. Even on a LAN, we saw approximately 1 bit error for every 4GB of data transfered over raw TCP/IP. Errors happen.

    True reliability over TCP requires strong checksums on top of the weak error-correction provided by the protocol. The bottom line is that HTTP and FTP aren't really suited for transferring more than a few megabytes of data without assistance.

    But there is a standard for solving this problem: Metalinker [metalinker.org]. This defines an XML-based standard for block checksums and multiple sources of a source file. Now if people would actually use it to distribute all their large files...
  • Re:!new (Score:1, Interesting)

    by Anonymous Coward on Monday May 05, 2008 @02:28AM (#23297732)
    Nobody follows the "scene releases are never compressed" motto as assiduously as you believe.

    People are welcome to "follow the scene" as long as they play on FTP where using multi-part files are actually useful and efficient. Once they are on a torrent, the uploader can bloody well unrar, check the quality and then seed it. Other users who want to contribute to the torrent can _also_ unrar prior to seeding.

    This way,

    1. I don't have to keep a crappy duplicate rar-ed releases for seeding (which I seldom do, i.e, I stop seeding once I unrar) and an unrar-ed version for my viewing.

    2. The uploader will actually check the damn files before uploading. Have you noticed how many torrents have corrupt RARs? Or are uploaded without the subs?

    3. Some people argue that if a RAR is corrupt in a release, they can just create another torrent with a "fix" for it containing the offending RAR alone. While this might sound dandy, if you had unrar-ed in the first place, this would have never occurred.

    4. And what the hell is with this anachronistic bullshit of splitting AVI files? Hardly anybody uses 700MB physical media nowadays. Yet, the tards continue to split the AVIs because "that's the way it's always been".

    5. While we are at it, what is with people rar-ing shit twice? This is usually done for sub files.

    It's completely maddening for large releases like DVD / BR.
  • Re:!new (Score:3, Interesting)

    by totally bogus dude ( 1040246 ) on Monday May 05, 2008 @09:26AM (#23299698)

    I actually look for some "group names" in the torrents I get - because they provide one file, not a RAR. In other words, provide what people want, and they will respect you for that. Make their life hard, and they will not care about your 1998 social customs. Like anything else in life.

    Firstly, if you use torrents than nobody in the "Scene" gives a flying toss about whether you respect them or not. I have nothing to do with the Scene, and even I know that. They are not ripping things for us, they're ripping things for themselves. We're feeding from their scraps, if you like.

    Once you understand that, all the other arguments become moot. Yes, multi-part RARs in torrents annoys me as well, but the people making them aren't doing it for us. Most (all?) Scene members would much prefer their releases never ever made it onto BT or USENET. Telling them that you disapprove of their distribution practices is, well, hilarious. Like a bank robber telling the cops he disapproves of their regular patrols of the street with all the banks on it. Actually, it's more like a bank robber in the US complaining about a pre-school teacher in Japan because he doesn't like the colour of the crayons they use. Thanks for the input, but who asked you, anyway?

    So you're left trying to convince the people who do upload to more public services to unrar before they upload. More power to you, and I wish you luck. But I think the mob has largely spoken on this matter, and the mob says: "I don't give a crap if I have to unrar it first, so long as it's a) complete and b) a fast download". The torrents with multi-part archives tend to be seeded better than those which contain the extracted file, and therefore more people download the multi-part; which results in more seeds on it, resulting in more people downloading it...

    As for using BT in the Scene -- it's up to them, it's their resources and they can do what they want with them -- so the following is purely mental masturbation. I would think BT would make it harder to keep "safe" and maybe easier to infiltrate. Password-protecting the servers (assuming most BT clients and trackers even support such) is probably insufficient; you'd likely want a local firewall to ensure only other Scene members can connect to your client. Keeping such a list updated in a secure manner would be somewhat tricky, I think, and telling everyone else the IP address of every other member sounds like a no-go.

"Engineering without management is art." -- Jeff Johnson

Working...