Use BitTorrent To Verify, Clean Up Files 212
jweatherley writes "I found a new (for me at least) use for BitTorrent. I had been trying to download beta 4 of the iPhone SDK for the last few days. First I downloaded the 1.5GB file from Apple's site. The download completed, but the disk image would not verify. I tried to install it anyway, but it fell over on the gcc4.2 package. Many things are cheap in India, but bandwidth is not one of them. I can't just download files > 1GB without worrying about reaching my monthly cap, and there are Doctor Who episodes to be watched. Fortunately we have uncapped hours in the night, so I downloaded it again. md5sum confirmed that the disk image differed from the previous one, but it still wouldn't verify, and fell over on gcc4.2 once more. Damn." That's not the end of the story, though — read on for a quick description of how BitTorrent saved the day in jweatherley's case.
jweatherley continues: "I wasn't having much success with Apple, so I headed off to the resurgent Demonoid. Sure enough they had a torrent of the SDK. I was going to set it up to download during the uncapped night hours, but then I had an idea. BitTorrent would be able to identify the bad chunks in the disk image I had downloaded from Apple, so I replaced the placeholder file that Azureus had created with a corrupt SDK disk image, and then reimported the torrent file. Sure enough it checked the file and declared it 99.7% complete. A few minutes later I had a valid disk image and installed the SDK. Verification and repair of corrupt files is a new use of BitTorrent for me; I thought I would share a useful way of repairing large, corrupt, but widely available, files."
Nice (Score:5, Interesting)
Re: (Score:2, Interesting)
Anyways, I've done this before for a different thing.
There was a rare file I was trying to get my hands on, which was fairly large, but corrupted. There was a torrent which had it too, but
Re:Nice (Score:4, Insightful)
Just ship everything with a
(Wow, all the authorities we could annoy with one minor change!)
Re:Nice (Score:5, Interesting)
Done this with RAR archived stuff as well. (Multipart rars on torrents are retarded, but that's another issue entirely.)
Re:Nice (Score:5, Funny)
Re: (Score:2, Funny)
Re: (Score:3, Insightful)
The first rule (Score:5, Informative)
Re:The first rule (Score:5, Funny)
Re: (Score:2)
Of course, it's not exactly the most time-effective way to "download" a file, but I've always wondered why I can't
Re:Nice (Score:4, Informative)
Besides that there is the information theory problem too. If the hash is 128bit long then every 2^128th file will have the same hash. This might seem unlikely if you only compare a few files (such as all the files ever created by man) but compared to the 2^8000000 hashes we where going to calculate it is actually quite substantial.
Re: (Score:3, Funny)
It's ok, I just edited wikipedia to make the age of the universe 10^2299991.
Oh, nearly forgot, ~~~~
Re: (Score:2, Informative)
We still use them, on usenet anyways.
Re: (Score:2, Informative)
For example, if you are missing a total of 3 blocks (one block from 3 different files) you only need to download a
Re: (Score:2, Insightful)
Re:Nice (Score:4, Interesting)
Re: (Score:3, Informative)
With multipart binary posts, a single file is split up between so many posts. Between fifteen to fifty, let's say. It's common for usenet providers to not receive all the posts, so folks are sometimes left with incomplete/corrupt files. Enter the small, spanned archive formats. It's quite common to see up to 10% parity per usenet posting, especially for large files. Small split set
Re:Nice (Score:5, Interesting)
I do not know what GP meant precisely, but I had similar experience.
Some game (very old RPG) was available on Overlord and on BitTorrent. Not sold anymore. Problem was that BitTorrent had only single seed which minuscule upload speed - in several day I have downloaded only few megs. I tried then Overlord and in few days I got the game almost complete - but another snag had hit me: whether by mistake or intentionally, file was poisoned and three parts couldn't be downloaded. I was ready to throw everything away - antique games interest me little (but friend was recommending it as milestone RPG I had to play). Then suddenly I was enlightened: I fed the incomplete ISO of game to BitTorrent. BT client happily announced something like 98% of file complete and in less than one night downloaded rest of the file.
Re:Nice (Score:5, Insightful)
Re:Nice (Score:5, Informative)
Ok, you load torrentB in your favorite Bittorrent client, and start it up. It will automatically create 0-sized files with the names in filesetB (at least, all clients I know do that). Stop the transfer of torrentB, and substitute the 0-sized files in filesetB with the corresponding files in filesetA (may require some renaming). As you restart torrentB, your Bittorrent client will recheck the whole filesetB, keeping the valid parts in order to avoid downloading them. Voilá! You have migrated files from one torrent to another.
Note: You should make sure that the files you are substituting in are the same files you want to download through torrentB or, at least, keep a copy around until you see that the restart check accepts most of their contents.
Re: (Score:2)
Re: (Score:2)
Talking more seriously, fragmenting isn't usually an issue if you don't use FATxx.
Re: (Score:2)
Re:Nice (Score:5, Insightful)
To give you an extreme example, imagine a 100 GB volume which has no files. You create a 1 MB file, and your filesystem places it near the top. Now you create a second file, and your filesystem places it... well, it could place it anywhere except that first 1 MB, so let's say it places it right next to the first file. Uh oh, it turn out that you need to write 1 GB of data to that first file and extend it. Now you have two fragments.
Ok, let's assume our file system is magical and knows that you like to extend files to huge sizes. So it places the second file at the end of the disk, instead. Oops, you fooled you file system: this time, you wanted to extend the second file by 1 GB. There is no room to append to the end of the file, so a second extent is created somewhere else and linked to the second file. You have two fragments again.
This is why performance tuning requires that you anticipate data requirements and allocate space accordingly; for example, by setting the initial size of database files to one that should reasonably accommodate the data requirements for the foreseeable future (and not automatically shrinking the database down when records are deleted).
Good for seeding stuck torrents, too (Score:5, Interesting)
Re: (Score:2)
You really need a girlfriend or some hobbies or something...
Anonymous Coward (Score:3, Informative)
It's like getting parity files over on usenet to fix that damned
Scheduling (Score:4, Informative)
Re: (Score:3, Informative)
Re: (Score:2)
!new (Score:2, Insightful)
Re: (Score:3, Insightful)
So, you'd use Jigdo, and if all went well, it'd assemble a working image. But if a few packages couldn't be downloaded, you could always take your mostly-complete Jig
Re:!new (Score:5, Informative)
First of all, scene releases are _never_ compressed; it's always done with the -0 argument, this makes is basically equivalent to the unix split program. If a file is to be compressed, it is done with a zip archive, and the zip archive is placed inside the rar archive. This is because rar archives can be created/extracted easily with FOSS software, but cannot easily be de/compressed. This was more of an issue before Alexader Roshal released source code (note:not FOSS) to decompress rar archives.
Second, people often have parts of, or complete, scene releases and they are unwilling to unrar them (often because it's an intermediary, like a shell account somewhere where law isn't a problem).
Third, people follow "the scene" and try and download the exact releases that are chosen by the social customs of the scene (I am not going to detail those here), thus, "breaking up" (ie, altering) the original scene release is seen as rude.
Fourth, the archive splitting is in precise sizes so that fitting the archives onto physical media works better; typically the archive size is some rough factor of 698, ~4698 and ~8500.
Fifth, archives are split due to poor data integrity on some transfer protocols (though this is largely historical nowadays); redownloading a corrupted 14.3mb archive is easier than redownloading a 350mb file.
Sixth, traffic of the size is measured in terabytes, with some releases being tens, or sometimes hundreds of gigabytes in size. Thus, there become efficiency arguments for archive splitting; effective use of connections, limited efficiency of software(sftp scales remarkably poorly, though that is beginning to change - not that sftp is used everywhere), use of multiple coordinated machines and so on. This is an incomplete list of reasons; it is almost as though every time a new challenge is presented to the scene, splitting in some way helps to solve it.
AC because I'm not stupid enough to expose my knowledge of this either to law enforcement, or to the scene (who might just hand me over for telling you this - it has been done). Suffice to say that this is more complex than you understand, and that even this level of incomplete explanation is rare.
Re: (Score:2)
Re:!new (Score:5, Informative)
Re: (Score:3, Interesting)
I actually look for some "group names" in the torrents I get - because they provide one file, not a RAR. In other words, provide what people want, and they will respect you for that. Make their life hard, and they will not care about your 1998 social customs. Like anything else in life.
Firstly, if you use torrents than nobody in the "Scene" gives a flying toss about whether you respect them or not. I have nothing to do with the Scene, and even I know that. They are not ripping things for us, they're ripping things for themselves. We're feeding from their scraps, if you like.
Once you understand that, all the other arguments become moot. Yes, multi-part RARs in torrents annoys me as well, but the people making them aren't doing it for us. Most (all?) Scene members would much prefer thei
Re: (Score:2)
Any MD5s on Apple's page? (Score:4, Interesting)
Re: (Score:2)
Re: (Score:3, Informative)
Hardware Failure is your bigger concern (Score:5, Interesting)
I'd say its a safe bet that the files from apple.com are in perfect condition.
Which means it either became corrupted in transit to, or on arrival to your machine.
Which leads the question, is your memory defective
run memtest86 to check your memory.
http://www.memtest86.com/ [memtest86.com]
Check if your Harddrives have SMART and are reporting anything. A disk checker would also be a good idea.
The other idea that springs to mind is if your behind some proxy with the above problems, although i doubt anyone would want to proxy a 1.5gig file.
Fact is, if files are being corrupted on your disk, its just a matter of time before something more important is hit by corruption.
Re:Hardware Failure is your bigger concern (Score:5, Interesting)
There was a problem w/ dlink routers back in the day that hit alot of p2p users. If you placed your machine in the dmz, the router basically did a search and replace on all packets replacing the bitstring representing the global address w/ the bitstring representing the local address. On large files, this didn't just hit in the ip header, but in the data as well corrupting it. If you didn't use dmz functionality, just port mapping, it worked fine, so if you were using bittorrent, you'd get repeated hash fails on some parts that would never fix, because bitorrent has no capability to work around that (as opposed to eMule's extensions)
Re: (Score:2)
Re:Hardware Failure is your bigger concern (Score:4, Informative)
While I agree that bad ram is most likely the issue, it's still possible bad ram in a router or even something goofy going on in a router, such as the firmware bug described, could have caused problems. The bits were mangled before they were written to the disk. They could have been mangled by anything that processed those bits as they traversed from apple's website to his HD, including Apple's website and the HD itself. That embedded devices tend to be more reliable does not mean they don't break and do weird things sometimes.
Re: (Score:2)
Re: (Score:2)
No such problems on the Tomato firmware I'm using now
Re:Hardware Failure is your bigger concern (Score:5, Interesting)
IIRC TCP/IP has a guaranteed maximum error rate of at least 10^-5 bits. Well, the thing is, 1.5 Gigabytes is over 10^10 bits in length. So even at such an error rate, it is not guaranteed that your file will arrive without bit errors.
Re: (Score:2)
Re: (Score:2)
its unlikely that TCP is the culprit here, but with a file that big, there are many many places where things could
go wrong, and a single bit error is all it takes to mess things up in a compressed file.
I would bet on the HTTP client/server software being involved.
Re: (Score:2)
I'd say its a safe bet that the files from apple.com are in perfect condition.
Which means it either became corrupted in transit to, or on arrival to your machine.
Which leads the question:
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
The point of my post was to remind people that Comcast in the USA has been inserting reset commands into Bit Torrent (as well as other P2P protocols) as a way of "managing the network" (what ever the hell that means).
If the seeders were on comcast, some of the problem getting complete downloads could be due to that practice.
Of course Comcast both 1)denies they were doing this and 2)has promised to stop.
Re: (Score:2)
I generally suspect malware on their clients, but I don't know for sure and it has long baffled me, because it is not rare at all. Something like 40% or so. Surely the malware problem is not so bad that 40% of net users can't download a 130 MB file via http without corruption?
Re: (Score:2)
Re: (Score:2)
Been using bittorrent and rsync for this for years (Score:5, Informative)
Especially with our slow links, or worse yet, on dialup (if I go enough years back) in Australia.
Before bittorrent I would use rsync. That required me to download the large file to a server in the US on a fast connection, then rsync my copy to the server's copy to fix what is corrupt in my copy.
It works beautifully.
Re:Been using bittorrent and rsync for this for ye (Score:2)
Good for game files too (Score:5, Interesting)
When a large number of users are having problems downloading or resuming a particular file, I simply create a torrent for them and give them some vague instructions about how to resume it and then generally I never hear from them again. They're happy because they don't have to download a 4gb game client again from scratch, they don't have to worry about resuming/corrupt downloads, and because its a torrent it probably feels like they're getting something for free that they shouldn't be.
Or synchronize with yourself... (Score:5, Interesting)
I used Azureus's internal tracker ability and two computers on a local network with the torrent modified to track on one of the machines, and one corrupted copy of the file on each.
Obviously only works if they don't have corruption in common, but it also doesn't require the original torrent file tracker to work anymore.
Re: (Score:2)
Re: (Score:2)
For even more fun, if you have two differently-corrupted copies of a file and a torrent to go with it, then you can have BitTorrent stitch them together into a valid file without involving any third parties.
It would be cool if someone built a small utility to do just that, built off of something like cfv [sourceforge.net], which only does torrent (+sfv,crc,csv,md5,etc.) verification.
.par/.par2 files, but it would be nice to have a tool for torrent repairing that works as well as something like QuickPar [quickpar.org.uk] does for newsgroup files.
Torrents are really just fancy networked
What a novel idea!!! (Score:3, Interesting)
I'm not a lawyer though. I just hope it doesn't violate apples NDA. Please please please follow the rules. Don't want to see you in prison or slapped with a large fine.
Bit torrent has received a bad reputation because of pirates. There are legitimate uses though. I do believe that doctor who episodes aren't public domain, so shame on you for that. Might want to be careful what you admit to on
Re: (Score:2)
Most of my BitTorrenting is "100% legal" (Linux distros, UBCD, publicly released media (new NIN, Star Trek fan films,) etc.) Those I make sure to seed to at least a 2:1 ratio, often more. But some of it is to download items that I have the legal right to have, but do not hav
Re: (Score:2)
And 50% of the time it works every time.
simpler home-brew technique (Score:2)
Re: (Score:3, Insightful)
Isn't that almost exactly how rsync works?
I hope you verified the file (Score:2)
What asshole tagged this '!news'? (Score:2)
On a related note, I came up with a roundabout way to do something similar to help a friend who was having trouble moving large files. On the remote end, split [hmug.org] the file into small chunks. Then md5 [hmug.org] them all and save those results into a text file. Then, ftp them, and when they arrive, md5 them all again and compare your values to what's in the text file. If any don't match, re-downl
Re: (Score:2, Insightful)
The subsequent discussion has revealed that a large chunk of the slashdot population not only doesn't understand how BitTorrent works but does
Torrent Distribution Network - Results: Awesome (Score:4, Interesting)
I was being required to copy sometimes 10-20GB of Virtual Machine Image Files from Server to PC or PC to PC on up 40 machines at one time.
This was taking way too long and copies were not perfect.
Restoration of VM images presented the same problem.
Updating a VM meant redistribution of the entire file to all machines again.
Using (Micro) Torrent and my own tracker changed all that.
I came up with the following solution using all available resources.
First I started by copying all images to workstations to a separate partition. (about 200GB of VM's.)
Then I created created my own internal Tracker and Web Page to host torrents.
The results were:
1. Extremely efficient use of all available network hard drive space.
2. Utilities every machine on the network to distribute the files.
3. Works extremely well restoring or redistributing the VM's to any one machine or several machines at once. (The more the better)
4. 100% accuracy in distribution.
5. The ability to quickly modify any one image on any machine, recreate the torrent(hash) and then update that image across hundreds of machines very quickly.
In other words, modifying a file only means that the machines only have to download the bits that changed not the whole image again.
6. With Micro Torrent any machine can be used as the tracker.
7. The Tracker is also the "master" file server, however any machine can be used to modifiy and upload a change
Just recreate and re-upload the new torrent replacing the old one. Remember that a torrent file serving network is Not a server centric file sharing system.
In the old days before BitTorrent (Score:2)
Nothing upset me more than downloading an ISO only to find out that after I burned it to CD/DVD, it had CRC errors and random lockups during an install.
After BitTorrent with error correcting, the problem was solved. It works for other things as well.
Commercial software companies can offer ISO downloads via BitTorrent trackers and send the install CD Key via email. That way customers just burn the CD/DVD and install the key they got in email.
S
Re: (Score:2)
Nope, better. Kicking the tires and test-driving the car add tiny amounts of wear and tear.
Fix up jigdo file (Score:2)
Re: (Score:2)
How does this sit with RIAA? (Score:2)
Unless I am mistaken, it is perfectly legal to make a backup of data that you own right? So, if you already own an item, would downloading it to have a backup be a legal thing to do?
And if that's the case, I wonder what the legal implications are in cases where the RIAA comes down on people who have been "participating in file sharing" activities.
Another nice tool for this: rsync (Score:2)
Assuming you can find a source that serves a known-good file via rsync, it's a very efficient way to fix up a damaged copy.
I once had to download a CD image over a dialup connection when I was at a client site in Mexico. I did the initial download via FTP, but it got corrupted and the MD5 sum didn't match the correct value. It had taken almost two full days to download the first time (over a weekend, so shipping a CD wouldn't have been faster), but rsync was able to find and correct the corrupted sectio
Other applications...politics? (Score:2)
Re:What broken software were you using? (Score:5, Insightful)
Re:What broken software were you using? (Score:4, Insightful)
Re: (Score:3, Informative)
Re:What broken software were you using? (Score:4, Informative)
Oh, and TCP checksumming isn't perfect.
Re:What broken software were you using? (Score:5, Informative)
I actually saw this happen once ... the astronomically unlikely [1]. TCP accepted the corrupt packet. I'm sure it will never happen again. Fortunately, rsync caught it in the next run.
One problem I ran into once with a certain Intel NIC was that a certain data pattern was always being corrupted. TCP always caught it and dropped the packet. There was no progress beyond that point because of the hardware defect always corrupted that data pattern. Turns out there was a run of zeros followed by a certain data byte (I tried a different data byte and with different run lengths and those never got corrupted). What the NIC did was drop 4 bytes, and put 4 bytes of garbage at the end. I suspect it was a clocking syncronization error. I got around the problem by adding the -z option to rsync (which I normally would not have done with an ISO of mostly compressed files). Another way would have been to do the rsync through ssh, either as a session agent (like rsync itself can do) or as a forwarded port (how I do it now for a lot of things).
[1] ... approximately 1 in 2^31-1 chance that the TCP checksum will happen to match when the data is wrong (variance depending on what causes the error in the first place) ... which approaches astronomically unlikely. Take 1 Terabyte of random bits. Calculate the CRC-32 checksum for each 256 byte block. Sort all these checksums. You will find 2 (or more) data blocks with the same checksum (or a repeating pattern in your RNG). Why? Because CRC-32 has 2^32-1 possible states, and you have 2^32 random checksums.
Agreed. Since it is at least software's responsibility to detect and fix it, if the problem happens, the famous finger of fault points at the software.
Your $100 is safe.
Chance of CRC clashes is much higher (Score:2, Informative)
First, as rdebath argues, you only get 16 bits of CRC on TCP headers.
And furthermore, if you start calculating CRCs off random data, chances (>50%) are you will get a collision (two chunks of data with the same CRC) around the 256th try (this is known as the "birthday paradox" in criptography). Of course, to be really sure to get a collision you will need to try at most 65536 values; but you will reach a very high probability of clash much sooner than intuition may tell you.
See birthday attack [wikipedia.org] for the
Re: (Score:2, Interesting)
Re:What broken software were you using? (Score:5, Interesting)
You'd be shocked - SHOCKED - at how much data gets corrupted routinely - by errant antivirus software, flaky network equipment, plain ol' line noise that the checksums don't detect (which will happen much more often than you expect, see also birthday paradox), or misbehaving routers who think that any occurence of 0xC0A80102 obviously must be an internal IP address and needs to be changed to your external one. Even if that's in the middle of a ZIP file. Oops.
Encryption actually aids this somewhat, as the same byte patterns don't get repeated, so if there's an errant IDS changing things for example, it tends not to fire the second time.
I've done this before for file repairs. Works a treat, but you sort of wish that torrent used a Merkle hash tree such as the modified THEX standard Tiger Tree Hash. SHA-1's so last century.
Re:What broken software were you using? (Score:5, Informative)
Re: (Score:3, Informative)
Re: (Score:2)
Re: (Score:2)
http://techreport.com/discussions.x/9483 [techreport.com]
Re: (Score:2)
Good motherboards, bad motherboard drivers.
Re:What broken software were you using? (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
They could be the first series to kill off all but one of their franchise characters and several of the support crew
Re: (Score:3, Informative)
Re: (Score:2)
Obviously you didn't see the finale of series 2. Two of your wishes were fulfilled.
Torchwood is pretty silly (es
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)