I’ve decided that it’s about time to carve my laptop drive in half and put Fedora Core 3 on there. There are just some things you can’t do with cygwin.
So, I go to the Red Hat site, locate a mirror, and start my download.
36 hours later, I have this 2.5 GB DVD ISO image that I’m all ready to burn. But, being a semi-paranoid security type, I have to first verify the MD5 hash. I jump through all the hoops (get the Fedora release key, verify the fingerprint with mulitple sources, validate the MD5SUMS file)… and then:
$ md5sum FC3-i386-DVD.iso
c9620407bbfb0cc2d1844be427da1a29 *FC3-i386-DVD.iso
What? That’s not right. The published MD5 sum for this file is:
ca49964739f84848ca78fc03662272fb FC3-i386-DVD.iso
Okay, so I got a corrupted copy. Maybe it has to do with wget in some way. I’ll try a different approach: bittorrent.
38 hours later:
$ md5sum FC3-i386-DVD.iso
d76a770423958932da8929174a9891c8 *FC3-i386-DVD.iso
What?
So, now I have two different 2.5 GB files, both of which fail the MD5 test. (As an aside, I have to point out that I’m highly disturbed that I got different results from these two different methods of acquiring the image).
They have to both be mostly right, right? So, all I should have to do is figure out what’s different between the two, get ahold of the correct values for those bytes that differ, and patch one of them to be correct.
So, I hack diff to point out exactly what differs between two binary files. It turns out that there are only four bytes different between the files. At offset 634999400, the one that I downloaded from a mirror (Georgia Tech, to be specific) looks like:
9e f7 87 23 3b 90 c0 a8 00 6c 7e 0e 63 62 1c a6
Meanwile, the bittorrent version looks like:
9e f7 87 23 3b 90 d1 1e 21 0d 7e 0e 63 62 1c a6
Alright, there’s our difference. Let’s hack wget to retreive user-specified byte ranges, and re-fetch those bytes from Georgia Tech.
9e f7 87 23 3b 90 c0 a8 00 6c 7e 0e 63 62 1c a6
Okay, that’s what it said the first time. So, to the best of my ability to tell, I have an accurate copy of what’s on Georgia Tech’s website.
Let’s try another mirror. This time, Duke University.
9e f7 87 23 3b 90 c0 a8 00 6c 7e 0e 63 62 1c a6
Yep, the official mirrors sure seem to think that’s the right set of bytes for positions 634999400 through 634999416 of the Fedora Core 3 DVD ISO image.
But it still fails the MD5 check. How can that be the case? What are the chances that, aside from the 4 bytes we’ve already identified, the torrent feed gave me exactly the same corruptions as the Georgia Tech HTTP server?
Could there be something wrong with my md5sum program?
I download a non-cygwin build of md5sum.
$ ./md5sum FC3-i386-DVD.iso
c9620407bbfb0cc2d1844be427da1a29 *FC3-i386-DVD.iso
Okay, so let’s try a completely different codebase. I grab digestIT, which provides a braindead Windows interface to generate MD5 hashes.
This is getting old quickly. WinMD5 gives the same result.
Meanwhile, for the past 9 hours, I’ve had my cygwin home directory mounted over on my (old, slow) Linux box, which has been diligently computing the md5sum for me. It’s been between 30 and 70% CPU load the whole time. I don’t know what it’s doing over there, but I really want a second opinion from an independant source. My laptop may actually hate me; I can’t tell.
Is there something obvious I’m overlooking here? Something additional I should poke at? I instrumented a copy of the textutils md5sum program to validate that it was reading all the bytes of the file. I did a walkthrough of the code, side-by-side with RFC1321, and it all looks correct to me. (I’ve written an MD5 implementation before, so I’m rather familiar with where the gotchas might lie).
What are the chances that Red Hat flubbed the MD5 sums for their images, and I’m one of the first people to notice and care? That seems slim. What are the chances that I got a corrupted version of the ISO from Georgia Tech that just happens — with one small exception — to have the same defects as the image I got from bittorrent? That’s also nearly impossible to beleive. What are the chances that three independant implemenations of an MD5 digest verification program are not just wrong, but wrong in precisely the same way? Well, somewhat greater, but still hard to beleive — especially considering that the textutils version of md5sum is almost certainly the same tool that was used to generate the hashes in the same place.
Every theory I can come up with about what might be causing the MD5 check to fail is so outlandish as to be beyond belief. What on earth is going on here?