This something I posted elsewhere as a response to a comment on a discussion about aliens which mentioned that the distances between stars in the galaxy is very large and difficult to cross.
The real problem is that the distances within a galaxy are too short.
We pretty much know how to travel between stars at 1-10% of the speed of light. There’s probably a few centuries of detailed engineering work to get from here to there, but the physics works and doesn’t require impossible amounts of energy. At the high end of the scale we can colonise the entire galaxy in a million years, at the low end in ten million.
So if expansionist technological societies typically evolve more than ten million years apart, the first one will take over the galaxy before the second has a chance to evolve.
Someone has to be first, and right now it looks like that’s us; if someone else was out there and even a few tens of thousand years ahead of us we should be able to see evidence of their existence from here. I could quite imagine that they have evolved and then wiped themselves out or gone introverted rather than spread across the galaxy, but that makes them just a footnote in galactic history.
We can see oddities in other galaxies which could be signs of engineering on a massive scale. But intergalactic travel at even 10% of the speed of light would take a very long time.
After performing a full copy of all the data from the failing disk to the good one, I see I have 33 bad blocks on that disk; so it’s worse than I thought. Fortunately it only affects a few game installer files which are out of date anyway, so the important data is safe.
Still, now I have two RAIDs with one disk each, I’ll have to get a new disk in to replace the bad one and hopefully that will solve this problem for a couple of years.
So, MD RAID appears to be fundamentally flawed. If the active drive develops a bad sector and it has to resync, when resyncing it reads that bad sector and instead of continuing to resync the rest of the disk it… gives up. It then flags the good disk as a spare and continues to write to the bad disk.
Now, the most likely reason for this is an improper shutdown so the good disk wasn’t fully synced and a few blocks need to be fixed up. The odds are therefore very high that the block on the good disk corresponding to the bad block actually contains the correct data, and simply updating the other blocks will fix any problems you have.
But it’s too stupid to do that. My RAID was getting 75% through the resync, hitting a single 4k bad block among 3TB of data and aborting. This largely invalidates the benefit of using RAID in the first place, since the whole idea of doing so is to allow you to remove the bad disk and replace it with a new one.
As a result I’ve had to remove the good disk from the RAID since there’s no way to recover it. To fix the problem I have to manually copy all the files over from the bad disk to the good one, then replace the bad disk.
I thought I’d try ZFS this time, which has much less brain-dead failure behaviour. Unfortunately I ran into two major problems:
- ZFS in Ubuntu 10.04 doesn’t support disks with 4k block sizes. There’s an option allowing you to force it to write in 4k blocks, but that’s not in the version of ZFS included in Ubuntu 10.04.
- You can’t export ZFS disks via NFS in Ubuntu 10.04. The block size is a small performance hit, but this is a deal breaker. I share the RAID across the various Linux systems in the house with NFS, so if it can’t be shared it’s useless.
So it’s back to building a new MD RAID and hoping that this one works better than the old one. Hopefully by the time I have to worry about replacing another disk I’ll be able to switch over to ZFS after all.
I see someone was hammering the blog yesterday, apparently trying to find a password to break in.
Good luck with that. Long, randomly-generated passwords FTW.
Neat, the world’s first hard drive, storing 5MB on fifty platters back in 1956:
Actually, the first hard drive I worked with was 5MB, but at least it was a single platter.