How to rescue your data, 3/3

le 10/12/2010 par Gabriel Guillon
Tags: Software Engineering

Last time we have seen how to rescue your FAT. In this article we'll see a third, and last, way of losing data.

Physically crash a hard disk

Hard disk are made of mechanical pieces, so they are subject to ageing. The S.M.A.R.T. technology, shipped in hard disk for years, can monitor a bunch of indicators helping you foresee your hard disk's end of life. Under GNU/Linux, smartd is widely used.

As all the indicators, if they are not seen by a human, they are useless. You have here a perfect way to physically crash a hard disk : wait, and don't look at indicators. But facts are stubborn : a dying hard disk complains a lot in syslogs :

Jun 28 08:17:09 rhynn kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 28 08:17:09 rhynn kernel: ata5.00: BMDMA2 stat 0x686c0009 Jun 28 08:17:09 rhynn kernel: ata5.00: failed command: READ DMA EXT Jun 28 08:17:09 rhynn kernel: ata5.00: cmd 25/00:01:32:03:4a/00:00:12:00:00/e0 tag 0 dma 512 in Jun 28 08:17:09 rhynn kernel: res 51/40:00:32:03:4a/00:00:12:00:00/e0 Emask 0x9 (media error) Jun 28 08:17:09 rhynn kernel: ata5.00: status: { DRDY ERR } Jun 28 08:17:09 rhynn kernel: ata5.00: error: { UNC } Jun 28 08:17:09 rhynn kernel: ata5.00: configured for UDMA/100 Jun 28 08:17:09 rhynn kernel: ata5: EH complete

You can ignore them too if you want to go into troubles.

More of that, a dying hard disk can completely freeze your computer. At this point, you can still ignore your hard disk, but you are really looking for waste of time and data.

That what I've done : ignoring, ignoring, ignoring. And badblocks confirm me that they were bad blocks on my disk. Good news : letting this hard disk die made me write this article :)

Once you are fed up with weird messages in syslogs and computer freezing, you can

Rescue your disk

First step : buy a brand new one.

Not-so-good-news : you'll be able to save your data, but

  • It depends on the state of your hard disk (old, dying, dead)
  • It can take a lot of time (from minutes to days, or weeks)
  • You will certainly loose some files

But you played with fire, don't complain :)

Now, let me introduce a friend I hope you'll not need too often : ddrescue.

ddrecue is like dd (it copies data from a file or block device to an other) but "try hard to rescue data in case of read errors". Please note : two ddrescue are existing : dd_rescue and ddrescue. I used the latter, the package on Debian/Ubuntu is gddrescue.

The documentation of ddrescue is pretty well done, you should read it. In a few words, ddrescue can :

  • rescue data from a sick hard disk to a good one
  • or from 2 sick hard disk (in RAID1, for example)
  • or from a hard disk to an other, and if the other fail, from this other to a third.

Here, you "just" crashed a disk, so you "just" need ddrescue in its common need : rescuing a partition (or a disk)

Make a big partition in your brand new disk. The size must be equal or (a bit) greater than the size of your sick partition. It will contain the (hopefully) rescued data of your sick partition.

The 'logfile' mentioned is a file containing informations for ddrescue to know what job has been done by him, thus allowing you to interrupt it and restart it later without redoing everything.

First, copy everything that can be copied without retrying bad sectors. Look twice to the source and destination partition before hitting enter !

[root@home]# ddrescue -n /dev/sdb1 /dev/sdc1 /root/logfile.sdb1 Press Ctrl-C to interrupt Initial status (read from logfile) rescued:         0 B,  errsize:       0 B,  errors:       0 Current status rescued:    98671 kB,  errsize:       0 B,  current rate:   72483 kB/s ipos:    98632 kB,   errors:       0,    average rate:   72483 kB/s opos:    98632 kB,     time from last successful read:       0 s Finished Then, retry but this time try to rescue. This step can be very long. I stopped after 2 weeks : my guess was that bad sectors contains no more data.

[root@home]# ddrescue -d -r3 /dev/sdb1 /dev/sdc1 /root/logfile.sdb1 Press Ctrl-C to interrupt Initial status (read from logfile) rescued: 148571 MB, errsize: 2065 kB, errors: 3774 Current status rescued: 148571 MB, errsize: 2065 kB, current rate: 0 B/s ipos: 54681 MB, errors: 3774, average rate: 0 B/s opos: 54681 MB, time from last successful read: 43 s Retrying bad sectors... Retry 1

Checking the filesystem ensure that your data are coherent :

[root@home]# e2fsck -v -f /dev/sdc1

Finally, mount your rescued partition read only :

[root@home]# mount -t ext2 -o ro /dev/sdc1 /mnt/sdc

And copy your data to a safe place.

Conclusion

When a hard disk is ready to die, it complains. You'd better not ignore those complains... If you do so, you are done for wasting time and data, and using ddrescue.

Well, well, as a big conclusion to these three articles : in trying to save a hard disk killed by my laziness, I used different methods to rescue data :

  • To rescue partition table : testdisk
  • To blindly rescue files (either deleted or held in a deleted partition) : photorec
  • To rescue a file allocation table destroyed by pvcreate : dd, mkfs.*
  • To rescue a dying hard disk : ddrescue

You should not use those program too often :) But when you are in trouble, it worth remembering they are existing. You should practice them to know how they react, in order to not make more mistakes when something goes wrong.

Tracklist of those articles :

"How to save a life" by The Fray