[Grml] How to use GRML to check whether a hard disk is failing

Tue Dec 22 20:06:38 CET 2009

At Tue, 22 Dec 2009 15:33:30 +0000,
Michael Whapples wrote:
> 
> Hello,
> I am wondering whether GRML can help me here. I have agreed to check a 
> computer (tomorrow) for someone as it isn't booting properly (its a 
> windows XP computer). By the sound of it I suspect the hard disk is 
> failing or totally failed or windows has become corrupted to the point 
> it won't boot. 

IMHO the first question should be how important the data on the hard
disk is. If the hdd sounds unhealthy chances are goot the it may be a
mechanical defect that gets worse and destroys data simply by spinning
the discs. I personally refuse to check computers whose hard disks
make unhealthy noises.

If you decide to check it, my second step would be booting grml and
making a backup of the drive using ddrescue.

To check the hdd I would use smartmontools that queries the internal
log of the hard disk.

smartctl -a /dev/<disk>

Displays a overview over the hard disk's state. I normally check the line

SMART overall-health self-assessment test result: 

and on the SMART Attributes 

  - 196: Reallocated_Event_Count

    Physically damaged sectors are reallocated; it's okay if this
    happes sometimes but an increasing number of reallocated sectors
    is troubel ahead.

  - 197: Current_Pending_Sector

    Pending sectors are sectors that are marked for reallocation but
    can't be reallocated for some reason.

Please be aware that the attribute table is hard to interpret because
what most of the values actually /mean/ depends on the hard disk
manufacturer. It is for instance normal for a "Seagate Barracuda
7200.10 family" that the raw value of attribute 1: Raw_Read_Error_Rate
is about 124438548 etc.

It's my practical expirience as a sysadmin that the attributes 196 and
197 are good indicators of failing hdds.

You may also start an internal self-test of the hdd (smartctl -t) --
the possible test routines depend on the hdd model but I would try a
long selftest (smartctl -t long). 

As I had to debug a failing hdd recently I can only stress that what
ever you do you should check the SMART values occasionally. In my case
I noticed an increasing rate of reallocated sectors while trying to
fix the filesystem.

On the question how to check and/or fix a broken ntfs filesystem, I am
lost.

HTH

 -- David

-- 
OpenPGP... 0x316F4BE4670716FD
Jabber.... dmjena at jabber.org
Email..... maus.david at gmail.com
ICQ....... 241051416