Hi everyone, Some background: Recently my RAM went bad, and I realised it too late. Towards the last few of days my desktop had crashed more than once. Yesterday I received the replacement RAMs from RMA. On installing them and turning on my machine I noticed errors like these, Device: /dev/sdb [SAT], 172 Currently unreadable (pending) sectors And I see that the errors started around about the time my desktop started crashing before I found the faulty RAMs. The problem: On subsequent boots it failed to boot, fsck complaining about disk read errors during a forced disk check. I was dropped to a read-only shell to troubleshoot everytime, so I ran fsck on all my partitions and found errors on my /home. The error messages said "inode has deleted or empty entries clear", "unlinked inode entries" and so on. Since I was on a read only partition I couldn't save them on a file (I guess paper would have worked :-p). When prompted by fsck to fix the errors, I answered yes. On a reboot, my system booted properly but I had lost some very important data. All the missing directories were the ones which fsck had complained about. I restored whatever I could from some backups. To confirm this as a one off incident and my disk hasn't gone bad I ran SMART tests, (this is a few month old drive) # smartctl -t long /dev/sdb But after the test I can't understand the output of the logs, > # smartctl -a /dev/sdb > smartctl 5.39.1 2010-01-28 r3054 [x86_64-redhat-linux-gnu] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Model Family: Western Digital Caviar Black family > Device Model: WDC WD1001FALS-00E8B0 > Serial Number: WD-WMATV5966482 > Firmware Version: 05.00K05 > User Capacity: 1,000,204,886,016 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 8 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Sat Aug 14 19:37:26 2010 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. > Auto Offline Data Collection: Enabled. > Self-test execution status: ( 121) The previous self-test completed having > the read element of the test failed. > Total time to complete Offline > data collection: (18000) seconds. > Offline data collection > capabilities: (0x7b) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 208) minutes. > Conveyance self-test routine > recommended polling time: ( 5) minutes. > SCT capabilities: (0x3037) SCT Status supported. > SCT Feature Control supported. > SCT Data Table supported. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 1354 > 3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 1158 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 40 > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 > 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 > 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1403 > 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 > 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 38 > 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 21 > 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 18 > 194 Temperature_Celsius 0x0022 112 107 000 Old_age Always - 38 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 > 197 Current_Pending_Sector 0x0032 199 199 000 Old_age Always - 172 > 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 > 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 > 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed: read failure 90% 1393 1106820646 > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. All the values in the table above seems larger than the threshold. But the report says PASSED. I'm not clear how to interpret this. Could someone help? Thanks a lot in advance. -- Suvayu Open source is the future. It sets us free. -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines