On Wed, 2006-03-15 at 06:50, Reuben D. Budiardja wrote: > Yes, the drive are SMART capable, and smartd is running. In the last failure > before this one smartd gave me notification so I know something started to go > wrong and did what I needed to do. In the recent failure I didn't get any > notification and suddenly the whole system just died. There are some tests you can do with smartctl but in normal use I think smartd only monitors soft errors/retries on the parts of the disk that you access. That is, you can have sectors going bad on unused areas or areas containing files that aren't often read and not know it until it is too late. Then when you have to rebuild the raid it will fail if it can't read a single sector on one of the remaing drives. Doing a cat of the raw device to /dev/null once in a while will give smartd a chance to report errors. The drive should even internally remap a certain number of 'soft' errors transparently if it is able to recover the data with retries but again it will only do that if you access the bad spot. -- Les Mikesell lesmikesell@xxxxxxxxx