On Tuesday 14 March 2006 08:49, Scot L. Harris wrote: > On Tue, 2006-03-14 at 08:21 -0500, Reuben D. Budiardja wrote: <snip> > > I am hoping for comments, etc. Thank you in advance. > > Have you monitored the temperature of the system at all times? Heat can > play a major roll in causing drives and other hardware to fail long > before it should. Good air flow around the system/drives is critical. I have not. Any recommended tools , methods ? Thank you. > And make sure you have a good UPS system connected. Power fluctuations > can cause all kinds of havoc. It's connected to UPS with proper shutdown in the event of power outage. > > Also note that using RAID by itself does not replace the need for > backups. RAID protects against hardware failure. And depending on the > value of the data it is usually recommended to run RAID with a hot spare > drive so multiple drive failures won't bring the system down. I am not > sure if the card you are using allows you to run a hot spare or not. The machine is a backup machine. It's main job is to backup data from other machines, so if I have to have a backup for the backup ... well I am going to have hard time to justify that :). Yes, I should have had a hot spare ready, but resource is not unlimited so I did not have hot spare. The data lost were non-critical (I am not loosing sleep), but this just indicates there is something wrong with the system and it's getting ridiculous to keep replacing drives with its warranty. > And make sure you have something in place that notifies you that there > is a problem. Yes, email notification is in place by default (from mdadm and smartd). Thank you for respond. RDB -- Reuben D. Budiardja Dept. Physics and Astronomy University of Tennessee, Knoxville, TN