On Tue, 17 Aug 2010 18:07:18 +0300 Gilboa Davara <gilboad@xxxxxxxxx> wrote: > On Tue, 2010-08-17 at 09:44 -0400, Steve Blackwell wrote: > > I leave my computer on 24/7 so that my backups can run at night. > > Lately, it has been crashing during the night usually leaving no > > trace of what happened. Last night it crashed but left this > > in /var/log/messages: > > > > Aug 17 01:04:56 steve kernel: INFO: task kjournald:1960 blocked for > > more than 120 seconds. Aug 17 01:04:56 steve kernel: "echo 0 > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > Could a hard drive get shut down because it was getting too hot? > > > What would be a normal temp for a hard drive that has just > > > completed a backup? 124C seems really hot. The HD cooling fan had > > > been broken so I replaced it this past weekend but it doesn't > > > seem to have helped. Too late? Permanent HD damage already done? > > Any other comments or suggestions? > > Hello Steve, > > This is not a crash. > The kjournald kernel process (which handles various file-system task). > You assumption that the HD went into some type of sleep/suspend mode > during write sounds reasonable to me. > > 124C seems -very- hot. Even during heavy I/O. > Two things spring into mind: > A. Is it a normal desktop SATA drive or high-speed SCSI/SAS drive? > B. Please post the SMART log of the drive. (smartctl -a /dev/sdX). > > - Gilboa > Hello Gilboa, Yes I realize that it was not a crash. When I first saw the kernel messages I thought it was and started writing the e-mail. I neglected to correct the subject line after I actually read the messages. Sorry about that. I had already run the command: smartctl -t long /dev/sdb before I got your reply. The results should be ready soon. I've been looking at my logs some more. I don't understand these messages: Aug 17 10:30:50 steve kernel: CPU0: Temperature above threshold, cpu clock throttled (total events = 455) Aug 17 10:30:50 steve kernel: CPU1: Temperature above threshold, cpu clock throttled (total events = 455) Aug 17 10:30:50 steve kernel: CPU1: Temperature/speed normal Aug 17 10:30:50 steve kernel: CPU0: Temperature/speed normal These messages are repeated every hour or so. It seems unlikely that every time the threshold is exceeded, it immediately (within one second) drops back again. What is going on here? The drive is an old IDE drive: WDC WD1600JB-00F Thanks, Steve -- Changing lives one card at a time http://www.send1cardnow.com
Attachment:
signature.asc
Description: PGP signature
-- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines