Fedora Users — Re: kernel crash

On Tue, 17 Aug 2010 18:07:18 +0300
Gilboa Davara <gilboad@xxxxxxxxx> wrote:

> On Tue, 2010-08-17 at 09:44 -0400, Steve Blackwell wrote:
> > I leave my computer on 24/7 so that my backups can run at night.
> > Lately, it has been crashing during the night usually leaving no
> > trace of what happened. Last night it crashed but left this
> > in /var/log/messages:
> > 
> > Aug 17 01:04:56 steve kernel: INFO: task kjournald:1960 blocked for
> > more than 120 seconds. Aug 17 01:04:56 steve kernel: "echo 0
> > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > Could a hard drive get shut down because it was getting too hot?
> > > What would be a normal temp for a hard drive that has just
> > > completed a backup? 124C seems really hot. The HD cooling fan had
> > > been  broken so I replaced it this past weekend but it doesn't
> > > seem to have helped. Too late? Permanent HD damage already done?
> > Any other comments or suggestions?
> 
> Hello Steve,
> 
> This is not a crash.
> The kjournald kernel process (which handles various file-system task).
> You assumption that the HD went into some type of sleep/suspend mode
> during write sounds reasonable to me.
> 
> 124C seems -very- hot. Even during heavy I/O.
> Two things spring into mind:
> A. Is it a normal desktop SATA drive or high-speed SCSI/SAS drive?
> B. Please post the SMART log of the drive. (smartctl -a /dev/sdX). 
> 
> - Gilboa
> 

Hello Gilboa,

Yes I realize that it was not a crash. When I first saw the kernel
messages I thought it was and started writing the e-mail. I neglected
to correct the subject line after I actually read the messages. Sorry
about that.

I had already run the command:
smartctl -t long /dev/sdb
before I got your reply. The results should be ready soon.

I've been looking at my logs some more. I don't understand these
messages:

Aug 17 10:30:50 steve kernel: CPU0: Temperature above threshold, cpu
clock throttled (total events = 455) 
Aug 17 10:30:50 steve kernel: CPU1: Temperature above threshold, cpu
clock throttled (total events = 455) 
Aug 17 10:30:50 steve kernel: CPU1: Temperature/speed normal 
Aug 17 10:30:50 steve kernel: CPU0: Temperature/speed normal

These messages are repeated every hour or so. It seems unlikely that
every time the threshold is exceeded, it immediately (within one
second) drops back again. What is going on here?

The drive is an old IDE drive: WDC WD1600JB-00F

Thanks,
Steve
-- 
Changing lives one card at a time

http://www.send1cardnow.com

Attachment: signature.asc
Description: PGP signature

-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines