Fedora Users — RE: fedora 6 kernel panic issues

So I finally got another log that had something different in it that I
hadn't seen before. And then the system slowly became less and less
useful until it stopped altogether.

Aug  8 14:14:44 squidmin kernel: BUG: soft lockup detected on CPU#1!
Aug  8 14:14:44 squidmin kernel:  [<c04051db>] dump_trace+0x69/0x1af 
Aug  8 14:14:44 squidmin kernel:  [<c0405339>]
show_trace_log_lvl+0x18/0x2c 
Aug  8 14:14:44 squidmin kernel:  [<c04058ed>] show_trace+0xf/0x11 
Aug  8 14:14:44 squidmin kernel:  [<c04059ea>] dump_stack+0x15/0x17 
Aug  8 14:14:44 squidmin kernel:  [<c044d9b5>] softlockup_tick+0xad/0xc4

Aug  8 14:14:44 squidmin kernel:  [<c042e596>]
update_process_times+0x39/0x5c 
Aug  8 14:14:44 squidmin kernel:  [<c0418914>]
smp_apic_timer_interrupt+0x5c/0x64
Aug  8 14:14:44 squidmin kernel:  [<c0404ad3>]
apic_timer_interrupt+0x1f/0x24 
Aug  8 14:14:44 squidmin kernel: DWARF2 unwinder stuck at
apic_timer_interrupt+0x1f/0x24 
Aug  8 14:14:44 squidmin kernel: Leftover inexact backtrace:
Aug  8 14:14:44 squidmin kernel:  [<c047703e>]
generic_fillattr+0x62/0xa4 
Aug  8 14:14:44 squidmin kernel:  [<f8c45290>] cifs_getattr+0x1e/0x24
[cifs] 
Aug  8 14:14:44 squidmin kernel:  [<f8c45272>] cifs_getattr+0x0/0x24
[cifs] 
Aug  8 14:14:44 squidmin kernel:  [<c0477519>] vfs_getattr+0x40/0x9b 
Aug  8 14:14:44 squidmin kernel:  [<c0477895>] vfs_fstat+0x22/0x31 
Aug  8 14:14:44 squidmin kernel:  [<c04778b3>] sys_fstat64+0xf/0x23 
Aug  8 14:14:44 squidmin kernel:  [<c046de63>] sys_open+0x1c/0x1e 
Aug  8 14:14:44 squidmin kernel:  [<c0404013>] syscall_call+0x7/0xb

-----Original Message-----
From: Jason Taylor 
Sent: Wednesday, August 08, 2007 10:22 AM
To: fedora-list@xxxxxxxxxx
Subject: RE: fedora 6 kernel panic issues

>From this smartctl report, (running smartctl -a -d ata /dev/sda) it
looks like the drive is not having any errors.  I am leaning towards a
driver or power issue.  I have moved the hard drive to a machine with
identical hardware and it has been up for one day at this point.  I have
run Memtest86 on the old machine for 24 hours and it has passed 28 times
with flying colors.


smartctl version 5.36 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST3160811AS
Serial Number:    6PT54BA6
Firmware Version: 3.AAE
User Capacity:    160,041,885,696 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Aug  8 09:26:29 2007 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection:
Enabled.
Self-test execution status:      (   0)	The previous self-test routine
completed
					without error or no self-test
has ever 
					been run.
Total time to complete Offline 
data collection: 		 ( 430) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection
on/off support.
					Suspend Offline collection upon
new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test
supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging
supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 (  54) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   116   097   006    Pre-fail  Always
-       230511004
  3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always
-       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always
-       12
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always
-       0
  7 Seek_Error_Rate         0x000f   068   060   030    Pre-fail  Always
-       6802205
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always
-       122
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always
-       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always
-       14
187 Unknown_Attribute       0x0032   100   100   000    Old_age   Always
-       0
189 Unknown_Attribute       0x003a   100   100   000    Old_age   Always
-       0
190 Unknown_Attribute       0x0022   069   049   045    Old_age   Always
-       589103135
194 Temperature_Celsius     0x0022   031   051   000    Old_age   Always
-       31 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered  0x001a   057   051   000    Old_age   Always
-       11080071
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always
-       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always
-       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age
Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always
-       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute
delay.


-----Original Message-----
From: fedora-list-bounces@xxxxxxxxxx
[mailto:fedora-list-bounces@xxxxxxxxxx] On Behalf Of Tony Nelson
Sent: Wednesday, August 08, 2007 7:27 AM
To: fedora-list@xxxxxxxxxx
Subject: Re: fedora 6 kernel panic issues

At 4:06 PM +0100 8/7/07, Alan Cox wrote:
>On Mon, 6 Aug 2007 20:29:41 -0700
>"Jason Taylor" <jtaylor@xxxxxxxxxxxxx> wrote:
>
>>
>> That was all I saw at the console besides the transaction #'s.
>>
>> I was unable to open any virtual terminals or escape it at all.  I
will
>>try and see if there is any more data at the end.
>>
>> I am still pretty Linux green.  Is there something else that I can
>>provide that would help?
>> I ran through /var/log/messages and saw nothing.
>
>Before it choked it will have dumped a set of messages indicating ATA
>error information to the system. That may have scrolled off before it
>died, and if the disk failed then it couldn't write it to the log
either.
>
>Drives keep their own failure information log usually (partly because
of
>this) and there are low level tools to access the information:
>
>open a terminal window
>
>do
>
>su -
>[root password]
>smartctl -a -d ata /dev/sda
>
>and it will dump the data for the first disk.
>
>That will show you various stats including an overall health self
>assessment and also usually the last errors that occurred. Those are
the
>important and useful bit.

Let me add that the word "fail" will always appear in the TYPE column in
the report.  Look at the "WHEN FAILED" column; if that is clear then the
disk hasn't failed yet.  See `man smartctl` about this and /don't
panic/.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson@xxxxxxxxxxxxxxxxx>
      '                              <http://www.georgeanelson.com/>

-- 
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list