So I finally got another log that had something different in it that I hadn't seen before. And then the system slowly became less and less useful until it stopped altogether. Aug 8 14:14:44 squidmin kernel: BUG: soft lockup detected on CPU#1! Aug 8 14:14:44 squidmin kernel: [<c04051db>] dump_trace+0x69/0x1af Aug 8 14:14:44 squidmin kernel: [<c0405339>] show_trace_log_lvl+0x18/0x2c Aug 8 14:14:44 squidmin kernel: [<c04058ed>] show_trace+0xf/0x11 Aug 8 14:14:44 squidmin kernel: [<c04059ea>] dump_stack+0x15/0x17 Aug 8 14:14:44 squidmin kernel: [<c044d9b5>] softlockup_tick+0xad/0xc4 Aug 8 14:14:44 squidmin kernel: [<c042e596>] update_process_times+0x39/0x5c Aug 8 14:14:44 squidmin kernel: [<c0418914>] smp_apic_timer_interrupt+0x5c/0x64 Aug 8 14:14:44 squidmin kernel: [<c0404ad3>] apic_timer_interrupt+0x1f/0x24 Aug 8 14:14:44 squidmin kernel: DWARF2 unwinder stuck at apic_timer_interrupt+0x1f/0x24 Aug 8 14:14:44 squidmin kernel: Leftover inexact backtrace: Aug 8 14:14:44 squidmin kernel: [<c047703e>] generic_fillattr+0x62/0xa4 Aug 8 14:14:44 squidmin kernel: [<f8c45290>] cifs_getattr+0x1e/0x24 [cifs] Aug 8 14:14:44 squidmin kernel: [<f8c45272>] cifs_getattr+0x0/0x24 [cifs] Aug 8 14:14:44 squidmin kernel: [<c0477519>] vfs_getattr+0x40/0x9b Aug 8 14:14:44 squidmin kernel: [<c0477895>] vfs_fstat+0x22/0x31 Aug 8 14:14:44 squidmin kernel: [<c04778b3>] sys_fstat64+0xf/0x23 Aug 8 14:14:44 squidmin kernel: [<c046de63>] sys_open+0x1c/0x1e Aug 8 14:14:44 squidmin kernel: [<c0404013>] syscall_call+0x7/0xb -----Original Message----- From: Jason Taylor Sent: Wednesday, August 08, 2007 10:22 AM To: fedora-list@xxxxxxxxxx Subject: RE: fedora 6 kernel panic issues >From this smartctl report, (running smartctl -a -d ata /dev/sda) it looks like the drive is not having any errors. I am leaning towards a driver or power issue. I have moved the hard drive to a machine with identical hardware and it has been up for one day at this point. I have run Memtest86 on the old machine for 24 hours and it has passed 28 times with flying colors. smartctl version 5.36 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: ST3160811AS Serial Number: 6PT54BA6 Firmware Version: 3.AAE User Capacity: 160,041,885,696 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Aug 8 09:26:29 2007 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 54) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 116 097 006 Pre-fail Always - 230511004 3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 12 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 6802205 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 122 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 14 187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0 190 Unknown_Attribute 0x0022 069 049 045 Old_age Always - 589103135 194 Temperature_Celsius 0x0022 031 051 000 Old_age Always - 31 (Lifetime Min/Max 0/23) 195 Hardware_ECC_Recovered 0x001a 057 051 000 Old_age Always - 11080071 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. -----Original Message----- From: fedora-list-bounces@xxxxxxxxxx [mailto:fedora-list-bounces@xxxxxxxxxx] On Behalf Of Tony Nelson Sent: Wednesday, August 08, 2007 7:27 AM To: fedora-list@xxxxxxxxxx Subject: Re: fedora 6 kernel panic issues At 4:06 PM +0100 8/7/07, Alan Cox wrote: >On Mon, 6 Aug 2007 20:29:41 -0700 >"Jason Taylor" <jtaylor@xxxxxxxxxxxxx> wrote: > >> >> That was all I saw at the console besides the transaction #'s. >> >> I was unable to open any virtual terminals or escape it at all. I will >>try and see if there is any more data at the end. >> >> I am still pretty Linux green. Is there something else that I can >>provide that would help? >> I ran through /var/log/messages and saw nothing. > >Before it choked it will have dumped a set of messages indicating ATA >error information to the system. That may have scrolled off before it >died, and if the disk failed then it couldn't write it to the log either. > >Drives keep their own failure information log usually (partly because of >this) and there are low level tools to access the information: > >open a terminal window > >do > >su - >[root password] >smartctl -a -d ata /dev/sda > >and it will dump the data for the first disk. > >That will show you various stats including an overall health self >assessment and also usually the last errors that occurred. Those are the >important and useful bit. Let me add that the word "fail" will always appear in the TYPE column in the report. Look at the "WHEN FAILED" column; if that is clear then the disk hasn't failed yet. See `man smartctl` about this and /don't panic/. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@xxxxxxxxxxxxxxxxx> ' <http://www.georgeanelson.com/> -- fedora-list mailing list fedora-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list