kernel oops: assertion failure at journal:576 (ext3 issue?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all:

We have a few (3) systems that are crashing with:

  Assertion failure in journal_next_log_block() at fs/jbd/journal.c:576:
  "journal->j_free > 1" 

  Kernel BUG at journal:576
  invalid operand: 0000 [1] SMP
  CPU 1
  Modules linked in: 
  md5 ipv6 parport_pc lp parport w83627hf eeprom adm1026 hwmon_vid hwmon
  i2c_sensor i2c_isa i2c_amd756 i2c_amd8111 i2c_dev i2c_core nfs lockd
  nfs_acl sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter
  ip_tables button battery ac ohci_hcd hw_random tg3 floppy dm_snapshot
  dm_zero dm_mirror ext3 jbd dm_mod 3w_9xxx sata_mv libata sd_mod
  scsi_mod
  Pid: 1603, comm: kjournald Not tainted 2.6.9-42.0.3.ELsmp
  RIP: 0010:[<ffffffffa006c18a>]
  <ffffffffa006c18a>{:jbd:journal_next_log_block+76}
  RSP: 0018:0000010476327b88 EFLAGS: 00010212
  RAX: 0000000000000060 RBX: 0000010283163e00 RCX: ffffffff803e1fe8
  RDX: ffffffff803e1fe8 RSI: 0000000000000246 RDI: ffffffff803e1fe0
  RBP: 0000000000000040 R08: ffffffff803e1fe8 R09: 0000010283163e00
  R10: 0000000100000000 R11: ffffffff8011e884 R12: 0000010283163e24
  R13: 0000010476327be0 R14: 0000010283163e00 R15: 000000000000002e
  FS: 0000002a95560b00(0000) GS:ffffffff804e5200(0000)
  knlGS:00000000f7ff36c0
  CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000002a9556c000 CR3: 0000000037e42000 CR4: 00000000000006e0
  Process kjournald (pid: 1603, threadinfo 0000010476326000, task
  0000010478d777f0)
  Stack: 0000010453f4afa8 0000010310072240 0000000000000040
  0000010147528be0
  000001044240a880 ffffffffa0067dfe 00000e7c00000000
  00000101c33f2184
  0000000000000000 0000010310b12f50
  Call Trace:<ffffffffa0067dfe>{:jbd:journal_commit_transaction+1834}
  <ffffffff80135756>{autoremove_wake_function+0}
  <ffffffff80135756>{autoremove_wake_function+0}
  <ffffffffa006a914>{:jbd:kjournald+250}
  <ffffffff80135756>{autoremove_wake_function+0}
  <ffffffff80135756>{autoremove_wake_function+0}
  <ffffffffa006a814>{:jbd:commit_timeout+0}
  <ffffffff80110f47>{child_rip+8}
  <ffffffffa006a81a>{:jbd:kjournald+0}
  <ffffffff80110f3f>{child_rip+0}

  Code: 0f 0b bd e2 06 a0 ff ff ff ff 40 02 48 8b ab 18 01 00 00 48
  RIP <ffffffffa006c18a>{:jbd:journal_next_log_block+76} RSP
  <0000010476327b88>
  <0>Kernel panic - not syncing: Oops

(Note I editied together some lines in the "Modules linked in"
section. The rest is cut from the serial console (size 80x24) on the
system.)

We are running centos 4.4 kernel. Uname -a shows:

  Linux cook05 2.6.9-42.0.3.ELsmp #1 SMP Fri Oct 6 06:28:26 CDT 2006
  x86_64 x86_64 x86_64 GNU/Linux 

The disk subsystem for this crash are 4 sata disks on a 3ware 9550
(see the attached dmesg output for more info) with a mix of western
digital and seagate drives. It has also crashed with sysrq enabled and
(not surprisingly) the system is totally dead. We have to power cycle
it to reboot it.

Other systems experiencing the same crash have:

   * non-smp version of the same kernel with the software md raid
     drivers
   * same kernel running a megaraid raid card

The same crash has also been seen with an earlier kernel version
2.6.9-42.ELsmp.

It seems to crash when we expect the system to have high IO, but we
don't have any hard evidence of throughput/transactions to disk to
support that.

We can try setting up a remote kernel dump if that would be
useful/would work.

We get a crash every couple of days on average (sometimes two crashes
with 30 min-2 hours between them) so we can try applying patches/new
kernels if needed and see how the system does.

I have attached selected lines from dmesg to give some additional info
about the hardware and config of the system. I tried to attach
/proc/kallsyms from the system as requested by the mailing list FAQ
at: http://www.tux.org/lkml/#s4-3. However it has been two days since
I originally sent that email and I haven't see it arrive in the
archives, so that info is available on request.  The dmesg info is
from a post crash boot that should be identical to the pre-crash boot.

If you require more/different information just let me know and I will
try to obtain it.

Thank you for your help.

--
				-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-643-9300 x 111
Linux version 2.6.9-42.0.3.ELsmp ([email protected])
(gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)) #1 SMP
Fri Oct 6 06:28:26 CDT 2006
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
 BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
 BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000480000000 (usable)
ACPI: RSDP (v000 ACPIAM                                ) @ 0x00000000000f97a0
ACPI: RSDT (v001 A M I  OEMRSDT  0x04000610 MSFT 0x00000097) @ 0x000000007fff000
0
ACPI: FADT (v002 A M I  OEMFACP  0x04000610 MSFT 0x00000097) @ 0x000000007fff020
0
ACPI: MADT (v001 A M I  OEMAPIC  0x04000610 MSFT 0x00000097) @ 0x000000007fff038
0
ACPI: OEMB (v001 A M I  OEMBIOS  0x04000610 MSFT 0x00000097) @ 0x000000007ffff04
0
ACPI: SRAT (v001 A M I  OEMSRAT  0x04000610 MSFT 0x00000097) @ 0x000000007fff355
0
ACPI: DSDT (v001  H8DAR H8DAR010 0x00000000 INTL 0x02002026) @ 0x000000000000000
0

CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(1) -> Node 0
CPU0: AMD Opteron(tm) Processor 250 stepping 01
per-CPU timeslice cutoff: 1024.16 usecs.
task migration cache decay timeout: 2 msecs.
Booting processor 1/1 rip 6000 rsp 10478fd9f58
Initializing CPU#1
Calibrating delay using timer specific routine.. 4810.25 BogoMIPS (lpj=2405126)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(1) -> Node 1
AMD Opteron(tm) Processor 250 stepping 01
Total of 2 processors activated (9627.98 BogoMIPS).
Using local APIC timer interrupts.
Detected 12.528 MHz APIC timer.
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs


Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
Probing IDE interface ide1...
hdc: CD-224E-N, ATAPI CD/DVD-ROM drive
Using cfq io scheduler
ide1 at 0x170-0x177,0x376 on irq 15
Probing IDE interface ide0...
Probing IDE interface ide2...
Probing IDE interface ide3...
Probing IDE interface ide4...
Probing IDE interface ide5...
hdc: ATAPI 24X CD-ROM drive, 256kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27

SCSI subsystem initialized
libata version 1.20 loaded.
sata_mv 0000:03:03.0: version 0.6
ACPI: PCI interrupt 0000:03:03.0[A] -> GSI 42 (level, low) -> IRQ 185
sata_mv 0000:03:03.0: 32 slots 4 ports unknown mode IRQ via INTx
ata1: SATA max UDMA/133 cmd 0x0 ctl 0xFFFFFF00000A2120 bmdma 0x0 irq 185
ata2: SATA max UDMA/133 cmd 0x0 ctl 0xFFFFFF00000A4120 bmdma 0x0 irq 185
ata3: SATA max UDMA/133 cmd 0x0 ctl 0xFFFFFF00000A6120 bmdma 0x0 irq 185
ata4: SATA max UDMA/133 cmd 0x0 ctl 0xFFFFFF00000A8120 bmdma 0x0 irq 185
ata1: no device found (phy stat 00000000)
scsi0 : sata_mv
ata2: no device found (phy stat 00000000)
scsi1 : sata_mv
ata3: no device found (phy stat 00000000)
scsi2 : sata_mv
ata4: no device found (phy stat 00000000)
scsi3 : sata_mv
3ware 9000 Storage Controller device driver for Linux v2.26.04.010.
ACPI: PCI interrupt 0000:03:01.0[A] -> GSI 40 (level, low) -> IRQ 177
scsi4 : 3ware 9000 Storage Controller
3w-9xxx: scsi4: Found a 3ware 9000 Storage Controller at 0xfe9ff000, IRQ: 177.
3w-9xxx: scsi4: Firmware FE9X 3.04.01.011, BIOS BE9X 3.04.00.002, Ports: 4.
  Vendor: AMCC      Model: 9550SX-4LP DISK   Rev: 3.04
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sda: 1953083392 512-byte hdwr sectors (999979 MB)
SCSI device sda: drive cache: write back, no read (daft)
SCSI device sda: 1953083392 512-byte hdwr sectors (999979 MB)
SCSI device sda: drive cache: write back, no read (daft)
 sda: sda1 sda2
Attached scsi disk sda at scsi4, channel 0, id 0, lun 0
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: [email protected]
cdrom: open failed.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.

EXT3 FS on dm-0, internal journal
cdrom: open failed.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev sda1, type ext3), uses xattr
SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev dm-1, type ext3), uses xattr
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev dm-2, type ext3), uses xattr
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev dm-3, type ext3), uses xattr
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev dm-4, type ext3), uses xattr
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-6, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev dm-6, type ext3), uses xattr
Adding 4095992k swap on /dev/VolGroup00/LogVol05.  Priority:-1 extents:1

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux