Re: iommu_alloc failure and panic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2006-01-28 at 10:48 +1100, Anton Blanchard wrote:
> Hi,
> 
> > I have been testing large numbers of storage devices.  I have 16000 scsi
> > LUNs configured.  (4000 fiberchannel LUNS seen 4 times).  They are
> > configured as 4000 multipath devices using device mapper.  I have 100 of
> > those devices configured as a logical volume using LVM2.  Once I start
> > bonnie++ using one of those logical volumes I start seeing iommu_alloc
> > failures and then a panic.  I don't have this issue when accessing the
> > individual devices or the individual multipath devices.  Only when
> > conglomerated into a logical volume.
> > 
> > Here is the syslog output:
> > 
> > Jan 26 14:56:41 linux kernel: iommu_alloc failed, tbl c00000000263c480 vaddr c0000000d7967000 npages 10
> 
> This stuff should be OK since the lpfc driver should handle the iommu
> filling up. Im guessing since you have so many LUNs you can get a whole
> lot of outstanding IO, enough to fill up the TCE table.
> 
> > DAR: C000000600711710
> 
> > NIP [C00000000000F7D0] .validate_sp+0x30/0xc0
> > LR [C00000000000FA2C] .show_stack+0xec/0x1d0
> > Call Trace:
> > [C0000000076DBBC0] [C00000000000FA18] .show_stack+0xd8/0x1d0 (unreliable)
> > [C0000000076DBC60] [C000000000433838] .schedule+0xd78/0xfb0
> > [C0000000076DBDE0] [C000000000079FC0] .worker_thread+0x1b0/0x1c0
> 
> This is interesting, an oops in validate_sp which is a pretty boring
> function. We have seen this in the past when you overflow your kernel
> stack. The bottom of the kernel stack contains the threadinfo struct and
> in that we have the task_cpu() data.
> 
> When we overflow the stack and overwrite the threadinfo we end up with 
> a crazy number for task_cpu() and then oops when doing 
> hardirq_ctx[task_cpu(p)]. Can you turn on DEBUG_STACKOVERFLOW and
> DEBUG_STACK_USAGE and see if you get any stack overflow warnings? 

It looks like they are already on:

CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_STACK_USAGE=y

> The
> second option will also allow you to do sysrq t and dump the most stack
> each process has used at that point.

The machine is frozen after the panic. 


Mark.


> 
> Anton
-- 
Mark Haverkamp <[email protected]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux