Re: [BUG?] OOM with large cache....(x86_64, 2.6.24-rc3-git1, nohz)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On tis, 2007-11-20 at 15:13 +1100, Nick Piggin wrote:
> On Tuesday 20 November 2007 11:59, Ian Kumlien wrote:
> > Hi,
> >
> > I have had this before and sent a mail about it.
> >
> > It seems like the diskcache is still in use and is never shrunk. This
> > happened with a odd load though, trackerd started indexing a bit late
> > and the other workload which is a large bittorrent seed/download.
> >
> > The bittorrent app is the one that drives up the diskcache.
> >
> > I don't think that trackerd was triggering it, i actually upgraded
> > kernel since it kept happening on 2.6.23...
> >
> > I really don't know what other information i can provide.
> >
> > free from now (some hours later)
> > vmstat from now ^
> >
> > and the dmesg log.
> >
> > Ideas? Comments?
> >
> > free:
> >              total       used       free     shared    buffers     cached
> > Mem:       2056484    2039736      16748          0      20776    1585408
> > -/+ buffers/cache:     433552    1622932
> > Swap:      2530180     426020    2104160
> > ---
> >
> > vmstat:
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ----cpu---- r  b   swpd   free   buff  cache   si   so    bi    bo   in  
> > cs us sy id wa 0  0 426020  16612  20580 1585848   26   21   684    56   34
> >   51  5  3 88  4 ---
> >
> > --- 8<--- 8<---
> > ntpd invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
> >
> > Call Trace:
> >  [<ffffffff80270fd6>] oom_kill_process+0xf6/0x110
> >  [<ffffffff80271476>] out_of_memory+0x1b6/0x200
> >  [<ffffffff80273a07>] __alloc_pages+0x387/0x3c0
> >  [<ffffffff80275b03>] __do_page_cache_readahead+0x103/0x260
> >  [<ffffffff802703f1>] filemap_fault+0x2f1/0x420
> >  [<ffffffff8027bcbb>] __do_fault+0x6b/0x410
> >  [<ffffffff802499de>] recalc_sigpending+0xe/0x40
> >  [<ffffffff8027d9dd>] handle_mm_fault+0x1bd/0x7a0
> >  [<ffffffff80212cda>] save_i387+0x9a/0xe0
> >  [<ffffffff80227e76>] do_page_fault+0x176/0x790
> >  [<ffffffff8020bacf>] sys_rt_sigreturn+0x35f/0x400
> >  [<ffffffff806acbf9>] error_exit+0x0/0x51
> >
> > Mem-info:
> > DMA per-cpu:
> > CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1
> > usd:   0 CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0,
> > btch:   1 usd:   0 DMA32 per-cpu:
> > CPU    0: Hot: hi:  186, btch:  31 usd: 148   Cold: hi:   62, btch:  15
> > usd:  60 CPU    1: Hot: hi:  186, btch:  31 usd: 116   Cold: hi:   62,
> > btch:  15 usd:  18 Active:241172 inactive:241825 dirty:0 writeback:0
> > unstable:0
> >  free:3388 slab:8095 mapped:149 pagetables:6263 bounce:0
> > DMA free:7908kB min:20kB low:24kB high:28kB active:0kB inactive:0kB
> > present:7436kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0
> > 2003 2003 2003
> > DMA32 free:5644kB min:5716kB low:7144kB high:8572kB active:964688kB
> > inactive:967188kB present:2052008kB pages_scanned:5519125
> > all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0
> > DMA: 5*4kB 4*8kB 3*16kB 4*32kB 6*64kB 5*128kB 4*256kB 3*512kB 0*1024kB
> > 0*2048kB 1*4096kB = 7908kB DMA32: 95*4kB 2*8kB 0*16kB 0*32kB 0*64kB 1*128kB
> > 0*256kB 2*512kB 0*1024kB 0*2048kB 1*4096kB = 5644kB Swap cache: add
> > 1979600, delete 1979592, find 144656/307405, race 1+17 Free swap  = 0kB
> > Total swap = 2530180kB
> > Free swap:            0kB
> > 524208 pages of RAM
> > 10149 reserved pages
> > 5059 pages shared
> > 8 pages swap cached
> > Out of memory: kill process 8421 (trackerd) score 1016524 or a child
> > Killed process 8421 (trackerd)
> 
> It's also used up all your 2.5GB of swap. The output of your `free` shows
> a fair bit of disk cache there, but it also shows a lot of swap free, which
> isn't the case at oom-time.

Yes, as i said those was from several hours later...

> Unfortunately, we don't show NR_ANON_PAGES in these stats, but at a guess,
> I'd say that the file cache is mostly shrunk and you still don't have
> enough memory. trackerd probably has a memory leak in it, or else is just
> trying to allocate more memory than you have. Is this a regression?

I have had it happen twice before, without tracker running...

It didn't quite get to the oom stage, it just failed alot of allocations
while having 1.5 or more memory locked in cache.

http://marc.info/?l=linux-kernel&m=118895576924867&w=2

I saw that again and thought "Lets update and see what happens" and then
this happened. I haven't had the same version of trackerd go haywire on
me before, i haven't had a oom kill gnome-session or metacity before.

-- 
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux