Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 07, 2007 at 02:56:45PM +0100, Frank van Maarseveen wrote:
> On Tue, Nov 06, 2007 at 05:13:50PM -0600, Robert Hancock wrote:
> > Frank van Maarseveen wrote:
> > >For quite some time I'm seeing occasional lockups spread over 50 different
> > >machines I'm maintaining. Symptom: a page allocation failure with order:1,
> > >GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
> > >pages, almost no swap used) followed by a lockup (everything dead). I've
> > >collected all (12) crash cases which occurred the last 10 weeks on 50
> > >machines total (i.e. 1 crash every 41 weeks on average). The kernel
> > >messages are summarized to show the interesting part (IMO) they have
> > >in common. Over the years this has become the crash cause #1 for stable
> > >kernels for me (fglrx doesn't count ;).
> > >
> > >One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
> > >network driver via that same driver (netconsole) may not be the smartest
> > >thing to do and this could be responsible for the lockup itself. However,
> > >the initial page allocation failure remains and I'm not sure how to
> > >address that problem.
> > >
> > >I still think the issue is memory fragmentation but if so, it looks
> > >a bit extreme to me: One system with 2GB of ram crashed after a day,
> > >merely running a couple of TCP server programs. All systems have either
> > >1 or 2GB ram and at least 1G of (merely unused) swap.
> > 
> > These are all order-1 allocations for received network packets that need 
> > to be allocated out of low memory (assuming you're using a 32-bit 
> > kernel), so it's quite possible for them to fail on occasion. (Are you 
> > using jumbo frames?)
> 
> I don't use jumbo frames.
> 
> 
> > 
> > That should not be causing a lockup though.. the received packet should 
> > just get dropped.
> 
> Ok, packet loss is recoverable to some extend. When a system crashes
> I often see a couple of page allocation failures in the same second,
> all reported via netconsole.

[snip]

I've grepped for 'Normal free:' assuming it is the low memory you mention to see
how it correlates. Of the 12 cases 7 did crash, 5 recovered:

Nov  5 12:58:27 lokka Normal free:6444kB min:3736kB low:4668kB high:5604kB active:235196kB inactive:104336kB present:889680kB pages_scanned:44 all_unreclaimable? no 
Nov  5 12:58:27 lokka Normal free:6444kB min:3736kB low:4668kB high:5604kB active:235196kB inactive:104336kB present:889680kB pages_scanned:44 all_unreclaimable? no 
Nov  5 12:58:27 lokka Normal free:6444kB min:3736kB low:4668kB high:5604kB active:235196kB inactive:104336kB present:889680kB pages_scanned:44 all_unreclaimable? no 
crash

Oct 29 11:48:07 somero Normal free:5412kB min:3736kB low:4668kB high:5604kB active:288068kB inactive:105708kB present:889680kB pages_scanned:0 all_unreclaimable? no 
Oct 29 11:48:07 somero Normal free:6704kB min:3736kB low:4668kB high:5604kB active:287940kB inactive:105084kB present:889680kB pages_scanned:0 all_unreclaimable? no 
Oct 29 11:48:08 somero Normal free:8332kB min:3736kB low:4668kB high:5604kB active:287760kB inactive:104240kB present:889680kB pages_scanned:54 all_unreclaimable? no 
ok (more cases with increasing free memory not received via netconsole)

Oct 26 11:27:01 naantali Normal free:3976kB min:3736kB low:4668kB high:5604kB active:318568kB inactive:152928kB present:889680kB pages_scanned:0 all_unreclaimable? no 
Oct 26 11:27:01 naantali Normal free:4408kB min:3736kB low:4668kB high:5604kB active:318256kB inactive:152856kB present:889680kB pages_scanned:0 all_unreclaimable? no 
Oct 26 11:27:01 naantali Normal free:4408kB min:3736kB low:4668kB high:5604kB active:318256kB inactive:152856kB present:889680kB pages_scanned:0 all_unreclaimable? no 
crash

Oct 12 14:56:44 koli Normal free:11628kB min:3736kB low:4668kB high:5604kB active:238112kB inactive:157232kB present:889680kB pages_scanned:0 all_unreclaimable? no 
ok

Oct  1 08:51:58 salla Normal free:5496kB min:3736kB low:4668kB high:5604kB active:409500kB inactive:46388kB present:889680kB pages_scanned:137 all_unreclaimable? no 
Oct  1 08:51:59 salla Normal free:7396kB min:3736kB low:4668kB high:5604kB active:408292kB inactive:46740kB present:889680kB pages_scanned:0 all_unreclaimable? no 
crash

Sep 17 10:34:49 lokka Normal free:39756kB min:3736kB low:4668kB high:5604kB active:236916kB inactive:175624kB present:889680kB pages_scanned:0 all_unreclaimable? no 
ok

Sep 17 10:48:48 karvio Normal free:11648kB min:3736kB low:4668kB high:5604kB active:424420kB inactive:45380kB present:889680kB pages_scanned:144 all_unreclaimable? no 
Sep 17 10:48:48 karvio Normal free:11648kB min:3736kB low:4668kB high:5604kB active:424420kB inactive:45380kB present:889680kB pages_scanned:144 all_unreclaimable? no 
crash

Sep 20 10:32:50 nivala Normal free:27276kB min:3736kB low:4668kB high:5604kB active:354084kB inactive:104152kB present:889680kB pages_scanned:260 all_unreclaimable? no 
crash

Sep  3 09:46:11 lahti Normal free:26200kB min:3736kB low:4668kB high:5604kB active:242088kB inactive:94900kB present:889680kB pages_scanned:0 all_unreclaimable? no 
Sep  3 09:46:11 lahti Normal free:28096kB min:3736kB low:4668kB high:5604kB active:238756kB inactive:96184kB present:889680kB pages_scanned:0 all_unreclaimable? no 
ok (one additional case with "Normal free:31888kB" not received via netconsole)

Aug 30 10:40:46 ropi Normal free:14372kB min:3736kB low:4668kB high:5604kB active:393508kB inactive:93644kB present:889680kB pages_scanned:0 all_unreclaimable? no 
ok

Aug 30 10:46:58 ivalo Normal free:9808kB min:3736kB low:4668kB high:5604kB active:392388kB inactive:106044kB present:889680kB pages_scanned:96 all_unreclaimable? no 
Aug 30 10:46:58 ivalo Normal free:12324kB min:3736kB low:4668kB high:5604kB active:390276kB inactive:105852kB present:889680kB pages_scanned:32 all_unreclaimable? no 
crash

Aug 31 16:30:02 lokka Normal free:11840kB min:3736kB low:4668kB high:5604kB active:206760kB inactive:172036kB present:889680kB pages_scanned:7 all_unreclaimable? no 
Aug 31 16:30:02 lokka Normal free:13268kB min:3736kB low:4668kB high:5604kB active:205824kB inactive:171976kB present:889680kB pages_scanned:0 all_unreclaimable? no 
crash

I'll try "echo 40000 >/proc/sys/vm/min_free_kbytes" but I'm not sure
if it applies to all memory or only low memory and if it would make a
difference in practice.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux