Re: Linux 2.6.22: had to reboot after OOM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 15 Jul 2007 15:03:21 +0300 Sami Farin <[email protected]> wrote:

> After I got this error [1], system got real slow, like 386 having 32 MB of RAM
> and swapping constantly.
> My system is P4 SMP with 1GB of RAM.
> 
> I got this same behavior with 2.6.19, too, but then I used GNU cp v6.9
> which had micro-optimization which did not bother doing read() for regular
> files, like /proc/vmstat , instead, it generated 0-byte destination file
> (I noticed that only after rebooting).
> So I do not have useful debug info for 2.6.19.
> 
> And now my version of cp does not have that micro-optimization.
> 
> I also attached vmstat and slabinfo , in case somebody can figure out
> something out of them.
> I also now use --probes=5 for ipset nethash... uses less mem but is slower.

I don't know why your system went bad like that.

> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa st
>  0  2 762376 748392      0  17104   11    8    52    65   64    53 10  3 85  3  0
>  0  3 762376 748664      0  17144  456   12   908    24  237  3044  2  3 26 70  0
>  0  2 762376 749544      0  16492  464  284  1440   288  248  3633  2  5 14 79  0
>  0  3 762376 749916      0  16392  432    0  1368    16  239  3360  1  2  5 93  0
>  0  3 762376 751108      0  15740  512  432  1656   460  214  3751  1  3 28 69  0
>  0  7 762376 751740      0  15512  532  104  1920   131  237  4014  2  4  5 90  0
>  0 10 762436 755064      0  15652  200 3172   992  3264  274  5086  1  3  1 96  0
>  0 11 762504 757208      0  15156  144  836   624   926  204  2602  1  2  0 98  0
>  0 13 762680 766384      0  12740   36 7284    96  7308  252  8643  0  5  0 95  0
>  0 14 762772 771312      0  11696   12 3908   132  3912  253  5153  0  3  0 97  0
>  0 13 763044 778536      0  10356  184 8088   572  8088  249  6889  1  3  0 97  0
>  0 15 763092 781380      0  10284    0 2136   532  2160  176  2387  1  2  0 98  0
>  0 16 763360 785428      0  10008   60 4884  1312  4985  281  4798  0  5  0 96  0
>  0 19 763636 791192      0   8548   24 6264   476  6268  219  6732  0  3  0 97  0
>  0 18 763824 799184      0   6240    0 7500    52  7504  217  7080  1  6  0 94  0
>  0 18 763872 802196      0   5816    0 2628   552  2640  194  3851  0  4  0 96  0
>  0 28 764048 805072      0   5168   92 1760  1180  1772  185  2703  0  5  0 94  0
>  0 30 764048 806032      0   5256   36  164   364   234  126  1448  0  1  0 99  0
>  0 31 764048 807728      0   4732   32   40    92    84  152  1824  0  3  0 97  0
>  0 35 764056 809372      0   4332 1100  532  9084   804  194  1986  0  7  0 94  0
>  0 74 764012 839544      0   8100 1604    0  5660    12 4216 46853  0  8  0 92  0
> 
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in    cs us sy id wa st
>  0 14 762504 757108   4004  90724   11    8    52    65   64    53 10  3 85  3  0
>  0 13 762676 764608   8768  78084   44 6324   288  6370  251  7697  0  4  0 95  0
>  0 14 762772 770832  12404  68328   12 4868   128  4880  243  6161  0  4  0 96  0
>  0 13 763044 778536  24248  48736  180 7908   548  7912  296  7208  0  3  0 97  0
>  0 16 763092 781380  21196  49004    4 2312   560  2336  185  2524  1  1  0 98  0
>  0 16 763360 784584  40040  27048   60 4088  1312  4189  267  3964  0  4  0 96  0
>  0 19 763636 791192  36552  23804   24 7064   392  7068  227  7600  0  4  0 96  0
>  0 18 763824 798896  36928  15568    0 7352   136  7356  219  6902  0  5  0 94  0
>  0 19 763872 802088  41372   7876    0 2776   548  2788  195  3917  0  4  0 96  0
>  0 25 764048 804636  39896   6932   56 1760   920  1772  170  2602  0  4  0 95  0
>  0 30 764048 806032  39308   6272   72  164   628   234  131  1579  0  2  0 98  0
>  0 30 764048 807500  38316   5768   32   40    84    68  149  1630  0  2  0 98  0
>  1 34 764056 808476  37900   5444   20  192   660   242  184  1883  0  6  0 94  0
>  1 37 764056 811148  38772   1740   20  100   388   104  168  2409  0  5  0 95  0
>  0 38 764056 813056  34164   4536    0    0   124    47  227  2590  0  6  0 94  0
>  0 38 764056 815500  32924   3308   40    0   608     5  228  2448  0  4  0 96  0
> 10 37 764056 817764  30404   3800    0   16   260    20  323  3795  0 10  0 90  0
>  1 37 764056 819548  28552   3744    0   20   148    56  207  2275  0  6  0 94  0
>  1 38 764056 820828  27004   4028   20    8   496    21  216  1703  0  3  0 97  0
>  0 71 764056 820064  28228   3508  600    0  1424     4  140   826  0  2  0 99  0
>  0 73 764028 822568  25928   3404   68   36   332    48  205  2223  0  5  0 95  0
> 
> 
> 
> As a comparison, these are stats after booting, with similar workload:
> 
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa st
>  2  0 290544  12728      0 444216    2   13   213   285  144   922  7  2 82  9  0
>  0  0 290544  12672      0 444264    0    0    16  1616  270  1881  2  2 96  0  0
>  0  0 290544  12752      0 444280    0    0    16    56  245  1687  2  2 96  1  0
>  0  0 290544  12876      0 444280    0    0     0    31  243  1604  2  1 97  0  0
>  0  0 290544  12520      0 444260   48    0    80    50  333  1708  2  2 94  2  0
>  0  0 290544  12656      0 444424    0    0   136   114  269  1817  3  2 95  1  0
>  0  0 290544  12672      0 444448    0    0    12  1687  330  1822  2  2 97  0  0
> 
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in    cs us sy id wa st
>  2  0 290544  18756 316324 420928    2   13   212   284  144   922  7  2 82  9  0
>  0  0 290544  18716 316360 421112    0    0    32    72  273  1979  2  2 96  0  0
>  0  0 290544  18472 316504 421152    0    0   144    51  256  1644  3  1 97  0  0
>  0  0 290544  18256 316520 421144    0    0    16    63  240  1768  2  3 96  1  0
>  0  0 290544  18264 316520 421160    0    0     0   121  243  1770  2  1 97  0  0
>  0  0 290544  18312 316520 421168    0    0     8    51  235  1789  2  1 97  0  0
>  0  0 290544  18252 316544 421168    0    0    24   204  296  2017  2  1 96  1  0
> 
> ...
> 
> and the rest are after the mem allocation failure.
> 
> 2007-07-15 10:43:02.271042500 <6>[246964.992391] SysRq : Show Memory
> 2007-07-15 10:43:02.271082500 <6>[246964.992399] Mem-info:
> 2007-07-15 10:43:02.271103500 <4>[246964.992401] DMA per-cpu:
> 2007-07-15 10:43:02.271123500 <4>[246964.992405] CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> 2007-07-15 10:43:02.271144500 <4>[246964.992409] CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> 2007-07-15 10:43:02.271164500 <4>[246964.992412] Normal per-cpu:
> 2007-07-15 10:43:02.271185500 <4>[246964.992415] CPU    0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  50
> 2007-07-15 10:43:02.271205500 <4>[246964.992419] CPU    1: Hot: hi:  186, btch:  31 usd:  21   Cold: hi:   62, btch:  15 usd:  59
> 2007-07-15 10:43:02.271225500 <4>[246964.992422] HighMem per-cpu:
> 2007-07-15 10:43:02.271245500 <4>[246964.992425] CPU    0: Hot: hi:   42, btch:   7 usd:   6   Cold: hi:   14, btch:   3 usd:   1
> 2007-07-15 10:43:02.271266500 <4>[246964.992429] CPU    1: Hot: hi:   42, btch:   7 usd:   4   Cold: hi:   14, btch:   3 usd:   1
> 2007-07-15 10:43:02.271287500 <4>[246964.992434] Active:23152 inactive:648 dirty:8 writeback:50 unstable:0
> 2007-07-15 10:43:02.271307500 <4>[246964.992436]  free:189843 slab:8987 mapped:3892 pagetables:1960 bounce:0
> 2007-07-15 10:43:02.271329500 <4>[246964.992440] DMA free:10384kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16256kB pages_scanned:0 all_unreclaimable? yes
> 2007-07-15 10:43:02.271349500 <4>[246964.992444] lowmem_reserve[]: 0 873 975
> 2007-07-15 10:43:02.271371500 <4>[246964.992451] Normal free:742488kB min:3744kB low:4680kB high:5616kB active:2860kB inactive:212kB present:894080kB pages_scanned:0 all_unreclaimable? no

You have heaps of free ZONE_NORMAL memory.

> 
> -------------------------------------------------------------------------------------
> 
> [1]
> 
> 2007-07-15 10:12:40.640668500 <4>[245143.816358] net/ipv4/netfilter/ip_set_nethash.c: retry: rehashing of set blockedp2pnew triggered: hashsize grows from 262144 to 348651
> 2007-07-15 10:12:40.640672500 <4>[245143.816379] ipset: page allocation failure. order:9, mode:0x20

ipset does an order-9 atomic allocation?  That's optimistic.  Hopefully it's
just doing a probe-for-the-largest-allocation operation, albeit rather stupidly.

ipset is some out-of-tree patch you added, I assume?

> 2007-07-15 10:12:40.640808500 <4>[245143.816548] Normal free:9772kB min:3744kB low:4680kB high:5616kB active:465640kB inactive:228696kB present:894080kB pages_scanned:32 all_unreclaimable? no

No free memory, but lots of presumably-reclaimable pagecache there.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux