Re: Network receive stall avoidance (was [PATCH 2/9] deadlock prevention core)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrew Morton wrote:
handwaving

- The mmap(MAP_SHARED)-the-whole-world scenario should be fixed by
  mm-tracking-shared-dirty-pages.patch.  Please test it and if you are
  still able to demonstrate deadlocks, describe how, and why they
  are occurring.

OK, but please see "atomic 0 order allocation failure" below.

- We expect that the lots-of-dirty-anon-memory-over-swap-over-network
scenario might still cause deadlocks.
  I assert that this can be solved by putting swap on local disks.  Peter
  asserts that this isn't acceptable due to disk unreliability.  I point
  out that local disk reliability can be increased via MD, all goes quiet.

  A good exposition which helps us to understand whether and why a
  significant proportion of the target user base still wishes to do
  swap-over-network would be useful.

I don't care much about swap-over-network, Peter does.  I care about
reliable remote disks in general, and reliable nbd in particular.

- Assuming that exposition is convincing, I would ask that you determine
  at what level of /proc/sys/vm/min_free_kbytes the swap-over-network
  deadlock is no longer demonstrable.

Earlier, I wrote: "we need to actually fix the out of memory
deadlock/performance bug right now so that remote storage works
properly."  It is actually far more likely that memory reclaim stuffups
will cause TCP to drop block IO packets here and there than that we
will hit some endless retransmit deadlock.  But dropping packets and
thus causing TCP stalls during block writeout is a very bad thing too,
I do not think you could call a system that exhibits such behavior a
particularly reliable one.

So rather than just the word deadlock, let us add "or atomic 0 order
alloc failure during TCP receive" to the challenge.  Fair?

On the client send side, Peter already showed an easily reproducable
deadlock which we avoid by using elevator-noop.  The bug is still
there, it is just under a different rock now.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux