Re: Back to the future.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday, 29 April 2007 10:23, Pavel Machek wrote:
> Hi!
> 
> > > > The freezer has *caused* those deadlocks (eg by stopping threads that were 
> > > > needed for the suspend writeouts to succeed!), not solved them.
> > > 
> > > I can't remember anything like this, but I believe you have a specific test
> > > case in mind.
> > 
> > Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place?
> > 
> > Rafael, you really don't know what you're talking about, do you?
> > 
> > Just _look_ at them. It's the IO threads etc that shouldn't be frozen, 
> > exactly *because* they do IO. You claim that kernel threads shouldn't do 
> > IO, but that's the point: if you cannot do IO when snapshotting to disk, 
> > here's a damn big clue for you: how do you think that snapshot is going to 
> > get written?
> > 
> > I *guarantee* you that we've had a lot more problems with threads that 
> > should *not* have been frozen than with those hypothetical threads that 
> > you think should have been frozen.
> 
> Well, we had nasty corruption on XFS, caused by thread that was not
> frozen and should be. (While the other case leads "only" to deadlocks,
> so it is easier to debug.)
> 
> The locking point.. when I added freezing to swsusp, I knew very
> little about kernel locking, so I "simply" decided to avoid the
> problem altogether... using the freezer.
> 
> You may be right that locks are not a big problem for the hibernation
> after all; I just do not know.

Still, I think, if a kernel thread is a part of a device driver, then _in_
_principle_ it needs _some_ synchronization with the driver's suspend/freeze
and resume/thaw callbacks.  For example, it's reasonable to assume that the
thread should be quiet between suspend/freeze and resume/thaw.

With the freezing of kernel threads we provide a simple means of such
synchronization: use try_to_freeze() in a suitable place of your kernel thread
and you're done.  [Well, there should be a second part for making the thread
die if the thaw callback doesn't find the device, but that's in the works.]

Without it, there may be race conditions that we are not even aware of and that
may trigger in, say, 1 in 10 suspends or so and I wish you good luck with
debugging such things.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux