Re: [PATCH 1b/7] dlm: core locking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 02 May 2005 16:51, Lars Marowsky-Bree wrote:
> On 2005-04-30T07:12:46, Daniel Phillips <[email protected]> wrote:
> > process.  And obviously, there already is some reliable starting point or
> > cman would not work.  So let's just expose that and have a better cluster
> > stack.
>
> Most memberships internally construct such a fixed starting point from
> voting or other 'chatty' techniques.

But running a whole voting algorithm from square one makes no sense at all, 
because cman has already taken care of the first step.  Cman just fails to 
expose the result in the obvious way.  (I believe this remains the case in 
the current code - Patrick, could you confirm or deny please, and could we 
please have a pointer to the latest incarnation of cman?).

Now, please actually take a look at one of those voting schemes and chances 
are, you'll just see a perverse way of picking the lowest-numbered node.  But 
cman already knows which one that is, and even better, it knows the exact 
order each node joined the cluster.  So does every other node!

So we can just allow the oldest cluster node to supervise a full-fancy 
election (if indeed anything fancy is needed) or if it is too lazy for that, 
merely to designate the actual election master and then go back to whatever 
it was doing.  In this way, we compress dozens of lines of 
hard-to-read-and-maintain boilerplate cluster code running on multiple nodes 
and taking up valuable recovery time... into... _nothing_.

See?

So let's lose the "chatty" code and use the sensible, obvious approach that 
cman already uses for itself.

> This is exposed by the membership (providing all nodes in the same order
> on all nodes), however the node level membership does not necessarily
> reflect the service/application level membership. So to get it right,
> you essentially have to run such an algorithm at that level again too.

Yessirree!  But please lets get the easy base-level thing above out of the way 
first, then we can take a good hard look at how service groups need to work 
in order to be simple, sane, etc.  Note: what we want is not so different 
from how cman _already_ handles service groups.  Basically: take the oldest 
node concept (aka stable node enumeration) and apply it to service groups as 
well.  Then we need events from the service groups, just like the main 
cluster membership (which is in effect, an anonymous service group that all 
cluster nodes must join before they can join any other service group).  To be 
sure, cman is more-or-less designed _and documented_ this way already - we 
just need to do a few iterative improvements to turn it into a truly sensible 
gizmo.

> True enough it would be helpful if the group membership service provided
> such, but here we're at the node level.

It does, we just need to extract the diamond from the, ehm, rough ground.

> > But note that it _can_ use the oldest cluster member as a recovery
> > master, or to designate a recovery master.  It can, and should - there
> > is no excuse for making this any more complex than it needs to be.
>
> The oldest node might not be running that particular service, or it
> might not be healthy. To figure that out, you need to vote.

Not necessary!  Remember, we also have service groups.  Membership in each 
service group can (read: should) follow the same rules as cluster membership, 
and offers a similar, stable enumeration.  That is, the oldest member of each 
service group is, by default, the boss.  Therefore, except for certain 
recovery intervals that have to be handled by barriers, every node in the 
cluster always knows the identity of the boss of a particular service group.

> This is straying a bit from LKML issues, maybe it ought to be moved to
> one of the clustering lists.

It is very much on-topic here, and thankyou for driving it.

The reason this infrastructure track is on topic is, without this background, 
no core maintainer has the context they need to know why we think things 
should be done one way versus another in (g)dlm, let alone the incoming gfs 
code.

In the end, we will hatch a lovely kernel API, but not if we cluster mavens 
are the only ones who actually have a sense of direction.  Left to discuss 
the issues only amongst ourselves, we would likely end up with little more 
than eternal out-of-tree status.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux