Re: GFS2 and DLM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Andrew Morton <[email protected]> wrote:

> > it is relevant to a certain degree, because it creates a (IMO) false 
> > impression of merging showstoppers. After months of being in -mm, 
> > and after addressing all issues that were raised (and there was a 
> > fair amount of review activity December last year iirc), one week 
> > prior the close of the merge window a 'huge' list of issues are 
> > raised. (after belovingly calling the GFS2 code a "huge mess", to 
> > create a positive and productive tone for the review discussion i 
> > guess.)
> 
> It's a general problem - our reviewing resources do not have the 
> capacity to cover our coding resources.  This is especially the case 
> on filesystems.  We'd have merged (a very different) reiser4 a year 
> ago if things were in balance.

and just this very minute what gets merged upstream? A chunk of OCFS2 
code that has this comment in it:

 * NOTE: the allocation error cases here are scary
 * we really cannot afford to fail an alloc in recovery
 * do we spin?  returning an error only delays the problem really

plus this code:

                        /* sleep for a bit in hopes that we can avoid
                         * another ENOMEM */
                        msleep(100);
                        goto retry;

and this:

                        /* TODO Look into replacing msleep with cond_resched() */
                        msleep(100);
                        goto retry;

and this:

                /* yield a bit to allow any final network messages
                 * to get handled on remaining nodes */
                msleep(100);

and this:

                if (status < 0) {
                        mlog(ML_ERROR, "%s: failed to alloc recovery area, "
                             "retrying\n", dlm->name);
                        msleep(1000);
                }

and this:

                                } else {
                                        /* -ENOMEM on the other node */
                                        mlog(0, "%s: node %u returned "
                                             "%d during recovery, retrying "
                                             "after a short wait\n",
                                             dlm->name, ndata->node_num,
                                             status);
                                        msleep(100);
                                }

and that's just from a 60 seconds scan.

and we are not merging GFS2 that does an honest __GFP_NOFAIL for a hard 
to solve problem? (Btw., __GFP_NOFAIL is actually more robust due to the 
congestion sleep it does, it is more reviewable and more fixable thing 
than an open-coded msleep() or cond_resched().)

"Hypocrisy", "double standard", "pot calling the kettle black" is pretty 
much the nicest words that come to mind :-(

[and again, i'm not blaming XFS or OCFS2 here.]

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux