Re: Fwd: uninterruptable fcntl calls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2007-02-02 at 14:28 -0500, Aaron Wiebe wrote:
> Greetings,
> 
> I've run into a situation where fcntl F_SETLKW calls lock up nearly
> completely.  I've tried several approaches to handle this case, and
> have yet to come up with some method of handling this.  I've never
> really ventured outside userspace, so I'm turning to this list to try
> and get a handle on this.
> 
> Over NFSv3 udp, this situation takes place VERY rarely, however with
> the volume I do, its creating a problem.
> 
> In short, I am attempting to read or write lock, and the call hangs to
> the point where a sigkill is not captured - no signal is.  I've tried
> alarming out and I've tried switching the socket to nonblocking -
> nothing I can think of prevents or even allows me to handle the case.
> I understand NFS locking can be rather sketchy at times - but all I
> need is the ability to handle the case.
> 
> I can force the process to die by sending a sigkill, then stracing.
> The strace reports the process as sigstop, then processes the kill
> signal.
> 
> All I need here is a method of capturing this case.  I can "repair"
> the stuck lock by regenerating the file, but I can't capture the case
> in order to handle this in code.
> 
> Any help would be useful - I am currently running 2.6.15.6 compiled
> with the NFS patches from linux-nfs.org, but this case was happening
> before applying those patches.  I'd be happy to provide any more
> information nessecary.  I've been struggling with this one for a few
> months now.
> 
> Thanks,
> -Aaron
> 
> 
> Straces:
> 
> rt_sigaction(SIGALRM, {0xb7f56640, [ALRM], 0}, {SIG_DFL}, 8) = 0
> alarm(120)                              = 0
> fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}
> [hangs]
> 
> Or:
> 
> fcntl64(3, F_GETFL)                     = 0x8002 (flags O_RDWR|O_LARGEFILE)
> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
> fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}
> 
> 
> 
> Code used for locking:
> 
> static int db_lock(int fd, int type)
> {
>     struct flock fl;
>     struct timespec *tv = (struct timespec *) malloc(sizeof(struct timespec));
>     int ret, c = 0;
> 
>     if(!(fd > 0))
>         return -1;
> 
> #ifdef SIGALRM_HACK
>     /* after two minutes, wig out */
>     sigalrm_set();
>     alarm(120);
> #endif
> 
>     fl.l_whence = SEEK_SET;
>     fl.l_start = 0;
>     fl.l_len = 0;
>     fl.l_type = type;
> 
> #ifdef NONBLOCKING_HACK
>     set_nonblocking(fd);
> #endif
> 
>     while((ret = fcntl(fd, F_SETLKW, &fl)) < 0)
>     {
>         c++;
>         if(c > 600)
>         {
>             /* we've been waiting for 60 seconds... */
>             my_error("stuck on fcntl request, aborting");
>             return -1;
>         }
>         tv->tv_nsec = 100;   /* 10th of a second wait */
>         tv->tv_sec = 0;
>         nanosleep(tv, NULL);
>     }
>     free(tv);
> #ifdef SIGALRM_HACK
>     sigalrm_unset();
> #endif
> #ifdef NONBLOCKING_HACK
>     unset_nonblocking(fd);
> #endif
>     return ret;
> }

Should have been fixed in mainline in 2.6.16 by the following patch

http://linux-nfs.org/cgi-bin/gitweb.cgi?p=nfs-2.6.git;a=commitdiff;h=a9a801787a761616589a6526d7a29c13f4deb3d8;hp=03f28e3a2059fc466761d872122f30acb7be61ae

Cheers,
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux