Fedora Users — Question regarding pthread_cancel and pthread_cond

We have a threading library which has been in production for
six years and currently functions
on Solaris 2.6-2.9 Sparc, Solaris 2.7-2.10 x86, HP-UX 11.00,
Tru64 5.1(a,b), AIX 4.3.x and AIX 5.x.

The library starts up within the current process 5-8 threads,
the operation runs to completion (with or without error), the
threads complete or are canceled and then complete depending on
what happened during processing.

At some latter time this repeated N times without the main process exiting. The threads are NOT detached.

The problem occurs on Fedora Core 3 if thread has exited exited and pthread_cancel is called with a thread id of a thread which has completed.

If thread has exited and we call pthread_cancel with that thread id on Fedora Core 3 ( version info getconf GNU_LIBPTHREAD_VERSION NPTL 2.3.4 >uname -a Linux irl-73-26 2.6.10-1.770_FC3 #1 Thu Feb 24 14:00:06 EST 2005 i686 i686 i386 GNU/Linux )

the application segfaults.  Is this the expected behavior?

I am also getting a segfault when pthread_cond_timedwait is called, I still determining the exact state when the segfault occurred. The back trace shows

#0 0x005c57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00839dbc in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0

The directory listing shows: ls -l /lib/tls/ total 1936 drwxr-xr-x 2 root root 4096 Mar 23 04:03 i486 drwxr-xr-x 2 root root 4096 Mar 23 04:03 i586 drwxr-xr-x 2 root root 4096 Mar 23 04:03 i686 -rwxr-xr-x 1 root root 1524828 Dec 21 02:04 libc-2.3.4.so lrwxrwxrwx 1 root root 13 Mar 22 18:42 libc.so.6 -> libc-2.3.4.so -rwxr-xr-x 1 root root 215272 Dec 21 02:04 libm-2.3.4.so lrwxrwxrwx 1 root root 13 Mar 22 18:42 libm.so.6 -> libm-2.3.4.so -rwxr-xr-x 1 root root 108560 Dec 21 02:04 libpthread-2.3.4.so lrwxrwxrwx 1 root root 19 Mar 22 18:42 libpthread.so.0 -> libpthread-2.3.4.so -rwxr-xr-x 1 root root 50984 Dec 21 02:04 librt-2.3.4.so lrwxrwxrwx 1 root root 14 Mar 22 18:42 librt.so.1 -> librt-2.3.4.so -rwxr-xr-x 1 root root 32308 Dec 21 02:04 libthread_db-1.0.so lrwxrwxrwx 1 root root 19 Mar 22 18:42 libthread_db.so.1 -> libthread_db-1.0.so

Is this what NPTL on Fedora Core 3 does TODAY? or is there a problem in the sequence of releasing mutex's or condition variables that would cause this behavior in our code on Fedora Core 3.

We maintain internal thread exit status so I can skip cancelling the threads which have succesfully exited. We normally just cancel everything we started just as a big hammer to make sure every thread shuts down and exits. We can make the abort function a bit smarter since it has access to our internal thread status if need be.

On the OS's I mentioned above 0 is returned on success, on failure:

On HP-UX  11.00 pthread_cancel returns the value ERSCH, errno is NOT set.

On Solaris SPARC and x86 same as HP-UX 11.00

AIX same as HP-UX an Solaris.

On Tru64 pthread_cancel returns EINVAL or ESRCH, errno is not set.

Eric Bruno.

Question regarding pthread_cancel and pthread_cond_timedwait