From: Bob Peterson <[email protected]> The problem boiled down to a race between the gdlm_init_threads() function initializing thread1 and its setting of blist = 1. Essentially, "if (current == ls->thread1)" was checked by the thread before the thread creator set ls->thread1. Since thread1 is the only thread who is allowed to work on the blocking queue, and since neither thread thought it was thread1, no one was working on the queue. So everything just sat. This patch reuses the ls->async_lock spin_lock to fix the race, and it fixes the problem. I've done more than 2000 iterations of the loop that was recreating the failure and it seems to work. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]> -- diff --git a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c index 1aca51e..bd938f0 100644 --- a/fs/gfs2/locking/dlm/thread.c +++ b/fs/gfs2/locking/dlm/thread.c @@ -268,20 +268,16 @@ static inline int check_drop(struct gdlm_ls *ls) return 0; } -static int gdlm_thread(void *data) +static int gdlm_thread(void *data, int blist) { struct gdlm_ls *ls = (struct gdlm_ls *) data; struct gdlm_lock *lp = NULL; - int blist = 0; uint8_t complete, blocking, submit, drop; DECLARE_WAITQUEUE(wait, current); /* Only thread1 is allowed to do blocking callbacks since gfs may wait for a completion callback within a blocking cb. */ - if (current == ls->thread1) - blist = 1; - while (!kthread_should_stop()) { set_current_state(TASK_INTERRUPTIBLE); add_wait_queue(&ls->thread_wait, &wait); @@ -333,12 +329,22 @@ static int gdlm_thread(void *data) return 0; } +static int gdlm_thread1(void *data) +{ + return gdlm_thread(data, 1); +} + +static int gdlm_thread2(void *data) +{ + return gdlm_thread(data, 0); +} + int gdlm_init_threads(struct gdlm_ls *ls) { struct task_struct *p; int error; - p = kthread_run(gdlm_thread, ls, "lock_dlm1"); + p = kthread_run(gdlm_thread1, ls, "lock_dlm1"); error = IS_ERR(p); if (error) { log_error("can't start lock_dlm1 thread %d", error); @@ -346,7 +352,7 @@ int gdlm_init_threads(struct gdlm_ls *ls) } ls->thread1 = p; - p = kthread_run(gdlm_thread, ls, "lock_dlm2"); + p = kthread_run(gdlm_thread2, ls, "lock_dlm2"); error = IS_ERR(p); if (error) { log_error("can't start lock_dlm2 thread %d", error); -- 1.5.1.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- References:
- [GFS2/DLM] Pre-pull patch posting
- From: [email protected]
- [PATCH 01/51] [GFS2] Fix two races relating to glock callbacks
- From: [email protected]
- [PATCH 02/51] [GFS2] Fix calculation of demote state
- From: [email protected]
- [PATCH 03/51] [GFS2] Clean up duplicate includes in fs/gfs2/
- From: [email protected]
- [PATCH 04/51] [GFS2] GFS2 not checking pointer on create when running under nfsd
- From: [email protected]
- [PATCH 05/51] [GFS2] Fix an oops in glock dumping
- From: [email protected]
- [PATCH 06/51] [GFS2] Move some code inside the log lock
- From: [email protected]
- [PATCH 07/51] [GFS2] Revert part of earlier log.c changes
- From: [email protected]
- [PATCH 08/51] [GFS2] Prevent infinite loop in try_rgrp_unlink()
- From: [email protected]
- [PATCH 09/51] [GFS2] use an temp variable to reduce a spin_unlock
- From: [email protected]
- [PATCH 10/51] [GFS2] Detach buf data during in-place writeback
- From: [email protected]
- [PATCH 11/51] [GFS2] mark struct *_operations const
- From: [email protected]
- [PATCH 12/51] [GFS2] use the declaration of gfs2_dops in the header file instead
- From: [email protected]
- [PATCH 13/51] [GFS2] Reduce number of gfs2_scand processes to one
- From: [email protected]
- [PATCH 14/51] [GFS2] invalid metadata block - REVISED
- From: [email protected]
- [PATCH 15/51] [GFS2] Ensure journal file cache is flushed after recovery
- From: [email protected]
- [PATCH 16/51] [GFS2] use list_for_each_entry instead
- From: [email protected]
- [PATCH 17/51] [GFS2] unneeded typecast
- From: [email protected]
- [PATCH 18/51] [GFS2] better code for translating characters
- From: [email protected]
- [PATCH 19/51] [GFS2] Force unstuff of hidden quota inode
- From: [email protected]
- [PATCH 20/51] [GFS2] fixed a NULL pointer assignment BUG
- From: [email protected]
- [PATCH 21/51] [GFS2] Fix quota do_list operation hang
- From: [email protected]
- [PATCH 22/51] [GFS2] Clean up invalidatepage/releasepage
- From: [email protected]
- [PATCH 23/51] [GFS2] Add a missing gfs2_trans_add_bh()
- From: [email protected]
- [PATCH 24/51] [GFS2] Add NULL entry to token table
- From: [email protected]
- [PATCH 25/51] [GFS2] Reduce truncate IO traffic
- From: [email protected]
- [PATCH 26/51] [DLM] Fix lowcomms socket closing
- From: [email protected]
- [PATCH 27/51] [GFS2] Wendy's dump lockname in hex & fix glock dump
- From: [email protected]
- [PATCH 28/51] [GFS2] Patch to protect sd_log_num_jdata
- From: [email protected]
- [PATCH 29/51] [GFS2] panic after can't parse mount arguments
- From: [email protected]
- [PATCH 30/51] [GFS2] delay glock demote for a minimum hold time
- From: [email protected]
- [PATCH 31/51] [GFS2] fix inode meta data corruption
- From: [email protected]
- [PATCH 32/51] [GFS2] Correct lock ordering in unlink
- From: [email protected]
- [PATCH 33/51] [GFS2] Introduce gfs2_remove_from_ail
- From: [email protected]
- [PATCH 34/51] [GFS2] Don't mark jdata dirty in gfs2_unstuffer_page()
- From: [email protected]
- [PATCH 35/51] [GFS2] Move pin/unpin into lops.c, clean up locking
- From: [email protected]
- [PATCH 36/51] [GFS2] Clean up ordered write code
- From: [email protected]
- [PATCH 37/51] [GFS2] Fix ordering of dirty/journal for ordered buffer unstuffing
- From: [email protected]
- [PATCH 38/51] [GFS2] Replace revoke structure with bufdata structure
- From: [email protected]
- [PATCH 39/51] [GFS2] Use slab operations for all gfs2_bufdata allocations
- From: [email protected]
- [PATCH 40/51] [GFS2] Clean up gfs2_trans_add_revoke()
- From: [email protected]
- [PATCH 41/51] [GFS2] flocks from same process trip kernel BUG at fs/gfs2/glock.c:1118!
- From: [email protected]
- [PATCH 42/51] [GFS2] Move inode deletion out of blocking_cb
- From: [email protected]
- [PATCH 43/51] [DLM] Make dlm_sendd cond_resched more
- From: [email protected]
- [GFS2/DLM] Pre-pull patch posting
- Prev by Date: [PATCH 43/51] [DLM] Make dlm_sendd cond_resched more
- Next by Date: [PATCH 45/51] [GFS2] Clean up journaled data writing
- Previous by thread: [PATCH 43/51] [DLM] Make dlm_sendd cond_resched more
- Next by thread: [PATCH 45/51] [GFS2] Clean up journaled data writing
- Index(es):