Re: BUG: Null pointer dereference in fs/open.c

On Mon, 23 Apr 2007, Andrew Morton wrote:

On Tue, 24 Apr 2007 04:09:18 +0000 (GMT) William Heimbigner <[email protected]> wrote:

This bug occurs in linux-2.6.20 and 2.6.21-rc7-git5, and does not occur in
linux-2.6.19-git22.

After running "pktsetup 0 /dev/hdd", I get (timestamps removed):

pktcdvd: pkt_get_last_written failed
BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000e
printing eip:
c0173f69
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in: snd_ca0106 snd_ac97_codec ac97_bus 8139cp 8139too iTCO_wdt
CPU:    0
EIP:    0060:[<c0173f69>]    Not tainted VLI
EFLAGS: 00010203   (2.6.21-rc7-git5 #22)
EIP is at do_sys_open+0x59/0xd0
eax: 00000002   ebx: 40000020   ecx: 00000001   edx: 00000002
esi: df1e3000   edi: 00000003   ebp: de17bfa4   esp: de17bf84
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process vol_id (pid: 4273, ti=de17b000 task=df4143f0 task.ti=de17b000)
Stack: 00000000 c013d2a5 ffffff9c 00000002 c059cea3 bfb6bf64 00008000 b7f60ff4
        de17bfb0 c017401c 00000000 de17b000 c01041c6 bfb6bf64 00008000 00000000
        00008000 b7f60ff4 bfb6a798 00000005 0000007b 0000007b 00000000 00000005
Call Trace:
  [<c010521a>] show_trace_log_lvl+0x1a/0x30
  [<c01052d9>] show_stack_log_lvl+0xa9/0xd0
  [<c010551c>] show_registers+0x21c/0x3a0
  [<c01057a4>] die+0x104/0x260
  [<c04c5947>] do_page_fault+0x277/0x610
  [<c04c408c>] error_code+0x74/0x7c
  [<c017401c>] sys_open+0x1c/0x20
  [<c01041c6>] sysenter_past_esp+0x5f/0x99
  =======================
Code: ff 85 c0 89 c7 78 77 8b 45 08 89 d9 89 f2 89 04 24 8b 45 e8 e8 69 ff
ff ff 3d 00 f0 ff ff 89 45 ec 77 71 8b 55 ec bb 20 00 00 40 <8b> 42 0c 8b
48 30 89 4d f0 0f b7 51 66 81 e2 00 f0 00 00 81 fa
EIP: [<c0173f69>] do_sys_open+0x59/0xd0 SS:ESP 0068:de17bf84


Try this:

--- a/drivers/block/pktcdvd.c~packet-fix-error-handling
+++ a/drivers/block/pktcdvd.c
@@ -777,7 +777,8 @@ static int pkt_generic_packet(struct pkt
		rq->cmd_flags |= REQ_QUIET;

	blk_execute_rq(rq->q, pd->bdev->bd_disk, rq, 0);
-	ret = rq->errors;
+	if (rq->errors)
+		ret = -EIO;
out:
	blk_put_request(rq);
	return ret;
_


This patch fixes (or conceals?) the oops.



The packet driver was assuming that request.errors is an errno, but it
isn't - it's some sort of diagnostic bitfield thing.  Now why would the
packet driver have though that?  Let's go read the comments:

	unsigned short nr_hw_segments;

	unsigned short ioprio;

	void *special;
	char *buffer;

	int tag;
	int errors;

	int ref_count;


Well there's your root cause right there.


I don't know why this wasn't oopsing in eariler kernels.  Perhaps something
else is broken.  Please test this urgently.


There's a locking problem in there too.  `pktsetup 0 /dev/scd0' gives me

[   77.720000] pktcdvd: writer pktcdvd0 mapped to sr0
[   77.860000]
[   77.860000] =============================================
[   77.860000] [ INFO: possible recursive locking detected ]
[   77.860000] 2.6.21-rc7 #19
[   77.860000] ---------------------------------------------
[   77.860000] vol_id/2508 is trying to acquire lock:
[   77.860000]  (&bdev->bd_mutex){--..}, at: [<c01815e2>] do_open+0x5a/0x267
[   77.860000]
[   77.860000] but task is already holding lock:
[   77.860000]  (&bdev->bd_mutex){--..}, at: [<c01815e2>] do_open+0x5a/0x267
[   77.860000]
[   77.860000] other info that might help us debug this:
[   77.860000] 2 locks held by vol_id/2508:
[   77.860000]  #0:  (&bdev->bd_mutex){--..}, at: [<c01815e2>] do_open+0x5a/0x267
[   77.860000]  #1:  (&ctl_mutex#2){--..}, at: [<f8dc6986>] pkt_open+0x1a/0xcbc [pktcdvd]
[   77.860000]
[   77.860000] stack backtrace:
[   77.860000]  [<c01323c1>] __lock_acquire+0x11e/0xb3b
[   77.860000]  [<c02efe4e>] __mutex_unlock_slowpath+0x109/0x113
[   77.860000]  [<c0132166>] trace_hardirqs_on+0x11e/0x141
[   77.860000]  [<c0132e34>] lock_acquire+0x56/0x6e
[   77.860000]  [<c01815e2>] do_open+0x5a/0x267
[   77.860000]  [<c02f01a5>] mutex_lock_nested+0xf4/0x24f
[   77.860000]  [<c01815e2>] do_open+0x5a/0x267
[   77.860000]  [<c024020c>] kobj_lookup+0xda/0x104
[   77.860000]  [<c01815e2>] do_open+0x5a/0x267
[   77.860000]  [<c018184a>] __blkdev_get+0x5b/0x66
[   77.860000]  [<c0181867>] blkdev_get+0x12/0x14
[   77.860000]  [<f8dc69f9>] pkt_open+0x8d/0xcbc [pktcdvd]
[   77.860000]  [<c0170949>] __d_lookup+0x66/0xed
[   77.860000]  [<c0170949>] __d_lookup+0x66/0xed
[   77.860000]  [<c01ce919>] _atomic_dec_and_lock+0xd/0x2c
[   77.860000]  [<c01ce919>] _atomic_dec_and_lock+0xd/0x2c
[   77.860000]  [<c01ce919>] _atomic_dec_and_lock+0xd/0x2c
[   77.860000]  [<c015f655>] cache_alloc_refill+0x4a/0x444
[   77.860000]  [<c0240165>] kobj_lookup+0x33/0x104
[   77.860000]  [<c0132166>] trace_hardirqs_on+0x11e/0x141
[   77.860000]  [<c01815e2>] do_open+0x5a/0x267
[   77.860000]  [<c02f007f>] __mutex_lock_slowpath+0x222/0x235
[   77.860000]  [<c02f02ed>] mutex_lock_nested+0x23c/0x24f
[   77.860000]  [<c0131f85>] mark_held_locks+0x46/0x62
[   77.860000]  [<c02f02ed>] mutex_lock_nested+0x23c/0x24f
[   77.860000]  [<c02f02ed>] mutex_lock_nested+0x23c/0x24f
[   77.860000]  [<c0132166>] trace_hardirqs_on+0x11e/0x141
[   77.860000]  [<c01815e2>] do_open+0x5a/0x267
[   77.860000]  [<c02f02f8>] mutex_lock_nested+0x247/0x24f
[   77.860000]  [<c01815e2>] do_open+0x5a/0x267
[   77.860000]  [<c024020c>] kobj_lookup+0xda/0x104
[   77.860000]  [<c018160f>] do_open+0x87/0x267
[   77.860000]  [<c0181977>] blkdev_open+0x0/0x4d
[   77.860000]  [<c018199c>] blkdev_open+0x25/0x4d
[   77.860000]  [<c0160b77>] __dentry_open+0xb8/0x16e
[   77.860000]  [<c0160ca7>] nameidata_to_filp+0x24/0x33
[   77.860000]  [<c0160ce8>] do_filp_open+0x32/0x39
[   77.860000]  [<c02f1232>] _spin_unlock+0x14/0x1c
[   77.860000]  [<c0160ab5>] get_unused_fd+0xaa/0xb4
[   77.860000]  [<c01619da>] do_sys_open+0x42/0xbe
[   77.860000]  [<c0161a8f>] sys_open+0x1c/0x1e
[   77.860000]  [<c0103c58>] syscall_call+0x7/0xb
[   77.860000]  =======================
[   77.900000] pktcdvd: pkt_get_last_written failed

What the heck _is_ in request.errors?

Should the packet driver even be looking at it?

I got a similar recursive lock warning too, though I hadn't included it inthe last message:


[10349.402613] =============================================
[10349.423972] [ INFO: possible recursive locking detected ]
[10349.440138] 2.6.21-rc7-git5 #23
[10349.449541] ---------------------------------------------
[10349.465685] vol_id/4312 is trying to acquire lock:
[10349.480033]  (&bdev->bd_mutex){--..}, at: [<c019a82f>] do_open+0x4f/0x2c0
[10349.500561]
[10349.500566] but task is already holding lock:
[10349.518079]  (&bdev->bd_mutex){--..}, at: [<c019a82f>] do_open+0x4f/0x2c0
[10349.538590]
[10349.538591] other info that might help us debug this:
[10349.558186] 2 locks held by vol_id/4312:
[10349.569934]  #0:  (&bdev->bd_mutex){--..}, at: [<c019a82f>] do_open+0x4f/0x2c0
[10349.591767]  #1:  (&ctl_mutex#2){--..}, at: [<c04c221c>] mutex_lock+0x1c/0x20
[10349.613389]
[10349.613390] stack backtrace:
[10349.626463]  [<c010521a>] show_trace_log_lvl+0x1a/0x30
[10349.641904]  [<c0105952>] show_trace+0x12/0x20
[10349.655234]  [<c0105a46>] dump_stack+0x16/0x20
[10349.668594]  [<c013e410>] __lock_acquire+0xbc0/0x1040
[10349.683752]  [<c013e900>] lock_acquire+0x70/0x90
[10349.697626]  [<c04c229e>] mutex_lock_nested+0x7e/0x2e0
[10349.713067]  [<c019a82f>] do_open+0x4f/0x2c0
[10349.725905]  [<c019ab19>] __blkdev_get+0x79/0x90
[10349.739783]  [<c019ab45>] blkdev_get+0x15/0x20
[10349.753121]  [<c03298f7>] pkt_open+0xb7/0xd80
[10349.766221]  [<c019a865>] do_open+0x85/0x2c0
[10349.779030]  [<c019acc3>] blkdev_open+0x33/0x70
[10349.792652]  [<c0173ce4>] __dentry_open+0xf4/0x220
[10349.807048]  [<c0173eb5>] nameidata_to_filp+0x35/0x40
[10349.822232]  [<c0173f09>] do_filp_open+0x49/0x50
[10349.836107]  [<c0173f57>] do_sys_open+0x47/0xd0
[10349.849729]  [<c017401c>] sys_open+0x1c/0x20
[10349.862566]  [<c01041c6>] sysenter_past_esp+0x5f/0x99
[10349.877749]  =======================
[10350.141702] pktcdvd: pkt_get_last_written failed

William Heimbigner
[email protected]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: BUG: Null pointer dereference in fs/open.c
  - From: Andrew Morton <[email protected]>

References:
- BUG: Null pointer dereference (2.6.21-rc7)
  - From: William Heimbigner <[email protected]>
- Re: BUG: Null pointer dereference (2.6.21-rc7)
  - From: Andrew Morton <[email protected]>
- Re: BUG: Null pointer dereference (2.6.21-rc7)
  - From: William Heimbigner <[email protected]>
- Re: BUG: Null pointer dereference (2.6.21-rc7)
  - From: William Heimbigner <[email protected]>
- Re: BUG: Null pointer dereference in fs/open.c
  - From: William Heimbigner <[email protected]>

Prev by Date: Re: MMCv4 support (8-bit support missing)
Next by Date: Re: [PATCH] lazy freeing of memory through MADV_FREE
Previous by thread: Re: BUG: Null pointer dereference in fs/open.c
Next by thread: Re: BUG: Null pointer dereference in fs/open.c
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]