Re: [Q] Bio traversal trouble?

On 06/04/2007 08:41 PM, Pekka Enberg wrote:

[ Jens' email address updated to his oracle address ]

We can see that we're reading 2048 bytes from port 0x300 and storing
the data in memory location 0x8c1d2071 which causes the OOPS. What's
surprising is that EBP is set to 0x8c1d2071 too which suggests stack
corruption (note that ioread8_rep() is a fastcall so it does not
create a stack frame of its own). What you could do here is add some
printks and watch how dst changes over time to see if you can figure
out a pattern.

Mmm, I do always run with 4K stacks, I guess I'll try that next. I did followthings a bit with printk. The normal and expected scenario is an N-frame requestconsisting of N bio's, each consisting of 1 segment, each consisting of 1 frame.


That is, with this as mitsumi_read:

static unsigned int mitsumi_read(struct mitsumi_cdrom *mcd, struct bio *bio)
{
        sector_t frame = sector_to_frame(bio->bi_sector);
        unsigned int bytes = 0;
        struct bio_vec *bvec;
        int segno;

        printk("%s: enter\n", __func__);

        bio_for_each_segment(bvec, bio, segno) {
                unsigned char *dst;
                unsigned int len;

                dst = page_address(bvec->bv_page) + bvec->bv_offset;

                printk("%s: dst = %p, len = %#x\n", __func__, dst,
                       bvec->bv_len);

                for (len = bvec->bv_len; len; len -= CD_FRAMESIZE) {
                        if (mitsumi_get_frame(mcd, frame++, dst))
                                goto out;

                        bytes += CD_FRAMESIZE;
                        dst += CD_FRAMESIZE;
                }
        }
  out:
        return bytes;
}

I get a long uninterrupted string of:

[  903.130588] mitsumi_read: enter
[  903.130630] mitsumi_read: dst = c2cd1000, len = 0x800
[  903.143644] mitsumi_read: enter
[  903.143686] mitsumi_read: dst = c2cd1800, len = 0x800
[  903.156949] mitsumi_read: enter
[  903.156992] mitsumi_read: dst = c3353000, len = 0x800
[  903.170325] mitsumi_read: enter
[  903.170367] mitsumi_read: dst = c3353800, len = 0x800

until it oopses:

[  903.184750] mitsumi_read: enter
[  903.184795] mitsumi_read: dst = c3a7f000, len = 0x800
[  903.299315] mitsumi_read: dst = 8c1d2071, len = 0xc1033bda

[ 903.311457] BUG: unable to handle kernel paging request at virtual address8c1d2071

And yes, it's the ioread8_rep that's trying to address the wrong dst. Thisparticular OOPS by the way while very similar doesn't have cfq in the backtraceso here it is complete again:


===
BUG: unable to handle kernel paging request at virtual address 8c1d2071
 printing eip:
c10a6d6f
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: mitsumi nfsd exportfs lockd sunrpc nls_cp437 msdos fat nls_base
CPU:    0
EIP:    0060:[<c10a6d6f>]    Not tainted VLI
EFLAGS: 00010246   (2.6.21.3 #1)
EIP is at ioread8_rep+0x20/0x31
eax: 00010000   ebx: 00010300   ecx: 00000800   edx: 00000300
esi: c2e26b20   edi: 8c1d2071   ebp: 8c1d2071   esp: c2e26adc
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process dd (pid: 1437, ti=c2e26000 task=c1632050 task.ti=c2e26000)
Stack: 00000000 c3c55098 c486504d c4865ab7 00000006 50e20034 c11a5edc 00000034
       00000000 00520300 50d50100 11d73140 00000034 00340300 5b3e343c 30392020
       39322e33 00000000 00000001 00000000 00000000 00000000 dead4ead ffffffff
Call Trace:
 [<c486504d>] __mitsumi_get_frame+0xc/0x16 [mitsumi]
 [<c4865ab7>] mitsumi_get_frame+0x120/0x134 [mitsumi]
 [<c1033bda>] mempool_free+0x5f/0x64
 [<c4865b66>] mitsumi_read+0x9b/0xce [mitsumi]
 [<c1033bda>] mempool_free+0x5f/0x64
 [<c4865be2>] mitsumi_request+0x49/0xb5 [mitsumi]
 [<c1099aae>] blk_start_queueing+0x14/0x1c
 [<c10971c2>] elv_insert+0xa3/0x13f
 [<c114fbc5>] _spin_lock_irq+0x2f/0x3a
 [<c109a577>] __make_request+0x29f/0x2d3
 [<c109a75b>] generic_make_request+0x136/0x146
 [<c1048433>] kmem_cache_alloc+0x93/0x9e
 [<c1033ae0>] mempool_alloc+0x32/0xcd
 [<c1033ae0>] mempool_alloc+0x32/0xcd
 [<c109a80f>] submit_bio+0xa4/0xaa
 [<c106736b>] bio_alloc_bioset+0xb2/0x112
 [<c1066dd1>] submit_bh+0xc2/0xdb
 [<c1065fc4>] block_read_full_page+0x245/0x252
 [<c1068346>] blkdev_get_block+0x0/0x36
 [<c114ffff>] _write_unlock_irq+0x20/0x23
 [<c103134f>] add_to_page_cache+0x71/0x78
 [<c10368f3>] read_pages+0x7c/0xc1
 [<c114ff7e>] _read_unlock_irq+0x20/0x23
 [<c114ff7e>] _read_unlock_irq+0x20/0x23
 [<c1025de0>] trace_hardirqs_on+0x11b/0x13e
 [<c1036a25>] __do_page_cache_readahead+0xed/0x107
 [<c1036b43>] blockable_page_cache_readahead+0x4d/0x9d
 [<c1036c17>] make_ahead_window+0x84/0xa5
 [<c1036d74>] page_cache_readahead+0x13c/0x15b
 [<c1031cbc>] file_read_actor+0x7f/0x102
 [<c1031953>] do_generic_mapping_read+0x110/0x3fa
 [<c1031ec6>] generic_file_aio_read+0x187/0x1b0
 [<c1031c3d>] file_read_actor+0x0/0x102
 [<c104abc3>] do_sync_read+0xbf/0xfc
 [<c101f2b8>] autoremove_wake_function+0x0/0x33
 [<c1070fe1>] dnotify_parent+0x1b/0x66
 [<c104aec2>] vfs_write+0xc9/0xff
 [<c1027210>] __lock_release+0x2d/0x3f
 [<c104ac87>] vfs_read+0x87/0xfd
 [<c104af39>] sys_read+0x41/0x67
 [<c1002490>] syscall_call+0x7/0xb
 =======================

Code: 00 00 89 c8 ef c3 0f c9 89 0a c3 57 3d ff ff 03 00 53 89 d7 89 c3 89 ca 7715 66 31 c0 3d 00 00 01 00 74 04 0f 0b eb fe 0f b7 d3 <f3> 6c eb 0a 4a 78 07 8a03 88 07 47 eb f6 5b 5f c3 57 3d ff ff

EIP: [<c10a6d6f>] ioread8_rep+0x20/0x31 SS:ESP 0068:c2e26adc
===

Also, assuming CFQ fiddles with the request after we've done
elv_next_request() and dropped the lock, one thing worth trying is to
do blk_stop_queue() before we drop it and blk_start_queue after we
reacquire it so that we don't actually get any new I/O while we wait
for mitsumi_get_frame() to finish.

That sounded like it might work (even if it might not be a proper solution) so Iadded:


	block_stop_queue(q);

immediately before the spin_unlock_irq(q->queue_lock) and a

	block_start_queue(q);

immediately after the spin_lock_irq(q->queue_lock) in mitsumi_request. Thisseems to be an immediate deadlock of sorts though. Box dead. :-\


Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [Q] Bio traversal trouble?
  - From: Rene Herman <[email protected]>

References:
- [Q] Bio traversal trouble?
  - From: Rene Herman <[email protected]>
- Re: [Q] Bio traversal trouble?
  - From: "Pekka Enberg" <[email protected]>

Prev by Date: Re: [PATCH 00/10] Containers(V10): Generic Process Containers
Next by Date: Re: [PATCH] x86_64: remove check for CONSTANT_TSC from time.c
Previous by thread: Re: [Q] Bio traversal trouble?
Next by thread: Re: [Q] Bio traversal trouble?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]