On Fri, 2005-08-19 at 15:20 +0100, Al Viro wrote: > On Fri, Aug 19, 2005 at 12:14:48PM +0100, Anton Altaparmakov wrote: > > There is a bug somewhere in 2.6.11.4 and I can't figure out where it is. > > I assume it is present in older and newer kernels, too as the related > > code hasn't changed much AFAICS and googling for "Bad page state" > > returns rather a lot of hits relating to both older (up to 2.5.70!) and > > newer kernels... > > > > Note: PLEASE do not stop reading because you read ncpfs below as I am > > pretty sure it is not ncpfs related! And looking at google a lot of > > people have reported such similar problems since 2.5.70 or so and they > > were all told to go away as they have bad ram. That is impossible > > because this happens on well over 600 workstations and several servers > > 100% reproducible. Many different types of hardware, different makes, > > difference age, all running smp kernels even if single cpu. You can't > > tell me they all have bad ram. Windows works fine and Linux works fine > > except for that one specific problem which is 100% reproducible... > > > > The bug only appears, but it appears 100% reproducibly when a cross > > volume symlink on ncpfs is accessed using nautilus under gnome. I.e. > > double click on a cross volume symlink on ncpfs in nautilus and the > > machine locks up solid. > > Ugh... Could you at least tell what does nautilus attempt to do at that > point? Something that wouldn't show up with simple ls -l <symlink> or > cat <symlink> >/dev/null, judging by the above, but what? One important thing I forgot to add is that the symlink must point to a directory not a file. If it points to a file all works fine. I tried stracing nautilus to answer your question. And this time for the first time instead of a Bad page state I got a BUG() triggered in fs/namei.c, the arrow below marks the spot: void page_put_link(struct dentry *dentry, struct nameidata *nd) { if (!IS_ERR(nd_get_link(nd))) { struct page *page; page = find_get_page(dentry->d_inode->i_mapping, 0); if (!page) ----> BUG(); kunmap(page); page_cache_release(page); page_cache_release(page); } } Here is the BUG output: kernel BUG at fs/namei.c:2329! invalid operand: 0000 [#1] SMP Modules linked in: autofs4 nls_cp850 nls_utf8 ncpfs subfs fuse snd soundcore speedstep_lib freq_table thermal processor fan button battery ac nvram ipt_REJECT iptable_filter ip_tables af_packet edd joydev evdev st video1394 ohci1394 raw1394 ieee1394 capability sg sr_mod dm_mod e1000 uhci_hcd usbcore i2c_i801 i2c_core hw_random parport_pc lp parport reiserfs ide_cd cdrom ide_disk aic7xxx aic79xx via82cxxx ata_piix libata piix ide_core sd_mod scsi_mod CPU: 3 EIP: 0060:[<c0174e3e>] Tainted: G U VLI EFLAGS: 00010246 (2.6.11.4-21.8-smp) EIP is at page_put_link+0x3e/0x50 eax: 00000000 ebx: 00000000 ecx: d9c75654 edx: 00000000 esi: 00000000 edi: d9d30000 ebp: ffac2000 esp: d9d31eac ds: 007b es: 007b ss: 0068 Process nautilus (pid: 21910, threadinfo=d9d30000 task=f5caf550) Stack: d9c75654 c0171659 00000000 00004000 c2025060 00000001 d9d31f1c dff3ec80 d9c75654 001d981f 00000003 d9eb2014 00000000 d9d30000 d9d31f1c 00000000 d9eb2000 c0171e96 d9eb2000 d9d31f1c 00000001 d9eb2000 c017211f 08542878 Call Trace: [<c0171659>] link_path_walk+0x929/0xe80 [<c0171e96>] path_lookup+0xa6/0x1c0 [<c017211f>] __user_walk+0x2f/0x70 [<c016c6fd>] vfs_stat+0x1d/0x60 [<c016ce5f>] sys_stat64+0xf/0x30 [<c0109051>] do_syscall_trace+0xb1/0x175 [<c01040d7>] syscall_call+0x7/0xb Code: 31 d2 8b 80 a8 00 00 00 e8 80 e6 fc ff 85 c0 89 c3 74 18 89 d8 e8 93 5e fa ff 89 d8 e8 2c 80 fd ff 89 d8 5b e9 24 80 fd ff 5b c3 <0f> 0b 19 09 76 bb 32 c0 eb de 90 8d b4 26 00 00 00 00 83 ec 28 The compressed strace output is attached. It was tracing the main nautilus process (strace -f -F -o /strace.log -p 21910) which is where the bug was hit as well. FWIW I attached strace to the process just before double-clicking on the symlink in the nautilus window... Looking at the bottom of the strace.log file it seems like two possibly concurrent stat64() calls of the symlink were running, one of which did not complete... (note this was done on a dual-cpu machine with hyperthreading enabled, i.e. it appears to have four cpus). Does this help? Let me know if I can provide any other details... (I can provide an ssh connection to a machine for testing purposes if required.) Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
Attachment:
strace.log.bz2
Description: application/bzip
- Follow-Ups:
- References:
- Kernel bug: Bad page state: related to generic symlink code and mmap
- From: Anton Altaparmakov <[email protected]>
- Re: Kernel bug: Bad page state: related to generic symlink code and mmap
- From: Al Viro <[email protected]>
- Kernel bug: Bad page state: related to generic symlink code and mmap
- Prev by Date: Re: 2.6.13-rc6-git10 test report [x86_64](WITHOUT NVIDIA MODULE)
- Next by Date: Re: 2.6.13-rc6-mm1
- Previous by thread: Re: Kernel bug: Bad page state: related to generic symlink code and mmap
- Next by thread: Re: Kernel bug: Bad page state: related to generic symlink code and mmap
- Index(es):