On Thu, Aug 09, 2007 at 11:15:42PM -0700, Andrew Morton wrote: > On Thu, 9 Aug 2007 20:14:35 -0500 Dean Nelson <[email protected]> wrote: > > > This patch provides cross partition access to user memory (XPMEM) when > > running multiple partitions on a single SGI Altix. > > > > - Please don't send multiple patches all with the same title. Sorry, I'd somehow gotten the impression it was to be done this way. > - Feed the diff through checkpatch.pl, ponder the result. I did and made the necessary changes to satisfy the issues it raised, except for: 1) ERROR: Don't use kernel_thread(): see Documentation/feature-removal-schedule.txt As mentioned in the intro to this set of patches, XPMEM is not currently using the kthread API because it was in the process of being changed to require a kthread_stop() be done for every kthread_create(), and the kthread_stop() couldn't be called for a thread that had already exited. In talking with Eric Biederman, there was some talk of creating a kthread_orphan() which would eliminate the need for a call to kthread_stop() being required. 2) WARNING: Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt I eliminated all but two usages of volatile in XPMEM. One in xpmem_flush_cpu_cache() and the other in xpmem_recall_cache_lines_phys(). Without the volatile the compiler eliminates the necessary instruction in each. Any suggestions as how to eliminate the need for volatile in these two cases? 3) WARNING: declaring multiple variables together should be avoided checkpatch.pl is erroneously commplaining about the following found in five different functions in arch/ia64/sn/kernel/xpmem_pfn.c. int n_pgs = xpmem_num_of_pages(vaddr, size); > ! Avoid needless casts to and from void* (eg, vm_private_data) Done. > ! The test for PF_DUMPCORE in xpmem_fault_handler() is mysterious and > merits a comment. Done. > - xpmem_fault_handler() is scary. You are very right about that, but consider that this code has been running at our customer sites on very large systems (18 partitions with 512 CPUs in each partition). The avoid_deadlock_1 and avoid_deadlock_2 issues were discovered at such sites and have been resolved (to the best of our knowledge). > - xpmem_fault_handler() appears to have imposed a kernel-wide rule that > when taking multiple mmap_sems, one should take the lowest-addressed one > first? If so, that probably wants a mention in that locking comment in > filemap.c Sure. After looking at the lock ordering comment block in mm/filemap.c, it wasn't clear to me how best to document this. Any suggestions/help would be most appreciated. > - xpmem_fault_handler() does atomic_dec(&seg_tg->mm->mm_users). What > happens if that was the last reference? When /dev/xpmem is opened by a user process, xpmem_open() incs mm_users and when it is flushed, xpmem_flush() decs it (via mmput()) after having ensured that no XPMEM attachments exist of this mm. Thus the dec in xpmem_fault_handler() will never take it to 0. > - Has it all been tested with lockdep enabled? Jugding from all the use > of SPIN_LOCK_UNLOCKED, it has not. > > Oh, ia64 doesn't implement lockdep. For this code, that is deeply > regrettable. No, it hasn't been tested with lockdep. But I have switched it from using SPIN_LOCK_UNLOCKED to spin_lock_init(). > ! This code all predates the nopage->fault conversion and won't work in > current kernels. I've switched from using nopage to using fault. I read that it is intended that nopfn also goes away. If this is the case, then the BUG_ON if VM_PFNMAP is set would make __do_fault() a rather unfriendly replacement for do_no_pfn(). > - xpmem_attach() does smp_processor_id() in preemptible code. Lucky that > ia64 doesn't do preempt? Actually, the code is fine as is even with preemption configured on. All it's doing is ensuring that the thread was previously pinned to the CPU it's currently running on. If it is, it can't be moved to another CPU via preemption, and if it isn't, the check will fail and we'll return -EINVAL and all is well. > - Stuff like this: > > + ap_tg = xpmem_tg_ref_by_apid(apid); > + if (IS_ERR(ap_tg)) > + return PTR_ERR(ap_tg); > + > + ap = xpmem_ap_ref_by_apid(ap_tg, apid); > + if (IS_ERR(ap)) { > + xpmem_tg_deref(ap_tg); > + return PTR_ERR(ap); > + } > > is fragile. It is easy to introduce leaks and locking errors as the > code evolves. The code has a lot of these deeply-embedded `return' > statments preceded by duplicated unwinding. Kenrel code generally > prefers to do the `goto out' thing. Done (assuming I correctly interpreted your intent). > ! "XPMEM_FLAG_VALIDPTEs"? Someone's pinky got tired at the end ;) Yeah, but it's better now (and so is the code). > Attention span expired at 19%, sorry. It's a large patch. I've included the new patch as an attachment. Thanks, Dean
Attachment:
xpmem-module.v002.bz2
Description: BZip2 compressed data
- Follow-Ups:
- Re: [RFC 3/3] SGI Altix cross partition memory (XPMEM)
- From: Andrew Morton <[email protected]>
- Re: [RFC 3/3] SGI Altix cross partition memory (XPMEM)
- References:
- [RFC 0/3] SGI Altix cross partition memory (XPMEM)
- From: Dean Nelson <[email protected]>
- [RFC 3/3] SGI Altix cross partition memory (XPMEM)
- From: Dean Nelson <[email protected]>
- Re: [RFC 3/3] SGI Altix cross partition memory (XPMEM)
- From: Andrew Morton <[email protected]>
- [RFC 0/3] SGI Altix cross partition memory (XPMEM)
- Prev by Date: Re: Problems with IDE on linux 2.6.22.X
- Next by Date: Re: net/ipv4/fib_trie.c - compile error (Re: 2.6.23-rc3-mm1)
- Previous by thread: Re: [RFC 3/3] SGI Altix cross partition memory (XPMEM)
- Next by thread: Re: [RFC 3/3] SGI Altix cross partition memory (XPMEM)
- Index(es):