I was recently disappointed to find that 2.4 kernels very slightly edge out the latest 2.6 kernels in kernel compiling performance on my dual Xeon, even when 2.6 has HZ set at 100. So I spent a few days trying some different mm/ optimisations, and managed to reduce total kernel residency (excluding idle time) by 7% on that workload, and now manage to beat 2.4 by a good third of a second. Here's a diffprofile versus a plain 2.6.14-rc3 kernel: 123 384.4% __get_zone_counts 54 0.0% __page_set_anon_rmap 37 17.0% find_lock_page 28 31.8% lru_cache_add_active 26 42.6% path_lookup 25 22.3% kmem_cache_alloc 20 6.2% find_vma 19 17.3% _atomic_dec_and_lock 18 26.5% __copy_from_user_ll 17 188.9% shmem_nopage 17 58.6% unmap_vmas 15 0.0% __page_state 14 31.1% copy_pte_range 14 13.6% __wake_up_bit 14 0.0% remove_vma 14 46.7% exit_notify 13 61.9% sys_close 13 27.1% anon_vma_prepare 13 0.0% unlink_file_vma 13 92.9% do_generic_mapping_read 12 109.1% free_pgd_range 12 0.0% vm_stat_account 12 100.0% sys_mmap2 10 17.5% get_empty_filp . . -10 -21.7% _spin_unlock_irq -10 -90.9% flush_old_exec -10 -25.6% vfs_read -10 -4.6% __handle_mm_fault -12 -17.1% do_shmem_file_read -12 -60.0% cond_resched -12 -60.0% number -12 -46.2% sys_open -12 -100.0% __vm_stat_account -13 -46.4% inotify_inode_queue_event -13 -33.3% dput -13 -2.8% release_pages -14 -26.9% kmem_cache_free -14 -17.7% pte_alloc_map -14 -3.2% __link_path_walk -16 -100.0% __rmqueue -19 -10.9% sysenter_past_esp -19 -55.9% vfs_getattr -20 -25.0% zone_watermark_ok -21 -17.9% may_open -21 -15.8% strnlen_user -21 -8.8% __pagevec_lru_add_active -23 -100.0% remove_vm_struct -23 -5.1% zap_pte_range -28 -0.6% do_page_fault -33 -15.8% do_anonymous_page -40 -7.1% __d_lookup -44 -5.9% _spin_lock -75 -43.1% page_remove_rmap -79 -98.8% set_page_dirty -81 -23.5% free_hot_cold_page -81 -10.0% __copy_to_user_ll -94 -98.9% page_add_anon_rmap -108 -26.2% do_no_page -111 -72.1% prep_new_page -143 -100.0% bad_range -444 -87.2% __mod_page_state -548 -10.1% buffered_rmqueue -1634 -7.1% total Depending on how much interest there is, I might keep a tree around to collect performance improvements. If you have any more[*] I could look at, please send them over. I'll eventually try to get things merged. * Not just for kbuild, or only mm related, but preferably something that I can easily measure on my little system. Attached is a rollup against 2.6.14-rc3. I don't currently have any webspace handy, so I can't host a broken-out tarball anywhere yet. Sorry for the big attachment (actually most of it is Hugh's pagefault scalability prep and my lockless pagecache prep that I'm working on top of). Nick -- SUSE Labs, Novell Inc.
Attachment:
2.6.14-rc3-kc1.patch.gz
Description: Unix tar archive
- Follow-Ups:
- Re: kernel compiling performance challenge
- From: Nick Piggin <[email protected]>
- Re: kernel compiling performance challenge
- Prev by Date: [kernel 2.6.13-2] kernel upgrade problem
- Next by Date: Re: Block I/O Mystery
- Previous by thread: [kernel 2.6.13-2] kernel upgrade problem
- Next by thread: Re: kernel compiling performance challenge
- Index(es):