Re: Add pacct_struct to fix some pacct bugs.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The seriese of patches fixes some process accounting bugs.

[PATCH 1/3] two phase process accounting
[PATCH 2/3] avoidance to refer the last thread as a representation
            of the process
[PATCH 3/3] none-delayed process accounting accumulation

* background of the patch.1

    The pacct facility need an i/o operation when an accounting record
    is generated. There is a possibility to wake OOM killer up.
    If OOM killer is activated, it kills some processes to make them
    release process memory regions.
    But acct_process() is called in the killed processes context
    before calling exit_mm(), so those processes cannot release
    own memory. In the results, any processes stop in this point and
    it finally cause a system stall.

    ---- in kernel/exit.c : do_exit() ------------
            group_dead = atomic_dec_and_test(&tsk->signal->live);
            if (group_dead) {
                    hrtimer_cancel(&tsk->signal->real_timer);
                    exit_itimers(tsk->signal);
                    acct_process(code);
            }
               :
           - snip -
               :
            exit_mm(tsk);
    ----------------------------------------------

    This patch separates generating an accounting record facility
    into two-phase.
    In the first one, acct_collect() calculate vitual memory size
    of the process and stores it into pacct_struct before exit_mm().
    Then, acct_process() generates an accounting record and write
    it into medium.


* background of the patch.2

    When pacct facility generate an 'ac_flag' field in accounting record,
    it refers a task_struct of the thread which died last in the process.
    But any other task_structs are ignored.

    Therefore, pacct facility drops ASU flag even if root-privilege
    operations are used by any other threads except the last one.
    In addition, AFORK flag is always set when the thread of group-leader
    didn't die last, although this process has called execve() after fork().

    We have a same matter in ac_exitcode. The recorded ac_exitcode is
    an exit code of the last thread in the process. There is a possibility
    this exitcode is not the group leader's one.

    ---- in kernel/acct.c : do_acct_process() ----
            ac.ac_flag = 0;
            if (current->flags & PF_FORKNOEXEC)
                    ac.ac_flag |= AFORK;
            if (current->flags & PF_SUPERPRIV)
                    ac.ac_flag |= ASU;
            if (current->flags & PF_DUMPCORE)
                    ac.ac_flag |= ACORE;
            if (current->flags & PF_SIGNALED)
                    ac.ac_flag |= AXSIG;
                      :
                   - snip -
                      :
            ac.ac_exitcode = exitcode;
    ----------------------------------------------

    This patch fixes those matters.
    - The exit code of group leader is recorded as ac_exitcode.
    - ASU, ACORE, AXSIG flag are marked if any task_struct satisfy
      the conditions.
    - AFORK flag is marked if only group leader thread satisfy
      the condition.


* background of the patch.3

    In current 2.6.17 implementation, signal_struct refered from task_struct
    is used for per-process data structure. The pacct facility also uses it
    as a per-process data structure to store stime, utime, minflt, majflt.
    But those members are saved in __exit_signal(). It's too late.

    For example, if some threads exits at same time, pacct facility has
    a possibility to drop accountings for a part of those threads.
    (see, the following 'The results of original 2.6.17 kernel')
    I think accounting information should be completely collected into
    the per-process data structure before writing out an accounting record.

    This patch fixes this matter. Accumulation of stime, utime, minflt
    and majflt are done before generating accounting record.


* accounting results

[in original 2.6.17 cases]

# accton acct.log
# time -p ./bugacct
real 10.07
user 5.96
sys 0.10
# time -p ./raceacct 4
real 6.92
user 27.22
sys 0.00
# time -p ./raceacct 4
real 7.71
user 30.14
sys 0.00
# time -p ./raceacct 4
real 6.94
user 27.21
sys 0.00
# time -p ./raceacct 4
real 6.25
user 24.42
sys 0.00
# time -p ./raceacct 4
real 6.92
user 27.22
sys 0.00

-- accounting results --------
FLAG    BTIME  ETIME  UTIME  STIME     MEM  MINFLT MAJFLT      COMM
-P-- 13:41:16      5       0     0    3072     110      0    accton
F--- 13:41:35   1006     596     9  143232    8200      0   bugacct *
F--- 13:41:53    692    2032     0   28528      38      0  raceacct *
---- 13:42:10    771    3014     0   28528     170      0  raceacct
F--- 13:42:19    694    2027     0   28528       8      0  raceacct *
F--- 13:42:26    625    1832     0   28528      40      0  raceacct *
---- 13:45:40    692    2722     0   28528     171      0  raceacct

'P' means this process used root privilege operations.
'F' means this process didn't execve() after fork().

=> bugacct used root privilege operation, but pacct facility droped it.
=> In raceacct, some threads exit on same time. pacct facility often drops
   a part of utime, stime, minflt and majflt.
=> When group leader thread didn't die last in raceacct, incorrent flag 'F'
   is set.


[in patched 2.6.17-kg cases]

# touch acct.log
# accton acct.log
# time -p ./bugacct
real 10.07
user 5.97
sys 0.09
# time -p ./raceacct 4
real 7.11
user 27.76
sys 0.00
# time -p ./raceacct 4
real 6.93
user 27.18
sys 0.00
# time -p ./raceacct 4
real 7.11
user 27.76
sys 0.00
# time -p ./raceacct 4
real 7.12
user 27.77
sys 0.00
# time -p ./raceacct 4
real 6.92
user 27.17
sys 0.00

-- accounting results --------
FLAG    BTIME  ETIME  UTIME  STIME     MEM  MINFLT MAJFLT      COMM
-P-- 13:24:01      0      0      0    3072     111      0    accton
-P-- 13:24:05   1007    597      8  143232    8360      0   bugacct
---- 13:24:35    711   2776      0   28528     171      0  raceacct
---- 13:24:44    693   2718      0   28528     172      0  raceacct
---- 13:24:51    711   2776      0   28528     172      0  raceacct
---- 13:25:05    712   2777      0   28528     174      0  raceacct
---- 13:25:14    692   2717      0   28528     171      0  raceacct

I hope your any comments. Thanks,

KaiGai Kohei wrote:
>>> Hi, I noticed three problems in pacct facility.
>>>
>>> 1. Pacct facility has a possibility to write incorrect ac_flag
>>>    in multi-threading cases.
>>> 2. There is a possibility to be waken up OOM Killer from
>>>    pacct facility. It will cause system stall.
>>> 3. If several threads are killed at same time, There is
>>>    a possibility not to pick up a part of those accountings.
>>>
>>> The attached patch will resolve those matters.
>>> Any comments please. Thanks,
>>
>> Thanks, but you have three quite distinct bugs here, and three quite
>> distinct descriptions and, I think, three quite distinct fixes.
>>
>> Would it be possible for you to prepare three patches?
>
> It may be possible. Please wait for a while to separate it into
> three-part and to confirm its correct behavior.
>
> Thanks,
--
Open Source Software Promotion Center, NEC
KaiGai Kohei <[email protected]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux