• 2009年05月07日

    system call - [札记]

    操作系统提供了一种标准的服务来让程序员实现对底层硬件和服务的控制(比如文件系统),叫做系统调用(system calls)。当一个程序需要作系统调用的时候,它将相关参数放进系统调用相关的寄存器,然后调用软中断0x80,这个中断就像一个让程序得以接触到内核模式的窗口,程序将参数和系统调用号交给内核,内核来完成系统调用的执行。
     
    在i386体系中,系统调用号将放入%eax,它的参数则依次放入%ebx, %ecx, %edx, %esi 和 %edi。 比如,在以下的调用:
        write(2, “Hello”, 5)
    的汇编形式大概是这样的:
        movl $4, %eax
        movl $2, %ebx
        movl $hello, %ecx
        movl $5, %edx
        int $0x80

    这里的$hello指向的是标准字符串”Hello”。
    所有的系统调用号可以在 /usr/include/asm/unistd.h 里面找到。

  • 2009年05月06日

    wstat macro - [札记]

    在进程间通信时,经常使用wait(int* status)或waitpid(pid_t pid, int* status, int options)来等待子进程中断或结束。其中status这个参数标识子进程的结束状态。我们可以通过wstat宏来对其进行判断。

    #include <sys/wait.h>
    /* int status */
    WIFEXITED(status)   /* 如果子进程正常结束则为非0值 */
    WEXITSTATUS(status)   /* 取得子进程由exit()返回的结束代码,一般会先用WIFEXITED来判断是否正常结束才能使用这个宏 */
    WIFSINGNALED(status)   /* 如果子进程是因为信号而结束,则这个宏值为真 */
    WTERMSIG(status)   /* 取得子进程因信号而中止的信号代码,一般会先用WIFSINGNALED来判断后才使用这个宏 */
    WIFSTOPPED(status)   /* 如果子进程处于暂停执行情况,则这个宏值为真。一般只有使用WUNTRACED时才会有这种情况 */
    WSTOPSIG(status)   /* 取得引发子进程暂停的信号代码,一般会先用WIFSTOPPED来判断后才使用这个宏 */
    WIFCONTINUED(status)   /* 如果状态是表示子进程继续执行则返回非0 */
    WCOREDUMP(status)   /* 如果已经生成了一个核心(core)转储文件,则返回真  */

  • 2009年05月05日

    ptrace - process trace - [札记]

    ptrace(2) - Linux man page

    Name
    ptrace - process trace

    Synopsis
    #include <sys/ptrace.h>
    long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);

    Description
    The ptrace() system call provides a means by which a parent process may observe and control the execution of another process, and examine and change its core image and registers. It is primarily used to implement breakpoint debugging and system call tracing.

    The parent can initiate a trace by calling fork(2) and having the resulting child do a PTRACE_TRACEME, followed (typically) by an exec(3). Alternatively, the parent may commence trace of an existing process using PTRACE_ATTACH.

    While being traced, the child will stop each time a signal is delivered, even if the signal is being ignored. (The exception is SIGKILL, which has its usual effect.) The parent will be notified at its next wait(2) and may inspect and modify the child process while it is stopped. The parent then causes the child to continue, optionally ignoring the delivered signal (or even delivering a different signal instead).

    When the parent is finished tracing, it can terminate the child with PTRACE_KILL or cause it to continue executing in a normal, untraced mode via PTRACE_DETACH.

    The value of request determines the action to be performed:

    PTRACE_TRACEME
        Indicates that this process is to be traced by its parent. Any signal (except SIGKILL) delivered to this process will cause it to stop and its parent to be notified via wait(). Also, all subsequent calls to exec() by this process will cause a SIGTRAP to be sent to it, giving the parent a chance to gain control before the new program begins execution. A process probably shouldn't make this request if its parent isn't expecting to trace it. (pid, addr, and data are ignored.)

    The above request is used only by the child process; the rest are used only by the parent. In the following requests, pid specifies the child process to be acted on. For requests other than PTRACE_KILL, the child process must be stopped.

    PTRACE_PEEKTEXT, PTRACE_PEEKDATA
        Reads a word at the location addr in the child's memory, returning the word as the result of the ptrace() call. Linux does not have separate text and data address spaces, so the two requests are currently equivalent. (The argument data is ignored.)
    PTRACE_PEEKUSR
        Reads a word at offset addr in the child's USER area, which holds the registers and other information about the process (see <linux/user.h> and <sys/user.h>). The word is returned as the result of the ptrace() call. Typically the offset must be word-aligned, though this might vary by architecture. (data is ignored.)
    PTRACE_POKETEXT, PTRACE_POKEDATA
        Copies the word data to location addr in the child's memory. As above, the two requests are currently equivalent.
    PTRACE_POKEUSR
        Copies the word data to offset addr in the child's USER area. As above, the offset must typically be word-aligned. In order to maintain the integrity of the kernel, some modifications to the USER area are disallowed.
    PTRACE_GETREGS, PTRACE_GETFPREGS
        Copies the child's general purpose or floating-point registers, respectively, to location data in the parent. See <linux/user.h> for information on the format of this data. (addr is ignored.)
    PTRACE_GETSIGINFO (since Linux 2.3.99-pre6)
        Retrieve information about the signal that caused the stop. Copies a siginfo_t structure (see sigaction(2)) from the child to location data in the parent. (addr is ignored.)
    PTRACE_SETREGS, PTRACE_SETFPREGS
        Copies the child's general purpose or floating-point registers, respectively, from location data in the parent. As for PTRACE_POKEUSER, some general purpose register modifications may be disallowed. (addr is ignored.)
    PTRACE_SETSIGINFO (since Linux 2.3.99-pre6)
        Set signal information. Copies a siginfo_t structure from location data in the parent to the child. This will only affect signals that would normally be delivered to the child and were caught by the tracer. It may be difficult to tell these normal signals from synthetic signals generated by ptrace() itself. (addr is ignored.)
    PTRACE_SETOPTIONS (since Linux 2.4.6; see BUGS for caveats)
        Sets ptrace options from data in the parent. (addr is ignored.) data is interpreted as a bitmask of options, which are specified by the following flags:
    PTRACE_O_TRACESYSGOOD (since Linux 2.4.6)
        When delivering syscall traps, set bit 7 in the signal number (i.e., deliver (SIGTRAP | 0x80) This makes it easy for the tracer to tell the difference between normal traps and those caused by a syscall. (PTRACE_O_TRACESYSGOOD may not work on all architectures.)
    PTRACE_O_TRACEFORK (since Linux 2.5.46)
        Stop the child at the next fork() call with SIGTRAP | PTRACE_EVENT_FORK << 8 and automatically start tracing the newly forked process, which will start with a SIGSTOP. The PID for the new process can be retrieved with PTRACE_GETEVENTMSG.
    PTRACE_O_TRACEVFORK (since Linux 2.5.46)
        Stop the child at the next vfork() call with SIGTRAP | PTRACE_EVENT_VFORK << 8 and automatically start tracing the newly vforked process, which will start with a SIGSTOP. The PID for the new process can be retrieved with PTRACE_GETEVENTMSG.
    PTRACE_O_TRACECLONE (since Linux 2.5.46)
        Stop the child at the next clone() call with SIGTRAP | PTRACE_EVENT_CLONE << 8 and automatically start tracing the newly cloned process, which will start with a SIGSTOP. The PID for the new process can be retrieved with PTRACE_GETEVENTMSG. This option may not catch clone() calls in all cases. If the child calls clone() with the CLONE_VFORK flag, PTRACE_EVENT_VFORK will be delivered instead if PTRACE_O_TRACEVFORK is set; otherwise if the child calls clone() with the exit signal set to SIGCHLD, PTRACE_EVENT_FORK will be delivered if PTRACE_O_TRACEFORK is set.
    PTRACE_O_TRACEEXEC (since Linux 2.5.46)
        Stop the child at the next exec() call with SIGTRAP | PTRACE_EVENT_EXEC << 8.
    PTRACE_O_TRACEVFORKDONE (since Linux 2.5.60)
        Stop the child at the completion of the next vfork() call with SIGTRAP | PTRACE_EVENT_VFORK_DONE << 8.
    PTRACE_O_TRACEEXIT (since Linux 2.5.60)
        Stop the child at exit with SIGTRAP | PTRACE_EVENT_EXIT << 8. The child's exit status can be retrieved with PTRACE_GETEVENTMSG. This stop will be done early during process exit when registers are still available, allowing the tracer to see where the exit occurred, whereas the normal exit notification is done after the process is finished exiting. Even though context is available, the tracer cannot prevent the exit from happening at this point.
    PTRACE_GETEVENTMSG (since Linux 2.5.46)
        Retrieve a message (as an unsigned long) about the ptrace event that just happened, placing it in the location data in the parent. For PTRACE_EVENT_EXIT this is the child's exit status. For PTRACE_EVENT_FORK, PTRACE_EVENT_VFORK and PTRACE_EVENT_CLONE this is the PID of the new process. (addr is ignored.)
    PTRACE_CONT
        Restarts the stopped child process. If data is non-zero and not SIGSTOP, it is interpreted as a signal to be delivered to the child; otherwise, no signal is delivered. Thus, for example, the parent can control whether a signal sent to the child is delivered or not. (addr is ignored.)
    PTRACE_SYSCALL, PTRACE_SINGLESTEP
        Restarts the stopped child as for PTRACE_CONT, but arranges for the child to be stopped at the next entry to or exit from a system call, or after execution of a single instruction, respectively. (The child will also, as usual, be stopped upon receipt of a signal.) From the parent's perspective, the child will appear to have been stopped by receipt of a SIGTRAP. So, for PTRACE_SYSCALL, for example, the idea is to inspect the arguments to the system call at the first stop, then do another PTRACE_SYSCALL and inspect the return value of the system call at the second stop. (addr is ignored.)
    PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP (since Linux 2.6.14)
        For PTRACE_SYSEMU, continue and stop on entry to the next syscall, which will not be executed. For PTRACE_SYSEMU_SINGLESTEP, do the same but also singlestep if not a syscall. This call is used by programs like User Mode Linux that want to emulate all the the child's syscalls. (addr and data are ignored; not supported on all architectures.)
    PTRACE_KILL
        Sends the child a SIGKILL to terminate it. (addr and data are ignored.)
    PTRACE_ATTACH
        Attaches to the process specified in pid, making it a traced "child" of the current process; the behavior of the child is as if it had done a PTRACE_TRACEME. The current process actually becomes the parent of the child process for most purposes (e.g., it will receive notification of child events and appears in ps(1) output as the child's parent), but a getppid(2) by the child will still return the PID of the original parent. The child is sent a SIGSTOP, but will not necessarily have stopped by the completion of this call; use wait() to wait for the child to stop. (addr and data are ignored.)
    PTRACE_DETACH
        Restarts the stopped child as for PTRACE_CONT, but first detaches from the process, undoing the reparenting effect of PTRACE_ATTACH, and the effects of PTRACE_TRACEME. Although perhaps not intended, under Linux a traced child can be detached in this way regardless of which method was used to initiate tracing. (addr is ignored.)

    Notes
    Although arguments to ptrace() are interpreted according to the prototype given, GNU libc currently declares ptrace() as a variadic function with only the request argument fixed. This means that unneeded trailing arguments may be omitted, though doing so makes use of undocumented gcc(1) behavior.

    init(8), the process with PID 1, may not be traced.

    The layout of the contents of memory and the USER area are quite OS- and architecture-specific.

    The size of a "word" is determined by the OS variant (e.g., for 32-bit Linux it's 32 bits, etc.).

    Tracing causes a few subtle differences in the semantics of traced processes. For example, if a process is attached to with PTRACE_ATTACH, its original parent can no longer receive notification via wait() when it stops, and there is no way for the new parent to effectively simulate this notification.

    This page documents the way the ptrace() call works currently in Linux. Its behavior differs noticeably on other flavors of Unix. In any case, use of ptrace() is highly OS- and architecture-specific.

    The SunOS man page describes ptrace() as "unique and arcane", which it is. The proc-based debugging interface present in Solaris 2 implements a superset of ptrace() functionality in a more powerful and uniform way.
    Return Value
    On success, PTRACE_PEEK* requests return the requested data, while other requests return zero. On error, all requests return -1, and errno is set appropriately. Since the value returned by a successful PTRACE_PEEK* request may be -1, the caller must check errno after such requests to determine whether or not an error occurred.
    Bugs
    On hosts with 2.6 kernel headers, PTRACE_SETOPTIONS is declared with a different value than the one for 2.4. This leads to applications compiled with such headers failing when run on 2.4 kernels. This can be worked around by redefining PTRACE_SETOPTIONS to PTRACE_OLDSETOPTIONS, if that is defined.
    Errors

    EBUSY
        (i386 only) There was an error with allocating or freeing a debug register.
    EFAULT
        There was an attempt to read from or write to an invalid area in the parent's or child's memory, probably because the area wasn't mapped or accessible. Unfortunately, under Linux, different variations of this fault will return EIO or EFAULT more or less arbitrarily.
    EINVAL
        An attempt was made to set an invalid option.
    EIO
        request is invalid, or an attempt was made to read from or write to an invalid area in the parent's or child's memory, or there was a word-alignment violation, or an invalid signal was specified during a restart request.
    EPERM
        The specified process cannot be traced. This could be because the parent has insufficient privileges (the required capability is CAP_SYS_PTRACE); non-root processes cannot trace processes that they cannot send signals to or those running set-user-ID/set-group-ID programs, for obvious reasons. Alternatively, the process may already be being traced, or be init (PID 1).
    ESRCH
        The specified process does not exist, or is not currently being traced by the caller, or is not stopped (for requests that require that).

    Conforming to
    SVr4, 4.3BSD

    See Also
    gdb(1), strace(1), execve(2), fork(2), signal(2), wait(2), exec(3), capabilities(7)

    Referenced By
    clone(2), credentials(7), gstack(1), libunwind-ptrace(3), ltrace(1), polkit-auth(1), pstack(1), scanmem(1), syscalls(2)

  • 今天试着用setrlimit()对进程进行资源限制操作,有一些东西怕忘了以记之:

    用RLIMIT_OFILE/RLIMIT_NOFILE可以限制打开的文件数。
    调用方法:setrlimit(RLIMIT_OFILE, &res);
    其中:struct rlimit res = {(rlim_t)0, (rlim_t)0};
    我试了一下,无论是0、1、2、3都可以实现让对应进程不能打开新的文件。“0”可以理解,为什么1~3也可以呢?于是我猜想是stdin, stdout, stderr对应这三个文件。但是当我设置为“0”,为什么还能用这三个“std*”呢。。
    不解之……

    用RLIMIT_NPROC可以限制新产生的进程数。《The GNU C Library Reference Manual》的解释是:“The maximum number of processes that can be created with the same user ID.
    If you have reached the limit for your user ID, fork will fail with EAGAIN.

    调用方法:setrlimit(RLIMIT_NPROC, &res);
    其中:struct rlimit res = {(rlim_t)x, (rlim_t)x};
    这里,我没有写x的值,因为该值到底有什么意义我现在也搞不懂。在我的电脑上测试,x取[0, 86]时fork()得到的均是-1;之后正常。
    不解之……(ps.sysconf(_SC_CHILD_MAX)的值在我的电脑上是8189)

    顺便给一个setrlimit()/getrlimit()/getrusage()介绍较详细的地址:http://www.bsdlover.cn/html/07/n-207.html