chapter 3 - Process Management
The Process
- process
- a program in the midst of execution
- include a set of resource such as open files, processor state, memory address space and etc.
- In Linux, it does not differentiate between threads and processes.
- fork()
- create a new process
- A process is created by fork() system call, which creates a new process by duplicating an existing one.
- exec()
- load a new program
- The exec() system call creates a new address space and loads a new program into it.
- exit(), wait()
- terminate a process
- When a process exits by exit() system call, it is placed into a special zombie state that represents terminated processes until parent call wait()
Process Descriptor and the Task Structure
- The kernel stores the list of processes in a circular doubly linked list called the task list.
- Each element in the task list is a process descriptor of the type struct task_struct.
- The process descriptor contains all the information about a specific process.
Allocating the Process Descriptor
- The task_struct structure is allocated via the slab allocator to provide object reuse and cache coloring.
- Each task's thread_info structure is allocated at the end of its stack. The task element of the structure is a pointer to the task's actual task_struct.
Storing the Process Descriptor
- The system identifies processes by a unique process identification value, PID.
- Most kernel code that deals with processes works directly with struct task_struct.
- current macro
Process State
- The state field of the process descriptor describes the current condition of the process.
- TASK_RUNNING
- The process is runnable; it is either currently running or on a runqueue waiting to run.
- TASK_INTERRUPTIBLE
- The process is sleeping, waiting for some condition to exist. When this condition exists, the kernel sets the process's state to TASK_RUNNING.
- TASK_UNINTERRUPTIBLE
- This state is identical to TASK_INTERRUPTIBLE except that it does not wake up and become runnable if it receives a signal.
- This is used in situations where the process must wait without interruption of when the event is expected to occur quite quickly.
Manipulating the Current Process State
set_task_state(task, state);
Process Context
- Normal program execution occurs in user-space.
- When a program executes a system call or triggers an exception, it enters kernel-space. At this point, the kernel is said to be "executing on behalf of the process" and is in process context.
The Process Family Tree
- All processes are descendants of the init process, whose PID is one. The kernel starts init in the last step of the boot process.
- Every process on the system has exactly one parent. Likewise, every process has zero or more children.
- Processes that are all direct children of the same parent are called siblings.
- Each task_struct has a pointer to the parent's task_struct, name parent, and a list of children, named children.
struct task_struct *my_parent = current->parent;
struct task_struct *task;
struct list_head *list;
list_for_each(list, ¤t->children) {
task = list_entry(list, struct task_struct, sibling);
}
Process Creation
- Most OS implement a spawn mechanism to create a new process in a new address space, read in an executable, and begin executing it.
- Unix takes the unusual approach of separating these steps into two distinct functions: fork() and exec().
- fork()
- fork() system call creates a child process that is a copy of the current task.
- It differs from the parent only in its PID, its PPID, and certain resources and statistics, such as pending signals, which are not inherited.
- exec()
- exec() system call loads a new executable into the address space and begins executing it.
Copy-on-Write
- Upon fork(), all resources owned by the parent are duplicated and the copy is given to the child. If the new process were to immediately execute a new image, all that copying would go to waste.
- In Linux, fork() is implemented through the use of copy-on-write pages.
- Copy-on-write(COW)
- Rather than duplicate the process address space, the parent and the child can share a single copy.
- The duplication of resources occurs only when they are written.
- The only overhead incurred by fork() is the duplication of the parent's page tables and the creation of a unique process descriptor for the child.
Forking
- do_fork(), copy_process()
- It calls dup_task_struct(), which creates a new kernel stack, thread_info structure, and task_struct for the new process. The new values are identical to those of the current task. At this point, the child and parent process descriptors are identical.
- It then checks that the new child will not exceed the resource limits on the number of processes for the current user.
- The child needs to differentiate itself from its parent. Various members of the process descriptor are cleared or set to initial value. Members of the process descriptor not inherited are primarily statistically information. The bulk of the values in task_struct remain unchanged.
- The child's state is set to TASK_UNINTERRUPTIBLE to ensure that it does not yet run.
- copy_process() calls copy_flags() to update the flags member of the task_struct. The PF_SUPERPRIV flog, which denotes whether a task used superuser privileges, is cleared. The PF_FORKNOEXEC flag, which denotes a process that has not called exec(), is set.
- It calls alloc_pid() to assign an available PID to the new task.
- Depending on the flags passed to clone(), copy_process() either duplicates or shares open files, filesystem information, signal handlers, process address space, and namespace. These resources are typically shared between threads in a given process; otherwise they are unique and thus copied here.
- Finally, copy_process() cleans up and returns to the caller a pointer to the new child.
- Deliberately, the kernel runs the child process first.
The Linux Implementation of threads
- The Linux provide multiple threads of execution within the same program in shared memory address space.
- The Linux kernel does not provide any special scheduling semantics of data structures to represent threads. Instead, a thread is merely a process that shares certain resources with other processes.
Creating Threads
- normal fork()
clone(SIGCHLD, 0);
Kernel Threads
- It is often useful for the kernel to perform some operations in the background. The kernel accomplishes this via kernel threads--standard processes that exist solely in kernel space.
- The significant difference between kernel threads and normal processes is that kernel threads do not have an address space. They operate only in kernel-space and do not context switch into user-space.
- Kernel thread, however, are schedulable and preemptable, the same as normal processes.
- A kernel thread can be created only by another kernel thread, kthreadd kernel process.
Process Termination
- Regardless of how a process terminates, the bulk of the work is handled by do_exit().
- It sets the PF_EXITING flag in the flags member of the task_struct.
- It calls del_timer_sync() to remove any kernel timers. Upon return, it is guaranteed that no timer is queued and that no timer handler is running.
- If BSD process accounting is enabled, do_exit() calls acct_update_integrals() to write out accounting information.
- It calls exit_mm() to release the mm_struct held by this process. If no other process is using this address space, the kernel then destroys it.
- It calls exit_sem(). If the process is queued waiting for IPC semaphore, it is dequeued here.
- It then calls exit_files() and exit_fs() to decrement the usage count of objects related to file descriptors and filesystem data, respectively. If either usage counts reach zero, the object is no longer in use by any process, and it is destroyed.
- It sets the task's exit code, stored in the exit_code member of the task_struct, to the code provided by exit() or whatever kernel mechanism forced the termination. The exit code is stored here for optional retrieval by the parent.
- It calls exit_notify() to send signals to the task's parent, reparents any of the task's children to another thread in their thread group or the init process, and sets the task's exit state, stored in exit_state in the task_struct structure, to EXIT_ZOMBIE.
- do_exit() calls schedule() to switch to a new process. Because the process is not schedulable, this is the last code the task will ever execute. do_exit() never returns.
- After that, the only memory it occupies is its kernel stack, the thread_info structure, and the task_struct structure.
Removing the Process Descriptor
- After do_exit() complete, the process descriptor for the terminated process still exists, but the process is a zombie and is unable to run.
- After the parent has obtained information on its terminated child, or signified to he kernel that it does not care, the child's task_struct is deallocated.
- wait4(), release_task()
- It calls exit_signal(), which calls unhash_process(), which in turns calls detach_pid() to remove the process from the pidhash and remove the process from the task list.
- __exit_signal() releases any remaining resources used by the now dead process and finalizes statistics and bookkeeping.
- If the task was the last number of a thread group, and the leader is a zombie, then release_task() notifies the zombie leader's parent.
- release_task() calls put_task_struct() to free the pages containing the process's kernel stack and thread_info structure and deallocate the slab cache containing the task_struct.
The Dilemma of the Parentless Task
- If a parent exits before its children, the Linux kernel do reparent any child tasks to a new process.
- The Linux kernel attempts to find another task in the process's thread group. If another task is not in the thread group, it do reparent to the init process.