Linux进程状态总结

Demon.Lee 2021年05月30日 2,130次浏览

本文实践环境:
Operating System: Ubuntu 20.04.2 LTS
Kernel: Linux 5.8.0-50-generic
Architecture: x86-64

为了方便后续学习容器中跟进程相关的知识,这里对进程生命周期中相关状态做一个梳理。

进程对于我们来说最熟悉不过了,我们计算机上运行的微信、浏览器、Office软件等,都是一个个进程。当应用程序未启动时,它是一个存储在在磁盘上的普通文件(静态表现),而将其启动后(由其他进程启动),就会产生了一个进程,所以可以将进程理解为一个应用程序运行时的实例(动态表现)。

进程生命周期

在Linux内核里,无论是进程还是线程,统一使用 task_struct{} 结构体来表示,也就是统一抽象为任务(task)。task_struct{} 定义在 include/linux/sched.h 文件中,十分复杂,这里简单了解下。

// include/linux/sched.h
// ... 省略

struct task_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
	/*
	 * For reasons of header soup (see current_thread_info()), this
	 * must be the first element of task_struct.
	 */
	struct thread_info		thread_info;
#endif
	/* -1 unrunnable, 0 runnable, >0 stopped: */
	volatile long			state;
	int				exit_state;
	int				exit_code;
	int				exit_signal;

	/*
	 * This begins the randomizable portion of task_struct. Only
	 * scheduling-critical items should be added above here.
	 */
	randomized_struct_fields_start

	void				*stack;
	refcount_t			usage;
	/* Per task flags (PF_*), defined further below: */
	unsigned int			flags;
	unsigned int			ptrace;

#ifdef CONFIG_SMP
	int				on_cpu;
	struct __call_single_node	wake_entry;
#ifdef CONFIG_THREAD_INFO_IN_TASK
	/* Current CPU: */
	unsigned int			cpu;
#endif
	unsigned int			wakee_flips;
	unsigned long			wakee_flip_decay_ts;
	struct task_struct		*last_wakee;
  // ...省略
  struct sched_info		sched_info;
	struct list_head		tasks;   // 链表,将所有task_struct串起来

	pid_t				pid;    // process id,指的是线程id
	pid_t				tgid;   // thread group ID,指的是进程的主线程id
  struct task_struct		*group_leader;  // 指向的是进程的主线程

	/* Signal handlers: */
	struct signal_struct		*signal;
	struct sighand_struct __rcu		*sighand;
	sigset_t			blocked;
	sigset_t			real_blocked;
	/* Restored if set_restore_sigmask() was used: */
	sigset_t			saved_sigmask;
	struct sigpending		pending;
	unsigned long			sas_ss_sp;
	size_t				sas_ss_size;
	unsigned int			sas_ss_flags;
  
  // ... 省略
}

查阅相关资料后,对Linux中进程的生命周期总结如下:

Linux Process Life Cycle

从图上可以看出,进程的睡眠状态是最多的,那进程一般在何时进入睡眠状态呢?答案是I/O操作时,因为I/O操作的速度与CPU运行速度相比,相差太大,所以此时进程会释放CPU,进入睡眠状态。

FLag State Name Description
R TASK_RUNNING Runnable 在调度队列中等待分配CPU时间片
R TASK_RUNNING Running 获得CPU时间片,正在运行
Z EXIT_ZOMBIE Zombie 进程要结束,先进入该状态,相关资源(如内存等)已释放,但进程描述符,PID等资源还未释放,需要父进程使用wait()等系统调用来回收
X EXIT_DEAD Dead 父进程回收完子进程资源,进程结束。该状态无法通过ps等命令观察到
T TASK_STOPPED Stopped 中止状态,进程收到SIGSTOP/SIGTTIN/SIGTSTP/SIGTTOU等信号之后会进入该状态
t TASK_TRACED Tracing stop 表示进程被debugger等进程监视,进程执行被调试程序所停止。当一个进程被另外的进程所监视,每一个信号都会让进程进入该状态
D TASK_UNINTERRUPTIBLE Disk Sleep 不可中断的睡眠状态,即深度睡眠,忽略任何信号,只能死等I/O操作完成。一旦I/O操作因为特殊原因无法完成,则只能重启机器,所以是一个很危险的状态
K TASK_KILLABLE KILLABLE 基于TASK_UNINTERRUPTIBLE状态的进程无法被唤醒,所以有了TASK_KILLABLE,其原理与TASK_UNINTERRUPTIBLE相似,但它可以响应致命信号
S TASK_INTERRUPTIBLE Interruptible Sleep 可中断睡眠状态,即浅睡眠。虽然进程因等待I/O操作完成而进入睡眠,但此时来了一个信号,进程还是会被唤醒,用户可以按需定制信号处理函数
I TASK_REPORT_IDLE Idle 空闲状态,用在不可中断睡眠的内核线程上。也是D状态的一个子集,因为某些内核线程,它们实际上没有任何负载,所以使用I状态进行区分。在统计平均负载时,D状态的进程需要统计,而I状态不会,与K状态类似,它也响应致命信号

进程状态相关的定义同样在 include/linux/sched.h 文件的开头部分,以 #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE) 为例,TASK_WAKEKILL表示用于在接收到致命信号时唤醒进程,将它与 TASK_UNINTERRUPTIBLE 按位或,就得到了 TASK_KILLABLE。代码注释中提到了 fs/proc/array.c,所以也将其代码贴出,作为补充。

// include/linux/sched.h
// ... 省略

/*
 * Task state bitmask. NOTE! These bits are also
 * encoded in fs/proc/array.c: get_task_state().
 *
 * We have two separate sets of flags: task->state
 * is about runnability, while task->exit_state are
 * about the task exiting. Confusing, but this way
 * modifying one set can't modify the other one by
 * mistake.
 */

/* Used in tsk->state: */
#define TASK_RUNNING			0x0000
#define TASK_INTERRUPTIBLE		0x0001
#define TASK_UNINTERRUPTIBLE		0x0002
#define __TASK_STOPPED			0x0004
#define __TASK_TRACED			0x0008
/* Used in tsk->exit_state: */
#define EXIT_DEAD			0x0010
#define EXIT_ZOMBIE			0x0020
#define EXIT_TRACE			(EXIT_ZOMBIE | EXIT_DEAD)
/* Used in tsk->state again: */
#define TASK_PARKED			0x0040
#define TASK_DEAD			0x0080
#define TASK_WAKEKILL			0x0100
#define TASK_WAKING			0x0200
#define TASK_NOLOAD			0x0400
#define TASK_NEW			0x0800
#define TASK_STATE_MAX			0x1000

/* Convenience macros for the sake of set_current_state: */
#define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
#define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)

#define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)

/* Convenience macros for the sake of wake_up(): */
#define TASK_NORMAL			(TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE)

/* get_task_state(): */
#define TASK_REPORT			(TASK_RUNNING | TASK_INTERRUPTIBLE | \
					 TASK_UNINTERRUPTIBLE | __TASK_STOPPED | \
					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
					 TASK_PARKED)

#define task_is_traced(task)		((task->state & __TASK_TRACED) != 0)

#define task_is_stopped(task)		((task->state & __TASK_STOPPED) != 0)

#define task_is_stopped_or_traced(task)	((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)

// ... 省略
// fs/proc/array.c
// ... 省略

/*
 * The task state array is a strange "bitmap" of
 * reasons to sleep. Thus "running" is zero, and
 * you can test for combinations of others with
 * simple bit tests.
 */
static const char * const task_state_array[] = {

	/* states in TASK_REPORT: */
	"R (running)",		/* 0x00 */
	"S (sleeping)",		/* 0x01 */
	"D (disk sleep)",	/* 0x02 */
	"T (stopped)",		/* 0x04 */
	"t (tracing stop)",	/* 0x08 */
	"X (dead)",		/* 0x10 */
	"Z (zombie)",		/* 0x20 */
	"P (parked)",		/* 0x40 */

	/* states beyond TASK_REPORT: */
	"I (idle)",		/* 0x80 */
};

static inline const char *get_task_state(struct task_struct *tsk)
{
	BUILD_BUG_ON(1 + ilog2(TASK_REPORT_MAX) != ARRAY_SIZE(task_state_array));
	return task_state_array[task_state_index(tsk)];
}

// ... 省略

在单核的CPU上,同一时刻只有一个task会被调度,所以即使看到了 R 状态,也不代表进程就被分配到了CPU时间片。但了解了进程状态之后,我们再通过 top, ps aux 等命令查看进程,分析问题起来效率就更高了:

top - 17:24:07 up 10:20,  1 user,  load average: 0.15, 0.08, 0.02
Tasks: 216 total,   1 running, 215 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.3 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   5945.2 total,   2655.4 free,   1580.8 used,   1709.1 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   4084.6 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                            
   1914 demonlee  20   0 5252100 395940 132816 S   1.0   6.5   1:33.09 gnome-shell                                                                        
    824 root      20   0 2495652  89512  48936 S   0.7   1.5   0:05.59 dockerd                                                                            
   1687 demonlee  20   0 1153156  81436  50128 S   0.3   1.3   0:15.90 Xorg                                                                               
   1957 demonlee  20   0  206556  28348  18504 S   0.3   0.5   0:00.26 ibus-x11                                                                           
   2897 demonlee  20   0  874684  61296  44532 S   0.3   1.0   0:06.22 gnome-terminal-                                                                    
  19984 demonlee  20   0   20632   4036   3376 R   0.3   0.1   0:00.02 top                                                                                
      1 root      20   0  169076  12952   8288 S   0.0   0.2   0:04.61 systemd                                                                            
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd                                                                           
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                             
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                         
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-kblockd                                                               
      9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                       
     10 root      20   0       0      0      0 S   0.0   0.0   0:00.08 ksoftirqd/0                                                                        
     11 root      20   0       0      0      0 I   0.0   0.0   0:03.60 rcu_sched                                                                          
     12 root      rt   0       0      0      0 S   0.0   0.0   0:00.51 migration/0                                                                        
     13 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/0                                                                      
     14 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0                                                                            
     15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/1                                                                            
     16 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/1                                                                      
     17 root      rt   0       0      0      0 S   0.0   0.0   0:00.89 migration/1                                                                        
     18 root      20   0       0      0      0 S   0.0   0.0   0:00.09 ksoftirqd/1                                                                        
     20 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0H                                                                       
     21 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/2                                                                            
     22 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/2                                                                      
     23 root      rt   0       0      0      0 S   0.0   0.0   0:00.83 migration/2                                                                        
     24 root      20   0       0      0      0 S   0.0   0.0   0:00.07 ksoftirqd/2                                                                        
     26 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/2:0H-kblockd                                                               
     27 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/3                                                                            
     28 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/3                                                                      
     29 root      rt   0       0      0      0 S   0.0   0.0   0:00.77 migration/3                                                                        
     30 root      20   0       0      0      0 S   0.0   0.0   0:00.18 ksoftirqd/3                                                                        
     32 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/3:0H-kblockd                                                               
     33 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kdevtmpfs                                                                          
     34 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 netns                                                                              
     35 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_kthre                                                                    
     36 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_rude_                                                                    
     37 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_trace                                                                    
     38 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kauditd                                                                            
     39 root      20   0       0      0      0 S   0.0   0.0   0:00.03 khungtaskd                                                                         
demonlee@demonlee-ubuntu:~$ 

最后,再补充一个知识点:使用ps命令查看进程时,会发现状态上面有其他符号,比如 S+Z+ 等,如下所示,

demonlee@demonlee-ubuntu:~$ 
demonlee    1704  0.0  0.6 557904 37256 ?        Sl   05:01   0:00 /usr/libexec/goa-daemon
demonlee    1707  0.0  0.1 172652  6936 tty2     Ssl+ 05:01   0:00 /usr/lib/gdm3/gdm-x-session --run-script env GNOME_SHELL_SESSION_MODE=ubuntu /usr/bin/g
demonlee    1714  0.0  0.1 323388  9068 ?        Sl   05:01   0:00 /usr/libexec/goa-identity-service
demonlee    1720  0.1  1.3 1151880 80576 tty2    Sl+  05:01   1:05 /usr/lib/xorg/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -background non
demonlee    1723  0.0  0.1 325356  9016 ?        Ssl  05:01   0:02 /usr/libexec/gvfs-afc-volume-monitor
demonlee    1728  0.0  0.1 244336  6532 ?        Ssl  05:01   0:00 /usr/libexec/gvfs-mtp-volume-monitor
demonlee    1759  0.0  0.2 197052 14276 tty2     Sl+  05:01   0:00 /usr/libexec/gnome-session-binary --systemd --systemd --session=ubuntu
...

这个 + 是啥意思呢,其实 man ps 中就有描述,只是我们从来都没认真看说明文档:

PROCESS STATE CODES
       Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a
       process:

               D    uninterruptible sleep (usually IO)
               I    Idle kernel thread
               R    running or runnable (on run queue)
               S    interruptible sleep (waiting for an event to complete)
               T    stopped by job control signal
               t    stopped by debugger during the tracing
               W    paging (not valid since the 2.6.xx kernel)
               X    dead (should never be seen)
               Z    defunct ("zombie") process, terminated but not reaped by its parent

       For BSD formats and when the stat keyword is used, additional characters may be displayed:

               <    high-priority (not nice to other users)
               N    low-priority (nice to other users)
               L    has pages locked into memory (for real-time and custom IO)
               s    is a session leader
               l    is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
               +    is in the foreground process group

参考资料

[1] Beginners Guide to Managing Processes Running on a Linux System

[2] 刘超. 进程数据结构(上):项目多了就需要项目管理系统

[3] 倪朋飞. 系统中出现大量不可中断进程和僵尸进程怎么办?