Python 程序执行的过程

在工作和生活中常常写 Python 程序，今天好不容易有空可以看看 Python VM 执行的过程。

python main.py

本文还是以 CPython 为主，因为它是最早、最受欢迎的解释器实现。

三个阶段

Python 程序的执行可以分为三个阶段：

初始化
编译期
解释执行

CPython 为了运行 Python，要在初始化阶段准备好相应的数据结构，比如内建类型、内建模块、导包系统(import system)等，虽然很容易被忽视，但这是非常重要的阶段。

初始化完成后进入编译阶段。CPython 作为一款解释器是没有生成字节码能力的，而依赖解释器的语言在运行之前需要先将源码翻译成中间码，通常是从源码构建出 AST，再从 AST 生成字节码，顺便执行一些字节码的优化。

我们可以通过一个简短的例子看看这个过程做的事：

# test.py
def add_one(x):
    return x + 1

这是一个很简单的函数，我们用 dis 反汇编它：

Disassembly of <code object add_one at 0x1034dd450, file "test.py", line 4>:
  5           0 LOAD_FAST                0 (x)
              2 LOAD_CONST               1 (1)
              4 BINARY_ADD
              6 RETURN_VALUE

LOAD_FAST 指令表示加载一个本地变量，它的 opcode 为 124，LOAD_CONST 则是加载一个常量，它的 opcode 为 100；BINARY_ADD 和 RETURN_VALUE 的 opcode 分别是 23 和 83，通过 dis 可以看出这两个指令不需要额外的参数。

完整的 opcode 表示可以见 Python 每个版本对应的 opcode.py 文件。

我们还可以从它的字节码确认 opcode：

python3 -m compileall test.py

得到：

xxd ./__pycache__/test.cpython-39.pyc
00000000: 610d 0d0a 0000 0000 f3d5 9f60 4900 0000  a..........`I...
00000010: e300 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0002 0000 0040 0000 0073 0c00 0000 6400  .....@...s....d.
00000030: 6401 8400 5a00 6402 5300 2903 6301 0000  d...Z.d.S.).c...
00000040: 0000 0000 0000 0000 0001 0000 0002 0000  ................
00000050: 0043 0000 0073 0800 0000 7c00 6401 1700  .C...s....|.d...
00000060: 5300 2902 4ee9 0100 0000 a900 2901 da01  S.).N.......)...
00000070: 7872 0200 0000 7202 0000 00fa 0774 6573  xr....r......tes
00000080: 742e 7079 da07 6164 645f 6f6e 6504 0000  t.py..add_one...
00000090: 0073 0200 0000 0001 7205 0000 004e 2901  .s......r....N).
000000a0: 7205 0000 0072 0200 0000 7202 0000 0072  r....r....r....r
000000b0: 0200 0000 7204 0000 00da 083c 6d6f 6475  ....r......<modu
000000c0: 6c65 3e04 0000 00f3 0000 0000            le>.........

7c00 - LOAD_FAST
6401 - LOAD_CONST
1700 - BINARY_ADD
5300 - RETURN_VALUE

或者直接：

print(list(add_one.__code__.co_code))
# Output: [124, 0, 100, 1, 23, 0, 83, 0]

从源码得到机器码，这就是编译阶段要做的事。

CPython 主要目的就是执行机器码，你可能已经知道了，CPython 是基于栈的 VM，使用栈来存储、检索数据，以上面的例子来说，LOAD_FAST 指令 push 一个变量到栈中，LOAD_CONST 继续 push 一个常量，BINARY_ADD pop 出两个变量并执行加法操作，然后将结果 push 回栈中，最后 RETURN_VALUE 从栈中 pop 出数据返回给它的调用者。

字节码在一个巨大的循环中不断执行，直到结束或发生错误。

更近一点

Code Object

我们已经知道了字节码执行的流程，接下来我们继续看看执行的细节，比如在之前的上文中，CPython 怎么知道 x 是一个本地变量？

实际上 CPython 会将一组可执行的代码封装成 Code Object：

struct PyCodeObject {
    PyObject_HEAD
    int co_argcount;            /* #arguments, except *args */
    int co_posonlyargcount;     /* #positional only arguments */
    int co_kwonlyargcount;      /* #keyword only arguments */
    int co_nlocals;             /* #local variables */
    int co_stacksize;           /* #entries needed for evaluation stack */
    int co_flags;               /* CO_..., see below */
    int co_firstlineno;         /* first source line number */
    PyObject *co_code;          /* instruction opcodes */
    PyObject *co_consts;        /* list (constants used) */
    PyObject *co_names;         /* list of strings (names used) */
    PyObject *co_varnames;      /* tuple of strings (local variable names) */
    PyObject *co_freevars;      /* tuple of strings (free variable names) */
    PyObject *co_cellvars;      /* tuple of strings (cell variable names) */

    Py_ssize_t *co_cell2arg;    /* Maps cell vars which are arguments. */
    PyObject *co_filename;      /* unicode (where it was loaded from) */
    PyObject *co_name;          /* unicode (name, for reference) */
        /* ... more members ... */
};

这个数据结构包含了完整的调用参数和可执行的字节码，不论是运行一个模块还是调用一个方法都有相同的 Code Object 抽象。

Function Object

除了 Code Object，Python 的方法还可以有默认参数和文档注释：

def add_one(x):
    """
    加一
    """
    return x + 1

这些额外信息加上一个 Code Object 共同组成了一个 Functioin Object：

typedef struct {
    PyObject_HEAD
    PyObject *func_code;        /* A code object, the __code__ attribute */
    PyObject *func_globals;     /* A dictionary (other mappings won't do) */
    PyObject *func_defaults;    /* NULL or a tuple */
    PyObject *func_kwdefaults;  /* NULL or a dict */
    PyObject *func_closure;     /* NULL or a tuple of cell objects */
    PyObject *func_doc;         /* The __doc__ attribute, can be anything */
    PyObject *func_name;        /* The __name__ attribute, a string object */
    PyObject *func_dict;        /* The __dict__ attribute, a dict or NULL */
    PyObject *func_weakreflist; /* List of weak references */
    PyObject *func_module;      /* The __module__ attribute, can be anything */
    PyObject *func_annotations; /* Annotations, a dict or NULL */
    PyObject *func_qualname;    /* The qualified name */
    vectorcallfunc vectorcall;
} PyFunctionObject;

这个工作由 MAKE_FUNCTION 完成：

  4           0 LOAD_CONST               0 (<code object add_one at 0x103411450, file "test.py", line 4>)
              2 LOAD_CONST               1 ('add_one')
              4 MAKE_FUNCTION            0
              6 STORE_NAME               0 (add_one)

Frame Object

VM 执行 Code Object 时，它需要在栈中不断跟踪值的变化，它还需要记住 Code Object 返回的位置，以便当前代码执行完后继续执行之前的代码，这些状态将被记录在一个 Frame Object 中：

struct _frame {
    PyObject_VAR_HEAD
    struct _frame *f_back;      /* previous frame, or NULL */
    PyCodeObject *f_code;       /* code segment */
    PyObject *f_builtins;       /* builtin symbol table (PyDictObject) */
    PyObject *f_globals;        /* global symbol table (PyDictObject) */
    PyObject *f_locals;         /* local symbol table (any mapping) */
    PyObject **f_valuestack;    /* points after the last local */

    PyObject **f_stacktop;          /* Next free slot in f_valuestack.  ... */
    PyObject *f_trace;          /* Trace function */
    char f_trace_lines;         /* Emit per-line trace events? */
    char f_trace_opcodes;       /* Emit per-opcode trace events? */

    /* Borrowed reference to a generator, or NULL */
    PyObject *f_gen;

    int f_lasti;                /* Last instruction if called */
    /* ... */
    int f_lineno;               /* Current line number */
    int f_iblock;               /* index in f_blockstack */
    char f_executing;           /* whether the frame is still executing */
    PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
    PyObject *f_localsplus[1];  /* locals+stack, dynamically sized */
};

CPython 会在每次执行另一个 Code Object 时创建一个新的 Frame Object，并且用一个 f_back 指针指向之前的 Frame，这种结构组成了一个 Frame 栈 - 也可以叫调用栈，栈顶是当前正在执行的方法，当方法执行完出栈时，CPython 继续执行之前 Frame 中未执行完的代码。

Thread

线程的状态包含调用栈线程数据、异常状态和调试设置。可以通过 threading 模块开启线程：

from threading import Thread

def f():
    print("f")

t = Thread(target=f)
t.start()
t.join()

当执行 t.start() 时会通过 pthread_create 创建一个新的 OS 线程（在 Windows 上则是通过 _beginthreadex 创建），并在新创建的线程里执行 target。

实际上由于 GIL 的存在，CPython 同一时间只能执行一个线程，具体可参见CPython 中的超级大锁。

如果一定想在 CPython 中执行多线程，只能考虑使用更高级的数据结构，如解释器级别。

Interpreter

解释器拥有一组线程，它们共享模块、内建类型等初始化阶段创建的数据结构。

Runtime

Runtime 维护进程内的全局状态，比如是否已被初始化，另外还有 GIL。Interpreter 与 Runtime 的边界并不太明显，一般来说同一个进程中的线程都在同一个解释器里执行，但也有例外情况，mod_wsgi 就是一个特例，它为每一个 WSGI 应用创建了一个子解释器，这导致虽然处于同一个进程中，但对子解释器中的全局数据进行修改时，对其他 WSGI 应用中的解释器是不可见的。

总结

从架构上看 CPython 采用了分层设计：

Runtime - 表示一个进程全局的状态，包括 GIL、内存管理等组件
Interpreter - 包含一组线程，共享上下文数据，如导入的模块
Thread - 对应 OS 线程，包含一个调用栈
Frame - 调用栈中的元素，一个 Frame 包含一个 Code Object 及执行它的上下文
Evaluation loop - Frame 的执行环境

其中每一层都有对应的数据结构。

同时我们也可以看到 Python 程序在执行时可以分为三个阶段：

初始化 CPython
编译源码为 Code Object
执行 Code Object 的字节码

负责字节码执行的这一部分可以称为 VM，CPython 的 VM 有几个重要的概念：Code Object、Frame Object，还有线程、解释器、运行时等不同的状态管理，这些概念构成了 CPython 架构的核心。