diff --git a/internals/compiler.rst b/internals/compiler.rst index a820ef260..7312fdcbd 100644 --- a/internals/compiler.rst +++ b/internals/compiler.rst @@ -14,8 +14,9 @@ In CPython, the compilation from source code to bytecode involves several steps: 1. Tokenize the source code (:cpy-file:`Parser/tokenizer.c`) 2. Parse the stream of tokens into an Abstract Syntax Tree (:cpy-file:`Parser/parser.c`) -3. Transform AST into a Control Flow Graph (:cpy-file:`Python/compile.c`) -4. Emit bytecode based on the Control Flow Graph (:cpy-file:`Python/compile.c`) +3. Transform AST into an instruction sequence (:cpy-file:`Python/compile.c`) +4. Construct a Control Flow Graph and apply optimizations to it (:cpy-file:`Python/flowgraph.c`) +5. Emit bytecode based on the Control Flow Graph (:cpy-file:`Python/assemble.c`) The purpose of this document is to outline how these steps of the process work. @@ -433,18 +434,6 @@ the variable. As for handling the line number on which a statement is defined, this is handled by ``compiler_visit_stmt()`` and thus is not a worry. -In addition to emitting bytecode based on the AST node, handling the -creation of basic blocks must be done. Below are the macros and -functions used for managing basic blocks: - -``NEXT_BLOCK(struct compiler *)`` - create an implicit jump from the current block - to the new block -``compiler_new_block(struct compiler *)`` - create a block but don't use it (used for generating jumps) -``compiler_use_next_block(struct compiler *, basicblock *block)`` - set a previously created block as a current block - Once the CFG is created, it must be flattened and then final emission of bytecode occurs. Flattening is handled using a post-order depth-first search. Once flattened, jump offsets are backpatched based on the @@ -460,15 +449,13 @@ not as simple as just suddenly introducing new bytecode in the AST -> bytecode step of the compiler. Several pieces of code throughout Python depend on having correct information about what bytecode exists. -First, you must choose a name and a unique identifier number. The official -list of bytecode can be found in :cpy-file:`Lib/opcode.py`. If the opcode is to -take an argument, it must be given a unique number greater than that assigned to -``HAVE_ARGUMENT`` (as found in :cpy-file:`Lib/opcode.py`). - -Once the name/number pair has been chosen and entered in :cpy-file:`Lib/opcode.py`, -you must also enter it into :cpy-file:`Doc/library/dis.rst`, and regenerate -:cpy-file:`Include/opcode.h` and :cpy-file:`Python/opcode_targets.h` by running -``make regen-opcode regen-opcode-targets``. +First, you must choose a name, implement the bytecode in +:cpy-file:`Python/bytecodes.c`, and add a documentation entry in +:cpy-file:`Doc/library/dis.rst`. Then run ``make regen-cases`` to +assign a number for it (see :cpy-file:`Include/opcode_ids.h`) and +regenerate a number of files with the actual implementation of the +bytecodes (:cpy-file:`Python/generated_cases.c.h`) and additional +files with metadata about them. With a new bytecode you must also change what is called the magic number for .pyc files. The variable ``MAGIC_NUMBER`` in @@ -478,23 +465,21 @@ to be recompiled by the interpreter on import. Whenever ``MAGIC_NUMBER`` is changed, the ranges in the ``magic_values`` array in :cpy-file:`PC/launcher.c` must also be updated. Changes to :cpy-file:`Lib/importlib/_bootstrap_external.py` will take effect only after running ``make regen-importlib``. Running this -command before adding the new bytecode target to :cpy-file:`Python/ceval.c` will -result in an error. You should only run ``make regen-importlib`` after the new -bytecode target has been added. +command before adding the new bytecode target to :cpy-file:`Python/bytecodes.c` +(followed by ``make regen-cases``) will result in an error. You should only run +``make regen-importlib`` after the new bytecode target has been added. .. note:: On Windows, running the ``./build.bat`` script will automatically regenerate the required files without requiring additional arguments. Finally, you need to introduce the use of the new bytecode. Altering -:cpy-file:`Python/compile.c` and :cpy-file:`Python/ceval.c` will be the primary -places to change. You must add the case for a new opcode into the 'switch' -statement in the ``stack_effect()`` function in :cpy-file:`Python/compile.c`. -If the new opcode has a jump target, you will need to update macros and -'switch' statements in :cpy-file:`Python/compile.c`. If it affects a control -flow or the block stack, you may have to update the ``frame_setlineno()`` -function in :cpy-file:`Objects/frameobject.c`. :cpy-file:`Lib/dis.py` may need -an update if the new opcode interprets its argument in a special way (like -``FORMAT_VALUE`` or ``MAKE_FUNCTION``). +:cpy-file:`Python/compile.c`, :cpy-file:`Python/bytecodes.c` will be the +primary places to change. Optimizations in :cpy-file:`Python/flowgraph.c` +may also need to be updated. +If the new opcode affects a control flow or the block stack, you may have +to update the ``frame_setlineno()`` function in :cpy-file:`Objects/frameobject.c`. +:cpy-file:`Lib/dis.py` may need an update if the new opcode interprets its +argument in a special way (like ``FORMAT_VALUE`` or ``MAKE_FUNCTION``). If you make a change here that can affect the output of bytecode that is already in existence and you do not change the magic number constantly, make