This project provides enhanced error messages for Python. It compiles to a JavaScript-library that can readily be imported in a web page and used together with, e.g. Skulpt, Pyodide or any other Python-interpreter, really (however, note that its main support is for Python until 3.6 and selected features from 3.7+ – see below). The library only checks a Python program for syntax errors and is primarily aimed at novice programmers and students.
It has grown out of a PhD thesis by Tobias Kohn and is part of the TigerJython programming envrionment. There is also an online version of TigerJython, where this JavaScript-library is currently used.
Let us know if you find the library useful and include it in your own project (jython at tobiaskohn.ch).
In order to make the project a self-contained JavaScript-library, additional resources like translations of the error messages are directly inlined into the code.
You can point to this GitHub repository as an NPM dependency. Add the following line to your "dependencies"
in your package.json
:
"tigerpython-parser": "git+https://github.com/Tobias-Kohn/TigerPython-Parser.git",
To be used in larger projects, the parser can be compiled to a
JavaScript ES module.
The file can be found in release/tigerpython-parser.mjs, along with a standalone
JavaScript file (see below). The accompanying index.html demonstrates how the module could
be loaded and used in a simple HTML. However, make sure your server delivers the module with MIME type
text/javascript
.
Note: when compiling to an ES module, the tests currently fail. It seems node.js is complaining about using export
instead of module.export
in the generated module.
In order to use the parser for error checking in your project, load the respective
JavaScript-file, set your preferences in the TPyParser
object and then check your
code. You can find a simple example in doc/index.html.
TPyParser.rejectDeadCode = true;
TPyParser.setLanguage("en");
var err = TPyParser.checkSyntax(my_code);
if (err !== null) {
var error_line = err.line;
var error_msg = err.msg;
// Display the error...
}
The TPyParser
object provides two methods for checking syntax:
TPyParser.checkSyntax(source: string): ErrorInfo
Takes the entire source code and returns eithernull
or the first error found in the Python program.TPyParser.findAllErrors(source: string): Array[ErrorInfo]
Takes the entire source code and returns a list of all errors found in the Python program.
A third method returns the AST:
TPyParser.parse(source: string): object
Takes the entire source code and returns the AST as (generic) objects. Each object has a fieldkind
that corresponds to the class in Python. While the AST is as close to CPython as possible, there are some minor differences, which stem mainly from the fact that the parser supports several different versions of Python, or provides slightly more information in some cases. Note that this is feature is currently under development and not thoroughly tested, yet.
Available options:
-
evalMode: boolean
Set totrue
when the given code is from an interactive console / shell. The parser would normally reject simple expressions like3 + 4
as not having side effects. However, in the context of a shell, this is perfectly legal and normal. -
newDivision: boolean
(Python 2 only) Python 3 distinguishes between "true" and "integer" division (/
vs//
) whereas Python 2 does not (by default). Set this flag totrue
if you use this "new division" in Python 2. -
pythonVersion: int
An integer value that is either 2 or 3 and indicates the used Python version (default 3). -
rejectDeadCode: bool
If this flag is set totrue
, dead code will be rejected as erroneous (e.g. code after areturn
orbreak
statement). -
repeatStatement: bool
TigerJython allows the use ofrepeat
as a keyword for simple loops. Set this flag totrue
to markrepeat
as a keyword. -
sagePower: bool
Sage uses the^
operator as a power operator rather than xor (which becomes^^
). Set this flag totrue
to mark^
as power operator and enable^^
and^^=
etc. as valid operators. -
strictCode: bool
If set totrue
, the parser will be very strict and report issues that are usually not considered errors in Python, but typical student mistakes. -
warningAsErrors: bool
If set totrue
, warnings are also reported as errors. -
getLanguage(): string
Get the two-letter language identifier for the error messages currently in use. -
getLanguages(): string[]
Get the list two-letter language identifiers supported by TigerPython for the error messages. -
setLanguage(s: string)
Change the language for the error messages, where the languages
is a two-letter identifier like"en"
,"de"
, or"fr"
. -
setErrorMessage(code: string, msg: string)
Occasionally, you might want to change just a few of the error messages in a given language to adapt it to your system, but you probably do not want to recompile the entire parser for that. In such a case, usesetErrorMessage()
to override the builtin error message and replace it by something new. You can find all available error codes in errors.ErrorCode.scala.Example usage:
setErrorMessage("MISSING_SPACE", "Missing whitespace.")
You can use the parser to get suggestions for auto-completion. Use
TPyParser.autoComplete(source: string, pos: int, filter: bool)
for that purpose, which will return a
(possibly empty) array of strings. source
is the program text and pos
is the absolute offset of the current
positions for which to retrieve the suggestions, counting from the beginning of the text.
The flag filter
determines whether the resulting suggestions shall be filtered according to the position of the
caret. Take, for instance, math.a|(
(with |
denoting the caret) and run the auto-completer. If filter
is set
to false
(the default), it will return all possible suggestions for math.
, ignoring the a
. If filter
is set
to true
, however, it will only return a list with acos, asin, atan, ...
, that is those names starting with a
.
Use TPyParser.defineModule(moduleName: string, moduleBody: string)
in order to add your own modules that can then
be 'imported' when the auto-completer analyses your program code. The module's body consists of a individual lines,
where each line defines either a function or a constant value. The line can start with a type specified in square
brackets, followed by a name (no spaces) and a list of parameters in the case of functions. A function may also be
followed by a doc-string. Here is an example:
[int]factorial(x) Return *x* factorial. Raises :exc:`ValueError` if *x* is not integral or is negative.
[float]sqrt(x)
gamma(x) Return the Gamma function at *x*.
[float]pi
The parser was originally written so support Python 2.7 and Python 3.6. With Python 3.9, the grammar has significantly changed (including the AST nodes), as the old LL(1)-parser was replaced by a Pegen-parser. The structure of this parser and the generated AST therefore deviates quite a bit from the ones used in Python 3.9+. Nonetheless, we try to support new syntactic elements or changes to the syntax, but cannot guarantee full compatibility. Concerning newer features, the current state is as follows:
- Assignment expressions (aka the 'Walrus' operator
:=
) are supported and will generate errors when the assignment expression is used as a statement, if the target is anything other than a name or if they are chained; - f-Strings are partially supported in that the parser will accept f-strings (even nested ones according to PEP 701), but it cannot yet handle errors that end in unterminated strings. At the moment, it will just treat the entire string literal as a single token;
- Positional only-arguments are, likewise, accepted through the slash in the arguments list. However, this is currently not reflected in the AST and no further checks are performed;
- Pattern matching is supported. This one is a bit tricky because of the subtle mechanisms at play for discerning
whether
match
andcase
have to be considered keywords or just normal identifiers.
The entire project is written in Scala 2.12 / Scala.js and uses sbt.
When sbt
is installed, go to the project's root directory and use sbt makeRelease
. This will create both the
JavaScript standalone-file and the module and place them inside the /release
folder.
When sbt
is installed, go to the project's root directory and use sbt fastOptJS
or sbt fullOptJS
to compile the
project (scala.js
supports two compilation modes: fast compilation during development and optimised compilation for
production code). The output (JavaScript-files) can then be found in ./tpParser/js/target/scala-2.12/
.
The JS-linker was previsouly configured to output a
JavaScript ES module. The line responsible
for this is in build.sbt: scalaJSLinkerConfig ~= { _.withModuleKind(ModuleKind.ESModule) }
. Remove
the comments on this line in order to get an ES module instead of classic JavaScript file (however, sbt
currently
crashes because of an export
vs. module.export
error).
We include a number of test cases for programs with and without errors in them. Programs in the folder
/test/programs/correct
are correct Python programs that should run without any error. Those in the folder
/test/programs/erroneous
are Python programs that have an error. These are stored as txt-files to avoid some
overzealous IDEs from reporting the (deliberate) errors in them. The first two lines are comments indicating the
line (with the third actual line in the file starting as 1
) and the expected error code.
If a test with an incorrect program fails, you will find some additional information in the console, but usually
higher up than all the [info]
s on successful or failed tests. That additional information tells you what error
TigerPython think there actually is, where it seems the error, but also which location in the parser is responsible
for reporting that error.
The parser was initially written by Tobias Kohn. Further contributions by:
The authors who contributed translations for the error messages are noted in the respective files.
Please let us know if you would like to add another language for error messages - we are more than happy to include new languages.