From 45a869a0e69ef26a6bec8c25642fc249cff522c6 Mon Sep 17 00:00:00 2001 From: Nuwan Goonasekera <2070605+nuwang@users.noreply.github.com> Date: Wed, 15 Jun 2022 12:17:37 +0530 Subject: [PATCH 1/3] Updated docs --- docs/topics/concepts.rst | 45 ++++++++++++++++++++-------- docs/topics/tpv_by_example.rst | 55 +++++++++++++++++++++++++++++++++- 2 files changed, 86 insertions(+), 14 deletions(-) diff --git a/docs/topics/concepts.rst b/docs/topics/concepts.rst index facfea9..ae3882d 100644 --- a/docs/topics/concepts.rst +++ b/docs/topics/concepts.rst @@ -5,10 +5,10 @@ Concepts and Organisation Object types ============ -Conceptually, Vortex consists of the following types of objects. +Conceptually, TPV consists of the following types of objects. 1. Entities - An entity is anything that will be considered for scheduling -by vortex. Entities include Tools, Users, Groups, Rules and Destinations. +by TPV. Entities include Tools, Users, Groups, Rules and Destinations. All entities have some common properties (id, cores, mem, env, params, scheduling tags). @@ -76,7 +76,7 @@ User > Role > Tool. 3. Evaluate ----------- -This operation evaluates any python expressions in the vortex config. It is divided into two steps, evaluate_early() +This operation evaluates any python expressions in the TPV config. It is divided into two steps, evaluate_early() and evaluate_late(). The former runs before the combine step and evaluates expressions for cores, mem and gpus. This ensures that at the time of combining entities, these values are concrete and can be compared. After the combine() step, the evaluate_late() function evaluates all remaining variables, ensuring that they have the latest possible @@ -102,7 +102,7 @@ candidate destinations. Job Dispatch Process ==================== -When a typical job is dispatched, vortex follows the process below. +When a typical job is dispatched, TPV follows the process below. .. image:: ../images/job-dispatch-process.svg @@ -112,17 +112,18 @@ When a typical job is dispatched, vortex follows the process below. 3. combine() - Combines entity requirements to create a merged entity. Uses lower of gpu, cores and mem requirements 4. evaluate_late() - Evaluates remaining expressions as late as possible 5. match() - Matches the combined entity requirements with a suitable destination -6. rank() - The matching destinations are ranked and the best match chosen +6. rank() - The matching destinations are ranked +7. choose - The ranked destinations are evaluated, with the first non-failing match chosen (no rule failures) Expressions =========== -Most vortex properties can be expressed as python expressions. The rule of thumb is that all string expressions +Most TPV properties can be expressed as python expressions. The rule of thumb is that all string expressions are evaluated as python f-strings, and all integers or boolean expressions are evaluated as python code blocks. For example, cpu, cores and mem are evaluated as python code blocks, as they evaluate to integer/float values. However, env and params are evaluated as f-strings, as they result in string values. This is to improve the readability -and syntactic simplicity of vortex config files. +and syntactic simplicity of TPV config files. At the point of evaluating these functions, there is an evaluation context, which is a default set of variables that are available to that expression. The following default variables are available to all expressions: @@ -140,13 +141,19 @@ Default evaluation context +----------+-----------------------------------------------------------------------------+ | job | the Galaxy job object | +----------+-----------------------------------------------------------------------------+ -| mapper | the vortex mapper object, which can be used to access parsed vortex configs | +| mapper | the TPV mapper object, which can be used to access parsed TPV configs | +----------+-----------------------------------------------------------------------------+ -| entity | the vortex entity being currently evaluated. Can be a combined entity. | +| entity | the TPV entity being currently evaluated. Can be a combined entity. | +----------+-----------------------------------------------------------------------------+ -| self | an alias for the current vortex entity. | +| self | an alias for the current TPV entity. | +----------+-----------------------------------------------------------------------------+ +Custom evaluation contexts +--------------------------- +These are user defined context values that can be defined globally, or locally at the level of each +entity. Any defined context value is available as a regular variable at the time the entity is evaluated. + + Special evaluation contexts --------------------------- In addition to the defaults above, additional context variables are available at different steps. @@ -159,13 +166,13 @@ expressions can be based on gpu values. mem expressions can refer to both cores refer to evaluated env expressions. *rank functions* - these can refer to all prior expressions, and are additional passed in a `candidate_destinations` -array, which is a list of matching vortex destinations. +array, which is a list of matching TPV destinations. Scheduling ========== -Vortex offers several mechanisms for controlling scheduling, all of which are optional. +TPV offers several mechanisms for controlling scheduling, all of which are optional. In its simplest form, no scheduling constraints would be defined at all, in which case the entity would schedule on the first available entity. Admins can use additional @@ -216,7 +223,19 @@ can execute that tool. Of course, the destination must also be marked as not rej Scheduling by rules ------------------- - +Rules can be used to conditionally modify any entity requirement. Rules can be given an ID, +which can subsequently be used by an inheriting entity to override thr rule. If no ID is +specified, a unique ID is generated, and the rule can no longer be overridden. Rules +are typically evaluted through an `if` clause, which specifies the logical condition under +which the rule matches. If the rule matches, any cores, memory, scheduling tags etc. can be +specified to override inherited values. The special clause `fail` can be used to immediately +fail the job with an error message. The `execute` clause can be used to execute an arbitrary +code block on rule match. Scheduling by custom ranking functions -------------------------------------- +The default rank function sorts destinations by scoring how well the tags match the job's requirements. +Since this may often be too simplistic, the rank function can be overridden by specifying a custom +rank clause. The rank clause can contain an arbitrary code block, which can do the desired sorting, +for example by determining destination load by querying the job manager, influx statistics etc. +The final statement in the rank clause must be the list of sorted destinations. diff --git a/docs/topics/tpv_by_example.rst b/docs/topics/tpv_by_example.rst index fbd4c38..d4dcfe0 100644 --- a/docs/topics/tpv_by_example.rst +++ b/docs/topics/tpv_by_example.rst @@ -57,7 +57,7 @@ Inheritance provides a mechanism for an entity to inherit properties from anothe gpus: 1 -The `global` section is used to define global vortex properties. The `default_inherits` property defines a "base class" +The `global` section is used to define global TPV properties. The `default_inherits` property defines a "base class" for all tools to inherit from. In this example, if the `bwa` tool is executed, it will match the `default` tool, as there are no other matches, @@ -327,3 +327,56 @@ in this example, the candidate destinations are first sorted by the best matchin default ranking function), and then sorted by CPU usage per destination, obtained from the influxdb query. Note that the final statement in the rank function must be the list of sorted destinations. + +Custom contexts +--------------- +In addition to the automatically provided context variables (see :doc:`concepts`), TPV allows you to define arbitrary +custom variables, which are then available whenever an expression is evaluated. Contexts can be defined both globally +or at the level of each entity, with entity level context variables overriding global ones. + +.. code-block:: yaml + :linenos: + + global: + default_inherits: default + context: + ABSOLUTE_FILE_SIZE_LIMIT: 100 + large_file_size: 10 + _a_protected_var: "some value" + + tools: + default: + context: + additional_spec: --my-custom-param + cores: 2 + mem: 4 + params: + nativeSpecification: "--nodes=1 --ntasks={cores} --ntasks-per-node={cores} --mem={mem*1024} {additional_spec}" + rules: + - if: input_size >= ABSOLUTE_FILE_SIZE_LIMIT + fail: Job input: {input_size} exceeds absolute limit of: {ABSOLUTE_FILE_SIZE_LIMIT} + - if: input_size > large_file_size + cores: 10 + + https://toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.1.0+galaxy7: + context: + large_file_size: 20 + additional_spec: --overridden-param + mem: cores * 4 + gpus: 1 + + +In this example, these global context variables are defined, which are made available to all entities. +Variable names follow Python conventions, where all uppercase variables indicate constants that cannot be overridden. +Lower case indicates a public variable that can be overridden and changed, even across multiple TPV config files. +An underscore indicates a protected variable that can be overridden within the same file, but not across files. + +Additional, the tool defaults section defines an additional context variable named 'additional_spec`, which is only +available to inheriting tools. + +If we were to dispatch a job, say bwa, with an input_size of 15, the large file rule in the defaults section would +kick in, and the number of cores would be set to 10. If we were to dispatch a hisat2 job with the same input size +however, the large_file_size rule would not kick in, as it has been overridden to 20. The main takeaway from this +example is that variables are bound late, and therefore, rules and params can be crafted to allow inheriting +tools to conveniently override values, even across files. While this capability can be powerful, it needs to be +treated with the same care as any global variable in a programming language. From 872d200f3bfeb7356ba76bb1ee14134a50608d92 Mon Sep 17 00:00:00 2001 From: Nuwan Goonasekera <2070605+nuwang@users.noreply.github.com> Date: Wed, 15 Jun 2022 12:45:46 +0530 Subject: [PATCH 2/3] Add docs on multiple matches and resubmission --- docs/topics/concepts.rst | 8 ++--- docs/topics/tpv_by_example.rst | 57 +++++++++++++++++++++++++++++++++- 2 files changed, 60 insertions(+), 5 deletions(-) diff --git a/docs/topics/concepts.rst b/docs/topics/concepts.rst index ae3882d..cc1b87e 100644 --- a/docs/topics/concepts.rst +++ b/docs/topics/concepts.rst @@ -224,10 +224,10 @@ can execute that tool. Of course, the destination must also be marked as not rej Scheduling by rules ------------------- Rules can be used to conditionally modify any entity requirement. Rules can be given an ID, -which can subsequently be used by an inheriting entity to override thr rule. If no ID is +which can subsequently be used by an inheriting entity to override the rule. If no ID is specified, a unique ID is generated, and the rule can no longer be overridden. Rules -are typically evaluted through an `if` clause, which specifies the logical condition under -which the rule matches. If the rule matches, any cores, memory, scheduling tags etc. can be +are typically evaluated through an `if` clause, which specifies the logical condition under +which the rule matches. If the rule matches, cores, memory, scheduling tags etc. can be specified to override inherited values. The special clause `fail` can be used to immediately fail the job with an error message. The `execute` clause can be used to execute an arbitrary code block on rule match. @@ -235,7 +235,7 @@ code block on rule match. Scheduling by custom ranking functions -------------------------------------- The default rank function sorts destinations by scoring how well the tags match the job's requirements. -Since this may often be too simplistic, the rank function can be overridden by specifying a custom +As this may often be too simplistic, the rank function can be overridden by specifying a custom rank clause. The rank clause can contain an arbitrary code block, which can do the desired sorting, for example by determining destination load by querying the job manager, influx statistics etc. The final statement in the rank clause must be the list of sorted destinations. diff --git a/docs/topics/tpv_by_example.rst b/docs/topics/tpv_by_example.rst index d4dcfe0..4b64c0d 100644 --- a/docs/topics/tpv_by_example.rst +++ b/docs/topics/tpv_by_example.rst @@ -366,7 +366,7 @@ or at the level of each entity, with entity level context variables overriding g gpus: 1 -In this example, these global context variables are defined, which are made available to all entities. +In this example, three global context variables are defined, which are made available to all entities. Variable names follow Python conventions, where all uppercase variables indicate constants that cannot be overridden. Lower case indicates a public variable that can be overridden and changed, even across multiple TPV config files. An underscore indicates a protected variable that can be overridden within the same file, but not across files. @@ -380,3 +380,58 @@ however, the large_file_size rule would not kick in, as it has been overridden t example is that variables are bound late, and therefore, rules and params can be crafted to allow inheriting tools to conveniently override values, even across files. While this capability can be powerful, it needs to be treated with the same care as any global variable in a programming language. + +Multiple matches +--------------- +If multiple regular expressions match, the matches are applied in order of appearance. Therefore, the convention is +to specify more general rule matches first, and more specific matches later. This matching also applies across +multiple TPV config files, again based on order of appearance. + +.. code-block:: yaml + :linenos: + + tools: + default: + cores: 2 + mem: 4 + params: + nativeSpecification: "--nodes=1 --ntasks={cores} --ntasks-per-node={cores} --mem={mem*1024}" + + https://toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/*: + mem: cores * 4 + gpus: 1 + + https://toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/2.1.0+galaxy7: + env: + MY_ADDITIONAL_FLAG: "test" + + +In this example, dispatching a hisat2 job would result in a mem value of 8, with 1 gpu. However, dispatching +the specific version of `2.1.0+galaxy7` would result in the additional env variable, with mem remaining at 8. + +Job Resubmission +---------------- +TPV has explict support for job resubmissions, so that advanced control over job resubmission is possible. + +.. code-block:: yaml + :linenos: + + tools: + default: + cores: 2 + mem: 4 * int(job.destination_params.get('SCALING_FACTOR', 1)) if job.destination_params else 1 + params: + SCALING_FACTOR: "{2 * int(job.destination_params.get('SCALING_FACTOR', 2)) if job.destination_params else 2}" + resubmit: + with_more_mem_on_failure: + condition: memory_limit_reached and attempt <= 3 + destination: tpv_dispatcher + +In this example, we have defined a resubmission handler that resubmits the job if the memory limited is reached. +Note that the resubmit section looks exactly the same as Galaxy's, except that it follows a dictionary structure +instead of being a list. Refer to the Galaxy job configuration docs for more information on resubmit handlers. One +twist in this example is that we automatically increase the amount of memory provided to the job on each resubmission. +This is done by setting the SCALING_FACTOR param, which is a custom parameter which we have chosen for this example, +that we increase on each resubmission. Since each resubmission's destination is TPV, the param is re-evaluated on each +resubmission, and scaled accordingly. The memory is allocated based on the scaling factor, which therefore, also +scales accordingly. From af0bb3fe4f1ca6d4656bf5e02e8ba53752a3d68f Mon Sep 17 00:00:00 2001 From: Nuwan Goonasekera <2070605+nuwang@users.noreply.github.com> Date: Wed, 15 Jun 2022 12:18:21 +0530 Subject: [PATCH 3/3] Add change log and set release version to 1.2.0 --- CHANGELOG.rst | 11 +++++++++++ tpv/__init__.py | 2 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.rst b/CHANGELOG.rst index d47e7d1..c7f200f 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -1,3 +1,14 @@ +1.2.0 - Jun 15, 2022. (sha 872d200f3bfeb7356ba76bb1ee14134a50608d92) +-------------------------------------------------------------------- + +* vortex package and cli renamed to tpv for consistency. +* All matching entity regexes are applied, not just the first. Order of application is in the order of definition. +* When a particular entity type is matched, its definitions are cached, so that future lookups are O(1). +* Support for job resubmission handling, with integration tests for Galaxy, +* Allow destinations to be treated as regular entities, with support for rules and expressions. +* Support for global and local context variables that can be referenced in expressions. +* Improved support for complex jobs param types like dicts and lists, which are now recursively evaluated. + 1.1.0 - Mar 25, 2022. (sha 0e65d9a6a16bbbfd463031677067e1af9f4dac64) -------------------------------------------------------------------- diff --git a/tpv/__init__.py b/tpv/__init__.py index 74fc4b6..523f71a 100644 --- a/tpv/__init__.py +++ b/tpv/__init__.py @@ -1,7 +1,7 @@ """Total Perspective Vortex library setup.""" # Current version of the library -__version__ = "1.1.0" +__version__ = "1.2.0" def get_version():