|
1 | 1 | # MolEvolvR Backend
|
2 | 2 |
|
3 |
| -The backend is implemented as a RESTful API over the following entities: |
4 |
| - |
5 |
| -- `User`: Represents a user of the system. At the moment logins aren't |
6 |
| -required, so all regular users are the special "Anonymous" user. Admins |
7 |
| -have individual accounts. |
8 |
| -- `Analysis`: Represents an analysis submitted by a user. Each analysis has a unique ID |
9 |
| -and is associated with a user. analyses contain the following sub-entities: |
10 |
| - - `Submission`: Represents the submission of a Analysis, e.g. the data |
11 |
| - itself as well the submission's parameters (both selected by the |
| 3 | +The backend is implemented as a RESTful API. It currently provides endpoints for |
| 4 | +just the `analysis` entity, but will be expanded to include other entities as |
| 5 | +well. |
| 6 | + |
| 7 | +## Usage |
| 8 | + |
| 9 | +Run the `launch_api.sh` script to start API server in a hot-reloading development mode. |
| 10 | +The server will run on port 9050, unless the env var `API_PORT` is set to another |
| 11 | +value. Once it's running, you can access it at http://localhost:9050. |
| 12 | + |
| 13 | +If the env var `USE_SLURM` is equal to 1, the script will create a basic SLURM |
| 14 | +configuration and then launch `munge`, a client used to authenticate to the |
| 15 | +SLURM cluster. The template that configures the backend's connection to SLURM |
| 16 | +can be found at `./cluster_config/slurm.conf.template`. |
| 17 | + |
| 18 | +The script then applies any outstanding database migrations via |
| 19 | +[atlas](https://github.com/ariga/atlas). Finally the API server is started by |
| 20 | +executing the `entrypoint.R` script via |
| 21 | +[drip](https://github.com/siegerts/drip), which restarts the server whenever |
| 22 | +there are changes to the code. |
| 23 | + |
| 24 | +*(Side note: the entrypoint contains a bit of custom logic to |
| 25 | +defer actually launching the server until the port it listens on is free, since |
| 26 | +drip doesn't cleanly shut down the old instance of the server.)* |
| 27 | + |
| 28 | +## Implementation |
| 29 | + |
| 30 | +The backend is implemented in [Plumber](https://www.rplumber.io/index.html), a |
| 31 | +package for R that allows for the creation of RESTful APIs. The API is defined |
| 32 | +in the `api/plumber.R` file, which defines the router and some shared metadata |
| 33 | +routes. The rest of the routes are brought in from the `endpoints/` directory. |
| 34 | + |
| 35 | +Currently implemented endpoints: |
| 36 | +- `POST /analyses`: Create a new analysis |
| 37 | +- `GET /analyses`: Get all analyses |
| 38 | +- `GET /analyses/:id`: Get a specific analysis by its ID |
| 39 | +- `GET /analyses/:id/status`: Get just the status field for an analysis by its ID |
| 40 | + |
| 41 | +*(TBC: more comprehensive docs; see the [Swagger docs](http://localhost:9050/__docs__/) for now)* |
| 42 | + |
| 43 | +## Database Schema |
| 44 | + |
| 45 | +The backend uses a PostgreSQL database to store analyses. The database's schema |
| 46 | +is managed by [atlas](https://github.com/ariga/atlas); you can find the current |
| 47 | +schema definition at `./schema/schema.pg.hcl`. After changing the schema, you |
| 48 | +can create a "migration", i.e. a set of SQL statements that will bring the |
| 49 | +database up to date with the new schema, by running `./schema/makemigration.sh |
| 50 | +<reason>`; if all is well with the schema, the new migration will be put in |
| 51 | +`./schema/migrations/`. |
| 52 | + |
| 53 | +Any pending migrations are applied automatically when the backend starts up, but |
| 54 | +you can manually apply new migrations by running `./schema/apply.sh`. |
| 55 | + |
| 56 | +## Testing |
| 57 | + |
| 58 | +You can run the tests for the backend by running the `run_tests.sh` script. The |
| 59 | +script will recursively search for all files with the pattern `test_*.R` in the |
| 60 | +`tests/` directory and run them. Tests are written using the |
| 61 | +[testthat](https://testthat.r-lib.org/) package. |
| 62 | + |
| 63 | +Note that the tests currently depend on the stack's services being available, so |
| 64 | +you should run the tests from within the backend container after having started |
| 65 | +the stack normally. An easy way to do that is to execute `./run_stack.sh shell` |
| 66 | +in the repo root, which will give you an interactive shell in the backend |
| 67 | +container. Eventually, we'll have them run in their own environment, which the |
| 68 | +`run_tests.sh` script will likely orchestrate. |
| 69 | + |
| 70 | +## Implementation Details |
| 71 | + |
| 72 | +### Domain Entities |
| 73 | + |
| 74 | +*NOTE: the backend is as of now a work in progress, so expect this to change.* |
| 75 | + |
| 76 | +The backend includes, or will include, the following entities: |
| 77 | + |
| 78 | +- `User`: Represents a user of the system. At the moment logins aren't required, |
| 79 | +so all regular users are the special "Anonymous" user. Admins have individual |
| 80 | +accounts. |
| 81 | +- `Analysis`: Represents an analysis submitted by a user. Each analysis has a |
| 82 | +unique ID and is associated with a user. analyses contain the following |
| 83 | +sub-entities: |
| 84 | + - `AnalysisSubmission`: Represents the submission of a Analysis, e.g. the |
| 85 | + data itself as well the submission's parameters (both selected by the |
12 | 86 | user and supplied by the system).
|
13 |
| - - `AnalysisStatus`: Represents the status of a Analysis. Each Analysis has a status |
14 |
| - associated with it, which is updated as the Analysis proceeds through its |
15 |
| - processing stages. |
| 87 | + - `AnalysisStatus`: Represents the status of a Analysis. Each Analysis has a |
| 88 | + status associated with it, which is updated as the Analysis proceeds through |
| 89 | + its processing stages. |
16 | 90 | - `AnalysisResult`: Represents the result of a Analysis.
|
17 |
| -- `Cluster`: Represents the status of the overall cluster, including |
18 |
| -how many analyses have been completed, how many are in the queue, |
19 |
| -and other statistics related to the processing of analyses. |
| 91 | +- `Queue`: Represents the status of processing analyses, including how many |
| 92 | +analyses have been completed, how many are in the queue, and other statistics. |
| 93 | +- `System`: Represents the system as a whole, including the version of the |
| 94 | +backend, the version of the frontend, and other metadata about the system. |
| 95 | +Includes runtime statistics about the execution environment as well, such as RAM |
| 96 | +and CPU usage. Includes cluster information, too, such as node uptime and |
| 97 | +health. |
20 | 98 |
|
21 |
| -## Implementation |
| 99 | +### Job Processing |
| 100 | + |
| 101 | +*NOTE: we use the term "job" here to indicate any asynchronous task that the |
| 102 | +backend needs to perform outside of the request-response cycle. It's not related |
| 103 | +to the app domain's terminology of a "job" (i.e. an analysis).* |
22 | 104 |
|
23 |
| -The backend is implemented in Plumber, a package for R that allows for the |
24 |
| -creation of RESTful APIs. The API is defined in the `api/router.R` file, which |
25 |
| -contains the endpoints for the API. Supporting files are found in |
26 |
| -`api/resources/`. |
| 105 | +The backend makes use of |
| 106 | +[future.batchtools](https://future.batchtools.futureverse.org/), an extension |
| 107 | +that adds [futures](https://future.futureverse.org/) support to |
| 108 | +[batchtools](https://mllg.github.io/batchtools/index.html), a package for |
| 109 | +processing asynchronous jobs. The package provides support for many |
| 110 | +job-processing systems, including |
| 111 | +[SLURM](https://slurm.schedmd.com/documentation.html); more details on |
| 112 | +alternative systems can be found in the [`batchtools` package |
| 113 | +documentation](https://mllg.github.io/batchtools/articles/batchtools.html). |
27 | 114 |
|
28 |
| -The API is then run using the `launch_api.R` file, which starts the Plumber |
29 |
| -server. |
| 115 | +In our case, we use SLURM; `batchtools` basically wraps SLURM's `sbatch` command |
| 116 | +and handles producing a job script for an R callable, submitting the script to |
| 117 | +the cluster for execution, and collecting the results to be returned to R. The |
| 118 | +template for the job submission script can be found at |
| 119 | +`./cluster_config/slurm.tmpl`. |
0 commit comments