THIS README IS WIP
Pipeline for running all dataloader scripts for covidgraph in a controlled manner.
Maintainer: Tim Bleimehl @tim.bleimehl:meet.dzd-ev.de https://github.com/motey
covidgraph.org is a project with the aim to build a knowledge graph around data concerning covid-19. For details go to https://covidgraph.org
The graph is fed by many independent scripts or scripts that are building on each other (called dataloaders here)
Motherlode helps to run these dataloaders in the correct sequence.
Scope of Motherlode is
-
Resolving dependencies of dataloaders
-
Keep track of which dataloaders are already fed in the graph
-
Keep track of versions of the dataloaders (and rerun if updated)
Copy the .env.template
to .env
cp .env.template .env
Edit the .env
to match your Neo4j setup.
e.g. if your run your database localy
NEO4J={"host":"localhost","user":"neo4j","password":"neo"}
# legacy parameters.
# this is the old format, to provide the neo4j access data.
NEO4J_LECAGY_PARAMS="{'GC_NEO4J_URL':'localhost','GC_NEO4J_USER':'neo4j','GC_NEO4J_PASSWORD':'neo'}"
Pull the latest pipeline docker image
docker-compose pull
Run the pipeline
docker-compose up -d
To see what is happening, you can monitor logs with
docker-compose logs -f
To provide connection details for the pipeline (and containers in the pipeline) we supply a json string via an environemt variable.
The name of the environment variable is NEO4J
.
The json string can consist of all parameter described in https://py2neo.org/2020.0/database/index.html#individual-settings
A common example for a local database would be
NEO4J={"host":"localhost", "user":"neo4j","password":"mypw"}
A more complex example for a remote ssl secured database would be:
NEO4J={"host":"myremotehost.example.com", "port":10947, "user":"write-user","password":"mypw", "secure":True}
NOTE: To see how to recieve/parse the string on client side see https://github.com/covidgraph/data_template#your-tasks-in-detail
Have a look at pipeline.yaml which is the pipeline definition. based on copili and motherlode written by the DZD
The format Motherlode accepts dataloaders is only as docker images from a registry (e.g. DockerHub) Motherlode will run these images as containers an handle over some ENV variables to help dataloaders to connect to the database.
For details have a look in at the dataloader template which comes as a python dataloader example: https://github.com/covidgraph/data_template
You can see which pipelinemembers had a run with checking the logging node in neo4j.
MATCH (n:`_PipelineLogNode`) RETURN n
The property exit_code
should be 0
to verify that a loader run without any errors.
If the exit_code
is not 0
have a look into the loaders log file in ./log