Skip to content

Commit

Permalink
Merge pull request #331 from RTXteam/kg2.8.4prep
Browse files Browse the repository at this point in the history
`KG2.8.4pre` Code Release
  • Loading branch information
ecwood authored Jul 24, 2023
2 parents cb0fca6 + 7634b86 commit d3f31e5
Show file tree
Hide file tree
Showing 60 changed files with 1,039 additions and 1,657 deletions.
44 changes: 44 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# This workflow is based on GitHub's CI example for Python

name: RTX-KG2 Continous Integration

on: [push, pull_request]

permissions:
contents: read

jobs:
build:

runs-on: ubuntu-latest

steps:
- name: Export Path
run: |
export PATH=$PATH:~/kg2-build/
- name: Setup KG2 Build
run: |
git clone https://github.com/RTXteam/RTX-KG2
cd RTX-KG2
git checkout $GITHUB_REF_NAME
bash -x ./setup-kg2-build.sh ci
- name: Run Tests
run: |
cd /home/runner/work/RTX-KG2/RTX-KG2/RTX-KG2
bash -x ./run-validation-tests.sh
bash -x ./build-kg2-snakemake.sh all -n ci
bash -x ./build-kg2-snakemake.sh all -n ci
bash -x ./build-kg2-snakemake.sh all -n nodes ci
bash -x ./build-kg2-snakemake.sh all -R_Merge -n ci
bash -x ./build-kg2-snakemake.sh all -R_Finish -n ci
bash -x ./build-kg2-snakemake.sh all -F -n ci
bash -x ./build-kg2-snakemake.sh all -R_Merge -n nodes ci
bash -x ./build-kg2-snakemake.sh all -R_Finish -n nodes ci
bash -x ./build-kg2-snakemake.sh all -F -n nodes ci
- name: Test Building One File
run: |
cd /home/runner/work/RTX-KG2/RTX-KG2/RTX-KG2
bash -x ./extract-mirbase.sh ~/kg2-build/miRNA.dat
~/kg2-venv/bin/python3 -u mirbase_dat_to_kg_json.py ~/kg2-build/miRNA.dat ~/kg2-build/kg2-mirbase.json --test
~/kg2-venv/bin/python3 -u report_stats_on_json_kg.py ~/kg2-build/kg2-mirbase.json ~/kg2-build/kg2-mirbase-report.json
cat ~/kg2-build/kg2-mirbase-report.json
2 changes: 1 addition & 1 deletion Adding-properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -811,7 +811,7 @@ if __name__ == '__main__':
build_node = kg2_util.make_node(kg2_util.CURIE_PREFIX_RTX + ':' + 'KG2',
kg2_util.BASE_URL_RTX + 'KG2',
build_name,
kg2_util.BIOLINK_CATEGORY_INFORMATION_RESOURCE,
kg2_util.SOURCE_NODE_CATEGORY,
update_date,
kg2_util.CURIE_PREFIX_RTX + ':')
build_info = {'version': build_node['name'], 'timestamp_utc': build_node['update_date']}
Expand Down
18 changes: 16 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[![Build Status](https://travis-ci.com/RTXteam/RTX-KG2.svg?branch=master)](https://travis-ci.com/RTXteam/RTX-KG2)
[![RTX-KG2 Continous Integration](https://github.com/RTXteam/RTX-KG2/actions/workflows/main.yml/badge.svg?branch=kg2.8.4prep)](https://github.com/RTXteam/RTX-KG2/actions/workflows/main.yml)
# KG2: the second-generation RTX knowledge graph

KG2 is the second-generation knowledge graph for the
Expand Down Expand Up @@ -336,6 +336,17 @@ this command:

touch ~/kg2-build/major-release

[**MORE COMMON ALTERNATIVE**] For regular releases, you want to increment the "minor"
release number. This is for situations where changes to the code have been made and
the build will likely be deployed. If you want to increment the "minor" release number
for KG2, you would run this command:

touch ~/kg2-build/minor-release

If you don't increment the release number at all, you should not be planning to deploy
the build. This is useful for cases where you are testing the build system, but not
necessarily different code or bug fixes.

(7) Run a "dry-run" build:

bash -x ~/kg2-code/build-kg2-snakemake.sh all -F -n
Expand Down Expand Up @@ -434,7 +445,10 @@ build, in Step (8) above, you would run

(note the absence of the `all` argument to `build-kg2-snakemake.sh`). A partial build of KG2
may take about 31 hours. Note, you have to have previously run an `all` build
of KG2, or else the partial build will not work.
of KG2, or else the partial build will not work. Note, when doing a partial build,
existing KG2 JSON files in the `/home/ubuntu/kg2-build` directory from previous
builds will just get used and will not get updated; if you want any of those files
to get updated, you should delete them before running the partial build.
</details>

<details>
Expand Down
3 changes: 2 additions & 1 deletion Snakefile-conversion
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,14 @@ rule SemMedDB_Conversion:
input:
real = config['SEMMED_TUPLELIST_FILE'],
mrcui_req = config['UMLS_CUI_FILE'],
exclusion_list = config['SEMMED_EXCLUSION_FILE'],
validation = config['VALIDATION_PLACEHOLDER']
output:
config['SEMMED_OUTPUT_FILE']
log:
config['BUILD_DIR'] + "/semmeddb-tuple-list-json-to-kg-json" + config['TEST_SUFFIX'] + ".log"
shell:
config['VENV_DIR'] + "/bin/python3 -u " + config['CODE_DIR'] + "/semmeddb_tuple_list_json_to_kg_json.py " + config['TEST_ARG'] + " --mrcuiFile ~/kg2-build/umls/META/MRCUI.RRF {input.real} {output} > {log} 2>&1"
config['VENV_DIR'] + "/bin/python3 -u " + config['CODE_DIR'] + "/semmeddb_tuple_list_json_to_kg_json.py " + config['TEST_ARG'] + " --mrcuiFile ~/kg2-build/umls/META/MRCUI.RRF {input.real} {input.exclusion_list} {output} > {log} 2>&1"

rule UniProtKB_Conversion:
input:
Expand Down
4 changes: 2 additions & 2 deletions Snakefile-maintenance.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,12 +69,12 @@ Here are the available options: (Format: `flag` [slots it works in, starting at
- `-R_*` [1-3]: This is our version of Snakemake's `-R` flag. However, rather than using it in the form `-R Rule` (ex. `-R Merge`), we add an underscore between them (`-R_Rule`) to simplify the command line options decoding process. This forces a rerun of all the rules that provide an input to the rule listed. For example, if you wanted to rerun all of the conversion rules, you might use `-R_Merge`. This one is more tricky to use and I'd recommend both reading up on what Snakemake says about it and doing dryruns until you get the effect you are looking for.
- `-F` [1-3]: This flag forces a rerun of all of the rules that lead up to the first rule in the Snakefile, which is `Finish` and depends on all of the rules. Thus, this will rebuild everything.
- `graphic` [1-3]: This flag generates the PNG diagram of the Snakemake workflow
- `travisci` [3-5]: This flag should only be used in the `.travis.yml` file (for usage on a Travis CI instance). It ensures that the commands are configured to run on a Travis CI instance (where we can't use a virtualenv).
- `ci` [3-5]: This flag should only be used in the `.github/workflows/main.yml` file (for usage on a GitHub Actions instance). It ensures that the commands are configured to run on a GitHub Actions instance (where certain paths are required to be different).

Examples:

- Bad: `bash -x build-kg2-snakemake.sh -n test` (`test` flag **must** be in position 1)
- Good: `bash -x build-kg2-snakemake.sh all -F nodes -n travisci` (every flag is in an allowable position for it)
- Good: `bash -x build-kg2-snakemake.sh all -F nodes -n ci` (every flag is in an allowable position for it)



Expand Down
7 changes: 4 additions & 3 deletions Snakefile-semmeddb-extraction
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
rule SemMedDB:
input:
config['BUILD_DIR'] + "/validation-placeholder.empty"
config['VALIDATION_PLACEHOLDER']
output:
config['SEMMED_TUPLELIST_FILE']
tuplelist = config['SEMMED_TUPLELIST_FILE'],
exclusion_list = config['SEMMED_EXCLUSION_FILE']
log:
config['BUILD_DIR'] + "/extract-semmeddb" + config['TEST_SUFFIX'] + ".log"
shell:
"bash -x " + config['CODE_DIR'] + "/extract-semmeddb.sh {output} " + config['TEST_FLAG'] + " > {log} 2>&1"
"bash -x " + config['CODE_DIR'] + "/extract-semmeddb.sh {output.tuplelist} {output.exclusion_list} " + config['TEST_FLAG'] + " > {log} 2>&1"

38 changes: 13 additions & 25 deletions build-kg2-snakemake.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ set -o nounset -o pipefail -o errexit

if [[ "${1:-}" == "--help" || "${1:-}" == "-h" ]]; then
echo Usage: "$0 [test|alltest|all|-n|nodes|graphic|-R_*|-F] [-n|nodes|graphic|-R_*|-F] "
echo "[-n|nodes|graphic|-R_*|-F|travisci] [nodes|travisci|-n] [travisci]"
echo "[-n|nodes|graphic|-R_*|-F|ci] [nodes|ci|-n] [ci]"
exit 2
fi

# Usage: build-kg2-snakemake.sh [test|alltest|all|-n|nodes|graphic|-R_*|-F] [-n|nodes|graphic|-R_*|-F]
# [-n|nodes|graphic|-R_*|-F|travisci] [nodes|travisci|-n] [travisci]
# [-n|nodes|graphic|-R_*|-F|ci] [nodes|ci|-n] [ci]

config_dir=`dirname "$0"`
source ${config_dir}/master-config.shinc
Expand All @@ -23,10 +23,10 @@ tertiary_build_flag=${3-""}
quaternary_build_flag=${4-""}
quinary_build_flag=${5-""}

travisci_flag=""
if [[ "${tertiary_build_flag}" == "travisci" || "${quaternary_build_flag}" == "travisci" || "${quinary_build_flag}" == "travisci" ]]
ci_flag=""
if [[ "${tertiary_build_flag}" == "ci" || "${quaternary_build_flag}" == "ci" || "${quinary_build_flag}" == "ci" ]]
then
travisci_flag="travisci"
ci_flag="ci"
fi

if [[ "${build_flag}" == "test" || "${build_flag}" == "alltest" ]]
Expand Down Expand Up @@ -66,25 +66,20 @@ fi

build_kg2_log_file=${BUILD_DIR}/build-kg2-snakemake${dryrun}${test_suffix}.log
touch ${build_kg2_log_file}
if [[ "${travisci_flag}" == "travisci" ]]
if [[ "${ci_flag}" == "ci" ]]
then
trap "cat ${build_kg2_log_file}" EXIT
fi

{
echo "================= starting build-kg2-snakemake.sh =================="
date

snakemake_config_file=${CODE_DIR}/snakemake-config.yaml
snakefile=${CODE_DIR}/Snakefile

if [[ "${travisci_flag}" != "travisci" ]]
then
${VENV_DIR}/bin/python3 -u ${CODE_DIR}/generate_snakemake_config_file.py ${test_arg} ${config_dir}/master-config.shinc \
${VENV_DIR}/bin/python3 -u ${CODE_DIR}/generate_snakemake_config_file.py ${test_arg} ${config_dir}/master-config.shinc \
${CODE_DIR}/snakemake-config-var.yaml ${snakemake_config_file}
else
python3 -u ${CODE_DIR}/generate_snakemake_config_file.py ${test_arg} ${config_dir}/master-config.shinc \
${CODE_DIR}/snakemake-config-var.yaml ${snakemake_config_file}
fi

# Run snakemake from the virtualenv but run the snakefile in kg2-code
# -F: Run all of the rules in the snakefile
Expand Down Expand Up @@ -118,9 +113,9 @@ then
sed -i '/\ shell("gzip -fk {input.simplified_output_nodes_file_full}")/d' ${CODE_DIR}/Snakefile-finish
sed -i "/\ shell(config\['S3_CP_CMD'\] + ' {input.simplified_output_nodes_file_full}.gz s3:\/\/' + config\['S3_BUCKET'\])/d" ${CODE_DIR}/Snakefile-finish
else
git fetch origin
git checkout -- ${CODE_DIR}/Snakefile-post-etl
git checkout -- ${CODE_DIR}/Snakefile-finish
git fetch origin
git checkout -- ${CODE_DIR}/Snakefile-post-etl
git checkout -- ${CODE_DIR}/Snakefile-finish
fi

echo configfile: \"${snakemake_config_file}\" > ${snakefile}
Expand Down Expand Up @@ -148,20 +143,13 @@ then
echo 'include: "Snakefile-generate-nodes"' >> ${snakefile}
fi

if [[ "${travisci_flag}" != "travisci" ]]
then
command="cd ~ && ${VENV_DIR}/bin/snakemake --snakefile ${snakefile} ${run_flag} -R Finish -j 16 ${dryrun} ${graphic}"
else
command="cd ~ && snakemake --snakefile ${snakefile} ${run_flag} -j 16 ${dryrun} ${graphic}"
fi

eval "$command"
cd ~ && ${VENV_DIR}/bin/snakemake --snakefile ${snakefile} ${run_flag} -R Finish -j 16 ${dryrun} ${graphic}

date
echo "================ script finished ============================"
} > ${build_kg2_log_file} 2>&1

if [[ "${travisci_flag}" != "travisci" && "${dryrun}" != "-n" ]]
if [[ "${ci_flag}" != "ci" && "${dryrun}" != "-n" ]]
then
${s3_cp_cmd} ${build_kg2_log_file} s3://${s3_bucket_public}/
${s3_cp_cmd} ${build_kg2_log_file} s3://${s3_bucket_versioned}/
Expand Down
8 changes: 6 additions & 2 deletions build-multi-ont-kg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,10 @@ source ${config_dir}/master-config.shinc
## supply a default value for the build_flag string
build_flag=${3:-""}
biolink_base_url_no_version=https://raw.githubusercontent.com/biolink/biolink-model/
biolink_raw_base_url=${biolink_base_url_no_version}${biolink_model_version}/biolink-model.owl.ttl
biolink_raw_base_url_curies_urls_map=${biolink_base_url_no_version}${biolink_model_version}/

# Issue #300: Need "v" before version number for URL to resolve
biolink_raw_base_url=${biolink_base_url_no_version}v${biolink_model_version}/biolink-model.owl.ttl
biolink_raw_base_url_curies_urls_map=${biolink_base_url_no_version}v${biolink_model_version}/
curies_urls_map_replace_string="\ biolink_download_source: ${biolink_raw_base_url_curies_urls_map}"
ont_load_inventory_replace_string="\ url: ${biolink_raw_base_url}"

Expand Down Expand Up @@ -67,6 +69,8 @@ ${VENV_DIR}/bin/python3 -u ${CODE_DIR}/save_owl_datatypeproperties.py \
${BUILD_DIR}/umls-omim.owl \
--outputFile ${node_datatype_properties_file}

${s3_cp_cmd} s3://${s3_bucket}/foodon.pickle ${BUILD_DIR}/

## run the multi_ont_to_json_kg.py script
cd ${BUILD_DIR} && ${VENV_DIR}/bin/python3 -u ${CODE_DIR}/multi_ont_to_json_kg.py \
${test_arg} \
Expand Down
2 changes: 1 addition & 1 deletion chembl_mysql_to_kg_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ def make_node(id: str,
nodes.append(kg2_util.make_node(CHEMBL_KB_CURIE_ID,
CHEMBL_KB_URL,
'ChEMBL v' + version,
kg2_util.BIOLINK_CATEGORY_INFORMATION_RESOURCE,
kg2_util.SOURCE_NODE_CATEGORY,
update_date,
CHEMBL_KB_CURIE_ID))

Expand Down
1 change: 1 addition & 0 deletions curies-to-categories.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ term-mappings:
MESH:D001523: disease
MESH:D003933: procedure
MESH:D004864: device
MESH:D005007: named thing
MESH:D005159: agent
MESH:D006281: agent
MESH:D008919: procedure
Expand Down
Loading

0 comments on commit d3f31e5

Please sign in to comment.