Skip to content

Latest commit

 

History

History
121 lines (76 loc) · 5.09 KB

README.md

File metadata and controls

121 lines (76 loc) · 5.09 KB

gene-oracle-docker

***Currently a funtional work-in-progress

This is a dockerized version of gene-oracle, a deep learning algorithm built using Tensorflow.

Funtionality

This software allows the user to create a pod that creates N gene-oracle containers on a Kubernertes cluster. It creates a framework for the pod, starts a pod based on that framework, copies input data into the pod, then runs gene-oracle inside the pod once data is done copying. The user can then run another script to pull log data from each container to their local machine.

Assume N will be the number of sets in the experiment.

Building and pushing image

Since this image utilizes GPUs, you will need to install nvidia-docker if you want to run the image on your local machine. You can still build and push the image with regular docker commands.

Once changes are made, build the image with docker build -t <IMG_NAME> .

If you plan on running locally, build the GPU image with nvidia-docker build -t <IMG_NAME> .

You can then test on your local machine with nvidia-docker run ...., or you can push to DockerHub to test the image on Nautilus.

Pushing to DockerHub

Build the image, then run docker images to see your image like so:

REPOSITORY                  TAG                 IMAGE ID            CREATED             SIZE
gene-oracle                 latest              a88adcfb02de        3 hours ago         1.24GB
...

Tag the image with docker tag <IMG_ID> <HUB_USER>/<REPO>:<VERSION>.

  • ex: docker tag a88adcfb02de cbmckni/gene-oracle:latest

Push with docker push <HUB_USER>/<REPO>.

  • ex: docker push cbmckni/gene-oracle

Source

Once an image is on DockerHub, it can be pulled. If you do not use the default image, you must change the line in deploy-gene-oracle.sh to pull your image:

apiVersion: v1
kind: Pod
metadata:
  name: gene-oracle
spec:
  containers:
  - name: gene-oracle-container-1
    image: <YOUR_IMAGE>
    imagePullPolicy: Always
    resources:
      limits:
        nvidia.com/gpu: 1
...

Running on Nautilus

This can be run on any kubernertes cluster, but these instructions are specifically for the Nautilus system.

Configuration

After installing kubectl on your local machine, it must be configured to the Nautilus system.

To do this, log in to Nautilus and click the "Get config" link at the top right taskbar. A file named "config" will be downloaded, which will need to be put in the ~/.kube/ directory on your local machine.

Test with ```kubectl config view''' Source

Pod Framework

The Kubernertes pod that will be deployed is outlined through a framework in .yaml format. This framework is automatically generated through the script deploy-gene-oracle.sh.

You can change the pod's framework to suit your needs by adding to the script.

The number of GPUs and the image that will be pulled from DockerHub can be specified, among other things.

Ex: To run the pod on a specific node/gpu(ex. a FIONA node), add the nodeSelector attribute to the yaml file.

  • Examples of this can be found here

After kubectl is installed and configured, you can run deploy-gene-oracle.sh to deploy your pod.

Deployment

Pre-deployment

The user will create a folder of input data, which is located in the parent directory of the gene-oracle-docker repo. So $(pwd)/.. must point to the folder where the input folder is located. Name the folder "data".

The user must place a simple shell script named "command.sh" inside the folder. This script will only be the gene-oracle command to be run. Ex:

#!/bin/sh

python scripts/classify.py \
 --dataset ./data/gtex_gct_data_float_v7.npy \
 --gene_list ./data/gtex_gene_list_v7.npy \
 --sample_json ./data/gtex_tissue_count_v7.json \
 --subset_list ./data/oncogenetic_sets.txt \
 --config ./models/net_config.json \
 --out_file ./data/oncogenetic_classify_kfold10.log

Usage

Once all input data is in the folder "data", run the script. Ex. ./deploy-gene-oracle.sh 3 $(pwd)/.. creates a 3 container pod with input data located in the parent directory.

If you are authenticated correctly and input data is correct, the script should run without error. The terminal will start printing output from each container as it runs gene-oracle.

Post-deployment

Check on your deployment using kubectl get pods

Once gene-oracle has finished, you can copy the log files back to your machine with kubectl cp ..., or run the script get-logs.sh.

Ex. ./get-logs.sh 3 to pull logs from a 3-container pod.

Deletion

Delete the deployment using kubectl delete pod gene-oracle

Do not leave an idle deployment running for too long!