Fix docs and use torch as an exported object

OHDSI · May 16, 2024 · 153b611 · 153b611
1 parent 14bb13c
commit 153b611
Show file tree

Hide file tree

Showing 9 changed files with 71 additions and 35 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -27,20 +27,16 @@ Imports:
     withr,
     reticulate (>= 1.31)
 Suggests:
-    devtools,
     Eunomia,
     knitr,
     markdown,
-    plyr,  
     testthat,
     PRROC,
     ResultModelManager (>= 0.2.0),
     DatabaseConnector (>= 6.0.0),
     Andromeda
 Remotes:
     ohdsi/PatientLevelPrediction,
-    ohdsi/FeatureExtraction,
-    ohdsi/Eunomia,  
     ohdsi/ResultModelManager
 RoxygenNote: 7.3.1
 Encoding: UTF-8

diff --git a/NAMESPACE b/NAMESPACE
@@ -10,6 +10,7 @@ export(setFinetuner)
 export(setMultiLayerPerceptron)
 export(setResNet)
 export(setTransformer)
+export(torch)
 export(trainingCache)
 importFrom(dplyr,"%>%")
 importFrom(reticulate,py_to_r)

diff --git a/R/DeepPatientLevelPrediction.R b/R/DeepPatientLevelPrediction.R
@@ -27,6 +27,20 @@
 #' @importFrom rlang .data
 "_PACKAGE"
 
+# package level global state
+.globals <- new.env(parent = emptyenv())
+
+#' Pytorch module
+#'
+#' The `torch` module object is the equivalent of
+#' `reticulate::import("torch")` and provided mainly as a convenience.
+#'
+#' @returns the torch Python module
+#' @export
+#' @usage NULL
+#' @format An object of class `python.builtin.module`
+torch <- NULL
+
 .onLoad <- function(libname, pkgname) {
   # use superassignment to update global reference
   reticulate::configure_environment(pkgname)

diff --git a/R/TrainingCache-class.R b/R/TrainingCache-class.R
@@ -102,7 +102,8 @@ trainingCache <- R6::R6Class(
 
     #' @description
     #' Trims the performance of the hyperparameter results by removing 
-    #' the predictions from all but the best performing hyperparameter 
+    #' the predictions from all but the best performing hyperparameter
+    #' @param hyperparameterResults List of hyperparameter results 
     trimPerformance = function(hyperparameterResults) {
       indexOfMax <-
         which.max(unlist(

diff --git a/man/torch.Rd b/man/torch.Rd
diff --git a/man/trainingCache.Rd b/man/trainingCache.Rd
diff --git a/vignettes/BuildingDeepModels.Rmd b/vignettes/BuildingDeepModels.Rmd
@@ -29,7 +29,7 @@ editor_options:
 
 ```{=html}
 <!--
-%\VignetteEngine{knitr}
+%\VignetteEngine{knitr::rmarkdown}
 %\VignetteIndexEntry{Building Deep Learning Models}
 -->
 ```
@@ -83,9 +83,7 @@ current implementation allows us to perform research at scale on the
 value and limitations of Deep Learning using observational healthcare
 data.
 
-In the package we have used
-[torch](https://cran.r-project.org/web/packages/torch/index.html) but we
-invite the community to add other backends.
+In the package we use `pytorch` through the `reticulate` package.
 
 Many network architectures have recently been proposed and we have
 implemented a number of them, however, this list will grow in the near
@@ -98,7 +96,7 @@ examples below.
 
 Note that training Deep Learning models is computationally intensive,
 our implementation therefore supports both GPU and CPU. A GPU
-is highly recommended for Deep Learning!
+is highly recommended and neccesary for most models for Deep Learning!
 
 ## Requirements
 
@@ -110,7 +108,7 @@ installing the package can be found
 
 The `DeepPatientLevelPrediction` package provides additional model
 settings that can be used within the `PatientLevelPrediction` package
-`runPlp()` function. To use both packages you first need to pick the
+`runPlp()` and `runMultiplePlp()` functions. To use both packages you first need to pick the
 deep learning architecture you wish to fit (see below) and then you
 specify this as the modelSettings inside `runPlp()`.
 
@@ -120,7 +118,7 @@ specify this as the modelSettings inside `runPlp()`.
 plpData <- PatientLevelPrediction::loadPlpData('locationOfData')
 
 # pick the set<Model> from  DeepPatientLevelPrediction
-deepLearningModel <- DeepPatientLevelPrediction::setResNet()
+deepLearningModel <- DeepPatientLevelPrediction::setDefaultResNet()
 
 # use PatientLevelPrediction to fit model
 deepLearningResult <- PatientLevelPrediction::runPlp(
@@ -147,7 +145,7 @@ input layer, one or more hidden layers and an output layer. The model
 takes in the input feature values and feeds these forward through the
 graph to determine the output class. A process known as
 'backpropagation' is used to train the model. Backpropagation requires
-labelled data and involves automatically calculating the derivative of
+some ground truth and involves automatically calculating the derivative of
 the model parameters with respect to the the error between the model's
 predictions and ground truth. Then the model learns how to adjust the
 model's parameters to reduce the error.
@@ -170,8 +168,8 @@ value of `0.2` means that 20% of the layers inputs will be set to
 0. This is used to reduce overfitting.
 
 The `sizeEmbedding` input specifies the size of the embedding used. The first
-layer is an embedding layer which converts each sparse feature to a dense vector
-which it learns. An embedding is a lower dimensional projection of the features
+layer is an embedding layer which converts each sparse feature to a dense learned 
+vector. An embedding is a lower dimensional projection of the features
 where distance between points is a measure of similarity.
 
 The `weightDecay` input corresponds to the weight decay in the objective
@@ -269,11 +267,11 @@ mlpResult <- PatientLevelPrediction::runPlp(
 ### Overall concept
 
 Deep learning models are often trained via a process known as gradient
-descent during backpropogation. During this process the network weights
+descent. During this process the network weights
 are updated based on the gradient of the error function for the current
 weights. However, as the number of layers in the network increase, there
-is a greater chance of experiencing an issue known as the vanishing or
-exploding gradient during this process. The vanishing or exploding
+is a greater chance of experiencing an issue known vanishing or
+exploding gradients. The vanishing or exploding
 gradient is when the gradient goes to 0 or infinity, which negatively
 impacts the model fitting.
 
@@ -284,7 +282,7 @@ non-adjacent layers, termed a 'skip connection'.
 The ResNet calculates embeddings for every feature and then averages
 them to compute an embedding per patient.
 
-This implementation of a ResNet for tabular data is based on [this
+Our implementation of a ResNet for tabular data is based on [this
 paper](https://arxiv.org/abs/2106.11959).
 
 ### Example
@@ -302,7 +300,7 @@ function to specify the hyperparameter settings for the network.
 
 `sizeHidden`: How many neurons in each hidden layer
 
-`hiddenFactor`: How much to increase number of neurons in each layer
+`hiddenFactor`: How much to increase number of neurons in each layer (see paper)
 
 `residualDropout` and`hiddenDropout` : How much dropout to apply in
 hidden layer or residual connection
@@ -338,11 +336,11 @@ random samples to use
 
 For example, the following code will fit a two layer ResNet where each
 layer has 32 neurons which increases by a factor of two before
-decreasing againg (hiddenFactor). 10% of inputs to each layer and
-residual connection within the layer are randomly zeroed. The embedding
-layer has 32 neurons. Learning rate of 3e-4 with weight decay of 1e-6 is
-used for the optimizer. No hyperparameter search is done since each
-input only includes one option.
+decreasing again (hiddenFactor). 10% of inputs to each layer and
+residual connection within the layer are randomly zeroed during training but 
+not testing.The embedding layer has 32 neurons. Learning rate of 3e-4 with 
+weight decay of 1e-6 is used for the optimizer. No hyperparameter search is done
+since each input only includes one option.
 
 ```{r, eval=FALSE}
 
@@ -422,14 +420,15 @@ ResNet.
 `numBlocks` : How many Transformer blocks to use, each block includes a
 self-attention layer and a feedforward block with two linear layers.
 
-`dimToken` : Dimension of the embedding for each feature's embedding
+`dimToken` : Dimension of the embedding for each feature.
 
 `dimOut` : Dimension of output, for binary problems this is 1.
 
-`numHeads` : Number of attention heads for the self-attention
+`numHeads` : Number of attention heads for the self-attention, `dimToken` needs
+to be divisible by `numHeads`.
 
-`attDropout` , `ffnDropout` and `resDropout` : How much dropout to apply
-on attentions, in feedforward block or in residual connections
+`attDropout`, `ffnDropout` and `resDropout` : How much dropout to apply
+on attentions, feedforward block or in residual connections
 
 `dimHidden` : How many neurons in linear layers inside the feedforward
 block

diff --git a/vignettes/FirstModel.Rmd b/vignettes/FirstModel.Rmd
@@ -26,7 +26,7 @@ output:
 
 ```{=html}
 <!--
-%\VignetteEngine{knitr}
+%\VignetteEngine{knitr::rmarkdown}
 %\VignetteIndexEntry{Developing your first DeepPLP model}
 -->
 ```
@@ -73,7 +73,7 @@ covariateSettings <- FeatureExtraction::createCovariateSettings(
 )
 ```
 
-This means we are extracting gender as a binary variable, age as a continuous variable and conditions occurring in the long term window, which is by default 365 days prior.
+This means we are extracting gender as a binary variable, age as a continuous variable and conditions occurring in the long term window, which is by default 365 days prior to index. If you want to know more about these terms we recommend checking out the [book of OHDSI](https://ohdsi.github.io/TheBookOfOhdsi/).
 
 Next we need to define our database details, which defines from which database we are getting which cohorts. Since we don't have a database we are using Eunomia.
 
@@ -126,11 +126,11 @@ modelSettings <- setDefaultResNet(
 
 ```
 
-We still need to define a few parameters. Device defines on which device to train the model. Usually deep learning models are slow to train so they need a GPU. However this example is small enough that we can use a CPU If you have access to a GPU you can try changing the device to `'cuda'` and see how much faster it goes.
+We still need to define a few parameters. Device defines on which device to train the model. Usually deep learning models are slow to train so they need a GPU. However this example is small enough that we can use a CPU. If you have access to a GPU you can try changing the device to `'cuda'` and see how much faster it goes.
 
 We also need to define our batch size. Usually in deep learning the model sees only a small chunk of the data at a time, in this case 256 patients. After that the model is updated before seeing the next batch. The batch order is random. This is called stochastic gradient descent.
 
-Finally we define our epochs. This is how long we will train the model. One epoch means the model has seen all the data once.
+Finally we define our epochs. This is how long we will train the model. One epoch means the model has seen all the data once. In this case we will train the model for 3 epochs.
 
 Now all that is left is using the PLP to train our first deep learning model. If you have used the PLP this should look familiar to you.
 
@@ -143,6 +143,7 @@ plpResults <- PatientLevelPrediction::runPlp(plpData = plpData,
                populationSettings = populationSettings
                                                       )
 ```
+
 On my computer this takes about 20 seconds per epoch. While you probably won't see any kind of good performance using this model and this data, at least the training loss should be decreasing in the printed output.
 
 Congratulations you have just developed your first deep learning model!

diff --git a/vignettes/Installing.Rmd b/vignettes/Installing.Rmd
@@ -26,7 +26,7 @@ output:
 
 ```{=html}
 <!--
-%\VignetteEngine{knitr}
+%\VignetteEngine{knitr::rmarkdown}
 %\VignetteIndexEntry{Installing DeepPLP}
 -->
 ```
@@ -106,7 +106,7 @@ This should install the required python packages. If that doesn't happen it can
 
 ```
 library(DeepPatientLevelPrediction)
-torch$trandn(10L)
+torch$randn(10L)
 ```
 
 This should print out a tensor with ten different values.