Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.0.1 #94

Merged
merged 99 commits into from
Nov 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
64fce91
speed up data conversion about 3-5x
egillax Nov 14, 2022
30e67fd
updated news
egillax Nov 14, 2022
131c434
make sure tensorList matches dataset length
egillax Nov 14, 2022
38504b3
user integers for tensorList
egillax Nov 14, 2022
5d89faa
test a different way of getting torch binaries
egillax Dec 15, 2022
2f8d6d6
test a different way of getting torch binaries
egillax Dec 15, 2022
e79a43f
fix tidyselect warnings
egillax Dec 15, 2022
6c69427
Merge branch 'main' into 43-hotfix-lantern-binaries
egillax Dec 15, 2022
887fac9
Update description and news
egillax Dec 15, 2022
72f813e
remove torch install env variable from actions
egillax Dec 15, 2022
d66c47e
merge hotfix to develop
egillax Dec 16, 2022
ad0ad19
fixed dataset
egillax Jan 20, 2023
03ff99b
fix numericalIndex and address pull warning
egillax Jan 20, 2023
517f675
Allow dimToken and numHeads to take the form of vectors
lhjohn Jan 26, 2023
b87b769
Merge pull request #47 from OHDSI/transformer-bug
lhjohn Jan 26, 2023
dfca29a
Modeltype fix (#48)
egillax Jan 27, 2023
65302bd
update default ResNet and Transformer to have custom LR and WD
egillax Feb 13, 2023
353e0ba
Add seed for sampling hyperparameter combinations (#50)
lhjohn Feb 15, 2023
926e7d0
Lr find (#51)
egillax Mar 1, 2023
e580c2a
Derive dimension of feedforward block from embedding dimension (#53)
lhjohn Mar 5, 2023
fba85ec
Divisible check for Transformer not comprehensive (#55)
lhjohn Mar 5, 2023
e50d2e4
Update NEWS.md
egillax Mar 6, 2023
b66de7f
Merge branch 'main' into develop
egillax Mar 22, 2023
051ea88
update website and docs
egillax Mar 22, 2023
793338a
remove docs folder from code branches
egillax Mar 22, 2023
d0cb45b
render dev website
egillax Mar 22, 2023
609df5b
fix action
egillax Mar 22, 2023
3330c28
fix action
egillax Mar 22, 2023
dcb9423
fix badge in readme
egillax Mar 22, 2023
ce5e235
prepare version for release
egillax Mar 22, 2023
2d07b6b
Update DESCRIPTION
egillax Mar 23, 2023
05fecc2
modelType as attribute and tests to cover database upload
egillax Mar 24, 2023
32a7e23
modelType as attribute and tests to cover database upload
egillax Mar 24, 2023
0a0ac73
Merge branch '59-hotfix-modelType' of https://github.com/OHDSI/DeepPa…
egillax Mar 24, 2023
85ce433
Merge branch '59-hotfix-modelType' of https://github.com/OHDSI/DeepPa…
egillax Mar 24, 2023
6d433a3
Merge branch '59-hotfix-modelType' of https://github.com/OHDSI/DeepPa…
egillax Mar 24, 2023
c5a984f
fix dependanceis
egillax Mar 24, 2023
7541a38
prepare version and news for release
egillax Mar 24, 2023
3b02ce6
merged with hotfix branch
egillax Mar 24, 2023
4cd2e30
modelType attribute back to modelSettings functions
egillax Mar 24, 2023
a390d4e
Merge branch 'develop' of https://github.com/OHDSI/DeepPatientLevelPr…
egillax Mar 24, 2023
db13265
Merge branch 'main' into develop
egillax Mar 24, 2023
9c19173
Update DESCRIPTION
egillax Mar 27, 2023
808ead8
debug actions
egillax Apr 16, 2023
8e1de7f
Update R_CDM_check_hades.yaml
egillax Apr 17, 2023
d18d579
torch install environment variable
egillax Apr 17, 2023
55bfca2
Merge branch 'debug-actions' of https://github.com/OHDSI/DeepPatientL…
egillax Apr 17, 2023
510f4f1
update version and news
egillax Apr 17, 2023
5e09d1d
merged with debug-actions
egillax Apr 17, 2023
a1cb2e7
add device as expression with tests (#66)
egillax Apr 18, 2023
01ce148
Merge branch 'main' into develop
egillax Apr 18, 2023
2113a48
remove torchopt
egillax Apr 18, 2023
575d2e5
update news and version
egillax Apr 18, 2023
20d4a0d
fix docs
egillax Apr 18, 2023
f97b37f
update version number
egillax Apr 19, 2023
783f417
LRFinder works with device fun (#68)
egillax Apr 24, 2023
0a6f9de
update version and news
egillax Apr 24, 2023
58df70c
Merge branch 'main' into develop
egillax Apr 24, 2023
2883b5e
update version
egillax Apr 25, 2023
8a01ed7
fix bug when test subject has no features
egillax Jun 18, 2023
e555710
Add parameter caching for training persistence and continuity (#63)
lhjohn Jun 18, 2023
1e640ee
fix incs issue
egillax Jun 18, 2023
6326cd6
Merge branch 'develop' of https://github.com/OHDSI/DeepPatientLevelPr…
egillax Jun 18, 2023
5d9dc59
Release version and news updated
egillax Jun 18, 2023
af02541
Merge branch 'main' into develop
egillax Jun 18, 2023
506b940
Release and NEWS
egillax Jun 18, 2023
74608ff
Resolve an issue with hidden dimension ratio (#74)
lhjohn Jun 22, 2023
ba60c28
Cache single hyperparameter combination (#78)
lhjohn Jul 20, 2023
bd9b357
Change backend to pytorch (#80)
egillax Aug 28, 2023
04c2b36
Merge branch 'main' into develop
egillax Aug 28, 2023
e501124
fix dataset
egillax Aug 28, 2023
31ef832
update PLP version in DESCRIPTION
egillax Sep 7, 2023
66b8c84
integer handling in python and input checks (#83)
egillax Sep 7, 2023
85be689
Ensure that param search is completed in empty cache test (#84)
lhjohn Sep 7, 2023
4c83897
use ubuntu 22.04 in CI (#85)
egillax Sep 7, 2023
904e926
Update NEWS.md
egillax Sep 8, 2023
ff9e22e
Update DESCRIPTION
egillax Sep 13, 2023
216c7af
Only cache best prediction
lhjohn Oct 6, 2023
2220ff2
Clean up
lhjohn Oct 6, 2023
9f8d23a
Add logger message when caching
lhjohn Oct 6, 2023
f93a53f
Add test to ensure prediction is cached for optimal parameters
lhjohn Oct 6, 2023
53af472
Resolve an issue with case sensitivity on Ubuntu
lhjohn Oct 6, 2023
1f50fa5
Merge pull request #90 from OHDSI/88-reduce-cache-size
lhjohn Oct 6, 2023
84bbb18
Fix lr schedule (#91)
egillax Oct 10, 2023
2d8f9af
Fix numerical embeddings + add tests (#92)
egillax Oct 12, 2023
56279c8
Fix transformer (#93)
egillax Oct 12, 2023
e4f158f
Merge branch 'main' into develop
egillax Oct 12, 2023
d056998
fix sneaky typo
egillax Oct 12, 2023
def727f
fix quosures
egillax Oct 13, 2023
bdf8bba
optimize tests
egillax Oct 13, 2023
ced60d1
fix hardcoded learning rate and add seed for sampling batch
egillax Oct 17, 2023
357b14e
remove incorrect import
egillax Oct 17, 2023
74c346f
Type cast batch size to int (#96)
lhjohn Oct 18, 2023
69b8bec
Add fix to the full cache issue (#99)
egillax Oct 21, 2023
9bbe364
improve docs (#100)
egillax Oct 22, 2023
dac66a0
dont track example
egillax Oct 26, 2023
20e855b
fix numerical features order issue
egillax Oct 30, 2023
7b31c73
updated news
egillax Oct 31, 2023
61bd834
update NEWS
egillax Oct 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ config.yml
docs
.idea/
renv.lock
extras/
extras/
.Renviron
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: DeepPatientLevelPrediction
Type: Package
Title: Deep Learning For Patient Level Prediction Using Data In The OMOP Common Data Model
Version: 2.0.0
Version: 2.0.1
Date: 18-04-2023
Authors@R: c(
person("Egill", "Fridgeirsson", email = "[email protected]", role = c("aut", "cre")),
Expand Down
12 changes: 12 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
DeepPatientLevelPrediction 2.0.1
======================
- Connection parameter fixed to be in line with newest polars
- Fixed a bug where LRFinder used a hardcoded batch size
- Seed is now used in LRFinder so it's reproducible
- Fixed a bug in NumericalEmbedding
- Fixed a bug for Transformer and numerical features
- Fixed a bug when resuming from a full TrainingCache (thanks Zoey Jiang and Linying Zhang )
- Updated installation documentation after feedback from HADES hackathon
- Fixed a bug where order of numeric features wasn't conserved between training and test set
- TrainingCache now only saves prediction dataframe for the best performing model

DeepPatientLevelPrediction 2.0.0
======================
- New backend which uses pytorch through reticulate instead of torch in R
Expand Down
40 changes: 27 additions & 13 deletions R/Estimator.R
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,8 @@ gridCvDeep <- function(mappedData,

fitParams <- names(paramSearch[[1]])[grepl("^estimator", names(paramSearch[[1]]))]
findLR <- modelSettings$estimatorSettings$findLR
for (gridId in trainCache$getLastGridSearchIndex():length(paramSearch)) {
if (!trainCache$isFull()) {
for (gridId in trainCache$getLastGridSearchIndex():length(paramSearch)) {
ParallelLogger::logInfo(paste0("Running hyperparameter combination no ", gridId))
ParallelLogger::logInfo(paste0("HyperParameters: "))
ParallelLogger::logInfo(paste(names(paramSearch[[gridId]]), paramSearch[[gridId]], collapse = " | "))
Expand Down Expand Up @@ -363,25 +364,38 @@ gridCvDeep <- function(mappedData,
)
}
maxIndex <- which.max(unlist(sapply(learnRates, `[`, 2)))
paramSearch[[gridId]]$learnSchedule <- learnRates[[maxIndex]]

gridSearchPredictons[[gridId]] <- list(
prediction = prediction,
param = paramSearch[[gridId]]
param = paramSearch[[gridId]],
gridPerformance = PatientLevelPrediction::computeGridPerformance(prediction, paramSearch[[gridId]])
)
gridSearchPredictons[[gridId]]$gridPerformance$hyperSummary$learnRates <- rep(list(unlist(learnRates[[maxIndex]]$LRs)),
nrow(gridSearchPredictons[[gridId]]$gridPerformance$hyperSummary))
gridSearchPredictons[[gridId]]$param$learnSchedule <- learnRates[[maxIndex]]


# remove all predictions that are not the max performance
indexOfMax <- which.max(unlist(lapply(gridSearchPredictons, function(x) x$gridPerformance$cvPerformance)))
for (i in seq_along(gridSearchPredictons)) {
if (!is.null(gridSearchPredictons[[i]])) {
if (i != indexOfMax) {
gridSearchPredictons[[i]]$prediction <- list(NULL)
}
}
}
ParallelLogger::logInfo(paste0("Caching all grid search results and prediction for best combination ", indexOfMax))
trainCache$saveGridSearchPredictions(gridSearchPredictons)
}
}
paramGridSearch <- lapply(gridSearchPredictons, function(x) x$gridPerformance)
# get best params
indexOfMax <- which.max(unlist(lapply(gridSearchPredictons, function(x) x$gridPerformance$cvPerformance)))
finalParam <- gridSearchPredictons[[indexOfMax]]$param

paramGridSearch <- lapply(gridSearchPredictons, function(x) x$gridPerformance)

# get best para (this could be modified to enable any metric instead of AUC, just need metric input in function)
paramGridSearch <- lapply(gridSearchPredictons, function(x) {
do.call(PatientLevelPrediction::computeGridPerformance, x)
}) # cvAUCmean, cvAUC, param

optimalParamInd <- which.max(unlist(lapply(paramGridSearch, function(x) x$cvPerformance)))
finalParam <- paramGridSearch[[optimalParamInd]]$param

cvPrediction <- gridSearchPredictons[[optimalParamInd]]$prediction
# get best CV prediction
cvPrediction <- gridSearchPredictons[[indexOfMax]]$prediction
cvPrediction$evaluationType <- "CV"

ParallelLogger::logInfo("Training final model using optimal parameters")
Expand Down
7 changes: 7 additions & 0 deletions R/TrainingCache-class.R
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,13 @@ TrainingCache <- R6::R6Class(
return(private$.paramPersistence$gridSearchPredictions)
},

#' @description
#' Check if cache is full
#' @returns Boolen
isFull = function() {
return(all(unlist(lapply(private$.paramPersistence$gridSearchPredictions, function(x) !is.null(x$gridPerformance)))))
},

#' @description
#' Gets the last index from the cached grid search
#' @returns Last grid search index
Expand Down
73 changes: 0 additions & 73 deletions extras/example.R

This file was deleted.

4 changes: 2 additions & 2 deletions inst/python/Dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def __init__(self,
if pathlib.Path(data).suffix == '.sqlite':
data = urllib.parse.quote(data)
data = pl.read_database("SELECT * from covariates",
connection_uri=f"sqlite://{data}").lazy()
connection=f"sqlite://{data}").lazy()
else:
data = pl.scan_ipc(pathlib.Path(data).joinpath('covariates/*.arrow'))
observations = data.select(pl.col('rowId').max()).collect()[0, 0]
Expand Down Expand Up @@ -67,7 +67,7 @@ def __init__(self,
if pl.count(self.numerical_features) == 0:
self.num = None
else:
numerical_data = data.filter(pl.col('columnId').is_in(self.numerical_features)). \
numerical_data = data.filter(pl.col('columnId').is_in(self.numerical_features)).sort(by='columnId'). \
with_row_count('newColumnId').with_columns(pl.col('newColumnId').first().over('columnId').
rank(method="dense") - 1, pl.col('rowId') - 1) \
.select(pl.col('rowId'), pl.col('newColumnId').alias('columnId'), pl.col('covariateValue')).collect()
Expand Down
7 changes: 4 additions & 3 deletions inst/python/LrFinder.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ def __init__(self,
smooth = lr_settings.get("smooth", 0.05)
divergence_threshold = lr_settings.get("divergence_threshold", 4)
torch.manual_seed(seed=estimator_settings["seed"])
self.seed = estimator_settings["seed"]
self.model = model(**model_parameters)
if callable(estimator_settings["device"]):
self.device = estimator_settings["device"]()
Expand All @@ -55,18 +56,18 @@ def __init__(self,
self.scheduler = ExponentialSchedulerPerBatch(self.optimizer, self.max_lr, self.num_lr)

self.criterion = estimator_settings["criterion"]()
self.batch_size = estimator_settings['batch_size']
self.batch_size = int(estimator_settings['batch_size'])
self.losses = None
self.loss_index = None

def get_lr(self, dataset):
batch_index = torch.arange(0, len(dataset), 1).tolist()

random.seed(self.seed)
losses = torch.empty(size=(self.num_lr,), dtype=torch.float)
lrs = torch.empty(size=(self.num_lr,), dtype=torch.float)
for i in tqdm(range(self.num_lr)):
self.optimizer.zero_grad()
random_batch = random.sample(batch_index, 32)
random_batch = random.sample(batch_index, self.batch_size)
batch = dataset[random_batch]
batch = batch_to_device(batch, self.device)

Expand Down
4 changes: 2 additions & 2 deletions inst/python/ResNet.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,9 +130,9 @@ def __init__(self,
nn.init.kaiming_uniform_(parameter, a=math.sqrt(5))

def forward(self, input):
x = self.weight.unsqueeze(0) * input.unsqueeze(-1)
x = self.weight[None] * input[..., None]
if self.bias is not None:
x = x + self.bias.unsqueeze(-1)
x = x + self.bias[None]
return x


Expand Down
5 changes: 4 additions & 1 deletion inst/python/Transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ def __init__(self,

if num_features != 0 and num_features is not None:
self.numerical_embedding = NumericalEmbedding(num_features, dim_token)
self.use_numerical = True
else:
self.use_numerical = False
self.class_token = ClassToken(dim_token)

self.layers = nn.ModuleList([])
Expand Down Expand Up @@ -78,7 +81,7 @@ def __init__(self,
def forward(self, x):
mask = torch.where(x["cat"] == 0, True, False)
cat = self.categorical_embedding(x["cat"])
if "num" in x.keys() and self.numerical_embedding is not None:
if self.use_numerical:
num = self.numerical_embedding(x["num"])
x = torch.cat([cat, num], dim=1)
mask = torch.cat([mask, torch.zeros([x.shape[0],
Expand Down
13 changes: 13 additions & 0 deletions man/TrainingCache.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 18 additions & 0 deletions tests/testthat/setup.R
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,21 @@ dataset <- Dataset$Data(
)
small_dataset <- torch$utils$data$Subset(dataset, (1:round(length(dataset)/3)))

modelSettings <- setResNet(
numLayers = 1, sizeHidden = 16, hiddenFactor = 1,
residualDropout = c(0, 0.2), hiddenDropout = 0,
sizeEmbedding = 16, hyperParamSearch = "random",
randomSample = 2,
setEstimator(epochs=1,
learningRate = 3e-4)
)
fitEstimatorPath <- file.path(testLoc, 'fitEstimator')
if (!dir.exists(fitEstimatorPath)) {
dir.create(fitEstimatorPath)
}
fitEstimatorResults <- fitEstimator(trainData$Train,
modelSettings = modelSettings,
analysisId = 1,
analysisPath = fitEstimatorPath)


25 changes: 6 additions & 19 deletions tests/testthat/test-Estimator.R
Original file line number Diff line number Diff line change
Expand Up @@ -146,25 +146,12 @@ test_that("early stopping works", {
testthat::expect_true(earlyStop$early_stop)
})

modelSettings <- setResNet(
numLayers = 1, sizeHidden = 16, hiddenFactor = 1,
residualDropout = 0, hiddenDropout = 0,
sizeEmbedding = 16, hyperParamSearch = "random",
randomSample = 1,
setEstimator(epochs=1,
learningRate = 3e-4)
)

sink(nullfile())
results <- fitEstimator(trainData$Train, modelSettings = modelSettings, analysisId = 1, analysisPath = testLoc)
sink()

test_that("Estimator fit function works", {
expect_true(!is.null(results$trainDetails$trainingTime))
expect_true(!is.null(fitEstimatorResults$trainDetails$trainingTime))

expect_equal(class(results), "plpModel")
expect_equal(attr(results, "modelType"), "binary")
expect_equal(attr(results, "saveType"), "file")
expect_equal(class(fitEstimatorResults), "plpModel")
expect_equal(attr(fitEstimatorResults, "modelType"), "binary")
expect_equal(attr(fitEstimatorResults, "saveType"), "file")
fakeTrainData <- trainData
fakeTrainData$train$covariateData <- list(fakeCovData <- c("Fake"))
expect_error(fitEstimator(fakeTrainData$train, modelSettings, analysisId = 1, analysisPath = testLoc))
Expand All @@ -184,7 +171,7 @@ test_that("predictDeepEstimator works", {
# input is a plpModel and data
sink(nullfile())
predictions <- predictDeepEstimator(
plpModel = results, data = trainData$Test,
plpModel = fitEstimatorResults, data = trainData$Test,
trainData$Test$labels
)
sink()
Expand Down Expand Up @@ -369,4 +356,4 @@ test_that("estimatorSettings can be saved and loaded with correct python objects
testthat::expect_false(reticulate::py_is_null_xptr(optimizer))
testthat::expect_false(reticulate::py_is_null_xptr(scheduler$fun))
testthat::expect_false(reticulate::py_is_null_xptr(criterion))
})
})
Loading
Loading