-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating GLMNet and Lasso #51
Comments
@ShouvikGhosh2048 @codetalker7 We need to do a performance check against |
Julia Code using RDatasets
mtcars = dataset("datasets", "mtcars")
y = Vector(mtcars[:,"MPG"])
x = Matrix(mtcars[:,["HP", "WT", "DRat", "QSec"]])
using GLMNet
using Random
Random.seed!(1234)
cv = glmnetcv(x, y) |
R code attach(mtcars)
y <- mtcars$mpg
x <- data.matrix(mtcars[, c('hp', 'wt', 'drat', 'qsec')])
library(glmnet)
#perform 10-fold cross-validation to find optimal lambda value
set.seed(100)
cv_model <- cv.glmnet(x, y, alpha = 1,nfolds=10)
|
I tried benchmarking the two programs (on WSL): using RDatasets, GLMNet, Random, BenchmarkTools
mtcars = dataset("datasets", "mtcars")
y = Vector(mtcars[:,"MPG"])
x = Matrix(mtcars[:, ["HP", "WT", "DRat", "QSec"]])
@benchmark glmnetcv(x, y) attach(mtcars)
library(microbenchmark)
y <- mtcars$mpg
x <- data.matrix(mtcars[, c('hp', 'wt', 'drat', 'qsec')])
library(glmnet)
microbenchmark(cv.glmnet(x, y, alpha = 1, nfolds = 10)) I got: Julia: Memory estimate: 178.95 KiB, allocs estimate: 679. R: The Julia programs takes 1/100th of the time taken by the R program. This seems wrong, I'm not sure if I made a mistake. |
Let me check it on my end |
Julia performance julia> @benchmark glmnetcv(x, y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 293.541 μs … 2.972 ms ┊ GC (min … max): 0.00% … 88.98%
Time (median): 305.500 μs ┊ GC (median): 0.00%
Time (mean ± σ): 313.728 μs ± 135.655 μs ┊ GC (mean ± σ): 2.22% ± 4.59%
▃▅▇████▅▃▁
▂▁▂▂▂▂▂▂▂▂▂▃▃▄▅███████████▇▇▅▅▄▄▄▄▃▄▃▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▄
294 μs Histogram: frequency by time 327 μs <
Memory estimate: 175.67 KiB, allocs estimate: 649. R Performance > microbenchmark(cv.glmnet(x, y, alpha = 1, nfolds = 10)
+ ,times = 100)
Unit: milliseconds
expr min lq mean
cv.glmnet(x, y, alpha = 1, nfolds = 10) 16.8904 17.04585 19.07382
median uq max neval
17.23759 19.43857 88.26255 100 |
Old style benchmarking: R performance attach(mtcars)
library(microbenchmark)
y <- mtcars$mpg
x <- data.matrix(mtcars[, c('hp', 'wt', 'drat', 'qsec')])
library(glmnet)
microbenchmark(cv.glmnet(x, y, alpha = 1, nfolds = 10)
,times = 200)
start = Sys.time()
set.seed(10101)
for(i in 1:1000){
fit =cv.glmnet(x, y, alpha = 1, nfolds = 10)
}
Sys.time()-start
Time difference of 18.86479 secs Julia Performance using RDatasets, GLMNet, Random, BenchmarkTools
mtcars = dataset("datasets", "mtcars")
y = Vector(mtcars[:,"MPG"])
x = Matrix(mtcars[:, ["HP", "WT", "DRat", "QSec"]])
start = Sys.time()
using Random
for i = 1:1000
Random.seed!(1234)
cv = glmnetcv(x, y)
end
Sys.time()-start
0.35875892639160156 Performance gain
52 times performance gain |
using RDatasets, GLMNet, BenchmarkTools
mtcars = dataset("datasets", "mtcars")
y = Vector(mtcars[:,"MPG"])
x = Matrix(mtcars[:, ["HP", "WT", "DRat", "QSec"]])
@benchmark glmnet(x, y) attach(mtcars)
library(microbenchmark)
y <- mtcars$mpg
x <- data.matrix(mtcars[, c('hp', 'wt', 'drat', 'qsec')])
library(glmnet)
microbenchmark(glmnet(x, y)) Julia:
R:
Julia glmnet output:
R glmnet output:
|
Okay, this looks like only 4.5 times faster. This makes sense. In the |
As per our discussion and debate at the Goa conference - it appears it is better to have a native development of GLMNet.jl However, LASSO.jl is a native Julia package. So we will integrate LASSO.jl to CRRao |
@ShouvikGhosh2048 @mousum-github @ajaynshah @codetalker7 @ayushpatnaikgit Lookout this doc: https://juliastats.org/Lasso.jl/stable/lasso/ Lasso.jl depends completely on GLM.jl That means for all GLM.jl models Lasso.jl will work. Top of that Lasso.jl is native Julia development Hence we should integrate Lasso.jl as top priority |
@ajaynshah @ayushpatnaikgit @mousum-github @SusanXKDR
I am tempted to integrate GLMNet and Lasso
https://github.com/JuliaStats/GLMNet.jl
https://github.com/JuliaStats/Lasso.jl/blob/master/docs/src/index.md
https://github.com/simonster/LARS.jl
I understand this Julia package wraps the Fortran code from glmnet.
Shall we integrate it to CRRao?
The text was updated successfully, but these errors were encountered: