-
Notifications
You must be signed in to change notification settings - Fork 204
Random Forest in R with Large Sample Sizes
Author: Jacob Nearing
First Created: 17 April 2019
Last Edited: 17 April 2019
- Introduction
- Background
- Load Packages and Read in Data
- Pre-processing
- Assessing Model Fit
- Identifying Important Features
This tutorial is aimed at individuals with a basic background in the R programming language that want to test out how well they can use microbiome sequencing data to either classify samples between different categories or predict a continuous variables outcome. In this tutorial we will go through how to set up your data to be used in training a RF model, as well as the basically principles that surround model training. By the end you will have learned how to create random forest models in R, assess how well they perform and identify the features of importance. Note that this tutorial is generally aimed at larger studies (greater than 100 samples). If you would like to see a similar tutorial on using random Forest with lower sample sizes please see this tutorial.
To Run through this tutorial you will need to have the following packages installed
- Tutorial [ASV Table](link here)
- R (v3.3.2)
- RStudio - recommended, but not necessary (v1.0.136)
- randomForest R package (v4.6-12)
- caret R package (v6.0-73)
- pROC R package
- doMC R package
- DMwR R package
If you would like to install and load all of the listed R packages run the following command within your R session:
deps = c("randomForest", "pROC", "caret", "DMwR", "doMC")
for (dep in deps){
if (dep %in% installed.packages()[,"Package"] == FALSE){
install.packages(as.character(dep), repos = "http://cran.us.r-project.org")
}
library(dep, character.only = TRUE)
}
- Please feel free to post a question on the Microbiome Helper google group if you have any issues.
- General comments or inquires about Microbiome Helper can be sent to [email protected].