The run_analysis.R
script takes as input the data from the UCI Machine
Learning Repository's Human Activity Recognition Using Smartphones Data
Set
and produces as output a wide tidy data set with the average of the means and
standard deviations of each measurement, for each activity and each subject. The
tidy data set meets the principles of tidiness stated in [1] and
[2], and it has a code
book.
Regarding the "what columns are measurements on the mean and standard deviation"
issue, I have decided to include only features with mean()
or std()
because
"measurement" here appears to refer only to the smartphone sensor signals listed
in Table 2 of [3].
- Install the
dplyr
package if you have not already done so. - Download run_analysis.R and the HAR data set zipfile into R's working directory.
- Execute the command
source("run_analysis.R")
. The script will print messages showing the steps taken. - If all goes well, the tidy data set will be in
tidy_data_2.txt
in theUCI HAR Dataset
subdirectory of the working directory.
After you have run the run_analysis.R
script, you can read and view the tidy
data set using the following code:
data <- read.table("UCI HAR Dataset/tidy_data_2.txt", header = TRUE, check.names = FALSE)
View(data)
Note that despite check.names = FALSE
above, the columns are accessible from
R. You just need to surround the column names with backticks, e.g.
data$`avg-tBodyAcc-mean()-X`
. The code
book
contains the rationale for the column names.
[1] Hadley Wickham (2014). "Tidy Data". Journal of Statistical Software, vol. 59, no. 10.
[2] David Hood (2015). "Tidy Data and the Assignment". https://class.coursera.org/getdata-030/forum/thread?thread_id=107 Accessed on 24 July 2015.
[3] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz (2013). "A Public Domain Dataset for Human Activity Recognition Using Smartphones". 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium.