forked from kenhanscombe/ukbtools
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
86 lines (40 loc) · 3.19 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# ukbtools <img src="man/figures/logo.png" align="right" alt="" width="120" />
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/ukbtools)](https://cran.r-project.org/package=ukbtools)
[![Travis-CI Build Status](https://travis-ci.org/kenhanscombe/ukbtools.svg?branch=master)](https://travis-ci.org/kenhanscombe/ukbtools)
After downloading and decrypting your UK Biobank (UKB) data with the supplied [UKB programs] (http://biobank.ctsu.ox.ac.uk/crystal/docs/UsingUKBData.pdf), you have multiple files that need to be brought together to give you a dataset to explore. The data file has column names that are edited field-codes from the [UKB data showcase](http://www.ukbiobank.ac.uk/data-showcase/). ukbtools makes it easy to collapse the multiple UKB files into a single dataset for analysis, in the process giving meaningful names to the variables. The package also includes functionality to retrieve ICD diagnoses, explore a sample subset in the context of the UKB sample, and collect genetic metadata.
## Installation
```{r, eval = FALSE}
# Install from CRAN
install.packages("ukbtools")
# Install latest development version
devtools::install_github("kenhanscombe/ukbtools", dependencies = TRUE)
```
## Prerequisite: Make a UKB fileset
Download<sup>§</sup> then decrypt your data and create a "UKB fileset" (.tab, .r, .html):
```{bash, eval = FALSE}
ukb_unpack ukbxxxx.enc key
ukb_conv ukbxxxx.enc_ukb r
ukb_conv ukbxxxx.enc_ukb docs
```
`ukb_unpack` decrypts your downloaded `ukbxxxx.enc` file, outputting a `ukbxxxx.enc_ukb` file. `ukb_conv` with the `r` flag converts the decrypted data to a tab-delimited file `ukbxxxx.tab` and an R script `ukbxxxx.r` that reads the tab file. The `docs` flag creates an html file containing a field-code-to-description table (among others).
<sup>§</sup> Full details of the data download and decrypt process are given in the [Using UK Biobank Data](http://biobank.ctsu.ox.ac.uk/crystal/docs/UsingUKBData.pdf) documentation.
## Make a UKB dataset
The function `ukb_df()` takes two arguments, the stem of your fileset and the path, and returns a dataframe with usable column names. This will take a few minutes. The rate-limiting step is reading and parsing the code in the UKB-generated .r file - not `ukb_df` per se.
```{r, eval = FALSE}
library(ukbtools)
my_ukb_data <- ukb_df("ukbxxxx")
```
You can also specify the path to your fileset if it is not in the current directory. For example, if your fileset is in a subdirectory of the working directory called data
```{r, eval = FALSE}
my_ukb_data <- ukb_df("ukbxxxx", path = "/full/path/to/my/data")
```
__Note:__ You can move the three files in your fileset after creating them with `ukb_conv`, but they should be kept together. `ukb_df()` automatically updates the read call in the R source file to point to the correct directory (the current directory by default, or a directory specified by `path`).
## Other tools
All tools are described on the [ukbtools webpage](https://kenhanscombe.github.io/ukbtools/) and in the package vignette "Explore UK Biobank Data"
```{r, eval = FALSE}
vignette("explore-ukb-data", package = "ukbtools")
```
For a list of all functions
```{r, eval = FALSE}
help(package = "ukbtools")
```