-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathWaveformInCSV.Rmd
executable file
·510 lines (386 loc) · 14.9 KB
/
WaveformInCSV.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
---
title: "WaveformInCSV"
author: "Rob Donald"
date: "`r format(Sys.time(), '%A %d %B %Y, %H:%M')`"
output:
html_document:
toc: true
toc_depth: 2
toc_float: true
---
# Introduction
This analysis will look at extracting waveform data that is encoded in the
columns of a .csv file.
Each row of the .csv file will contain the complete 11 point waveform along
with meta data that describes the conditions at the time the waveform was captured.
This is an example of a complex data structure contained within a seemingly
simple .csv file. This looks a lot trickier than it turns out to be :D.
If you want to get to the answer quickly, use the menu to select
+ Construct data for ggplot()
+ Using dplyr::gather() in two lines
```{r setup, include=FALSE}
#knitr::opts_chunk$set(echo = TRUE,fig.height=8,fig.width=12)
knitr::opts_chunk$set(echo = TRUE)
#options(width=1500)
```
## Libraries
```{r library_setup}
suppressMessages({suppressWarnings({
library(dplyr)
library(tidyr)
library(readr)
library(ggplot2)
library(gridExtra)
library(data.table)
library(RobsRUtils)
library(futile.logger)
})})
```
# Generate Data
```{r}
all.traces <- NULL
all.decay.profiles <- c(5,10,15,20)
for(decay.profile in all.decay.profiles)
{
time.ms <- seq(0,100,by=10)
response <- exp(-(time.ms/decay.profile))
decay.setting <- rep(decay.profile,length(time.ms))
waveform.df <- data_frame(time.ms,response,decay.setting)
if(is.null(all.traces))
{
all.traces <- waveform.df
}
else
{
all.traces <- bind_rows(all.traces,waveform.df)
}
}
```
We now add in some batch and experiment day information. Also add a column to control
the alpha setting. The alpha setting is a reflection of the experiment day.
```{r}
all.traces$batch <- ifelse(all.traces$decay.setting > 10,1234,5678)
all.traces$exp.day <- ifelse(all.traces$decay.setting %% 10 == 0,'Day 1','Day 2')
all.traces$alpha.setting <- ifelse(all.traces$decay.setting %% 10 == 0,1,0.3)
all.traces$line.type <- ifelse(all.traces$decay.setting %% 10 == 0,'solid','dotdash')
all.traces$line.size <- ifelse(all.traces$decay.setting %% 10 == 0,2,0.5)
```
Let's do an initial plot.
```{r}
p <- ggplot(data=all.traces,aes(x=time.ms,y=response
,colour = as.factor(decay.setting)
,alpha=alpha.setting
,linetype=line.type))
p <- p + geom_point(size=3)
p <- p + geom_line(aes(size=line.size))
p <- p + labs(title='Experiment Response'
,x='Time (ms)', y='Response proportion'
,colour='Decay Setting')
p <- p + guides(alpha=FALSE,linetype=FALSE,size=FALSE)
p <- p + scale_alpha_continuous(range = c(0.2, 1))
p <- p + scale_linetype_identity()
p <- p + scale_size_identity()
print(p)
```
We now do a plot where we 'facet' (i.e. draw a separate panel) based on batch.
```{r}
p <- ggplot(data=all.traces,aes(x=time.ms,y=response,colour = as.factor(decay.setting)))
p <- p + geom_point()
p <- p + geom_line()
p <- p + labs(title='Experiment Response [Panel: batch]'
,x='Time (ms)', y='Response proportion'
,colour='Decay Setting')
p <- p + facet_grid(. ~ batch)
print(p)
```
We can do more sub divisions by faceting on both batch and experiment day.
```{r}
p <- ggplot(data=all.traces,aes(x=time.ms,y=response,colour = as.factor(decay.setting)))
p <- p + geom_point()
p <- p + geom_line()
p <- p + labs(title='Experiment Response [Panel: batch, experiment day]'
,x='Time (ms)', y='Response proportion'
,colour='Decay Setting')
p <- p + facet_grid(exp.day ~ batch)
print(p)
```
Now let's save that out in a .csv format where each row is the trace from an
experimental row along with the meta data (in this case the decay setting,
batch and experiment day) from that run.
We have four experimental runs:
+ Day 1
+ Batch 1234
+ Decay setting 20
+ Day 1
+ Batch 5678
+ Decay setting 10
+ Day 2
+ Batch 1234
+ Decay setting 15
+ Day 2
+ Batch 5678
+ Decay setting 5
So this means we will have four rows in our .csv file.
Let's pull out each run's results into an R object.
```{r}
d1.b1234.ds20 <- filter(all.traces, exp.day == 'Day 1', batch == 1234, decay.setting == 20)
d1.b5678.ds10 <- filter(all.traces, exp.day == 'Day 1', batch == 5678, decay.setting == 10)
d2.b1234.ds15 <- filter(all.traces, exp.day == 'Day 2', batch == 1234, decay.setting == 15)
d2.b5678.ds05 <- filter(all.traces, exp.day == 'Day 2', batch == 5678, decay.setting == 5)
```
For each of these objects we have 11 rows of 5 variables. We need to flatten this data
into a single row. In actual fact we have 3 bits of meta data (decay setting,
batch and experiment day) and the 11 readings from the waveform 0 to 100 ms. We are going
to construct a 14 column row which we can write out in .csv format.
First we collect the waveform data into a single numeric vector.
```{r}
exp.df <- d1.b1234.ds20
wf <- NULL
num.time.pts <- nrow(exp.df)
for (count in 1:num.time.pts)
{
wf <- c(wf,exp.df$response[count])
}
```
Now we make another vector with the meta data. We only need to grab the first
element of the required vector.
```{r}
meta.vec <- c(exp.df$exp.day[1],exp.df$batch[1],exp.df$decay.setting[1])
```
Then we can stick these two vectors together and give them names.
```{r}
full.row <- c(meta.vec,wf)
names(full.row) <- c('ExpDay','Batch','DecaySetting'
,'t=0','t=10','t=20','t=30','t=40','t=50'
,'t=60','t=70','t=80','t=90','t=100')
full.row
```
Let's put the above techniques into a function
```{r}
build_WF_Row <- function(exp.df)
{
wf <- NULL
num.time.pts <- nrow(exp.df)
for (count in 1:num.time.pts)
{
wf <- c(wf,exp.df$response[count])
}
meta.vec <- c(exp.df$exp.day[1],exp.df$batch[1],exp.df$decay.setting[1])
full.row <- c(meta.vec,wf)
# Having got a full row vector we want it as a data frame with
# a column for each element. The as.is stops the 'Day x' becoming a factor.
full.row.df <- data.frame(lapply(full.row, type.convert,as.is=TRUE), stringsAsFactors=FALSE)
# We then set the names including the slightly odd t= pattern.
names(full.row.df) <- c('ExpDay','Batch','DecaySetting'
,'t=0','t=10','t=20','t=30','t=40','t=50'
,'t=60','t=70','t=80','t=90','t=100')
return(full.row.df)
}
```
Now we'll build up a 4 row object with the data from the experiment. If this were real data from a reasonable size experiment we would do this in a for() loop or using lapply but for this example we'll do this manually.
```{r}
full.exp.list <- list()
full.exp.list[[1]] <- build_WF_Row(d1.b1234.ds20)
full.exp.list[[2]] <- build_WF_Row(d1.b5678.ds10)
full.exp.list[[3]] <- build_WF_Row(d2.b1234.ds15)
full.exp.list[[4]] <- build_WF_Row(d2.b5678.ds05)
```
We have a list of data_frames but what we want it a single data_frame.
Use rbindlist from th data.table package to achieve this. For some
```{r}
full.exp.df <- rbindlist(full.exp.list)
```
Let's check that looks as we expect.
```{r}
full.exp.df[,c('ExpDay','Batch','DecaySetting','t=0','t=10','t=90','t=100')]
```
# Export the data
We can now write this out to a .csv file so that you can prove to yourself
that it is what you would expect.
I'll assume that your current directory is the project dir.
```{r}
exp.file.name <- 'ExperimentFullData.csv'
write.csv(full.exp.df,file = exp.file.name, row.names = FALSE )
```
# Import the data
What we want to do now is show how you can read in a file in this format
and construct an object that ggplot can use to produce graphs like those above.
```{r}
raw.exp.data <- read_csv(file = exp.file.name)
```
# Construct data for ggplot()
We now show two ways to build up an object that we can use with ggplot().
## The hard way
What we want is a function that can take a row of meta data *and* waveform data
and return a data_frame that can we can use in a bind_rows construct to form an
object for ggplot().
```{r}
buildWaveformAndMetaData<-function(single.row)
{
# Our first task is to split out the waveform from the metadata.
wf <- select(single.row,contains('t='))
col.names <- names(wf)
time.ms <- rep(NaN,length(col.names))
response <- rep(NaN,length(col.names))
waveform.df <- data_frame(time.ms,response)
for(count in 1:length(col.names))
{
waveform.df$time.ms[count] <- gsub('t=','',col.names[count])
waveform.df$response[count] <- as.numeric(wf[count])
}
waveform.df$time.ms <- as.numeric(waveform.df$time.ms)
meta.data <- select(single.row,-contains('t='))
col.names <- names(meta.data)
for(column.name in col.names)
{
waveform.df[[column.name]] <- rep(meta.data[[column.name]],nrow(waveform.df))
}
return(waveform.df)
}
```
Let's test this function
```{r}
single.row <- raw.exp.data[1,]
one.row <- buildWaveformAndMetaData(single.row)
one.row
```
Now do it for the whole .csv file.
```{r}
all.experimental.data <- NULL
for(count in 1:nrow(raw.exp.data))
{
single.row <- raw.exp.data[count,]
one.row.df <- buildWaveformAndMetaData(single.row)
if(is.null(all.experimental.data))
{
all.experimental.data <- one.row.df
}
else
{
all.experimental.data <- bind_rows(all.experimental.data,one.row.df)
}
}
```
```{r}
p <- ggplot(data=all.experimental.data,aes(x=time.ms,y=response,colour = as.factor(DecaySetting)))
p <- p + geom_point()
p <- p + geom_line()
p <- p + labs(title='Experimental Response'
,x='Time (ms)', y='Response proportion'
,colour='Decay Setting')
print(p)
```
```{r}
p <- ggplot(data=all.experimental.data,aes(x=time.ms,y=response,colour = as.factor(DecaySetting)))
p <- p + geom_point()
p <- p + geom_line()
p <- p + labs(title='Experimental Response [Panel: batch, experiment day]'
,x='Time (ms)', y='Response proportion'
,colour='Decay Setting')
p <- p + facet_grid(ExpDay ~ Batch)
print(p)
```
## Using dplyr::gather()
An alternative approach is to use the gather() function from dplyr.
Here we show using gather() one row at a time.
```{r}
gather.all.experimental.data <- NULL
for(count in 1:nrow(raw.exp.data))
{
single.row <- raw.exp.data[count,]
one.row.df <- gather(single.row,value='response',key=time.pt,contains('t='))
if(is.null(gather.all.experimental.data))
{
gather.all.experimental.data <- one.row.df
}
else
{
gather.all.experimental.data <- bind_rows(gather.all.experimental.data,one.row.df)
}
}
gather.all.experimental.data$time.ms <- as.numeric(gsub('t=','',gather.all.experimental.data$time.pt))
```
## Using dplyr::gather() in two lines
But in actual fact you can use gather() to process the *whole* data frame in a oner. This amazingly reduces the complete operation down to two lines!!
In this first example we use the contains() function to scoop up all the timepoints.
```{r}
long.data <- gather(raw.exp.data,value='response',key=time.pt,contains('t='))
long.data$time.ms <- as.numeric(gsub('t=','',long.data$time.pt))
```
Let's see the plot.
```{r}
p <- ggplot(data=long.data,aes(x=time.ms,y=response,colour = as.factor(DecaySetting)))
p <- p + geom_point()
p <- p + geom_line()
p <- p + labs(title='Experimental Response from Long Data [Panel: batch, experiment day]'
,x='Time (ms)', y='Response proportion'
,colour='Decay Setting')
p <- p + facet_grid(ExpDay ~ Batch)
print(p)
```
In this next example we can specify the start and end points if we are interested in a particular section of the waveform.
```{r}
long.data.v2 <- gather(raw.exp.data,value='response',key=time.pt,`t=0`:`t=50`)
long.data.v2$time.ms <- as.numeric(gsub('t=','',long.data.v2$time.pt))
```
Let's see the plot.
```{r}
p <- ggplot(data=long.data.v2,aes(x=time.ms,y=response,colour = as.factor(DecaySetting)))
p <- p + geom_point()
p <- p + geom_line()
p <- p + labs(title='Experimental Response from Long Data V2 [Panel: batch, experiment day]'
,x='Time (ms)', y='Response proportion'
,colour='Decay Setting')
p <- p + facet_grid(ExpDay ~ Batch)
print(p)
```
Using the long.data object, let's add in the alpha, line type and size columns
```{r}
long.data$alpha.setting <- ifelse(long.data$DecaySetting %% 10 == 0,1,0.6)
long.data$line.type <- ifelse(long.data$DecaySetting %% 10 == 0,'solid','dotdash')
long.data$line.size <- ifelse(long.data$DecaySetting %% 10 == 0,1,0.5)
long.data$point.size <- ifelse(long.data$DecaySetting %% 10 == 0,3,1)
```
Now do a facet plot as above but also show the use of subtitle and caption. Note the caption
also has a theme() line to control size and font face.
```{r}
p <- ggplot(data=long.data,aes(x=time.ms,y=response
,colour = as.factor(DecaySetting)
,alpha=alpha.setting
,linetype=line.type))
p <- p + geom_point(size=3)
p <- p + geom_line(aes(size=line.size))
p <- p + labs(title='Experimental Response from Long Data [Panel: batch, experiment day]'
,subtitle='Using alpha, linetype and size'
,caption='Data Object: long.data'
,x='Time (ms)', y='Response proportion'
,colour='Decay Setting')
p <- p + theme(plot.caption = element_text(size = 6,face = 'italic'))
p <- p + guides(alpha=FALSE,linetype=FALSE,size=FALSE)
p <- p + scale_alpha_continuous(range = c(0.2, 1))
p <- p + scale_linetype_identity()
p <- p + scale_size_identity()
p <- p + facet_grid(ExpDay ~ Batch)
print(p)
```
A slightly different version of this is to put the alpha setting only in the
geom_line(). We also use point.size to control the point.size for the two different conditions. This allows the active points to stand out.
```{r}
p <- ggplot(data=long.data,aes(x=time.ms,y=response
,colour = as.factor(DecaySetting)
,linetype=line.type))
p <- p + geom_point(aes(size=point.size))
p <- p + geom_line(aes(size=line.size,alpha=alpha.setting))
p <- p + labs(title='Experimental Response from Long Data [Panel: batch, experiment day]'
,subtitle='Using alpha, linetype and size'
,caption='Data Object: long.data'
,x='Time (ms)', y='Response proportion'
,colour='Decay Setting')
p <- p + theme(plot.caption = element_text(size = 6,face = 'italic'))
p <- p + guides(alpha=FALSE,linetype=FALSE,size=FALSE)
p <- p + scale_alpha_continuous(range = c(0.5, 1))
p <- p + scale_linetype_identity()
p <- p + scale_size_identity()
p <- p + facet_grid(ExpDay ~ Batch)
print(p)
```