forked from yihui/knitr-talks
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathknitr-RStudio-2012.Rmd
284 lines (188 loc) · 7.71 KB
/
knitr-RStudio-2012.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
% New Tools for Reproducible Research with R
% JJ Allaire and Yihui Xie
% 2012/06/14
```{r setup, include=FALSE}
options(width = 50, reindent.spaces = 2)
```
# Reproducible Research with R
- Single document containing analysis, code, and results
- Sweave (included in R)
- Created by Friedrich Leisch in 2002
- Enables embedding R code into LaTeX documents
- New tools
- Add powerful new capabilities to Sweave
- Make authoring easier and more productive
- Support publishing to the web
# Why Reproducible Research?
- Automatically regenerate documents when code, data, or assumptions change.
- Eliminate transposition errors that occur when copying results into documents.
- Preserve contextual narrative about why analysis was performed in a certain fashion.
- Documentation for the analytic and computational processes from which conclusions are drawn.
# Prime Directive: Trustworthy Software
*Those who receive the results of modern data analysis have limited opportunity to verify the results by direct observation. Users of the analysis have no option but to trust the analysis, and by extension the software that produced it.*
*This places an obligation on all creators of software to program in such a way that the computations can be understood and trusted. This obligation I label the __Prime Directive__.*
<br/>
**Chambers, Software for Data Analysis: Programming with R**
# New Tools
- RStudio
- Productivity tools for Sweave / LaTeX
- knitr
- Many enhancements to Sweave
- Extensible design for new capabilities and formats
- R Markdown
- Reproducible research for the web
- Integerated support in RStudio and knitr
# Productivity Tools for Sweave / LaTeX
- Multi-mode editor (R and TeX aware)
- Line by line R code execution
- TeX formatting and spell checking
- One-click Compile PDF
- Two-way sync between source and PDF (SyncTeX)
- TeX error log parsing and navigation
- Chunk navigation, execution, and code-completion
# R Markdown
- What is Markdown?
- Easy to write plain text format for web authoring
- Allows arbitrary HTML for complex formatting
- What is R Markdown?
- Standard markdown parser (Sundown)
- Execution of R code chunks
- MathJax extensions for LaTeX and MathML equations
- [http://www.rstudio.org/docs/r_markdown](http://www.rstudio.org/docs/r_markdown)
# The markdown Package
This code produces an identical result to Knit HTML in RStudio (with no run-time dependency on RStudio):
```{r knit-to-html, tidy=FALSE, eval=FALSE}
knit("foo.Rmd")
markdownToHTML("foo.md")
browseURL("foo.html")
```
Enables use of R Markdown with any editor or IDE
# Distributing R Markdown Documents
- Standalone HTML files
- Embedded images (Base64 encoded)
- Send as email attachment
- Easy deployment to web server or Dropbox
- Use the `markdown` package for custom publishing
- Many blogs and wikis accept markdown input
- Scheduled batch generation of reports on a server
- New services for publishing data analysis
# The knitr Package
- Next-generation re-implementation of Sweave
- Adds many features to Sweave including caching, syntax highlighting, code externalization, and new graphics capabilities
- Very extensible design
- Supports publishing to the web (R Markdown and R HTML)
# Motivation (as a student and TA)
I do homework, I grade homework, and I saw this:
![](http://i.imgur.com/ce1BT.png)
# Principle 1: Beautiful output by default
- code highlighting (package **highlight**)
- carefully designed default themes
- hundreds of themes
- reformat code for lazy users (package **formatR**)
- plain text?
# Code highlighting
![](http://i.imgur.com/Tk0OF.png)
# Code reformatting
```{r reformat-no, tidy=FALSE, eval=FALSE}
## option tidy=FALSE
for(k in 1:10){j=cos(sin(k)*k^2)+3;print(j-5)}
```
Same code, reformatted:
```{r reformat-yes, tidy=TRUE, eval=FALSE}
## option tidy=TRUE
for(k in 1:10){j=cos(sin(k)*k^2)+3;print(j-5)}
```
# Principle 2: What you imagined is what you get
- running **knitr** gives you whatever you see in a normal R session
- `qplot(carat, price, data = diamonds)` just works; no need `print()` or `fig=TRUE`
- if your code produces two plots, you get two in the output
- what if you have 50 plots? (demo)
# Principle 3: Focus on R programming
![](http://i.imgur.com/jrwbX.jpg)
# Principle 4: Be sustainable
Compare
```{r prompt-yes, prompt=TRUE, comment=NA}
(x = 0)
x = x + 1
```
to (default):
```{r prompt-no, prompt=FALSE, comment='##'}
(x = 0)
x = x + 1
```
You do not appreciate this unless you have been a homework grader.
# Principle 5: I write the core, you can define the decoration
- I parse/evaluate the code and give you the raw results, then you have control of everything
- for example
- source code `1 + 1`
- output `[1] 2`
- _`\begin{verbatim}`_ `[1] 2` _`\end{verbatim}`_
- or _`<div class="output">`_ `[1] 2` _`</div>`_
- or ```` ```[1] 2``` ````
# Output hooks
You can control how the source code, normal output, warnings, messages, errors and plots are written in the output document.
```{r output-hook, eval=FALSE, tidy=FALSE}
knit_hooks$set(source = function(x, options) {
paste("\\begin{DearSource}", x,
"\\end{DearSource}", sep = "")
})
```
LaTeX, HTML, Markdown and reStructuredText have been supported, and it is straightforward to support other formats.
# Principle 6: All your base are belong to us
- the `tikz()` device supported by **pgfSweave**
- cache in **cacheSweave**
- automatic detection of chunk dependencies in **weaver**
- animations in my **animation** package
- **knitr** ≈ Sweave + cacheSweave + pgfSweave + weaver + `animation::saveLatex` + `R2HTML::RweaveHTML` + `highlight::HighlightWeaveLatex` + 0.2 \* brew + 0.1 \* SweaveListingUtils + more
- without copying 700 lines of code
# Principle 7: Do not scare beginners
- you get reasonable output even if you do not set any options; just `knit()` it (convention over configuration)
- if you only have an R script, you can still do a report: `stitch()` or `spin()`
- if clicking a button works, click it
- RStudio
- LyX (demo)
# Principle 8: Open source is open
In theory you can use any language with **knitr**, e.g.
<<test-python, engine='python'>>=
x = 'hello, python world!'
print x
print x.split(' ')
@
Contributions needed!
# Principle 9: Literate programming is programming
- we can go beyond the model of `Code + Documentation`
- a LP document can be programmable
# Programmable reports: Example 1
<<setup>>=
library(gridExtra)
g = tableGrob(head(iris, 4))
@
<<draw-table, fig.width=convertWidth(grobWidth(g), "in", value=TRUE), fig.height=convertHeight(grobHeight(g), "in", value=TRUE), dev='png', dpi=150>>=
grid.draw(g)
@
# Programmable reports: Example 2
Chunk hooks are functions associated with code chunks.
```{r tweet-hook, eval=FALSE, tidy=FALSE}
knit_hooks$set(lord = function(before, options, envir) {
library(twitteR)
# Authentication with OAuth here, then
if (!before) {
msg = paste('I have finished the chunk',
options$label, ', my Lord!')
tweet(msg)
}
})
# enable the chunk hook
opts_chunk$set(lord = TRUE)
```
# Conclusions
- make reproducible research enjoyable first (new tools to hide gory details)
- all people do homework assignments before they do research, so make your homework reproducible first
- build extensible tools to satisfy both beginners and advanced users
- **knitr** allows any input languages and any output formats
- reproducible blogging, exams/solutions, books, web tutorials, ...
- from reproducible reports to reproducible research
# IN CODE WE TRUST
Questions and comments?
- RStudio: <http://www.rstudio.org>
- knitr: <http://yihui.name/knitr/>