-
Notifications
You must be signed in to change notification settings - Fork 16
/
Statistical-Tests-in-R.html
406 lines (370 loc) · 25.3 KB
/
Statistical-Tests-in-R.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
<!DOCTYPE html>
<html>
<head>
<title>Statistical Tests</title>
<meta charset="utf-8">
<meta name="Description" content="R Language Tutorials for Advanced Statistics">
<meta name="Keywords" content="R, Tutorial, Machine learning, Statistics, Data Mining, Analytics, Data science, Linear Regression, Logistic Regression, Time series, Forecasting">
<meta name="Distribution" content="Global">
<meta name="Author" content="Selva Prabhakaran">
<meta name="Robots" content="index, follow">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="/screenshots/iconb-64.png" type="image/x-icon" />
<link href="www/bootstrap.min.css" rel="stylesheet">
<link href="www/highlight.css" rel="stylesheet">
<link href='http://fonts.googleapis.com/css?family=Inconsolata:400,700'
rel='stylesheet' type='text/css'>
<!-- Color Script -->
<style type="text/css">
a {
color: #3675C5;
color: rgb(25, 145, 248);
color: #4582ec;
color: #3F73D8;
}
li {
line-height: 1.65;
}
/* reduce spacing around math formula*/
.MathJax_Display {
margin: 0em 0em;
}
</style>
<!-- Add Google search -->
<script language="Javascript" type="text/javascript">
function my_search_google()
{
var query = document.getElementById("my-google-search").value;
window.open("http://google.com/search?q=" + query
+ "%20site:" + "http://r-statistics.co");
}
</script>
</head>
<body>
<div class="container">
<div class="masthead">
<!--
<ul class="nav nav-pills pull-right">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">
Table of contents<b class="caret"></b>
</a>
<ul class="dropdown-menu pull-right" role="menu">
<li class="dropdown-header"></li>
<li class="dropdown-header">Tutorial</li>
<li><a href="R-Tutorial.html">R Tutorial</a></li>
<li class="dropdown-header">ggplot2</li>
<li><a href="ggplot2-Tutorial-With-R.html">ggplot2 Short Tutorial</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part1-With-R-Code.html">ggplot2 Tutorial 1 - Intro</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html">ggplot2 Tutorial 2 - Theme</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html">ggplot2 Tutorial 3 - Masterlist</a></li>
<li><a href="ggplot2-cheatsheet.html">ggplot2 Quickref</a></li>
<li class="dropdown-header">Foundations</li>
<li><a href="Linear-Regression.html">Linear Regression</a></li>
<li><a href="Statistical-Tests-in-R.html">Statistical Tests</a></li>
<li><a href="Missing-Value-Treatment-With-R.html">Missing Value Treatment</a></li>
<li><a href="Outlier-Treatment-With-R.html">Outlier Analysis</a></li>
<li><a href="Variable-Selection-and-Importance-With-R.html">Feature Selection</a></li>
<li><a href="Model-Selection-in-R.html">Model Selection</a></li>
<li><a href="Logistic-Regression-With-R.html">Logistic Regression</a></li>
<li><a href="Environments.html">Advanced Linear Regression</a></li>
<li class="dropdown-header">Advanced Regression Models</li>
<li><a href="adv-regression-models.html">Advanced Regression Models</a></li>
<li class="dropdown-header">Time Series</li>
<li><a href="Time-Series-Analysis-With-R.html">Time Series Analysis</a></li>
<li><a href="Time-Series-Forecasting-With-R.html">Time Series Forecasting </a></li>
<li><a href="Time-Series-Forecasting-With-R-part2.html">More Time Series Forecasting</a></li>
<li class="dropdown-header">High Performance Computing</li>
<li><a href="Parallel-Computing-With-R.html">Parallel computing</a></li>
<li><a href="Strategies-To-Improve-And-Speedup-R-Code.html">Strategies to Speedup R code</a></li>
<li class="dropdown-header">Useful Techniques</li>
<li><a href="Association-Mining-With-R.html">Association Mining</a></li>
<li><a href="Multi-Dimensional-Scaling-With-R.html">Multi Dimensional Scaling</a></li>
<li><a href="Profiling.html">Optimization</a></li>
<li><a href="Information-Value-With-R.html">InformationValue package</a></li>
</ul>
</li>
</ul>
-->
<ul class="nav nav-pills pull-right">
<div class="input-group">
<form onsubmit="my_search_google()">
<input type="text" class="form-control" id="my-google-search" placeholder="Search..">
<form>
</div><!-- /input-group -->
</ul><!-- /.col-lg-6 -->
<h3 class="muted"><a href="/">r-statistics.co</a><small> by Selva Prabhakaran</small></h3>
<hr>
</div>
<div class="row">
<div class="col-xs-12 col-sm-3" id="nav">
<div class="well">
<li>
<ul class="list-unstyled">
<li class="dropdown-header"></li>
<li class="dropdown-header">Tutorial</li>
<li><a href="R-Tutorial.html">R Tutorial</a></li>
<li class="dropdown-header">ggplot2</li>
<li><a href="ggplot2-Tutorial-With-R.html">ggplot2 Short Tutorial</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part1-With-R-Code.html">ggplot2 Tutorial 1 - Intro</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html">ggplot2 Tutorial 2 - Theme</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html">ggplot2 Tutorial 3 - Masterlist</a></li>
<li><a href="ggplot2-cheatsheet.html">ggplot2 Quickref</a></li>
<li class="dropdown-header">Foundations</li>
<li><a href="Linear-Regression.html">Linear Regression</a></li>
<li><a href="Statistical-Tests-in-R.html">Statistical Tests</a></li>
<li><a href="Missing-Value-Treatment-With-R.html">Missing Value Treatment</a></li>
<li><a href="Outlier-Treatment-With-R.html">Outlier Analysis</a></li>
<li><a href="Variable-Selection-and-Importance-With-R.html">Feature Selection</a></li>
<li><a href="Model-Selection-in-R.html">Model Selection</a></li>
<li><a href="Logistic-Regression-With-R.html">Logistic Regression</a></li>
<li><a href="Environments.html">Advanced Linear Regression</a></li>
<li class="dropdown-header">Advanced Regression Models</li>
<li><a href="adv-regression-models.html">Advanced Regression Models</a></li>
<li class="dropdown-header">Time Series</li>
<li><a href="Time-Series-Analysis-With-R.html">Time Series Analysis</a></li>
<li><a href="Time-Series-Forecasting-With-R.html">Time Series Forecasting </a></li>
<li><a href="Time-Series-Forecasting-With-R-part2.html">More Time Series Forecasting</a></li>
<li class="dropdown-header">High Performance Computing</li>
<li><a href="Parallel-Computing-With-R.html">Parallel computing</a></li>
<li><a href="Strategies-To-Improve-And-Speedup-R-Code.html">Strategies to Speedup R code</a></li>
<li class="dropdown-header">Useful Techniques</li>
<li><a href="Association-Mining-With-R.html">Association Mining</a></li>
<li><a href="Multi-Dimensional-Scaling-With-R.html">Multi Dimensional Scaling</a></li>
<li><a href="Profiling.html">Optimization</a></li>
<li><a href="Information-Value-With-R.html">InformationValue package</a></li>
</ul>
</li>
</div>
<div class="well">
<p>Stay up-to-date. <a href="https://docs.google.com/forms/d/1xkMYkLNFU9U39Dd8S_2JC0p8B5t6_Yq6zUQjanQQJpY/viewform">Subscribe!</a></p>
<p><a href="https://docs.google.com/forms/d/13GrkCFcNa-TOIllQghsz2SIEbc-YqY9eJX02B19l5Ow/viewform">Chat!</a></p>
</div>
<h4>Contents</h4>
<ul class="list-unstyled" id="toc"></ul>
<!--
<hr>
<p><a href="/contribute.html">How to contribute</a></p>
<p><a class="btn btn-primary" href="">Edit this page</a></p>
-->
</div>
<div id="content" class="col-xs-12 col-sm-8 pull-right">
<h1>Statistical Tests</h1>
<blockquote>
<p>This chapter explains the purpose of some of the most commonly used statistical tests and how to implement them in R</p>
</blockquote>
<h2>1. One Sample t-Test</h2>
<h4>Why is it used?</h4>
<p>It is a parametric test used to test if the mean of a sample from a normal distribution could reasonably be a specific value.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">100</span>)
x <-<span class="st"> </span><span class="kw">rnorm</span>(<span class="dv">50</span>, <span class="dt">mean =</span> <span class="dv">10</span>, <span class="dt">sd =</span> <span class="fl">0.5</span>)
<span class="kw">t.test</span>(x, <span class="dt">mu=</span><span class="dv">10</span>) <span class="co"># testing if mean of x could be</span>
<span class="co">#=> One Sample t-test</span>
<span class="co">#=> </span>
<span class="co">#=> data: x</span>
<span class="co">#=> t = 0.70372, df = 49, p-value = 0.4849</span>
<span class="co">#=> alternative hypothesis: true mean is not equal to 10</span>
<span class="co">#=> 95 percent confidence interval:</span>
<span class="co">#=> 9.924374 10.157135</span>
<span class="co">#=> sample estimates:</span>
<span class="co">#=> mean of x </span>
<span class="co">#=> 10.04075 </span></code></pre></div>
<h4>How to interpret?</h4>
<p>In above case, the p-Value is not less than significance level of 0.05, therefore the null hypothesis that the mean=10 cannot be rejected. Also note that the 95% confidence interval range includes the value 10 within its range. So, it is ok to say the mean of ‘x’ is 10, especially since ‘x’ is assumed to be normally distributed. In case, a normal distribution is not assumed, use wilcoxon signed rank test shown in next section.</p>
<p>Note: Use conf.level argument to adjust the confidence level.</p>
<h2>2. Wilcoxon Signed Rank Test</h2>
<h4>Why / When is it used?</h4>
<p>To test the mean of a sample when normal distribution is not assumed. Wilcoxon signed rank test can be an alternative to t-Test, especially when the data sample is not assumed to follow a normal distribution. It is a non-parametric method used to test if an estimate is different from its true value.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">numeric_vector <-<span class="st"> </span><span class="kw">c</span>(<span class="dv">20</span>, <span class="dv">29</span>, <span class="dv">24</span>, <span class="dv">19</span>, <span class="dv">20</span>, <span class="dv">22</span>, <span class="dv">28</span>, <span class="dv">23</span>, <span class="dv">19</span>, <span class="dv">19</span>)
<span class="kw">wilcox.test</span>(numeric_vector, <span class="dt">mu=</span><span class="dv">20</span>, <span class="dt">conf.int =</span> <span class="ot">TRUE</span>)
<span class="co">#> Wilcoxon signed rank test with continuity correction</span>
<span class="co">#></span>
<span class="co">#> data: numeric_vector</span>
<span class="co">#> V = 30, p-value = 0.1056</span>
<span class="co">#> alternative hypothesis: true location is not equal to 20</span>
<span class="co">#> 90 percent confidence interval:</span>
<span class="co">#> 19.00006 25.99999</span>
<span class="co">#> sample estimates:</span>
<span class="co">#> (pseudo)median </span>
<span class="co">#> 23.00002</span></code></pre></div>
<h4>How to interpret?</h4>
<p>If p-Value < 0.05, reject the null hypothesis and accept the alternate mentioned in your R code’s output. Type example(wilcox.test) in R console for more illustration.</p>
<h2>3. Two Sample t-Test and Wilcoxon Rank Sum Test</h2>
<p>Both t.Test and Wilcoxon rank test can be used to compare the mean of 2 samples. The difference is t-Test assumes the samples being tests is drawn from a normal distribution, while, Wilcoxon’s rank sum test does not.</p>
<h4>How to implement in R?</h4>
<p>Pass the two numeric vector samples into the t.test() when sample is distributed ‘normal’y and wilcox.test() when it isn’t assumed to follow a normal distribution.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">c</span>(<span class="fl">0.80</span>, <span class="fl">0.83</span>, <span class="fl">1.89</span>, <span class="fl">1.04</span>, <span class="fl">1.45</span>, <span class="fl">1.38</span>, <span class="fl">1.91</span>, <span class="fl">1.64</span>, <span class="fl">0.73</span>, <span class="fl">1.46</span>)
y <-<span class="st"> </span><span class="kw">c</span>(<span class="fl">1.15</span>, <span class="fl">0.88</span>, <span class="fl">0.90</span>, <span class="fl">0.74</span>, <span class="fl">1.21</span>)
<span class="kw">wilcox.test</span>(x, y, <span class="dt">alternative =</span> <span class="st">"g"</span>) <span class="co"># g for greater</span>
<span class="co">#=> Wilcoxon rank sum test</span>
<span class="co">#=> </span>
<span class="co">#=> data: x and y</span>
<span class="co">#=> W = 35, p-value = 0.1272</span>
<span class="co">#=> alternative hypothesis: true location shift is greater than 0</span></code></pre></div>
<p>With a p-Value of 0.1262, we cannot reject the null hypothesis that both x and y have same means.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">t.test</span>(<span class="dv">1</span>:<span class="dv">10</span>, <span class="dt">y =</span> <span class="kw">c</span>(<span class="dv">7</span>:<span class="dv">20</span>)) <span class="co"># P = .00001855</span>
<span class="co">#=> Welch Two Sample t-test</span>
<span class="co">#=> </span>
<span class="co">#=> data: 1:10 and c(7:20)</span>
<span class="co">#=> t = -5.4349, df = 21.982, p-value = 1.855e-05</span>
<span class="co">#=> alternative hypothesis: true difference in means is not equal to 0</span>
<span class="co">#=> 95 percent confidence interval:</span>
<span class="co">#=> -11.052802 -4.947198</span>
<span class="co">#=> sample estimates:</span>
<span class="co">#=> mean of x mean of y </span>
<span class="co">#=> 5.5 13.5</span></code></pre></div>
<p>With p-Value < 0.05, we can safely reject the null hypothesis that there is no difference in mean.</p>
<h4>What if we want to do a 1-to-1 comparison of means for values of x and y?</h4>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Use paired = TRUE for 1-to-1 comparison of observations.</span>
<span class="kw">t.test</span>(x, y, <span class="dt">paired =</span> <span class="ot">TRUE</span>) <span class="co"># when observations are paired, use 'paired' argument.</span>
<span class="kw">wilcox.test</span>(x, y, <span class="dt">paired =</span> <span class="ot">TRUE</span>) <span class="co"># both x and y are assumed to have similar shapes</span></code></pre></div>
<h4>When can I conclude if the mean’s are different?</h4>
<p>Conventionally, If the p-Value is less than significance level (ideally 0.05), reject the null hypothesis that both means are the are equal.</p>
<h2>4. Shapiro Test</h2>
<h4>Why is it used?</h4>
<p>To test if a sample follows a <em>normal distribution</em>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">shapiro.test</span>(numericVector) <span class="co"># Does myVec follow a normal disbn?</span></code></pre></div>
<p>Lets see how to do the test on a sample from a normal distribution.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Example: Test a normal distribution</span>
<span class="kw">set.seed</span>(<span class="dv">100</span>)
normaly_disb <-<span class="st"> </span><span class="kw">rnorm</span>(<span class="dv">100</span>, <span class="dt">mean=</span><span class="dv">5</span>, <span class="dt">sd=</span><span class="dv">1</span>) <span class="co"># generate a normal distribution</span>
<span class="kw">shapiro.test</span>(normaly_disb) <span class="co"># the shapiro test.</span>
<span class="co">#=> Shapiro-Wilk normality test</span>
<span class="co">#=></span>
<span class="co">#=> data: normaly_disb</span>
<span class="co">#=> W = 0.98836, p-value = 0.535</span></code></pre></div>
<h4>How to interpret?</h4>
<p>The null hypothesis here is that the sample being tested is normally distributed. Since the p Value is not less that the significane level of 0.05, we don’t reject the null hypothesis. Therefore, the tested sample is confirmed to follow a normal distribution (thou, we already know that!).</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Example: Test a uniform distribution</span>
<span class="kw">set.seed</span>(<span class="dv">100</span>)
not_normaly_disb <-<span class="st"> </span><span class="kw">runif</span>(<span class="dv">100</span>) <span class="co"># uniform distribution.</span>
<span class="kw">shapiro.test</span>(not_normaly_disb)
<span class="co">#=> Shapiro-Wilk normality test</span>
<span class="co">#=> data: not_normaly_disb</span>
<span class="co">#=> W = 0.96509, p-value = 0.009436</span></code></pre></div>
<h4>How to interpret?</h4>
<p>If p-Value is less than the significance level of 0.05, the null-hypothesis that it is normally distributed can be rejected, which is the case here.</p>
<h2>5. Kolmogorov And Smirnov Test</h2>
<p>Kolmogorov-Smirnov test is used to check whether 2 samples follow the same distribution.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ks.test</span>(x, y) <span class="co"># x and y are two numeric vector</span></code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># From different distributions</span>
x <-<span class="st"> </span><span class="kw">rnorm</span>(<span class="dv">50</span>)
y <-<span class="st"> </span><span class="kw">runif</span>(<span class="dv">50</span>)
<span class="kw">ks.test</span>(x, y) <span class="co"># perform ks test</span>
<span class="co">#=> Two-sample Kolmogorov-Smirnov test</span>
<span class="co">#=> </span>
<span class="co">#=> data: x and y</span>
<span class="co">#=> D = 0.58, p-value = 4.048e-08</span>
<span class="co">#=> alternative hypothesis: two-sided</span></code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Both from normal distribution</span>
x <-<span class="st"> </span><span class="kw">rnorm</span>(<span class="dv">50</span>)
y <-<span class="st"> </span><span class="kw">rnorm</span>(<span class="dv">50</span>)
<span class="kw">ks.test</span>(x, y) <span class="co"># perform ks test</span>
<span class="co">#=> Two-sample Kolmogorov-Smirnov test</span>
<span class="co">#=> </span>
<span class="co">#=> data: x and y</span>
<span class="co">#=> D = 0.18, p-value = .3959</span>
<span class="co">#=> alternative hypothesis: two-sided</span></code></pre></div>
<h4>How to tell if they are from the same distribution ?</h4>
<p>If p-Value < 0.05 (significance level), we reject the null hypothesis that they are drawn from same distribution. In other words, p < 0.05 implies x and y from different distributions</p>
<h2>6. Fisher’s F-Test</h2>
<p>Fisher’s F test can be used to check if two samples have same variance.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">var.test</span>(x, y) <span class="co"># Do x and y have the same variance?</span></code></pre></div>
<p>Alternatively fligner.test() and bartlett.test() can be used for the same purpose.</p>
<h2>7. Chi Squared Test</h2>
<p>Chi-squared test in R can be used to test if two categorical variables are dependent, by means of a contingency table.</p>
<p>Example use case: You may want to figure out if big budget films become box-office hits. We got 2 categorical variables (Budget of film, Success Status) each with 2 factors (Big/Low budget and Hit/Flop), which forms a 2 x 2 matrix.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">chisq.test</span>(<span class="kw">table</span>(categorical_X, categorical_Y), <span class="dt">correct =</span> <span class="ot">FALSE</span>) <span class="co"># Yates continuity correction not applied</span>
<span class="co">#or</span>
<span class="kw">summary</span>(<span class="kw">table</span>(categorical_X, categorical_Y)) <span class="co"># performs a chi-squared test.</span>
<span class="co"># Sample results</span>
<span class="co">#=> Pearson's Chi-squared test</span>
<span class="co">#=> data: M</span>
<span class="co">#=> X-squared = 30.0701, df = 2, p-value = 2.954e-07</span></code></pre></div>
<h4>How to tell if x, y are independent?</h4>
<p>There are two ways to tell if they are independent:</p>
<ol style="list-style-type: decimal">
<li><p><strong>By looking at the p-Value</strong>: If the p-Value is less that 0.05, we fail to reject the null hypothesis that the x and y are independent. So for the example output above, (p-Value=2.954e-07), we reject the null hypothesis and conclude that x and y are not independent.</p></li>
<li><p><strong>From Chi.sq value</strong>: For 2 x 2 contingency tables with 2 degrees of freedom (d.o.f), if the Chi-Squared calculated is greater than 3.841 (critical value), we reject the null hypothesis that the variables are independent. To find the critical value of larger d.o.f contingency tables, use qchisq(0.95, n-1), where n is the number of variables.</p></li>
</ol>
<h2>8. Correlation</h2>
<h4>Why is it used?</h4>
<p>To test the linear relationship of two continuous variables</p>
<p>The cor.test() function computes the correlation between two continuous variables and test if the y is dependent on the x. The null hypothesis is that the true correlation between <em>x</em> and <em>y</em> is zero.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">cor.test</span>(x, y) <span class="co"># where x and y are numeric vectors.</span></code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">cor.test</span>(cars$speed, cars$dist)
<span class="co">#=> Pearson's product-moment correlation</span>
<span class="co">#=> </span>
<span class="co">#=> data: cars$speed and cars$dist</span>
<span class="co">#=> t = 9.464, df = 48, p-value = 1.49e-12</span>
<span class="co">#=> alternative hypothesis: true correlation is not equal to 0</span>
<span class="co">#=> 95 percent confidence interval:</span>
<span class="co">#=> 0.6816422 0.8862036</span>
<span class="co">#=> sample estimates:</span>
<span class="co">#=> cor </span>
<span class="co">#=> 0.8068949</span></code></pre></div>
<h4>How to interpret?</h4>
<p>If the p Value is less than 0.05, we reject the null hypothesis that the true correlation is zero (i.e. they are independent). So in this case, we reject the null hypothesis and conclude that <em>dist</em> is dependent on <em>speed</em>.</p>
<h2>9. More Commonly Used Tests</h2>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">fisher.test</span>(contingencyMatrix, <span class="dt">alternative =</span> <span class="st">"greater"</span>) <span class="co"># Fisher's exact test to test independence of rows and columns in contingency table</span>
<span class="kw">friedman.test</span>() <span class="co"># Friedman's rank sum non-parametric test </span></code></pre></div>
<p>There are more useful tests available in various other packages.</p>
<p>The package <code>lawstat</code> has a good collection. The outliers package has a number of test for testing for presence of outliers.</p>
</div>
</div>
<div class="footer">
<hr>
<p>© 2016-17 Selva Prabhakaran. Powered by <a href="http://jekyllrb.com/">jekyll</a>,
<a href="http://yihui.name/knitr/">knitr</a>, and
<a href="http://johnmacfarlane.net/pandoc/">pandoc</a>.
This work is licensed under the <a href="http://creativecommons.org/licenses/by-nc/3.0/">Creative Commons License.</a>
</p>
</div>
</div> <!-- /container -->
<script src="//code.jquery.com/jquery.js"></script>
<script src="www/bootstrap.min.js"></script>
<script src="www/toc.js"></script>
<!-- MathJax Script -->
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript"
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<!-- Google Analytics Code -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-69351797-1', 'auto');
ga('send', 'pageview');
</script>
<style type="text/css">
/* reduce spacing around math formula*/
.MathJax_Display {
margin: 0em 0em;
}
body {
font-family: 'Helvetica Neue', Roboto, Arial, sans-serif;
font-size: 16px;
line-height: 27px;
font-weight: 400;
}
blockquote p {
line-height: 1.75;
color: #717171;
}
.well li{
line-height: 28px;
}
li.dropdown-header {
display: block;
padding: 0px;
font-size: 14px;
}
</style>
</body>
</html>