-
Notifications
You must be signed in to change notification settings - Fork 3
/
03-Scores-Scales.Rmd
546 lines (417 loc) · 30.7 KB
/
03-Scores-Scales.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
# Scores and Scales {#scoresScales}
Assessments yield information.
The information is encoded in scores or in other [types of data](#dataTypes).\index{data!type}
It is important to consider the different [types of data](#dataTypes) because the [types of data](#dataTypes) restrict what options are available to analyze the data.\index{data!type}
## Getting Started {#gettingStarted-scoresScales}
### Load Libraries {#loadLibraries-scoresScales}
First, we load the `R` packages used for this chapter:
```{r}
library("petersenlab") #to install: install.packages("remotes"); remotes::install_github("DevPsyLab/petersenlab")
library("dplyr")
library("tidyverse")
library("tinytex")
library("knitr")
library("rmarkdown")
library("bookdown")
```
### Prepare Data {#prepareData-scoresScales}
#### Simulate Data {#simulateData-scoresScales}
For this example, we simulate data below.\index{simulate data}
For reproducibility, we set the seed below.
Using the same seed will yield the same answer every time.
There is nothing special about this particular seed.
```{r}
set.seed(52242)
rawData <- rnorm(
n = 1000,
mean = 200,
sd = 30)
scores <- data.frame(
rawData = rawData,
rawDataNoMissing = rawData)
```
#### Add Missing Data {#addMissingData-scoresScales}
Adding missing data to dataframes helps make examples more realistic to real-life data and helps you get in the habit of programming to account for missing data.
```{r}
scores$rawData[c(5,10)] <- NA
```
## Data Types {#dataTypes}
There are four general data types: [nominal](#nominalData), [ordinal](#ordinalData), [interval](#intervalData), and [ratio](#ratioData).\index{data!type}
Depending on the use of the variable, the data could fall into more than one category.\index{data!type}
The type of data influences what kinds of data analysis you can do.\index{data!type}
For instance, parametric statistical analysis (e.g., *t* test, analysis of variance [ANOVA], and linear regression) assumes that data are [interval](#intervalData) or [ratio](#ratioData).\index{data!type}
### Nominal {#nominalData}
Nominal data are distinct categories.\index{data!nominal}
They are categorical and unordered.\index{data!nominal}
Nominal data make no quantitative claims.\index{data!nominal}
Nominal data represent things that we can name (e.g., cat and dog).\index{data!nominal}
Nominal data can be represented with numbers.\index{data!nominal}
For example, zip codes are nominal.\index{data!nominal}
Numbers that represent a participant's sex, race, or ethnicity are also nominal.\index{data!nominal}
Categorical data are often dummy-coded into binary (i.e., dichotomous) variables, which represent nominal data.\index{data!nominal}\index{data!dichotomous}
Higher numbers of nominal data do not reflect higher (or lower) levels of the construct because the numbers represent categories that do not have an order.\index{data!nominal}
### Ordinal {#ordinalData}
Ordinal data are ordered categories: they have a name and an order.\index{data!ordinal}
They make no claim about the conceptual distance between the ranks, only that higher values represent higher (or lower) levels of the construct.\index{data!ordinal}
For example, ranks following a race are ordinal—that is, the person with rank 1 finished before the person with rank 2, who finished before the person with rank 3 (1 > 2 > 3 > 4).\index{data!ordinal}
Ordinal data make a limited claim because the conceptual distance between adjacent numbers is not the same.\index{data!ordinal}
For instance, the person who finished the race first might have finished 10 minutes before the second-place finisher; whereas the third-place finisher might have finished 1 second after the second-place finisher.\index{data!ordinal}
That is, just because the numbers have the same *mathematical* distance does not mean that they represent the same *conceptual* distance on the construct.\index{data!ordinal}
For example, if the respondent is asked how many drinks they had in the past day, and the options are 0 = 0 drinks; 1 = 1–2 drinks; 2 = 3 or more drinks, the scale is ordinal.\index{data!ordinal}
Even though the numbers have the same mathematical distance (1, 2, 3), they do not represent the same conceptual distance.\index{data!ordinal}
Most data in psychology are ordinal data even though they are often treated as if they were [interval](#intervalData) data.\index{data!ordinal}\index{data!interval}
### Interval {#intervalData}
Interval data are ordered and have meaningful distances (i.e., equal spacing between intervals).\index{data!interval}
You can sum interval data (e.g., 2 is 2 away from 4), but you cannot multiply interval data ($2 \times 2 \ne 4$).\index{data!interval}
Examples of interval data are temperatures in Fahrenheit and Celsius—100 degrees Fahrenheit is not twice as hot as 50 degrees Fahrenheit.\index{data!interval}
A person's number of years of education is interval, whereas educational attainment (e.g., high school degree, college degree, graduate degree) is only ordinal.\index{data!interval}
Although much data in psychology involve numbers that have the same mathematical distance between intervals, the intervals likely do not represent the same conceptual distance.\index{data!interval}
For example, the difference in severity of two people who have two symptoms and four symptoms of depression, respectively, may not be the same difference in depression severity as two people who have four symptoms and six symptoms, respectively.\index{data!interval}
### Ratio {#ratioData}
Ratio data are ordered, have meaningful distances, and have a true (absolute) zero that represents absence of the construct.\index{data!ratio}
With ratio data, multiplicative relationships are true.\index{data!ratio}
An example of ratio data is temperature in Kelvin—100 degrees Kelvin is twice as hot as 50 degrees Kelvin.\index{data!ratio}
There is a dream of having ratio scales in psychology, but we still do not have a true zero with psychological constructs—what does total absence of depression mean (apart from a dead person)?\index{data!ratio}
## Score Transformation {#scoreTransformation}
There are a number of score transformations, depending on the goal.\index{data!transformation}
Some score transformations (e.g., log transform) seek to make data more normally distributed to meet assumptions of particular analysis approaches.\index{data!transformation}
Score transformations alter the original (raw) data.\index{data!transformation}\index{data!raw}
If you change the data, it can change the results.\index{data!transformation}
Score transformations are not neutral.\index{data!transformation}
### Raw Scores {#rawScores}
Raw scores are the original data, or they may be aggregations (e.g., sums or means) of multiple items.\index{data!raw}
Raw scores are the purest because they are closest to the original operation (e.g., behavior).\index{data!raw}
A disadvantage of raw scores is that they are scale dependent, and therefore they may not be comparable across different measures with different scales.\index{data!raw}
An example histogram of raw scores is in Figure \@ref(fig:rawScores).\index{data!raw}\index{histogram}
```{r rawScores, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "Histogram of Raw Scores."}
hist(scores$rawData,
xlab = "Raw Scores",
main = "Histogram of Raw Scores")
```
### Norm-Referenced Scores {#norm}
Norm-referenced scores are scores that are referenced to some norm.\index{norm-referenced}\index{norm}
A norm is a standard of comparison.\index{norm}
For instance, you may be interested in how well a participant performed relative to other children of the same sex, age, grade, or ethnicity.\index{norm-referenced}\index{norm}
However, interpretation of norm-referenced scores depends on the measure and on the normative sample.\index{norm-referenced}\index{norm}
A person's norm-referenced score can vary widely depending on which norms are used.\index{norm-referenced}\index{norm}
Which reference group should you use?\index{norm-referenced}\index{norm}
Age?\index{norm-referenced}\index{norm}
Sex?\index{norm-referenced}\index{norm}
Age and sex?\index{norm-referenced}\index{norm}
Grade?\index{norm-referenced}\index{norm}
Ethnicity?\index{norm-referenced}\index{norm}
The optimal reference group depends on the purpose of the assessment.\index{norm-referenced}\index{norm}
The quality of the norms also depends on the representativeness of the reference group compared to the population of interest.\index{norm-referenced}\index{norm}
Pros and cons of group-based norms are discussed in Section \@ref(withinGroupNorming).\index{norm-referenced}\index{norm}
A standard normal distribution on various norm-referenced scales is depicted in Figure \@ref(fig:normedDistribution), as adapted from @Bandalos2018.\index{norm-referenced}
```{r normedDistribution, echo = FALSE, results = "hide", out.width = "100%", fig.align = "center", fig.cap = "Various Norm-Referenced Scales."}
standardNormal <- data.frame(zScore = -4:4)
standardNormal$Tscore <- seq(from = 10, to = 90, by = 10)
standardNormal$standardScore <- seq(from = 40, to = 160, by = 15)
standardNormal$cumulativePercent <- substr(as.character(pnorm(standardNormal$zScore) * 100), 1, 4)
standardNormal$cumulativePercentFigure <- c("0","0.1","2.3","15.9","50","84.1","97.7","99.9","100")
standardNormal$nonCumulativePercent[1] <- round((pnorm(-3) - pnorm(-4)) * 100, 2)
standardNormal$nonCumulativePercent[2] <- round((pnorm(-2) - pnorm(-3)) * 100, 2)
standardNormal$nonCumulativePercent[3] <- round((pnorm(-1) - pnorm(-2)) * 100, 2)
standardNormal$nonCumulativePercent[4] <- round((pnorm(0) - pnorm(-1)) * 100, 2)
standardNormal$nonCumulativePercent[5] <- round((pnorm(1) - pnorm(0)) * 100, 2)
standardNormal$nonCumulativePercent[6] <- round((pnorm(2) - pnorm(1)) * 100, 2)
standardNormal$nonCumulativePercent[7] <- round((pnorm(3) - pnorm(2)) * 100, 2)
standardNormal$nonCumulativePercent[8] <- round((pnorm(4) - pnorm(3)) * 100, 2)
standardNormal$nonCumulativePercentFigure <- paste(standardNormal$nonCumulativePercent, "%", sep = "")
standardNormal$midpointsZ <- c(-3.5,-2.5,-1.5,-0.5,0.5,1.5,2.5,3.5,NA)
standardNormal$nonCumulativePercent[9] <- NA
standardNormal$nonCumulativePercentFigure[9] <- NA
standardNormalPercentiles <- data.frame(percentileRank = c(NA,1,5,10,20,30,40,50,60,70,80,90,95,99,NA))
standardNormalPercentiles$percentileRankZ <- qnorm(standardNormalPercentiles$percentileRank / 100)
standardNormalPercentiles$percentileRankZ[1] <- -4
standardNormalPercentiles$percentileRankZ[nrow(standardNormalPercentiles)] <- 4
standardNormalStanines <- data.frame(stanine = 1:9)
standardNormalStanines$staninePercent <- c("4%","7%","12%","17%","20%","17%","12%","7%","4%")
standardNormalStanines$stanineCumulativeStartingPercent <- c(0,4,11,23,40,60,77,89,96)
standardNormalStanines$stanineCumulativeEndingPercent <- c(4,11,23,40,60,77,89,96,100)
standardNormalStanines$stanineCumulativeStartingPercentZ <- qnorm(standardNormalStanines$stanineCumulativeStartingPercent/100)
standardNormalStanines$stanineCumulativeEndingPercentZ <- qnorm(standardNormalStanines$stanineCumulativeEndingPercent/100)
standardNormalStanines[standardNormalStanines == -Inf] <- -4
standardNormalStanines[standardNormalStanines == Inf] <- 4
standardNormalStanines$midpointsZ <- rowMeans(standardNormalStanines[,c("stanineCumulativeStartingPercentZ","stanineCumulativeEndingPercentZ")])
par(xpd = NA,
mgp = c(3, 0.5 , 0), # gap between axis label and axis
mar = c(12, 8, 0, 0) + 0.1) # plot margins: default of the form c(bottom, left, top, right): c(5, 4, 4, 2) + 0.1
x <- seq(-4, 4, length = 10000)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l", lwd = 8, axes = FALSE, xlab = NA, ylab = NA, col = "gray")
segments(x0 = -4, y0 = 0, y1 = 1)
segments(x0 = -3, y0 = 0, y1 = 1)
segments(x0 = -2, y0 = 0, y1 = 1)
segments(x0 = -1, y0 = 0, y1 = 1)
segments(x0 = 0, y0 = 0, y1 = 1)
segments(x0 = 1, y0 = 0, y1 = 1)
segments(x0 = 2, y0 = 0, y1 = 1)
segments(x0 = 3, y0 = 0, y1 = 1)
segments(x0 = 4, y0 = 0, y1 = 1)
axis(side = 1, at = standardNormal$zScore, pos = 0)
axis(side = 1, at = standardNormal$zScore, pos = -.06, labels = standardNormal$Tscore)
axis(side = 1, at = standardNormal$zScore, pos = -.12, labels = standardNormal$standardScore)
axis(side = 1, at = standardNormal$zScore, pos = -.18, labels = standardNormal$cumulativePercentFigure)
axis(side = 1, at = standardNormalPercentiles$percentileRankZ, pos = -.24, labels = standardNormalPercentiles$percentileRank, cex.axis = 0.5)
rect(xleft = standardNormalStanines$stanineCumulativeStartingPercentZ, ybottom = -.36, xright = standardNormalStanines$stanineCumulativeEndingPercentZ, ytop = -.31)
mtext(standardNormalStanines$stanine, side = 1, line = 9, at = standardNormalStanines$midpointsZ)
text(0.05, .02, labels = "Standard Deviations")
text(standardNormal$midpointsZ, .06, labels = standardNormal$nonCumulativePercentFigure)
mtext("Percent of cases \n under segment of \n the normal curve", side = 2, line = 0, at = .05, las = 1)
mtext("z scores", side = 2, line = 0, at = -.033, las = 1)
mtext("T scores", side = 2, line = 0, at = -.0934, las = 1)
mtext("Standard scores", side = 2, line = 0, at = -.1538, las = 1)
mtext("Cumulative percent", side = 2, line = 0, at = -.2142, las = 1)
mtext("Percentile ranks", side = 2, line = 0, at = -.2746, las = 1)
mtext("Stanines", side = 2, line = 0, at = -.335, las = 1)
```
#### Percentile Ranks {#percentileRanks}
A percentile rank reflects what percent of people a person scored higher than, in a given group (i.e., norm).\index{data!percentile rank}\index{norm}
Percentile ranks are frequently used for tests of intellectual/cognitive ability, academic achievement, academic aptitude, and grant funding.\index{data!percentile rank}\index{intellectual assessment}\index{achievement!testing}\index{aptitude!testing}
They seem like interval data, but they are not intervals because the conceptual spacing between the numbers is not equal.\index{data!percentile rank}\index{data!interval}
The difference in ability for two people who scored at the 99th and 98th percentile, respectively, is not the same as the difference in ability for two people who scored at the 49th and 50th percentile, respectively.\index{data!percentile rank}
Percentile ranks are only judged against a baseline; there is no subtraction.\index{data!percentile rank}
Percentile ranks have unusual effects.\index{data!percentile rank}
There are lots of people in the middle of a distribution, so a very small difference in raw scores gets expanded out in percentiles.\index{data!percentile rank}
For instance, a raw score of 20 may have a percentile rank of 50, but a raw score of 24 may have a percentile rank of 68.\index{data!percentile rank}\index{data!raw}
However, a larger raw score change at the ends of the distribution may have a smaller percentile change.\index{data!percentile rank}\index{data!raw}
For example, a raw score of 120 may have a percentile rank of 97, whereas a raw score of 140 may have a percentile rank of 99.\index{data!percentile rank}\index{data!raw}
Thus, percentile ranks stretch out differences for some people but constrict differences for others.\index{data!percentile rank}
Here is an example of how to calculate percentile ranks using the `dplyr` package from the `tidyverse` [@R-tidyverse; @tidyverse2019]:\index{data!percentile rank}
```{r}
scores$percentileRank <-
percent_rank(scores$rawDataNoMissing) * 100
```
A histogram of percentile rank scores is in Figure \@ref(fig:percentileRanks).\index{data!percentile rank}\index{histogram}
```{r percentileRanks, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "Histogram of Percentile Ranks."}
hist(scores$percentileRank,
xlab = "Percentile Ranks",
main = "Histogram of Percentile Ranks")
```
#### Deviation (Standardized) Scores {#standardizedScores}
Deviation or standardized scores are the transformation of raw scores to a normal distribution using some norm.\index{data!standardized}\index{norm}
The norm could be a comparison group, or it could be the sample itself.\index{data!standardized}\index{norm}
With deviation scores, you have similar challenges as with [percentile ranks](#percentileRanks), including which reference group to use, but there are additional challenges.\index{data!standardized}\index{data!percentile rank}
Deviation scores are more informative when the scores are normally distributed compared to when the scores are skewed.\index{data!standardized}\index{data!not normally distributed}
If scores are skewed, it can lead to two *z* scores on the opposite side of the mean having different probabilities.\index{data!standardized}\index{data!\textit{z} score}
Many constructs we study in psychology are not normally distributed.\index{data!not normally distributed}
For example, the frequency of hallucinations among people would show a positively skewed distribution with a truncation at zero, representing a floor effect—i.e., most people do not show hallucinations.\index{data!not normally distributed}
For instance, consider a hypothetical distribution of hallucinations.
It might follow a folded distribution, as depicted in Figure \@ref(fig:foldedDistribution):\index{data!not normally distributed}\index{histogram}
```{r foldedDistribution, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "Histogram of Hallucinations (Raw Score)."}
hist(rbinom(100000, 300, .01),
breaks = 8,
xlab = "Hallucinations (Raw Score)",
main = "Histogram of Hallucinations (Raw Score)")
```
Now consider the same distribution converted to a standardized (*z*) score, as depicted in Figure \@ref(fig:foldedDistributionZscore):\index{data!standardized}\index{data!\textit{z} score}\index{histogram}
```{r foldedDistributionZscore, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "Histogram of Hallucinations (z Score)."}
hist(scale(rbinom(100000, 300, .01)),
breaks = 8,
xlab = "Hallucinations (z Score)",
main = "Histogram of Hallucinations (z Score)")
```
Thus, you can compute a deviation score, but it may not be meaningful if the data and underlying construct are not normally distributed.\index{data!standardized}\index{data!not normally distributed}
##### *z* scores {#zScores}
The *z* score is the most common standardized score, and it can help put scores from different measures with different scales onto the same playing field.\index{data!standardized}\index{data!\textit{z} score}
*z* scores have a mean of zero and a standard deviation of one.\index{data!standardized}\index{data!\textit{z} score}
To get a *z* score that uses the sample as its own norm, subtract the mean from all scores and divide by the standard deviation.\index{data!standardized}\index{data!\textit{z} score}
Every *z* score represents how far that person's score is from the (normed) average, represented in standard deviation units.\index{data!standardized}\index{data!\textit{z} score}
With a standard normal curve, 68% of scores fall within one standard deviation of the mean.\index{data!standardized}\index{data!\textit{z} score}
95% of scores fall within two standard deviations of the mean.\index{data!standardized}\index{data!\textit{z} score}
99.7% of scores fall within three standard deviations of the mean.\index{data!standardized}\index{data!\textit{z} score}
The area under a normal curve within one standard deviation of the mean is calculated below using the `pnorm()` function, which calculates the cumulative density function for a normal curve.\index{data!standardized}
```{r}
stdDeviations <- 1
pnorm(stdDeviations) - pnorm(stdDeviations * -1)
```
The area under a normal curve within one standard deviation of the mean is depicted in Figure \@ref(fig:zScoreDensity1SD).\index{data!standardized}
```{r zScoreDensity1SD, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "Density of Standard Normal Distribution. The blue region represents the area within one standard deviation of the mean.", fig.scap = "Density of Standard Normal Distribution: One Standard Deviation of the Mean."}
x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
xlab = "z Score",
ylab = "Normal Density")
x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
c(0, y, 0),
col = "blue")
```
The area under a normal curve within two standard deviations of the mean is calculated below:\index{data!standardized}
```{r}
stdDeviations <- 2
pnorm(stdDeviations) - pnorm(stdDeviations * -1)
```
The area under a normal curve within two standard deviations of the mean is depicted in Figure \@ref(fig:zScoreDensity2SD).\index{data!standardized}
```{r zScoreDensity2SD, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "Density of Standard Normal Distribution. The blue region represents the area within two standard deviations of the mean.", fig.scap = "Density of Standard Normal Distribution: Two Standard Deviations of the Mean."}
x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
xlab = "z Score",
ylab = "Normal Density")
x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
c(0, y, 0),
col = "blue")
```
The area under a normal curve within three standard deviations of the mean is calculated below:\index{data!standardized}
```{r}
stdDeviations <- 3
pnorm(stdDeviations) - pnorm(stdDeviations * -1)
```
The area under a normal curve within three standard deviations of the mean is depicted in Figure \@ref(fig:zScoreDensity3SD).\index{data!standardized}
```{r zScoreDensity3SD, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "Density of Standard Normal Distribution. The blue region represents the area within three standard deviations of the mean.", fig.scap = "Density of Standard Normal Distribution: Three Standard Deviations of the Mean."}
x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
xlab = "z Score",
ylab = "Normal Density")
x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
c(0, y, 0),
col = "blue")
```
Alternatively, if you want to determine the *z* score associated with a particular percentile in a normal distribution, you can use the `qnorm()` function.\index{data!standardized}\index{data!\textit{z} score}\index{data!percentile rank}
For instance, the *z* score associated with the 37th percentile is:\index{data!standardized}\index{data!percentile rank}
```{r}
qnorm(.37)
```
*z* scores are calculated using Equation \@ref(eq:zScore):\index{data!standardized}\index{data!\textit{z} score}
\begin{equation}
z = \frac{x - \mu}{\sigma}
(\#eq:zScore)
\end{equation}
where $x$ is the observed score, $\mu$ is the mean observed score, and $\sigma$ is the standard deviation of observed scores.\index{data!\textit{z} score}
```{r}
scores$zScore <- scale(scores$rawData)
all.equal(
as.vector(scores$zScore),
(scores$rawData - mean(scores$rawData, na.rm = TRUE)) /
sd(scores$rawData, na.rm = TRUE))
```
An example histogram of *z* scores is in Figure \@ref(fig:zScores).\index{data!standardized}\index{data!\textit{z} score}\index{histogram}
(ref:zScores) Histogram of *z* Scores.
```{r zScores, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "(ref:zScores)"}
hist(scores$zScore,
xlab = "z Scores",
main = "Histogram of z Scores")
```
##### *T* scores {#tScores}
*T* scores have a mean of 50 and a standard deviation of 10.\index{data!standardized}\index{data!\textit{T} score}
*T* scores are frequently used with personality and symptom measures, where clinical cutoffs are often set at 70 (i.e., two standard deviations above the mean).\index{data!standardized}\index{data!\textit{T} score}\index{personality assessment}
For the Minnesota Multiphasic Personality Inventory (MMPI), you would examine peaks (elevations $\ge$ 70) and absences ($\le$ 30).\index{data!\textit{T} score}\index{Minnesota Multiphasic Personality Inventory}
*T* scores are calculated using Equation \@ref(eq:TScore):\index{data!standardized}\index{data!\textit{T} score}
\begin{equation}
T = 50 + 10z
(\#eq:TScore)
\end{equation}
where $z$ are *z* scores.\index{data!\textit{z} score}
```{r}
scores$tScore <- 50 + 10*scale(scores$rawData)
```
An example histogram of *T* scores is in Figure \@ref(fig:tScores).\index{data!\textit{T} score}\index{histogram}
(ref:tScores) Histogram of *T* Scores.
```{r tScores, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "(ref:tScores)"}
hist(scores$tScore,
xlab = "T Scores",
main = "Histogram of T Scores")
```
##### Standard scores {#standardScores}
Standard scores have a mean of 100 and a standard deviation of 15.\index{data!standardized}\index{data!standard score}
Standard scores are frequently used for tests of intellectual ability, academic achievement, and cognitive ability.\index{data!standardized}\index{data!standard score}\index{intellectual assessment}\index{achievement!testing}
Intellectual disability is generally considered an I.Q. standard score less than 70 (two standard deviations below the mean), whereas giftedness is at 130 (two standard deviations above the mean).\index{data!standardized}\index{data!standard score}\index{intellectual assessment}
Standard scores with a mean of 100 and standard deviation of 15 are calculated using Equation \@ref(eq:standardScore):\index{data!standardized}\index{data!standard score}
\begin{equation}
\text{standard score} = 100 + 15z
(\#eq:standardScore)
\end{equation}
where $z$ are *z* scores.\index{data!\textit{z} score}
```{r}
scores$standardScore <- 100 + 15*scale(scores$rawData)
```
An example histogram of standard scores is in Figure \@ref(fig:standardScores).\index{data!standardized}\index{data!standard score}\index{histogram}
```{r standardScores, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "Histogram of Standard Scores."}
hist(scores$standardScore,
xlab = "Standard Scores",
main = "Histogram of Standard Scores")
```
##### Scaled scores {#scaledScores}
Scaled scores are raw scores that have been converted to a standardized metric.\index{data!standardized}\index{data!scaled score}
The particular metric of the scaled score depends on the measure.\index{data!scaled score}
On tests of intellectual or cognitive ability, scaled scores commonly have a mean of 10 and a standard deviation of 3.\index{data!standardized}\index{intellectual assessment}\index{data!scaled score}
Using this metric, scaled scores can be calculated using Equation \@ref(eq:scaleScore):\index{data!standardized}\index{data!scaled score}
\begin{equation}
\text{scale score} = 10 + 3z
(\#eq:scaleScore)
\end{equation}
where $z$ are *z* scores.\index{data!\textit{z} score}
```{r}
scores$scaledScore <- 10 + 3*scale(scores$rawData)
```
An example histogram of scaled scores is in Figure \@ref(fig:scaledScores).\index{data!standardized}\index{data!scaled score}\index{histogram}
```{r scaledScores, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "Histogram of Scaled Scores."}
hist(scores$scaledScore,
xlab = "Scale Scores",
main = "Histogram of Scaled Scores")
```
##### Stanine scores {#stanineScores}
Stanine scores (short for STANdard Nine) have a mean of 5 and a standard deviation of 2.\index{data!standardized}\index{data!stanine score}
The scores range from 1–9.\index{data!standardized}\index{data!stanine score}
Stanine scores are calculated using the bracketed proportions in Table \@ref(tab:stanineCalculation).\index{data!standardized}\index{data!stanine score}
The lowest 4% receive a stanine score of 1, the next 7% receive a stanine score of 2, etc.\index{data!standardized}\index{data!stanine score}
```{r, include = FALSE}
stanineConverstionTable <- data.frame(stanine = 1:9)
stanineConverstionTable$bracketedPercent <- c(4,7,12,17,20,17,12,7,4)
stanineConverstionTable$bracketedPercentTable <- paste(stanineConverstionTable$bracketedPercent, "%", sep = "")
stanineConverstionTable$bracketedPercent <- NULL
stanineConverstionTable$cumulativePercent <- c(4,11,23,40,60,77,89,96,100)
stanineConverstionTable$cumulativePercentTable <- paste(stanineConverstionTable$cumulativePercent, "%", sep = "")
stanineConverstionTable$cumulativePercent <- NULL
stanineConverstionTable$zScore <- c("below −1.75","−1.75 to −1.25","−1.25 to −0.75","−0.75 to −0.25","−0.25 to +0.25","+0.25 to +0.75","+0.75 to +1.25","+1.25 to +1.75","above +1.75")
stanineConverstionTable$standardScore <- c("below 74","74 to 81","81 to 89","89 to 96","96 to 104","104 to 111","111 to 119","119 to 126","above 126")
names(stanineConverstionTable) <- c("Stanine","Bracketed Percent","Cumulative Percent","z Score","Standard Score")
```
```{r stanineCalculation, echo = FALSE}
kable(stanineConverstionTable,
caption = "Calculating Stanine Scores",
booktabs = TRUE)
```
```{r}
scores$stanineScore <- NA
scores$stanineScore[which(scores$percentileRank <= 4)] <- 1
scores$stanineScore[
which(scores$percentileRank > 4 & scores$percentileRank <= 11)] <- 2
scores$stanineScore[
which(scores$percentileRank > 11 & scores$percentileRank <= 23)] <- 3
scores$stanineScore[
which(scores$percentileRank > 23 & scores$percentileRank <= 40)] <- 4
scores$stanineScore[
which(scores$percentileRank > 40 & scores$percentileRank <= 60)] <- 5
scores$stanineScore[
which(scores$percentileRank > 60 & scores$percentileRank <= 77)] <- 6
scores$stanineScore[
which(scores$percentileRank > 77 & scores$percentileRank <= 89)] <- 7
scores$stanineScore[
which(scores$percentileRank > 89 & scores$percentileRank <= 96)] <- 8
scores$stanineScore[which(scores$percentileRank > 96)] <- 9
```
A histogram of stanine scores is in Figure \@ref(fig:stanineScores).\index{data!standardized}\index{data!stanine score}\index{data!histogram}
```{r stanineScores, out.width = "100%", fig.align = "center", class.source = "fold-hide", fig.cap = "Histogram of Stanine Scores."}
hist(scores$stanineScore,
xlab = "Stanine Scores",
main = "Histogram of Stanine Scores",
breaks = seq(from = 0.5, to = 9.5, by = 1),
xlim = c(0,10),
xaxp = c(1,9,8))
```
## Conclusion {#conclusion-scoresScales}
It is important to consider the [types of data](#dataTypes) your data are because the [types of data](#dataTypes) restrict what options are available to analyze the data.\index{data!type}
It is also important to consider whether the data were [transformed](#scoreTransformation) because score transformations are not neutral—they can impact the results.\index{data!transformation}
## Suggested Readings {#readings-scoresScales}
@Childs2021: https://dzchilds.github.io/stats-for-bio/data-transformations.html (archived at https://perma.cc/3YEB-QW2V)
## Exercises {#exercises-scoresScales}