In my current position at the University of Vienna I frequently work with data sets from the Austrian Corona Panel Project (ACPP) and the Austrian National Election Study (AUTNES). Both projects store their data in the Dataverse of the Austrian Social Science Data Archive (AUSSDA).
I'm a big fan of reproducible research and publicly available data sets are key for this. The mushrooming of dataverse repositories is an exciting development because it makes data sets available for everyone and brings us one step closer to fully reproducible research. However, I do not like to have data sets stored locally on my hard drive. The key concerns here are reproducibility across different computers and users as well as sacrificing disk space to data sets that might be outdated after an update, although the latter is not really an issue with the data sets from the ACPP and AUTNES. Lucklily, there is an easy solution: We can access the ACPP and AUTNES data sets through the Dataverse API. This also also speeds up the workflow of any project involving these data sets and allows to share code without sharing data sets that require each user to have individual access rights.
We'll be using the dataverse
package available on CRAN (more information here).
You can install it with the following line of code:
# Install from CRAN
install.packages("dataverse", dependencies = TRUE)
In order to be able to connect with the Dataverse API you will need an account with the AUSSDA dataverse. You can sign up through the SSO of your institution or using your email. After you have generated your account you'll need the DATAVERSE_KEY
, which is the API Token that connects your API request with your registered dataverse account. You can obtain the API Token by logging into your account, clicking on your name in the top left, and selecting API Token as can be seen in the picture below.
After clicking on API Token you will be taken to a page (image below) where you can generate a 37 digit Token that is valid for one year. Under no circumstances should you share this token with anyone. Treat it like your username/password combination and make sure it is never included in code you share with others or push into a publicly available repository.
We can now set up the R
script for the dataverse call. You should start with loading the library and specifying the DATAVERSE_KEY
, which takes the API Token you copied from the AUSSDA website between the quotation marks.
## Loading the dataverse library
library("dataverse")
## Specifying the API Token we received from AUSSDA
Sys.setenv("DATAVERSE_KEY" = "YOUR_API_KEY")
Calling data sets from the AUSSDA dataverse is done using the get_dataframe_by_name()
function, which takes the following arguments:
filename
= The file name of the data set we want to downloaddataset
= The DOI of the repository that holds the data set we want to download.f
= The function we want to use to read the data set. Since we have .tab data in both cases we useread_dta()
from thehaven
packageoriginal
= Which specifies if we want the original version of the file (TRUE) or the archival version that the dataverse generated (FALSE).server
= The server address. For the AUSSDA data this always is: data.aussda.at
The ACPP data exists in two versions. The SUF version for scientific use and an OA version for open access. If we want to download the ACPP data we can do so by gathering the relevant information from the repository pages of the scientific use or open access versions of the data. The filename and DOI for those are:
Data Set | filename | DOI | Server | Language |
---|---|---|---|---|
ACPP SUF | 10094_da_de_v2_0.tab | 10.11587/28KQNS | data.aussda.at | German |
ACPP OA | 10095_da_de_v1_0.tab | 10.11587/P5YJ0O | data.aussda.at | German |
If we want to download the scientific use version of the ACPP data we can do so by using the filename 10094_da_de_v2_0.tab and the DOI 10.11587/28KQNS. Hence, we can request the SUF data with the following R
code:
# Coronapanel
df_acpp_suf <-
get_dataframe_by_name(
filename = "10094_da_de_v2_0.tab",
dataset = "10.11587/28KQNS",
.f = haven::read_dta,
original = TRUE,
server = "data.aussda.at")
The AUTNES panel surveys for the most recent Austrian election in 2017 were collected in two different ways. There exists an online panel study with six waves (4 pre- and 2 post-election) as well as a multi-mode study with phone and online modes (2 pre- and 1 post-election waves each). The modes, sample sizes, and survey times are very well explained here.
From the repository page of the AUTNES Online Panel Study 2017 (SUF edition) and the AUTNES Multi-Mode Panel Study 2017 (SUF edition) we can collect the filenames and DOI of the data sets:
Data Set | Year | filename | DOI | Server | Language |
---|---|---|---|---|---|
AUTNES Online | 2017 | 10017_da_en_v2_0.tab | 10.11587/I7QIYJ | data.aussda.at | English |
AUTNES Multi-mode | 2017 | 10025_da_en_v1_0.dta | 10.11587/NXDDPE | data.aussda.at | English |
R
code for requesting the AUTNES Online 2017 SUF version of the data set would look like this:
# Autnes
df_autnes_online <-
get_dataframe_by_name(
filename = "10017_da_en_v2_0.tab",
dataset = "10.11587/I7QIYJ",
.f = haven::read_dta,
original = TRUE,
server = "data.aussda.at")