diff --git a/DESCRIPTION b/DESCRIPTION index 6cc2b3c..9e59232 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -32,7 +32,6 @@ Suggests: tidytext, wordcloud, purrr, - ggplot2, - glue + ggplot2 URL: https://michalovadek.github.io/eurlex/ VignetteBuilder: knitr diff --git a/NEWS.md b/NEWS.md index 2c107f5..99b3bda 100644 --- a/NEWS.md +++ b/NEWS.md @@ -4,7 +4,7 @@ - it is now possible to select all resource types available with `elx_make_query(resource_type = "any")`. Since there are nearly 1 million CELEX codes, use with discretion and expect long execution times - results can be restricted to a particular directory code with `elx_make_query(directory = "18")` (directory code "18" denotes Common Foreign and Security Policy) -- results can be restricted to a particular sector with `elx_make_query(sector = 2)` (sector code 3 denotes EU international agreements) +- results can be restricted to a particular sector with `elx_make_query(sector = 2)` (sector code 2 denotes EU international agreements) ## Minor changes diff --git a/R/elx_make_query.R b/R/elx_make_query.R index b72251b..756e983 100644 --- a/R/elx_make_query.R +++ b/R/elx_make_query.R @@ -1,8 +1,8 @@ -#' Create SPARQL quries +#' Create SPARQL queries #' #' Generates pre-defined or manual SPARQL queries to retrieve document ids from Cellar. #' List of available resource types: http://publications.europa.eu/resource/authority/resource-type . -#' Note that not all resource types are compatible with the pre-defined query. +#' Note that not all resource types are compatible with default parameter values. #' #' @importFrom magrittr %>% #' @@ -46,6 +46,7 @@ elx_make_query <- function(resource_type = c("directive","regulation","decision" include_directory = FALSE, include_sector = FALSE, order = FALSE, limit = NULL){ + if (missing(resource_type)) stop("'resource_type' must be defined") if (!resource_type %in% c("any","directive","regulation","decision","recommendation","intagr","caselaw","manual","proposal","national_impl")) stop("'resource_type' must be defined") if (resource_type == "manual" & nchar(manual_type) < 2){ diff --git a/README.md b/README.md index d384a33..51d079c 100644 --- a/README.md +++ b/README.md @@ -26,11 +26,11 @@ For the moment, it is recommended to retrieve metadata one variable at a time. F 2. `dates <- elx_make_query("directive", include_date_transpos = TRUE) %>% elx_run_query()` 3. `ids %>% dplyr::left_join(lbs) %>% dplyr::left_join(dates)` -rather than `elx_make_query("directive", include_lbs = TRUE, include_date_transpos = TRUE)`. This approach should make it easier to understand the returned data frame(s), especially when some variables contain missing or duplicated data. +rather than `elx_make_query("directive", include_lbs = TRUE, include_date_transpos = TRUE)`. This approach should make it easier to understand the returned data frame(s), especially when some variables contain missing or duplicated data. Always keep an eye on whether the `work` and `celex` columns identify rows uniquely or not. One of the main contributions of the SPARQL requests is that we obtain a comprehensive list of identifiers that we can subsequently use to obtain more data relating to the document in question. While the results of the SPARQL queries are useful also for webscraping (with the `rvest` package), the function `elx_fetch_data()` enables us to fire GET requests to retrieve data on documents with known identifiers (including Cellar URI). The function currently enables downloading the title and the full text of a document in all available languages. -See the [vignette](https://michalovadek.github.io/eurlex/articles/eurlexpkg.html) for a walkthrough on how to use the package. Check function documentation for most up-to-date overview of features. +See the [vignette](https://michalovadek.github.io/eurlex/articles/eurlexpkg.html) for a walkthrough on how to use the package. Check function documentation for most up-to-date overview of features. Example use cases are shown in this [paper](https://www.tandfonline.com/doi/full/10.1080/2474736X.2020.1870150). ## Cite Michal Ovádek (2021) Facilitating access to data on European Union laws, Political Research Exchange, 3:1, DOI: [10.1080/2474736X.2020.1870150](https://www.tandfonline.com/doi/full/10.1080/2474736X.2020.1870150) @@ -40,6 +40,19 @@ This package nor its author are in any way affiliated with the EU Publications O Please consider contributing to the maintanance and development of the package by reporting bugs or suggesting new features. +## Latest changes + +### eurlex 0.3.5 + +- it is now possible to select all resource types available with `elx_make_query(resource_type = "any")`. Since there are nearly 1 million CELEX codes, use with discretion and expect long execution times +- results can be restricted to a particular directory code with `elx_make_query(directory = "18")` (directory code "18" denotes Common Foreign and Security Policy) +- results can be restricted to a particular sector with `elx_make_query(sector = 2)` (sector code 2 denotes EU international agreements) + +- new feature: request date of court case submission `elx_make_query(include_date_lodged = TRUE)` +- new feature: request type of court procedure and outcome `elx_make_query(include_court_procedure = TRUE)` +- new feature: request directory code of legal act `elx_make_query(include_directory = TRUE)` +- `elx_curia_list()` has a new default parameter `parse = TRUE` which creates separate columns for `ecli`, `see_case`, `appeal` applying regular expressions on `case_info` + ## Useful resources Guide to CELEX numbers: https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html diff --git a/doc/eurlexpkg.R b/doc/eurlexpkg.R index 488fba1..fb4df0b 100644 --- a/doc/eurlexpkg.R +++ b/doc/eurlexpkg.R @@ -17,27 +17,36 @@ results <- dirs %>% select(-force,-date) ## ----------------------------------------------------------------------------- query_dir %>% - glue::as_glue() # for nicer printing + cat() # for nicer printing elx_make_query(resource_type = "caselaw") %>% - glue::as_glue() + cat() elx_make_query(resource_type = "manual", manual_type = "SWD") %>% - glue::as_glue() + cat() ## ----------------------------------------------------------------------------- elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>% - glue::as_glue() + cat() # minimal query: elx_make_query(resource_type = "directive") elx_make_query(resource_type = "recommendation", include_date = TRUE, include_lbs = TRUE) %>% - glue::as_glue() + cat() # minimal query: elx_make_query(resource_type = "recommendation") +## ----------------------------------------------------------------------------- +# request documents from directory 18 ("Common Foreign and Security Policy") +# and sector 3 ("Legal acts") + +elx_make_query(resource_type = "any", + directory = "18", + sector = 3) %>% + cat() + ## ----runquery, eval=FALSE----------------------------------------------------- # results <- elx_run_query(query = query_dir) # @@ -65,18 +74,14 @@ rec_eurovoc %>% ## ----eurovoctable------------------------------------------------------------- - eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc) print(eurovoc_lookup) - ## ----appendlabs--------------------------------------------------------------- - rec_eurovoc %>% left_join(eurovoc_lookup) - ## ----------------------------------------------------------------------------- eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc, alt_labels = TRUE, @@ -86,7 +91,6 @@ rec_eurovoc %>% left_join(eurovoc_lookup) %>% select(celex, eurovoc, labels) - ## ----getdatapur, message = FALSE, warning=FALSE, error=FALSE------------------ # the function is not vectorized by default elx_fetch_data(results$work[1],"title") @@ -117,7 +121,9 @@ dirs %>% ## ----------------------------------------------------------------------------- dirs %>% - ggplot(aes(x = as.Date(date), y = celex)) + + filter(!is.na(force)) %>% + mutate(date = as.Date(date)) %>% + ggplot(aes(x = date, y = celex)) + geom_point(aes(color = force), alpha = 0.1) + theme(axis.text.y = element_blank(), axis.line.y = element_blank(), diff --git a/doc/eurlexpkg.Rmd b/doc/eurlexpkg.Rmd index 7258dbd..05b1e3d 100644 --- a/doc/eurlexpkg.Rmd +++ b/doc/eurlexpkg.Rmd @@ -2,7 +2,7 @@ title: "eurlex: Retrieve data on European Union law in R" output: rmarkdown::html_vignette description: > - Retrieve efficiently tidy data on European Union law in R with + Retrieve data on European Union law in R with pre-defined SPARQL and REST queries. vignette: > %\VignetteIndexEntry{eurlex: Retrieve data on European Union law in R} @@ -29,6 +29,8 @@ The `eurlex` R package attempts to significantly reduce the overhead associated The `eurlex` package currently envisions the typical use-case to consist of getting bulk information about EU legislation into R as fast as possible. The package contains three core functions to achieve that objective: `elx_make_query()` to create pre-defined or customized SPARQL queries; `elx_run_query()` to execute the pre-made or any other manually input query; and `elx_fetch_data()` to fire GET requests for certain metadata to the REST API. +The package also contains largely self-explanatory functions for retrieving data on EU court cases (`elx_curia_list()`) and Council votes (`elx_council_votes()`) from outside Eur-Lex. + ## `elx_make_query()`: Generate SPARQL queries The function `elx_make_query` takes as its first argument the type of resource to be retrieved from the semantic database that powers Eur-Lex (and other publications) called Cellar. @@ -55,13 +57,13 @@ The choice of resource type is then reflected in the SPARQL query generated by t ```{r} query_dir %>% - glue::as_glue() # for nicer printing + cat() # for nicer printing elx_make_query(resource_type = "caselaw") %>% - glue::as_glue() + cat() elx_make_query(resource_type = "manual", manual_type = "SWD") %>% - glue::as_glue() + cat() ``` @@ -69,21 +71,33 @@ There are various ways of querying the same information in the Cellar database d The other arguments in `elx_make_query()` relate to additional metadata to be returned. The results include by default the [CELEX number](https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html) and exclude corrigenda (corrections of errors in legislation). Other data needs to be opted into. Make sure to select ones that are logically compatible (e.g. case law does not have a legal basis). More options should be added in the future. -Note that availability of data for each variable has an impact on the results. The data frame returned by the query will be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids. +Note that availability of data for each variable might have an impact on the results. The data frame returned by the query might be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids. ```{r} elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>% - glue::as_glue() + cat() # minimal query: elx_make_query(resource_type = "directive") elx_make_query(resource_type = "recommendation", include_date = TRUE, include_lbs = TRUE) %>% - glue::as_glue() + cat() # minimal query: elx_make_query(resource_type = "recommendation") ``` +You can also decide to not specify any resource types, in which case all types of documents will be returned. As there are over a million documents with a CELEX identifier, this is likely not efficient for a majority of users. But since version 0.3.5 it is possible to request documents belonging to a particular ["sector"](https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html) or [directory code](https://eur-lex.europa.eu/browse/directories/legislation.html). + +```{r} +# request documents from directory 18 ("Common Foreign and Security Policy") +# and sector 3 ("Legal acts") + +elx_make_query(resource_type = "any", + directory = "18", + sector = 3) %>% + cat() +``` + Now that we have a query, we are ready to run it. ## `elx_run_query()`: Execute SPARQL queries @@ -135,20 +149,16 @@ rec_eurovoc %>% By default, the endpoint returns the EuroVoc concept codes rather than the labels (keywords). The function `elx_label_eurovoc()` needs to be called to obtain a look-up table with the labels. ```{r eurovoctable} - eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc) print(eurovoc_lookup) - ``` The results include labels only for unique identifiers, but with `dplyr::left_join()` it is straightforward to append the labels to the entire dataset. ```{r appendlabs} - rec_eurovoc %>% left_join(eurovoc_lookup) - ``` As elsewhere in the API, we can tap into the multilingual nature of EU documents also when it comes to the EuroVoc keywords. Moreover, most concepts in the thesaurus are associated with alternative labels; these can be returned as well (separated by a comma). @@ -161,7 +171,6 @@ eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc, rec_eurovoc %>% left_join(eurovoc_lookup) %>% select(celex, eurovoc, labels) - ``` ## `elx_fetch_data()`: Fire GET requests @@ -186,7 +195,7 @@ print(dir_titles) ``` -Note that text requests are by far the most time-intensive; requesting the full text for thousands of documents is liable to extend the run-time into hours. Currently, no method for downloading text in non-html/plain formats is implemented, which means pdf-only texts will be missing from the results.^[It is worth pointing out that the html and pdf contents of older case law differs. Whereas typically the html file is only going to contain a summary and grounds of a judgment, the pdf should also contain background to the dispute.] +Note that text requests are by far the most time-intensive; requesting the full text for thousands of documents is liable to extend the run-time into hours. Texts are retrieved from html by priority, but methods for pdfs and .docs are also implemented.^[It is worth pointing out that the html and pdf contents of older case law differs. Whereas typically the html file is only going to contain a summary and grounds of a judgment, the pdf should also contain background to the dispute.] The function even handles multi-document resources (by pasting them together). # Application @@ -213,7 +222,9 @@ Directives become naturally outdated with time. It might be all the more interes ```{r} dirs %>% - ggplot(aes(x = as.Date(date), y = celex)) + + filter(!is.na(force)) %>% + mutate(date = as.Date(date)) %>% + ggplot(aes(x = date, y = celex)) + geom_point(aes(color = force), alpha = 0.1) + theme(axis.text.y = element_blank(), axis.line.y = element_blank(), @@ -251,6 +262,6 @@ dirs_1970_title %>% I use term-frequency inverse-document frequency (tf-idf) to weight the importance of the words in the wordcloud. If we used pure frequencies, the wordcloud would largely consist of words conveying little meaning ("the", "and", ...). -This is an extremely basic application of the `eurlex` package. Much more sophisticated methods can be used to analyse both the content and metadata of European Union legislation. If the package is useful for your research, please consider citing it. +This is an extremely basic application of the `eurlex` package. Much more sophisticated methods can be used to analyse both the content and metadata of European Union legislation. If the package is useful for your research, please consider citing the [accompanying paper](https://www.tandfonline.com/doi/full/10.1080/2474736X.2020.1870150).^[Michal Ovádek (2021) Facilitating access to data on European Union laws, Political Research Exchange, 3:1, DOI: [10.1080/2474736X.2020.1870150](https://www.tandfonline.com/doi/full/10.1080/2474736X.2020.1870150)] diff --git a/doc/eurlexpkg.html b/doc/eurlexpkg.html index b4098ba..495b333 100644 --- a/doc/eurlexpkg.html +++ b/doc/eurlexpkg.html @@ -311,6 +311,7 @@

Introduction

The eurlex package

The eurlex package currently envisions the typical use-case to consist of getting bulk information about EU legislation into R as fast as possible. The package contains three core functions to achieve that objective: elx_make_query() to create pre-defined or customized SPARQL queries; elx_run_query() to execute the pre-made or any other manually input query; and elx_fetch_data() to fire GET requests for certain metadata to the REST API.

+

The package also contains largely self-explanatory functions for retrieving data on EU court cases (elx_curia_list()) and Council votes (elx_council_votes()) from outside Eur-Lex.

elx_make_query(): Generate SPARQL queries

The function elx_make_query takes as its first argument the type of resource to be retrieved from the semantic database that powers Eur-Lex (and other publications) called Cellar.

@@ -321,266 +322,306 @@

elx_make_query(): Generate SPARQL queries

Currently, it is possible to choose from among a host of resource types, including directives, regulations and even case law (see function description for the full list). It is also possible to manually specify a resource type from the eligible list.1

The choice of resource type is then reflected in the SPARQL query generated by the function:

query_dir %>% 
-  glue::as_glue() # for nicer printing
+  cat() # for nicer printing
 #> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
-#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
-#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
-#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
-#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
-#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
-#>   select distinct ?work ?type ?celex where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/DIR>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_IMPL>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_DEL>) 
-#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} ?work cdm:resource_legal_id_celex ?celex. }
-
-elx_make_query(resource_type = "caselaw") %>% 
-  glue::as_glue()
-#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
-#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
-#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
-#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
-#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
-#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
-#>   select distinct ?work ?type ?celex where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/JUDG>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/ORDER>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/OPIN_JUR>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/THIRDPARTY_PROCEED>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/GARNISHEE_ORDER>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/RULING>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/JUDG_EXTRACT>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/INFO_JUDICIAL>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/VIEW_AG>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/OPIN_AG>) ?work cdm:resource_legal_id_celex ?celex. }
-
-elx_make_query(resource_type = "manual", manual_type = "SWD") %>% 
-  glue::as_glue()
-#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
-#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
-#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
-#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
-#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
-#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
-#>   select distinct ?work ?type ?celex where{ ?work cdm:work_has_resource-type ?type.FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/SWD>) 
-#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} ?work cdm:resource_legal_id_celex ?celex. }
+#> PREFIX annot: <http://publications.europa.eu/ontology/annotation#> +#> PREFIX skos:<http://www.w3.org/2004/02/skos/core#> +#> PREFIX dc:<http://purl.org/dc/elements/1.1/> +#> PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> +#> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> +#> PREFIX owl:<http://www.w3.org/2002/07/owl#> +#> select distinct ?work ?type ?celex where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/DIR>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_IMPL>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_DEL>) +#> FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} } + +elx_make_query(resource_type = "caselaw") %>% + cat() +#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#> +#> PREFIX annot: <http://publications.europa.eu/ontology/annotation#> +#> PREFIX skos:<http://www.w3.org/2004/02/skos/core#> +#> PREFIX dc:<http://purl.org/dc/elements/1.1/> +#> PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> +#> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> +#> PREFIX owl:<http://www.w3.org/2002/07/owl#> +#> select distinct ?work ?type ?celex where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/JUDG>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/ORDER>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/OPIN_JUR>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/THIRDPARTY_PROCEED>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/GARNISHEE_ORDER>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/RULING>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/JUDG_EXTRACT>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/INFO_JUDICIAL>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/VIEW_AG>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/OPIN_AG>) OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} } + +elx_make_query(resource_type = "manual", manual_type = "SWD") %>% + cat() +#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#> +#> PREFIX annot: <http://publications.europa.eu/ontology/annotation#> +#> PREFIX skos:<http://www.w3.org/2004/02/skos/core#> +#> PREFIX dc:<http://purl.org/dc/elements/1.1/> +#> PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> +#> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> +#> PREFIX owl:<http://www.w3.org/2002/07/owl#> +#> select distinct ?work ?type ?celex where{ ?work cdm:work_has_resource-type ?type.FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/SWD>) +#> FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} }

There are various ways of querying the same information in the Cellar database due to the existence of several overlapping classes and identifiers describing the same resources. The queries generated by the function should offer a reliable way of obtaining exhaustive results, as they have been validated by the helpdesk of the Publication Office. At the same time, it is always possible there will be issues either on the query or the database side; please report any you encounter through Github.

The other arguments in elx_make_query() relate to additional metadata to be returned. The results include by default the CELEX number and exclude corrigenda (corrections of errors in legislation). Other data needs to be opted into. Make sure to select ones that are logically compatible (e.g. case law does not have a legal basis). More options should be added in the future.

-

Note that availability of data for each variable has an impact on the results. The data frame returned by the query will be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids.

+

Note that availability of data for each variable might have an impact on the results. The data frame returned by the query might be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids.

elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>% 
-  glue::as_glue()
+  cat()
 #> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
-#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
-#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
-#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
-#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
-#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
-#>   select distinct ?work ?type ?celex str(?date) ?force where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/DIR>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_IMPL>||
-#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_DEL>) 
-#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} ?work cdm:resource_legal_id_celex ?celex. ?work cdm:work_date_document ?date. ?work cdm:resource_legal_in-force ?force. }
-
-# minimal query: elx_make_query(resource_type = "directive")
-
-elx_make_query(resource_type = "recommendation", include_date = TRUE, include_lbs = TRUE) %>% 
-  glue::as_glue()
-#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
-#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
-#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
-#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
-#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
-#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
-#>   select distinct ?work ?type ?celex str(?date) ?lbs ?lbcelex where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/RECO>||
-#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_DEC>||
-#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_DIR>||
-#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_OPIN>||
-#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_RES>||
-#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_REG>||
-#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_RECO>||
-#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_DRAFT>) 
-#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} ?work cdm:resource_legal_id_celex ?celex. ?work cdm:work_date_document ?date. ?work cdm:resource_legal_based_on_resource_legal ?lbs.
-#>                    ?lbs cdm:resource_legal_id_celex ?lbcelex. }
-
-# minimal query: elx_make_query(resource_type = "recommendation")
+#> PREFIX annot: <http://publications.europa.eu/ontology/annotation#> +#> PREFIX skos:<http://www.w3.org/2004/02/skos/core#> +#> PREFIX dc:<http://purl.org/dc/elements/1.1/> +#> PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> +#> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> +#> PREFIX owl:<http://www.w3.org/2002/07/owl#> +#> select distinct ?work ?type ?celex str(?date) ?force where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/DIR>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_IMPL>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_DEL>) +#> FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} OPTIONAL{?work cdm:work_date_document ?date.} OPTIONAL{?work cdm:resource_legal_in-force ?force.} } + +# minimal query: elx_make_query(resource_type = "directive") + +elx_make_query(resource_type = "recommendation", include_date = TRUE, include_lbs = TRUE) %>% + cat() +#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#> +#> PREFIX annot: <http://publications.europa.eu/ontology/annotation#> +#> PREFIX skos:<http://www.w3.org/2004/02/skos/core#> +#> PREFIX dc:<http://purl.org/dc/elements/1.1/> +#> PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> +#> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> +#> PREFIX owl:<http://www.w3.org/2002/07/owl#> +#> select distinct ?work ?type ?celex str(?date) ?lbs ?lbcelex ?lbsuffix where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/RECO>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_DEC>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_DIR>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_OPIN>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_RES>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_REG>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_RECO>|| +#> ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_DRAFT>) +#> FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} OPTIONAL{?work cdm:work_date_document ?date.} OPTIONAL{?work cdm:resource_legal_based_on_resource_legal ?lbs. +#> ?lbs cdm:resource_legal_id_celex ?lbcelex. +#> OPTIONAL{?bn owl:annotatedSource ?work. +#> ?bn owl:annotatedProperty <http://publications.europa.eu/ontology/cdm#resource_legal_based_on_resource_legal>. +#> ?bn owl:annotatedTarget ?lbs. +#> ?bn annot:comment_on_legal_basis ?lbsuffix}} } + +# minimal query: elx_make_query(resource_type = "recommendation")
+

You can also decide to not specify any resource types, in which case all types of documents will be returned. As there are over a million documents with a CELEX identifier, this is likely not efficient for a majority of users. But since version 0.3.5 it is possible to request documents belonging to a particular “sector” or directory code.

+
# request documents from directory 18 ("Common Foreign and Security Policy")
+# and sector 3 ("Legal acts")
+
+elx_make_query(resource_type = "any",
+               directory = "18",
+               sector = 3) %>% 
+  cat()
+#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
+#>   PREFIX annot: <http://publications.europa.eu/ontology/annotation#>
+#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
+#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
+#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
+#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
+#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
+#>   select distinct ?work ?type ?celex where{
+#>     VALUES (?value)
+#>     { (<http://publications.europa.eu/resource/authority/fd_555/18>)
+#>       (<http://publications.europa.eu/resource/authority/dir-eu-legal-act/18>)
+#>     }
+#>     {?work cdm:resource_legal_is_about_concept_directory-code ?value.
+#>     }
+#>     UNION
+#>     {?work cdm:resource_legal_is_about_concept_directory-code ?directory.
+#>       ?value skos:narrower+ ?directory.
+#>     }
+#>     
+#>     ?work cdm:resource_legal_id_sector ?sector.
+#>     FILTER(str(?sector)='3')
+#>      
+#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} }

Now that we have a query, we are ready to run it.

elx_run_query(): Execute SPARQL queries

elx_run_query() sends SPARQL queries to a pre-specified endpoint. The function takes the query string as the main argument, which means you can manually pass it any working SPARQL query (relevant to official EU publications).

-
results <- elx_run_query(query = query_dir)
-
-# the functions are compatible with piping
-# 
-# elx_make_query("directive") %>% 
-#   elx_run_query()
-
as_tibble(results)
-#> # A tibble: 4,192 x 3
-#>   work                                   type                            celex  
-#>   <chr>                                  <chr>                           <chr>  
-#> 1 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31979L~
-#> 2 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31989L~
-#> 3 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31984L~
-#> 4 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31966L~
-#> # ... with 4,188 more rows
+
results <- elx_run_query(query = query_dir)
+
+# the functions are compatible with piping
+# 
+# elx_make_query("directive") %>% 
+#   elx_run_query()
+
as_tibble(results)
+#> # A tibble: 4,317 x 3
+#>   work                                   type                            celex  
+#>   <chr>                                  <chr>                           <chr>  
+#> 1 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31979L~
+#> 2 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31989L~
+#> 3 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31984L~
+#> 4 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31966L~
+#> # ... with 4,313 more rows

The function outputs a data.frame where each column corresponds to one of the requested variables, while the rows accumulate observations of the resource type satisfying the query criteria. Obviously, the more data is to be returned, the longer the execution time, varying from a few seconds to several minutes, depending also on your connection.

The first column always contains the unique URI of a “work” (legislative act or court judgment) which identifies each resource in Cellar. Several human-readable identifiers are normally associated with each “work” but the most useful one is CELEX, retrieved by default.2

One column you should always pay attention to is type (as in resource_type). The URIs contained there reflect the FILTER argument in the SPARQL query, which is manually pre-specified. All resources are indexed as being of one type or another. For example, when retrieving directives, the results are going to return also delegated directives, which might not be desirable, depending on your needs. You can filter results by type to make the necessary adjustments. The queries are expansive by default in the spirit of erring on the side of over-inclusiveness rather than vice versa.

-
head(results$type,5)
-#> [1] "http://publications.europa.eu/resource/authority/resource-type/DIR"
-#> [2] "http://publications.europa.eu/resource/authority/resource-type/DIR"
-#> [3] "http://publications.europa.eu/resource/authority/resource-type/DIR"
-#> [4] "http://publications.europa.eu/resource/authority/resource-type/DIR"
-#> [5] "http://publications.europa.eu/resource/authority/resource-type/DIR"
-
-results %>% 
-  distinct(type)
-#> # A tibble: 3 x 1
-#>   type                                                                   
-#>   <chr>                                                                  
-#> 1 http://publications.europa.eu/resource/authority/resource-type/DIR     
-#> 2 http://publications.europa.eu/resource/authority/resource-type/DIR_IMPL
-#> 3 http://publications.europa.eu/resource/authority/resource-type/DIR_DEL
+
head(results$type,5)
+#> [1] "http://publications.europa.eu/resource/authority/resource-type/DIR"
+#> [2] "http://publications.europa.eu/resource/authority/resource-type/DIR"
+#> [3] "http://publications.europa.eu/resource/authority/resource-type/DIR"
+#> [4] "http://publications.europa.eu/resource/authority/resource-type/DIR"
+#> [5] "http://publications.europa.eu/resource/authority/resource-type/DIR"
+
+results %>% 
+  distinct(type)
+#> # A tibble: 3 x 1
+#>   type                                                                   
+#>   <chr>                                                                  
+#> 1 http://publications.europa.eu/resource/authority/resource-type/DIR     
+#> 2 http://publications.europa.eu/resource/authority/resource-type/DIR_IMPL
+#> 3 http://publications.europa.eu/resource/authority/resource-type/DIR_DEL

The data is returned in the long format, which means that rows are recycled up to the length of the variable with the most data points. For example, if 20 directives are returned, each with two legal bases, the resulting data.frame will have 40 rows. Some variables, such as dates, contain unexpectedly several entries for some documents. You should always check the number of unique identifiers in the results instead of assuming that each row is a unique observation.

EuroVoc descriptors

EuroVoc is a multilingual thesaurus, keywords from which are used to describe the content of European Union documents. Most resource types that can be retrieved with the pre-defined queries in this package can be accompanied by EuroVoc keywords and these can be retrieved as other variables.

- -

By default, the endpoint returns the EuroVoc concept codes rather than the labels (keywords). The function elx_label_eurovoc() needs to be called to obtain a look-up table with the labels.

+rec_eurovoc <- elx_make_query("recommendation", include_eurovoc = TRUE, limit = 10) %>% + elx_run_query() # truncated results for sake of the example + +rec_eurovoc %>% + select(celex, eurovoc) +#> # A tibble: 10 x 2 +#> celex eurovoc +#> <chr> <chr> +#> 1 32012H0090 http://eurovoc.europa.eu/1425 +#> 2 31962H0816 http://eurovoc.europa.eu/1004 +#> 3 31974H0435 http://eurovoc.europa.eu/1085 +#> 4 31996H0592 http://eurovoc.europa.eu/1076 +#> # ... with 6 more rows
+

By default, the endpoint returns the EuroVoc concept codes rather than the labels (keywords). The function elx_label_eurovoc() needs to be called to obtain a look-up table with the labels.

+
eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc)
+
+print(eurovoc_lookup)
+#> # A tibble: 9 x 2
+#>   eurovoc                       labels         
+#>   <chr>                         <chr>          
+#> 1 http://eurovoc.europa.eu/1085 France         
+#> 2 http://eurovoc.europa.eu/1442 food inspection
+#> 3 http://eurovoc.europa.eu/1076 form           
+#> 4 http://eurovoc.europa.eu/1318 Germany        
+#> # ... with 5 more rows

The results include labels only for unique identifiers, but with dplyr::left_join() it is straightforward to append the labels to the entire dataset.

-

-rec_eurovoc %>% 
-  left_join(eurovoc_lookup)
-#> Joining, by = "eurovoc"
-#> # A tibble: 10 x 5
-#>   work                      type                   celex    eurovoc      labels 
-#>   <chr>                     <chr>                  <chr>    <chr>        <chr>  
-#> 1 http://publications.euro~ http://publications.e~ 31962H0~ http://euro~ welfare
-#> 2 http://publications.euro~ http://publications.e~ 32015H0~ http://euro~ tax sy~
-#> 3 http://publications.euro~ http://publications.e~ 32016H0~ http://euro~ tax sy~
-#> 4 http://publications.euro~ http://publications.e~ 32017H0~ http://euro~ tax sy~
-#> # ... with 6 more rows
+
rec_eurovoc %>% 
+  left_join(eurovoc_lookup)
+#> Joining, by = "eurovoc"
+#> # A tibble: 10 x 5
+#>   work                      type                   celex  eurovoc      labels   
+#>   <chr>                     <chr>                  <chr>  <chr>        <chr>    
+#> 1 http://publications.euro~ http://publications.e~ 32012~ http://euro~ consumer~
+#> 2 http://publications.euro~ http://publications.e~ 31962~ http://euro~ welfare  
+#> 3 http://publications.euro~ http://publications.e~ 31974~ http://euro~ France   
+#> 4 http://publications.euro~ http://publications.e~ 31996~ http://euro~ form     
+#> # ... with 6 more rows

As elsewhere in the API, we can tap into the multilingual nature of EU documents also when it comes to the EuroVoc keywords. Moreover, most concepts in the thesaurus are associated with alternative labels; these can be returned as well (separated by a comma).

-
eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc,
-                                    alt_labels = TRUE,
-                                    language = "sk")
-
-rec_eurovoc %>% 
-  left_join(eurovoc_lookup) %>% 
-  select(celex, eurovoc, labels)
-#> Joining, by = "eurovoc"
-#> # A tibble: 10 x 3
-#>   celex          eurovoc                       labels                      
-#>   <chr>          <chr>                         <chr>                       
-#> 1 31962H0816     http://eurovoc.europa.eu/1004 blahobyt                    
-#> 2 32015H0818(10) http://eurovoc.europa.eu/1021 danová sústava,danový systém
-#> 3 32016H0818(09) http://eurovoc.europa.eu/1021 danová sústava,danový systém
-#> 4 32017H0809(10) http://eurovoc.europa.eu/1021 danová sústava,danový systém
-#> # ... with 6 more rows
+
eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc,
+                                    alt_labels = TRUE,
+                                    language = "sk")
+
+rec_eurovoc %>% 
+  left_join(eurovoc_lookup) %>% 
+  select(celex, eurovoc, labels)
+#> Joining, by = "eurovoc"
+#> # A tibble: 10 x 3
+#>   celex     eurovoc                   labels                                    
+#>   <chr>     <chr>                     <chr>                                     
+#> 1 32012H00~ http://eurovoc.europa.eu~ informácie pre spotrebitela,vzdelávanie s~
+#> 2 31962H08~ http://eurovoc.europa.eu~ blahobyt                                  
+#> 3 31974H04~ http://eurovoc.europa.eu~ Francúzska republika,Francúzsko           
+#> 4 31996H05~ http://eurovoc.europa.eu~ formulár                                  
+#> # ... with 6 more rows

elx_fetch_data(): Fire GET requests

A core contribution of the SPARQL requests is that we obtain a comprehensive list of identifiers that we can subsequently use to obtain more data relating to the document in question. While the results of the SPARQL queries are useful also for webscraping (with the rvest package), the function elx_fetch_data() enables us to fire GET requests to retrieve data on documents with known identifiers (including Cellar URI).

One of the most sought-after data in the Eur-Lex dataverse is the text. It is possible now to automate the pipeline for downloading html and plain texts from Eur-Lex. Similarly, you can retrieve the title of the document. For both you can specify also the desired language (English by default). Other metadata might be added in the future.

-
# the function is not vectorized by default
-elx_fetch_data(results$work[1],"title")
-#> [1] "Council Directive 79/173/EEC of 6 February 1979 on the programme for the acceleration and guidance of collective irrigation works in Corsica"
-
-# we can use purrr::map() to play that role
-library(purrr)
-
-dir_titles <- results[1:10,] %>% # take the first 10 directives only to save time
-  mutate(title = map_chr(work,elx_fetch_data, "title")) %>% 
-  as_tibble() %>% 
-  select(celex, title)
-
-print(dir_titles)
-#> # A tibble: 10 x 2
-#>   celex      title                                                              
-#>   <chr>      <chr>                                                              
-#> 1 31979L0173 Council Directive 79/173/EEC of 6 February 1979 on the programme f~
-#> 2 31989L0194 Council Directive 89/194/EEC of 13 March 1989 amending Directive 6~
-#> 3 31984L0378 Council Directive 84/378/EEC of 28 June 1984 amending the Annexes ~
-#> 4 31966L0683 Commission Directive 66/683/EEC of 7 November 1966 eliminating all~
-#> # ... with 6 more rows
-

Note that text requests are by far the most time-intensive; requesting the full text for thousands of documents is liable to extend the run-time into hours. Currently, no method for downloading text in non-html/plain formats is implemented, which means pdf-only texts will be missing from the results.3

+
# the function is not vectorized by default
+elx_fetch_data(results$work[1],"title")
+#> [1] "Council Directive 79/173/EEC of 6 February 1979 on the programme for the acceleration and guidance of collective irrigation works in Corsica"
+
+# we can use purrr::map() to play that role
+library(purrr)
+
+dir_titles <- results[1:10,] %>% # take the first 10 directives only to save time
+  mutate(title = map_chr(work,elx_fetch_data, "title")) %>% 
+  as_tibble() %>% 
+  select(celex, title)
+
+print(dir_titles)
+#> # A tibble: 10 x 2
+#>   celex      title                                                              
+#>   <chr>      <chr>                                                              
+#> 1 31979L0173 Council Directive 79/173/EEC of 6 February 1979 on the programme f~
+#> 2 31989L0194 Council Directive 89/194/EEC of 13 March 1989 amending Directive 6~
+#> 3 31984L0378 Council Directive 84/378/EEC of 28 June 1984 amending the Annexes ~
+#> 4 31966L0683 Commission Directive 66/683/EEC of 7 November 1966 eliminating all~
+#> # ... with 6 more rows
+

Note that text requests are by far the most time-intensive; requesting the full text for thousands of documents is liable to extend the run-time into hours. Texts are retrieved from html by priority, but methods for pdfs and .docs are also implemented.3 The function even handles multi-document resources (by pasting them together).

Application

In this section I showcase a simple application of eurlex on making overviews of EU legislation. First, we collate data on directives.

-
dirs <- elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>% 
-  elx_run_query() %>% 
-  rename(date = `callret-3`)
+
dirs <- elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>% 
+  elx_run_query() %>% 
+  rename(date = `callret-3`)

Let’s calculate the proportion of directives currently in force in the entire set of directives ever adopted. This variable offers a particularly good demonstration of the usefulness of the package to retrieve EU law data, because it changes every day, as new acts enter into force and old ones drop out. Regularly scraping webpages for this purpose and scale is simply impractical and disproportional.

-
library(ggplot2)
-
-dirs %>% 
-  count(force) %>% 
-  ggplot(aes(x = force, y = n)) +
-  geom_col()
-

+
library(ggplot2)
+
+dirs %>% 
+  count(force) %>% 
+  ggplot(aes(x = force, y = n)) +
+  geom_col()
+

Directives become naturally outdated with time. It might be all the more interesting to see which older acts are thus still surviving.

-
dirs %>% 
-  ggplot(aes(x = as.Date(date), y = celex)) +
-  geom_point(aes(color = force), alpha = 0.1) +
-  theme(axis.text.y = element_blank(),
-        axis.line.y = element_blank(),
-        axis.ticks.y = element_blank())
-

+
dirs %>% 
+  filter(!is.na(force)) %>% 
+  mutate(date = as.Date(date)) %>% 
+  ggplot(aes(x = date, y = celex)) +
+  geom_point(aes(color = force), alpha = 0.1) +
+  theme(axis.text.y = element_blank(),
+        axis.line.y = element_blank(),
+        axis.ticks.y = element_blank())
+

We want to know a bit more about the directives from 1970s that are still in force today. Their titles could give us a clue.

-
dirs_1970_title <- dirs %>% 
-  filter(between(as.Date(date), as.Date("1970-01-01"), as.Date("1980-01-01")),
-         force == "true") %>% 
-  mutate(title = map_chr(work,elx_fetch_data,"title")) %>% 
-  as_tibble()
-
-print(dirs_1970_title)
-#> # A tibble: 78 x 6
-#>   work                 type               celex  date  force title              
-#>   <chr>                <chr>              <chr>  <chr> <chr> <chr>              
-#> 1 http://publications~ http://publicatio~ 31975~ 1975~ true  Council Directive ~
-#> 2 http://publications~ http://publicatio~ 31977~ 1977~ true  First Commission D~
-#> 3 http://publications~ http://publicatio~ 31977~ 1977~ true  Council Directive ~
-#> 4 http://publications~ http://publicatio~ 31973~ 1973~ true  Council Directive ~
-#> # ... with 74 more rows
+
dirs_1970_title <- dirs %>% 
+  filter(between(as.Date(date), as.Date("1970-01-01"), as.Date("1980-01-01")),
+         force == "true") %>% 
+  mutate(title = map_chr(work,elx_fetch_data,"title")) %>% 
+  as_tibble()
+
+print(dirs_1970_title)
+#> # A tibble: 70 x 6
+#>   work                 type               celex  date  force title              
+#>   <chr>                <chr>              <chr>  <chr> <chr> <chr>              
+#> 1 http://publications~ http://publicatio~ 31975~ 1975~ true  Council Directive ~
+#> 2 http://publications~ http://publicatio~ 31977~ 1977~ true  First Commission D~
+#> 3 http://publications~ http://publicatio~ 31977~ 1977~ true  Council Directive ~
+#> 4 http://publications~ http://publicatio~ 31973~ 1973~ true  Council Directive ~
+#> # ... with 66 more rows

I will use the tidytext package to get a quick idea of what the legislation is about.

-
library(tidytext)
-library(wordcloud)
-
-dirs_1970_title %>% 
-  select(celex,title) %>% 
-  unnest_tokens(word, title) %>% 
-  count(celex, word, sort = TRUE) %>% 
-  filter(!grepl("\\d", word)) %>% 
-  bind_tf_idf(word, celex, n) %>% 
-  with(wordcloud(word, tf_idf, max.words = 40, scale = c(1.8,0.1)))
-

+
library(tidytext)
+library(wordcloud)
+
+dirs_1970_title %>% 
+  select(celex,title) %>% 
+  unnest_tokens(word, title) %>% 
+  count(celex, word, sort = TRUE) %>% 
+  filter(!grepl("\\d", word)) %>% 
+  bind_tf_idf(word, celex, n) %>% 
+  with(wordcloud(word, tf_idf, max.words = 40, scale = c(1.8,0.1)))
+

I use term-frequency inverse-document frequency (tf-idf) to weight the importance of the words in the wordcloud. If we used pure frequencies, the wordcloud would largely consist of words conveying little meaning (“the”, “and”, …).

-

This is an extremely basic application of the eurlex package. Much more sophisticated methods can be used to analyse both the content and metadata of European Union legislation. If the package is useful for your research, please consider citing it.

+

This is an extremely basic application of the eurlex package. Much more sophisticated methods can be used to analyse both the content and metadata of European Union legislation. If the package is useful for your research, please consider citing the accompanying paper.4


@@ -588,6 +629,7 @@

Application

  • Note, however, that not all resource types will work properly with the pre-specified query.

  • Occasionally, you may encounter legal acts without CELEX numbers, especially when digging through older legislation. It is good to report these to the Eur-Lex helpdesk.

  • It is worth pointing out that the html and pdf contents of older case law differs. Whereas typically the html file is only going to contain a summary and grounds of a judgment, the pdf should also contain background to the dispute.

  • +
  • Michal Ovádek (2021) Facilitating access to data on European Union laws, Political Research Exchange, 3:1, DOI: 10.1080/2474736X.2020.1870150

  • diff --git a/docs/404.html b/docs/404.html index 45cec7e..21866c3 100644 --- a/docs/404.html +++ b/docs/404.html @@ -79,7 +79,7 @@ eurlex - 0.3.4 + 0.3.5 diff --git a/docs/articles/eurlexpkg.html b/docs/articles/eurlexpkg.html index 5739615..dfc9641 100644 --- a/docs/articles/eurlexpkg.html +++ b/docs/articles/eurlexpkg.html @@ -39,7 +39,7 @@ eurlex - 0.3.4 + 0.3.5 @@ -105,6 +105,7 @@

    The eurlex package

    The eurlex package currently envisions the typical use-case to consist of getting bulk information about EU legislation into R as fast as possible. The package contains three core functions to achieve that objective: elx_make_query() to create pre-defined or customized SPARQL queries; elx_run_query() to execute the pre-made or any other manually input query; and elx_fetch_data() to fire GET requests for certain metadata to the REST API.

    +

    The package also contains largely self-explanatory functions for retrieving data on EU court cases (elx_curia_list()) and Council votes (elx_council_votes()) from outside Eur-Lex.

    elx_make_query(): Generate SPARQL queries

    @@ -116,7 +117,7 @@

    Currently, it is possible to choose from among a host of resource types, including directives, regulations and even case law (see function description for the full list). It is also possible to manually specify a resource type from the eligible list.1

    The choice of resource type is then reflected in the SPARQL query generated by the function:

    query_dir %>%
    -  glue::as_glue() # for nicer printing
    +  cat() # for nicer printing
     #> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
     #>   PREFIX annot: <http://publications.europa.eu/ontology/annotation#>
     #>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
    @@ -130,7 +131,7 @@ 

    #> FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} } elx_make_query(resource_type = "caselaw") %>% - glue::as_glue() + cat() #> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#> #> PREFIX annot: <http://publications.europa.eu/ontology/annotation#> #> PREFIX skos:<http://www.w3.org/2004/02/skos/core#> @@ -150,7 +151,7 @@

    #> ?type=<http://publications.europa.eu/resource/authority/resource-type/OPIN_AG>) OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} } elx_make_query(resource_type = "manual", manual_type = "SWD") %>% - glue::as_glue() + cat() #> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#> #> PREFIX annot: <http://publications.europa.eu/ontology/annotation#> #> PREFIX skos:<http://www.w3.org/2004/02/skos/core#> @@ -162,9 +163,9 @@

    #> FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} }

    There are various ways of querying the same information in the Cellar database due to the existence of several overlapping classes and identifiers describing the same resources. The queries generated by the function should offer a reliable way of obtaining exhaustive results, as they have been validated by the helpdesk of the Publication Office. At the same time, it is always possible there will be issues either on the query or the database side; please report any you encounter through Github.

    The other arguments in elx_make_query() relate to additional metadata to be returned. The results include by default the CELEX number and exclude corrigenda (corrections of errors in legislation). Other data needs to be opted into. Make sure to select ones that are logically compatible (e.g. case law does not have a legal basis). More options should be added in the future.

    -

    Note that availability of data for each variable has an impact on the results. The data frame returned by the query will be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids.

    +

    Note that availability of data for each variable might have an impact on the results. The data frame returned by the query might be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids.

    elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>%
    -  glue::as_glue()
    +  cat()
     #> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
     #>   PREFIX annot: <http://publications.europa.eu/ontology/annotation#>
     #>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
    @@ -180,7 +181,7 @@ 

    # minimal query: elx_make_query(resource_type = "directive") elx_make_query(resource_type = "recommendation", include_date = TRUE, include_lbs = TRUE) %>% - glue::as_glue() + cat() #> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#> #> PREFIX annot: <http://publications.europa.eu/ontology/annotation#> #> PREFIX skos:<http://www.w3.org/2004/02/skos/core#> @@ -204,31 +205,62 @@

    #> ?bn annot:comment_on_legal_basis ?lbsuffix}} } # minimal query: elx_make_query(resource_type = "recommendation")

    +

    You can also decide to not specify any resource types, in which case all types of documents will be returned. As there are over a million documents with a CELEX identifier, this is likely not efficient for a majority of users. But since version 0.3.5 it is possible to request documents belonging to a particular “sector” or directory code.

    +
    # request documents from directory 18 ("Common Foreign and Security Policy")
    +# and sector 3 ("Legal acts")
    +
    +elx_make_query(resource_type = "any",
    +               directory = "18",
    +               sector = 3) %>%
    +  cat()
    +#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
    +#>   PREFIX annot: <http://publications.europa.eu/ontology/annotation#>
    +#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
    +#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
    +#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
    +#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    +#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
    +#>   select distinct ?work ?type ?celex where{
    +#>     VALUES (?value)
    +#>     { (<http://publications.europa.eu/resource/authority/fd_555/18>)
    +#>       (<http://publications.europa.eu/resource/authority/dir-eu-legal-act/18>)
    +#>     }
    +#>     {?work cdm:resource_legal_is_about_concept_directory-code ?value.
    +#>     }
    +#>     UNION
    +#>     {?work cdm:resource_legal_is_about_concept_directory-code ?directory.
    +#>       ?value skos:narrower+ ?directory.
    +#>     }
    +#>     
    +#>     ?work cdm:resource_legal_id_sector ?sector.
    +#>     FILTER(str(?sector)='3')
    +#>      
    +#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} }

    Now that we have a query, we are ready to run it.

    elx_run_query(): Execute SPARQL queries

    elx_run_query() sends SPARQL queries to a pre-specified endpoint. The function takes the query string as the main argument, which means you can manually pass it any working SPARQL query (relevant to official EU publications).

    -
    results <- elx_run_query(query = query_dir)
    +
    results <- elx_run_query(query = query_dir)
     
     # the functions are compatible with piping
     # 
     # elx_make_query("directive") %>% 
     #   elx_run_query()
    -
    as_tibble(results)
    -#> # A tibble: 4,316 x 3
    +
    as_tibble(results)
    +#> # A tibble: 4,317 x 3
     #>   work                                   type                            celex  
     #>   <chr>                                  <chr>                           <chr>  
     #> 1 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31979L~
     #> 2 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31989L~
     #> 3 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31984L~
     #> 4 http://publications.europa.eu/resourc~ http://publications.europa.eu/~ 31966L~
    -#> # ... with 4,312 more rows
    +#> # ... with 4,313 more rows

    The function outputs a data.frame where each column corresponds to one of the requested variables, while the rows accumulate observations of the resource type satisfying the query criteria. Obviously, the more data is to be returned, the longer the execution time, varying from a few seconds to several minutes, depending also on your connection.

    The first column always contains the unique URI of a “work” (legislative act or court judgment) which identifies each resource in Cellar. Several human-readable identifiers are normally associated with each “work” but the most useful one is CELEX, retrieved by default.2

    One column you should always pay attention to is type (as in resource_type). The URIs contained there reflect the FILTER argument in the SPARQL query, which is manually pre-specified. All resources are indexed as being of one type or another. For example, when retrieving directives, the results are going to return also delegated directives, which might not be desirable, depending on your needs. You can filter results by type to make the necessary adjustments. The queries are expansive by default in the spirit of erring on the side of over-inclusiveness rather than vice versa.

    -
    head(results$type,5)
    +
    head(results$type,5)
     #> [1] "http://publications.europa.eu/resource/authority/resource-type/DIR"
     #> [2] "http://publications.europa.eu/resource/authority/resource-type/DIR"
     #> [3] "http://publications.europa.eu/resource/authority/resource-type/DIR"
    @@ -248,7 +280,7 @@ 

    EuroVoc descriptors

    EuroVoc is a multilingual thesaurus, keywords from which are used to describe the content of European Union documents. Most resource types that can be retrieved with the pre-defined queries in this package can be accompanied by EuroVoc keywords and these can be retrieved as other variables.

    -
    +
     rec_eurovoc <- elx_make_query("recommendation", include_eurovoc = TRUE, limit = 10) %>%
       elx_run_query() # truncated results for sake of the example
     
    @@ -263,7 +295,7 @@ 

    #> 4 31996H0592 http://eurovoc.europa.eu/1076 #> # ... with 6 more rows

    By default, the endpoint returns the EuroVoc concept codes rather than the labels (keywords). The function elx_label_eurovoc() needs to be called to obtain a look-up table with the labels.

    -
    eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc)
    +
    eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc)
     
     print(eurovoc_lookup)
     #> # A tibble: 9 x 2
    @@ -275,7 +307,7 @@ 

    #> 4 http://eurovoc.europa.eu/1318 Germany #> # ... with 5 more rows

    The results include labels only for unique identifiers, but with dplyr::left_join() it is straightforward to append the labels to the entire dataset.

    -
    rec_eurovoc %>%
    +
    rec_eurovoc %>%
       left_join(eurovoc_lookup)
     #> Joining, by = "eurovoc"
     #> # A tibble: 10 x 5
    @@ -287,7 +319,7 @@ 

    #> 4 http://publications.euro~ http://publications.e~ 31996~ http://euro~ form #> # ... with 6 more rows

    As elsewhere in the API, we can tap into the multilingual nature of EU documents also when it comes to the EuroVoc keywords. Moreover, most concepts in the thesaurus are associated with alternative labels; these can be returned as well (separated by a comma).

    -
    eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc,
    +
    eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc,
                                         alt_labels = TRUE,
                                         language = "sk")
     
    @@ -310,7 +342,7 @@ 

    elx_fetch_data(): Fire GET requests

    A core contribution of the SPARQL requests is that we obtain a comprehensive list of identifiers that we can subsequently use to obtain more data relating to the document in question. While the results of the SPARQL queries are useful also for webscraping (with the rvest package), the function elx_fetch_data() enables us to fire GET requests to retrieve data on documents with known identifiers (including Cellar URI).

    One of the most sought-after data in the Eur-Lex dataverse is the text. It is possible now to automate the pipeline for downloading html and plain texts from Eur-Lex. Similarly, you can retrieve the title of the document. For both you can specify also the desired language (English by default). Other metadata might be added in the future.

    -
    # the function is not vectorized by default
    +
    # the function is not vectorized by default
     elx_fetch_data(results$work[1],"title")
     #> [1] "Council Directive 79/173/EEC of 6 February 1979 on the programme for the acceleration and guidance of collective irrigation works in Corsica"
     
    @@ -338,11 +370,11 @@ 

    Application

    In this section I showcase a simple application of eurlex on making overviews of EU legislation. First, we collate data on directives.

    -
    dirs <- elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>%
    +
    dirs <- elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>%
       elx_run_query() %>%
       rename(date = `callret-3`)

    Let’s calculate the proportion of directives currently in force in the entire set of directives ever adopted. This variable offers a particularly good demonstration of the usefulness of the package to retrieve EU law data, because it changes every day, as new acts enter into force and old ones drop out. Regularly scraping webpages for this purpose and scale is simply impractical and disproportional.

    -
    library(ggplot2)
    +
    library(ggplot2)
     
     dirs %>%
       count(force) %>%
    @@ -350,15 +382,17 @@ 

    geom_col()

    Directives become naturally outdated with time. It might be all the more interesting to see which older acts are thus still surviving.

    -
    dirs %>%
    -  ggplot(aes(x = as.Date(date), y = celex)) +
    +
    dirs %>%
    +  filter(!is.na(force)) %>%
    +  mutate(date = as.Date(date)) %>%
    +  ggplot(aes(x = date, y = celex)) +
       geom_point(aes(color = force), alpha = 0.1) +
       theme(axis.text.y = element_blank(),
             axis.line.y = element_blank(),
             axis.ticks.y = element_blank())
    -

    +

    We want to know a bit more about the directives from 1970s that are still in force today. Their titles could give us a clue.

    -
    dirs_1970_title <- dirs %>%
    +
    dirs_1970_title <- dirs %>%
       filter(between(as.Date(date), as.Date("1970-01-01"), as.Date("1980-01-01")),
              force == "true") %>%
       mutate(title = map_chr(work,elx_fetch_data,"title")) %>%
    @@ -374,7 +408,7 @@ 

    #> 4 http://publications~ http://publicatio~ 31973~ 1973~ true Council Directive ~ #> # ... with 66 more rows

    I will use the tidytext package to get a quick idea of what the legislation is about.

    -
    library(tidytext)
    +
    library(tidytext)
     library(wordcloud)
     
     dirs_1970_title %>%
    @@ -386,7 +420,7 @@ 

    with(wordcloud(word, tf_idf, max.words = 40, scale = c(1.8,0.1)))

    I use term-frequency inverse-document frequency (tf-idf) to weight the importance of the words in the wordcloud. If we used pure frequencies, the wordcloud would largely consist of words conveying little meaning (“the”, “and”, …).

    -

    This is an extremely basic application of the eurlex package. Much more sophisticated methods can be used to analyse both the content and metadata of European Union legislation. If the package is useful for your research, please consider citing it.

    +

    This is an extremely basic application of the eurlex package. Much more sophisticated methods can be used to analyse both the content and metadata of European Union legislation. If the package is useful for your research, please consider citing the accompanying paper.4


    @@ -394,6 +428,7 @@

  • Note, however, that not all resource types will work properly with the pre-specified query.

  • Occasionally, you may encounter legal acts without CELEX numbers, especially when digging through older legislation. It is good to report these to the Eur-Lex helpdesk.

  • It is worth pointing out that the html and pdf contents of older case law differs. Whereas typically the html file is only going to contain a summary and grounds of a judgment, the pdf should also contain background to the dispute.

  • +
  • Michal Ovádek (2021) Facilitating access to data on European Union laws, Political Research Exchange, 3:1, DOI: 10.1080/2474736X.2020.1870150

  • diff --git a/docs/articles/eurlexpkg_files/figure-html/unnamed-chunk-9-1.png b/docs/articles/eurlexpkg_files/figure-html/unnamed-chunk-9-1.png index 8f93a10..e7a0faf 100644 Binary files a/docs/articles/eurlexpkg_files/figure-html/unnamed-chunk-9-1.png and b/docs/articles/eurlexpkg_files/figure-html/unnamed-chunk-9-1.png differ diff --git a/docs/articles/eurlexpkg_files/figure-html/wordcloud-1.png b/docs/articles/eurlexpkg_files/figure-html/wordcloud-1.png index 10f938c..bf1c904 100644 Binary files a/docs/articles/eurlexpkg_files/figure-html/wordcloud-1.png and b/docs/articles/eurlexpkg_files/figure-html/wordcloud-1.png differ diff --git a/docs/articles/index.html b/docs/articles/index.html index e5b156b..1a3e54b 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -79,7 +79,7 @@ eurlex - 0.3.4 + 0.3.5
    diff --git a/docs/authors.html b/docs/authors.html index b8fc41e..6a118cd 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -79,7 +79,7 @@ eurlex - 0.3.4 + 0.3.5
    diff --git a/docs/index.html b/docs/index.html index 52ee2a8..064a425 100644 --- a/docs/index.html +++ b/docs/index.html @@ -38,7 +38,7 @@ eurlex - 0.3.4 + 0.3.5
    @@ -108,9 +108,9 @@

  • dates <- elx_make_query("directive", include_date_transpos = TRUE) %>% elx_run_query()
  • ids %>% dplyr::left_join(lbs) %>% dplyr::left_join(dates)
  • -

    rather than elx_make_query("directive", include_lbs = TRUE, include_date_transpos = TRUE). This approach should make it easier to understand the returned data frame(s), especially when some variables contain missing or duplicated data.

    +

    rather than elx_make_query("directive", include_lbs = TRUE, include_date_transpos = TRUE). This approach should make it easier to understand the returned data frame(s), especially when some variables contain missing or duplicated data. Always keep an eye on whether the work and celex columns identify rows uniquely or not.

    One of the main contributions of the SPARQL requests is that we obtain a comprehensive list of identifiers that we can subsequently use to obtain more data relating to the document in question. While the results of the SPARQL queries are useful also for webscraping (with the rvest package), the function elx_fetch_data() enables us to fire GET requests to retrieve data on documents with known identifiers (including Cellar URI). The function currently enables downloading the title and the full text of a document in all available languages.

    -

    See the vignette for a walkthrough on how to use the package. Check function documentation for most up-to-date overview of features.

    +

    See the vignette for a walkthrough on how to use the package. Check function documentation for most up-to-date overview of features. Example use cases are shown in this paper.

    @@ -123,6 +123,23 @@

    This package nor its author are in any way affiliated with the EU Publications Office. Please refer to the applicable data reuse policies.

    Please consider contributing to the maintanance and development of the package by reporting bugs or suggesting new features.

    +
    +

    +Latest changes

    +
    +

    +eurlex 0.3.5

    + +
    +

    Useful resources

    diff --git a/docs/news/index.html b/docs/news/index.html index e985a87..01d636d 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -79,7 +79,7 @@ eurlex - 0.3.4 + 0.3.5
    @@ -139,7 +139,7 @@

    • it is now possible to select all resource types available with elx_make_query(resource_type = "any"). Since there are nearly 1 million CELEX codes, use with discretion and expect long execution times
    • results can be restricted to a particular directory code with elx_make_query(directory = "18") (directory code “18” denotes Common Foreign and Security Policy)
    • -
    • results can be restricted to a particular sector with elx_make_query(sector = 2) (sector code 3 denotes EU international agreements)
    • +
    • results can be restricted to a particular sector with elx_make_query(sector = 2) (sector code 2 denotes EU international agreements)

    diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 26598c4..c90be6b 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -3,5 +3,5 @@ pkgdown: 1.5.1 pkgdown_sha: ~ articles: eurlexpkg: eurlexpkg.html -last_built: 2021-03-08T23:15Z +last_built: 2021-03-14T13:10Z diff --git a/docs/reference/elx_council_votes.html b/docs/reference/elx_council_votes.html index 5c46eb8..07882fe 100644 --- a/docs/reference/elx_council_votes.html +++ b/docs/reference/elx_council_votes.html @@ -80,7 +80,7 @@ eurlex - 0.3.4 + 0.3.5
    diff --git a/docs/reference/elx_curia_list.html b/docs/reference/elx_curia_list.html index 67f64ed..4a9e65f 100644 --- a/docs/reference/elx_curia_list.html +++ b/docs/reference/elx_curia_list.html @@ -81,7 +81,7 @@ eurlex - 0.3.4 + 0.3.5
    @@ -137,7 +137,10 @@

    Scrape list of court cases from Curia

    CELEX identifiers are extracted from hyperlinks where available.

    -
    elx_curia_list(data = c("all", "ecj_old", "ecj_new", "gc_all", "cst_all"))
    +
    elx_curia_list(
    +  data = c("all", "ecj_old", "ecj_new", "gc_all", "cst_all"),
    +  parse = TRUE
    +)

    Arguments

    @@ -147,11 +150,16 @@

    Arg

    + + + +

    Data to be scraped from four separate lists of cases maintained by Curia, defaults to "all" which contains cases from Court of Justice, General Court and Civil Service Tribunal.

    parse

    If `TRUE`, references to cases and appeals are parsed out from `case_info` into separate columns

    Value

    -

    A data frame containing case identifiers and information as character columns.

    +

    A data frame containing case identifiers and information as character columns. Where the case id +contains a hyperlink to Eur-Lex, the CELEX identifier is retrieved as well.

    Examples

    # \donttest{ diff --git a/docs/reference/elx_fetch_data.html b/docs/reference/elx_fetch_data.html index d61b510..cc5b191 100644 --- a/docs/reference/elx_fetch_data.html +++ b/docs/reference/elx_fetch_data.html @@ -80,7 +80,7 @@ eurlex - 0.3.4 + 0.3.5
    diff --git a/docs/reference/elx_label_eurovoc.html b/docs/reference/elx_label_eurovoc.html index eef7929..b20829e 100644 --- a/docs/reference/elx_label_eurovoc.html +++ b/docs/reference/elx_label_eurovoc.html @@ -80,7 +80,7 @@ eurlex - 0.3.4 + 0.3.5
    diff --git a/docs/reference/elx_make_query.html b/docs/reference/elx_make_query.html index f146474..12f39b1 100644 --- a/docs/reference/elx_make_query.html +++ b/docs/reference/elx_make_query.html @@ -82,7 +82,7 @@ eurlex - 0.3.4 + 0.3.5
    @@ -141,8 +141,10 @@

    Create SPARQL quries

    elx_make_query(
       resource_type = c("directive", "regulation", "decision", "recommendation", "intagr",
    -    "caselaw", "manual", "proposal", "national_impl"),
    +    "caselaw", "manual", "proposal", "national_impl", "any"),
       manual_type = "",
    +  directory = NULL,
    +  sector = NULL,
       include_corrigenda = FALSE,
       include_celex = TRUE,
       include_lbs = FALSE,
    @@ -150,10 +152,14 @@ 

    Create SPARQL quries

    include_date_force = FALSE, include_date_endvalid = FALSE, include_date_transpos = FALSE, + include_date_lodged = FALSE, include_force = FALSE, include_eurovoc = FALSE, include_author = FALSE, include_citations = FALSE, + include_court_procedure = FALSE, + include_directory = FALSE, + include_sector = FALSE, order = FALSE, limit = NULL )
    @@ -169,6 +175,14 @@

    Arg manual_type

    Define manually the type of resource to be retrieved

    + + directory +

    Restrict the results to a given directory code

    + + + sector +

    Restrict the results to a given sector code

    + include_corrigenda

    If `TRUE`, results include corrigenda

    @@ -197,6 +211,10 @@

    Arg include_date_transpos

    If `TRUE`, results include date of transposition deadline for directives

    + + include_date_lodged +

    If `TRUE`, results include date a court case was lodged with the court

    + include_force

    If `TRUE`, results include whether legislation is in force

    @@ -213,6 +231,18 @@

    Arg include_citations

    If `TRUE`, results include citations (CELEX-labelled)

    + + include_court_procedure +

    If `TRUE`, results include type of court procedure and outcome

    + + + include_directory +

    If `TRUE`, results include the Eur-Lex directory code

    + + + include_sector +

    If `TRUE`, results include the Eur-Lex sector code

    + order

    Order results by ids

    diff --git a/docs/reference/elx_parse_xml.html b/docs/reference/elx_parse_xml.html index 1bf612e..958f88d 100644 --- a/docs/reference/elx_parse_xml.html +++ b/docs/reference/elx_parse_xml.html @@ -80,7 +80,7 @@ eurlex - 0.3.4 + 0.3.5

    diff --git a/docs/reference/elx_run_query.html b/docs/reference/elx_run_query.html index 9f0551a..53a9bc9 100644 --- a/docs/reference/elx_run_query.html +++ b/docs/reference/elx_run_query.html @@ -81,7 +81,7 @@ eurlex - 0.3.4 + 0.3.5
    @@ -162,7 +162,7 @@

    Value

    Examples

    # \donttest{ -elx_run_query(elx_make_query("directive", include_force = TRUE))
    #> # A tibble: 4,316 x 4 +elx_run_query(elx_make_query("directive", include_force = TRUE))
    #> # A tibble: 4,317 x 4 #> work type celex force #> <chr> <chr> <chr> <chr> #> 1 http://publications.europa.eu/res~ http://publications.europa.e~ 31979~ false @@ -175,7 +175,7 @@

    Examp #> 8 http://publications.europa.eu/res~ http://publications.europa.e~ 31966~ false #> 9 http://publications.europa.eu/res~ http://publications.europa.e~ 31974~ false #> 10 http://publications.europa.eu/res~ http://publications.europa.e~ 31982~ false -#> # ... with 4,306 more rows

    # } +#> # ... with 4,307 more rows
    # }
    diff --git a/man/elx_make_query.Rd b/man/elx_make_query.Rd index 042c2e6..d99885b 100644 --- a/man/elx_make_query.Rd +++ b/man/elx_make_query.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/elx_make_query.R \name{elx_make_query} \alias{elx_make_query} -\title{Create SPARQL quries} +\title{Create SPARQL queries} \usage{ elx_make_query( resource_type = c("directive", "regulation", "decision", "recommendation", "intagr", @@ -78,7 +78,7 @@ A character string containing the SPARQL query \description{ Generates pre-defined or manual SPARQL queries to retrieve document ids from Cellar. List of available resource types: http://publications.europa.eu/resource/authority/resource-type . -Note that not all resource types are compatible with the pre-defined query. +Note that not all resource types are compatible with default parameter values. } \examples{ elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) diff --git a/vignettes/eurlexpkg.Rmd b/vignettes/eurlexpkg.Rmd index df6e722..05b1e3d 100644 --- a/vignettes/eurlexpkg.Rmd +++ b/vignettes/eurlexpkg.Rmd @@ -29,6 +29,8 @@ The `eurlex` R package attempts to significantly reduce the overhead associated The `eurlex` package currently envisions the typical use-case to consist of getting bulk information about EU legislation into R as fast as possible. The package contains three core functions to achieve that objective: `elx_make_query()` to create pre-defined or customized SPARQL queries; `elx_run_query()` to execute the pre-made or any other manually input query; and `elx_fetch_data()` to fire GET requests for certain metadata to the REST API. +The package also contains largely self-explanatory functions for retrieving data on EU court cases (`elx_curia_list()`) and Council votes (`elx_council_votes()`) from outside Eur-Lex. + ## `elx_make_query()`: Generate SPARQL queries The function `elx_make_query` takes as its first argument the type of resource to be retrieved from the semantic database that powers Eur-Lex (and other publications) called Cellar. @@ -55,13 +57,13 @@ The choice of resource type is then reflected in the SPARQL query generated by t ```{r} query_dir %>% - glue::as_glue() # for nicer printing + cat() # for nicer printing elx_make_query(resource_type = "caselaw") %>% - glue::as_glue() + cat() elx_make_query(resource_type = "manual", manual_type = "SWD") %>% - glue::as_glue() + cat() ``` @@ -69,21 +71,33 @@ There are various ways of querying the same information in the Cellar database d The other arguments in `elx_make_query()` relate to additional metadata to be returned. The results include by default the [CELEX number](https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html) and exclude corrigenda (corrections of errors in legislation). Other data needs to be opted into. Make sure to select ones that are logically compatible (e.g. case law does not have a legal basis). More options should be added in the future. -Note that availability of data for each variable has an impact on the results. The data frame returned by the query will be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids. +Note that availability of data for each variable might have an impact on the results. The data frame returned by the query might be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids. ```{r} elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>% - glue::as_glue() + cat() # minimal query: elx_make_query(resource_type = "directive") elx_make_query(resource_type = "recommendation", include_date = TRUE, include_lbs = TRUE) %>% - glue::as_glue() + cat() # minimal query: elx_make_query(resource_type = "recommendation") ``` +You can also decide to not specify any resource types, in which case all types of documents will be returned. As there are over a million documents with a CELEX identifier, this is likely not efficient for a majority of users. But since version 0.3.5 it is possible to request documents belonging to a particular ["sector"](https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html) or [directory code](https://eur-lex.europa.eu/browse/directories/legislation.html). + +```{r} +# request documents from directory 18 ("Common Foreign and Security Policy") +# and sector 3 ("Legal acts") + +elx_make_query(resource_type = "any", + directory = "18", + sector = 3) %>% + cat() +``` + Now that we have a query, we are ready to run it. ## `elx_run_query()`: Execute SPARQL queries @@ -208,7 +222,9 @@ Directives become naturally outdated with time. It might be all the more interes ```{r} dirs %>% - ggplot(aes(x = as.Date(date), y = celex)) + + filter(!is.na(force)) %>% + mutate(date = as.Date(date)) %>% + ggplot(aes(x = date, y = celex)) + geom_point(aes(color = force), alpha = 0.1) + theme(axis.text.y = element_blank(), axis.line.y = element_blank(), @@ -246,6 +262,6 @@ dirs_1970_title %>% I use term-frequency inverse-document frequency (tf-idf) to weight the importance of the words in the wordcloud. If we used pure frequencies, the wordcloud would largely consist of words conveying little meaning ("the", "and", ...). -This is an extremely basic application of the `eurlex` package. Much more sophisticated methods can be used to analyse both the content and metadata of European Union legislation. If the package is useful for your research, please consider citing it. +This is an extremely basic application of the `eurlex` package. Much more sophisticated methods can be used to analyse both the content and metadata of European Union legislation. If the package is useful for your research, please consider citing the [accompanying paper](https://www.tandfonline.com/doi/full/10.1080/2474736X.2020.1870150).^[Michal Ovádek (2021) Facilitating access to data on European Union laws, Political Research Exchange, 3:1, DOI: [10.1080/2474736X.2020.1870150](https://www.tandfonline.com/doi/full/10.1080/2474736X.2020.1870150)]