diff --git a/docs/404.html b/docs/404.html index f9b0671..dd2783d 100644 --- a/docs/404.html +++ b/docs/404.html @@ -100,8 +100,7 @@
Retrieve data on European Union law in R with pre-defined SPARQL and -REST queries.
-Retrieve data on European Union law in R with pre-defined SPARQL and REST queries.
+An R package for retrieving official data on European Union law and -policy.
+ +An R package for retrieving official data on European Union law and policy.
Install from CRAN via install.packages("eurlex")
.
The development version is available via
-remotes::install_github("michalovadek/eurlex")
.
The development version is available via remotes::install_github("michalovadek/eurlex")
.
The eurlex
R package attempts to significantly reduce
-the overhead associated with using SPARQL and REST APIs made available
-by the EU Publication Office. Compared to web-scraping, the package
-provides simpler, more efficient and transparent access to data on
-European Union laws and policies.
The eurlex
package currently envisions the typical
-use-case to consist of getting bulk information about EU legislation
-into R as fast as possible. The package contains three core functions to
-achieve that objective: elx_make_query()
to create
-pre-defined or customized SPARQL queries; elx_run_query()
-to execute the pre-made or any other manually input query; and
-elx_fetch_data()
to fire GET requests for certain metadata
-to the REST API.
The function elx_make_query
takes as its first argument
-the type of resource to be retrieved (such as “directive”) from the
-semantic database that powers Eur-Lex (and other publications) called
-Cellar. If you are familiar with SPARQL, you can always specify your own
-queries and execute them with elx_run_query()
.
elx_run_query()
executes SPARQL queries on a
-pre-specified endpoint of the EU Publication Office. It outputs a
-data.frame
where each column corresponds to one of the
-requested variables, while the rows accumulate observations of the
-resource type satisfying the query criteria. Obviously, the more data is
-to be returned, the longer the execution time, varying from a few
-seconds to several hours, depending also on your connection. The first
-column always contains the unique URI of a “work” (legislative act or
-court judgment) which identifies each resource in Cellar. Several
-human-readable identifiers are normally associated with each “work” but
-the most useful one is CELEX,
-retrieved by default.
For the moment, it is recommended to retrieve metadata one variable -at a time. For example, if you wish to obtain the legal bases of -directives and the date of transposition, you should run separate -calls:
+The eurlex
R package attempts to significantly reduce the overhead associated with using SPARQL and REST APIs made available by the EU Publication Office. Compared to web-scraping, the package provides simpler, more efficient and transparent access to data on European Union laws and policies.
The eurlex
package currently envisions the typical use-case to consist of getting bulk information about EU legislation into R as fast as possible. The package contains three core functions to achieve that objective: elx_make_query()
to create pre-defined or customized SPARQL queries; elx_run_query()
to execute the pre-made or any other manually input query; and elx_fetch_data()
to fire GET requests for certain metadata to the REST API.
The function elx_make_query
takes as its first argument the type of resource to be retrieved (such as “directive”) from the semantic database that powers Eur-Lex (and other publications) called Cellar. If you are familiar with SPARQL, you can always specify your own queries and execute them with elx_run_query()
.
elx_run_query()
executes SPARQL queries on a pre-specified endpoint of the EU Publication Office. It outputs a data.frame
where each column corresponds to one of the requested variables, while the rows accumulate observations of the resource type satisfying the query criteria. Obviously, the more data is to be returned, the longer the execution time, varying from a few seconds to several hours, depending also on your connection. The first column always contains the unique URI of a “work” (legislative act or court judgment) which identifies each resource in Cellar. Several human-readable identifiers are normally associated with each “work” but the most useful one is CELEX, retrieved by default.
For the moment, it is recommended to retrieve metadata one variable at a time. For example, if you wish to obtain the legal bases of directives and the date of transposition, you should run separate calls:
ids <- elx_make_query("directive") %>%
-elx_run_query()
lbs <- elx_make_query("directive", include_lbs = TRUE)
-%>% elx_run_query()
dates <- elx_make_query("directive", include_date_transpos
-= TRUE) %>% elx_run_query()
ids %>% dplyr::left_join(lbs) %>%
-dplyr::left_join(dates)
ids <- elx_make_query("directive") %>% elx_run_query()
lbs <- elx_make_query("directive", include_lbs = TRUE) %>% elx_run_query()
dates <- elx_make_query("directive", include_date_transpos = TRUE) %>% elx_run_query()
ids %>% dplyr::left_join(lbs) %>% dplyr::left_join(dates)
rather than elx_make_query("directive", include_lbs = TRUE,
-include_date_transpos = TRUE)
. This approach is usually faster
-and should also make it easier to understand the returned data frame(s),
-especially when some variables contain missing or duplicated data.
-Always keep an eye on whether the work
and
-celex
columns identify rows uniquely or not.
One of the main contributions of the SPARQL requests is that we
-obtain a comprehensive list of identifiers that we can subsequently use
-to obtain more data relating to the document in question. While the
-results of the SPARQL queries are useful also for webscraping (with the
-rvest
package), the function elx_fetch_data()
-enables us to fire GET requests to retrieve data on documents with known
-identifiers (including Cellar URI). The function currently enables
-downloading the title and the full text of a document in all available
-languages.
See the vignette -for a walkthrough on how to use the package. Check function -documentation for most up-to-date overview of features. Example use -cases are shown in this paper.
+rather than elx_make_query("directive", include_lbs = TRUE, include_date_transpos = TRUE)
. This approach is usually faster and should also make it easier to understand the returned data frame(s), especially when some variables contain missing or duplicated data. Always keep an eye on whether the work
and celex
columns identify rows uniquely or not.
One of the main contributions of the SPARQL requests is that we obtain a comprehensive list of identifiers that we can subsequently use to obtain more data relating to the document in question. While the results of the SPARQL queries are useful also for webscraping (with the rvest
package), the function elx_fetch_data()
enables us to fire GET requests to retrieve data on documents with known identifiers (including Cellar URI). The function currently enables downloading the title and the full text of a document in all available languages.
See the vignette for a walkthrough on how to use the package. Check function documentation for most up-to-date overview of features. Example use cases are shown in this paper.
Michal Ovádek (2021) Facilitating access to data on European Union -laws, Political Research Exchange, 3:1, DOI: 10.1080/2474736X.2020.1870150
+Michal Ovádek (2021) Facilitating access to data on European Union laws, Political Research Exchange, 3:1, DOI: 10.1080/2474736X.2020.1870150
This package nor its author are in any way affiliated with the EU -Publications Office. Please refer to the applicable data -reuse policies.
-Please consider contributing to the maintenance and development of -the package by reporting bugs or suggesting new features.
+This package nor its author are in any way affiliated with the EU Publications Office. Please refer to the applicable data reuse policies.
+Please consider contributing to the maintenance and development of the package by reporting bugs or suggesting new features.
elx_download_xml()
+elx_download_xml()
elx_make_query(include_ecli = TRUE)
+elx_make_query(include_ecli = TRUE)
elx_run_query()
now fails gracefully in presence of internet/server problems
elx_fetch_data()
now automatically fixes urls with
-parentheses (e.g. “32019H1115(01)” used to fail)elx_fetch_data()
now automatically fixes urls with parentheses (e.g. “32019H1115(01)” used to fail)
elx_parse_xml
no longer an exported functionit is now possible to select all resource types available with
-elx_make_query(resource_type = "any")
. Since there are
-nearly 1 million CELEX codes, use with discretion and expect long
-execution times
results can be restricted to a particular directory code with
-elx_make_query(directory = "18")
(directory code “18”
-denotes Common Foreign and Security Policy)
results can be restricted to a particular sector with
-elx_make_query(sector = 2)
(sector code 2 denotes EU
-international agreements)
new feature: request date of court case submission
-elx_make_query(include_date_lodged = TRUE)
new feature: request type of court procedure and outcome
-elx_make_query(include_court_procedure = TRUE)
new feature: request directory code of legal act
-elx_make_query(include_directory = TRUE)
elx_curia_list()
has a new default parameter
-parse = TRUE
which creates separate columns for
-ecli
, see_case
, appeal
applying
-regular expressions on case_info
it is now possible to select all resource types available with elx_make_query(resource_type = "any")
. Since there are nearly 1 million CELEX codes, use with discretion and expect long execution times
results can be restricted to a particular directory code with elx_make_query(directory = "18")
(directory code “18” denotes Common Foreign and Security Policy)
results can be restricted to a particular sector with elx_make_query(sector = 2)
(sector code 2 denotes EU international agreements)
new feature: request date of court case submission elx_make_query(include_date_lodged = TRUE)
new feature: request type of court procedure and outcome elx_make_query(include_court_procedure = TRUE)
new feature: request directory code of legal act elx_make_query(include_directory = TRUE)
elx_curia_list()
has a new default parameter parse = TRUE
which creates separate columns for ecli
, see_case
, appeal
applying regular expressions on case_info
elx_download_xml()
+elx_make_query(include_ecli = TRUE)
+elx_fetch_data()
+elx_run_query()
now fails gracefully in presence of internet/server problemselx_fetch_data()
now automatically fixes urls with parentheses (e.g. “32019H1115(01)” used to fail)elx_make_query(resource_type = "any")
. Since there are nearly 1 million CELEX codes, use with discretion and expect long execution timeselx_make_query(directory = "18")
(directory code “18” denotes Common Foreign and Security Policy)elx_make_query(sector = 2)
(sector code 2 denotes EU international agreements)elx_make_query(include_date_lodged = TRUE)
+elx_make_query(resource_type = "any")
. Since there are nearly 1 million CELEX codes, use with discretion and expect long execution timeselx_make_query(directory = "18")
(directory code “18” denotes Common Foreign and Security Policy)elx_make_query(sector = 2)
(sector code 2 denotes EU international agreements)elx_make_query(include_date_lodged = TRUE)
elx_make_query(include_court_procedure = TRUE)
+elx_make_query(include_court_procedure = TRUE)
elx_make_query(include_directory = TRUE)
+elx_make_query(include_directory = TRUE)
elx_curia_list()
has a new default parameter parse = TRUE
which creates separate columns for ecli
, see_case
, appeal
applying regular expressions on case_info
elx_make_query(include_author = TRUE)
+elx_make_query(include_author = TRUE)
elx_fetch_data()
now prefers CELEX-based URLs (instead of Cellar URIs) as input, as they appear to yield fewer missing documentselx_fetch_data("text")
now retrieves plain text from html, pdf and MS Word documentselx_fetch_data("text")
now retrieves plain text from html, pdf and MS Word documentselx_curia_list()
+ elx_curia_list()
elx_label_eurovoc()
elx_council_votes()
made fully operationalelx_curia_list()
to retrieve full list of EU court caseselx_fetch_data()
elx_run_query.Rd
Executes cURL request to a pre-defined endpoint of the EU Publications Office. -Relies on elx_make_query to generate valid SPARQL queries
-elx_run_query(
- query = "",
- endpoint = "http://publications.europa.eu/webapi/rdf/sparql"
-)
A valid SPARQL query specified by `elx_make_query` or manually
SPARQL endpoint
A data frame containing the results of the SPARQL query. -Column `work` contains the Cellar URI of the resource. Rows with even one missing variable are dropped.
-# \donttest{
-elx_run_query(elx_make_query("directive", include_force = TRUE))
-#> # A tibble: 4,367 x 4
-#> work type celex force
-#> <chr> <chr> <chr> <chr>
-#> 1 http://publications.europa.eu/res~ http://publications.europa.e~ 31979~ false
-#> 2 http://publications.europa.eu/res~ http://publications.europa.e~ 31989~ false
-#> 3 http://publications.europa.eu/res~ http://publications.europa.e~ 31984~ false
-#> 4 http://publications.europa.eu/res~ http://publications.europa.e~ 31966~ true
-#> 5 http://publications.europa.eu/res~ http://publications.europa.e~ 31993~ false
-#> 6 http://publications.europa.eu/res~ http://publications.europa.e~ 31992~ false
-#> 7 http://publications.europa.eu/res~ http://publications.europa.e~ 31983~ false
-#> 8 http://publications.europa.eu/res~ http://publications.europa.e~ 31966~ false
-#> 9 http://publications.europa.eu/res~ http://publications.europa.e~ 31974~ false
-#> 10 http://publications.europa.eu/res~ http://publications.europa.e~ 31982~ false
-#> # ... with 4,357 more rows
-# }
-
Developed by Michal Ovadek.
-Site built with pkgdown -2.0.2.
-elx_run_query.Rd
Executes cURL request to a pre-defined endpoint of the EU Publications Office. +Relies on elx_make_query to generate valid SPARQL queries. +Results are capped at 1 million rows.
+elx_run_query(
+ query = "",
+ endpoint = "http://publications.europa.eu/webapi/rdf/sparql"
+)
A valid SPARQL query specified by `elx_make_query()` or manually
SPARQL endpoint
A data frame containing the results of the SPARQL query. +Column `work` contains the Cellar URI of the resource. Rows with even one missing variable are dropped.
+# \donttest{
+elx_run_query(elx_make_query("directive", include_force = TRUE))
+#> # A tibble: 4,367 x 4
+#> work type celex force
+#> <chr> <chr> <chr> <chr>
+#> 1 http://publications.europa.eu/res~ http://publications.europa.e~ 31979~ false
+#> 2 http://publications.europa.eu/res~ http://publications.europa.e~ 31989~ false
+#> 3 http://publications.europa.eu/res~ http://publications.europa.e~ 31984~ false
+#> 4 http://publications.europa.eu/res~ http://publications.europa.e~ 31966~ true
+#> 5 http://publications.europa.eu/res~ http://publications.europa.e~ 31993~ false
+#> 6 http://publications.europa.eu/res~ http://publications.europa.e~ 31992~ false
+#> 7 http://publications.europa.eu/res~ http://publications.europa.e~ 31983~ false
+#> 8 http://publications.europa.eu/res~ http://publications.europa.e~ 31966~ false
+#> 9 http://publications.europa.eu/res~ http://publications.europa.e~ 31974~ false
+#> 10 http://publications.europa.eu/res~ http://publications.europa.e~ 31982~ false
+#> # ... with 4,357 more rows
+# }
+
Developed by Michal Ovadek.
+Site built with pkgdown 2.0.2.
+
- All functions- - |
- |
---|---|
- - | -Retrieve Council votes on EU acts |
-
- - | -Scrape list of court cases from Curia |
-
- - | -Download XML notice associated with a URL |
-
- - | -Retrieve additional data on EU documents |
-
- - | -Label EuroVoc concepts |
-
- - | -Create SPARQL queries |
-
- - | -Execute SPARQL queries |
-
Developed by Michal Ovadek.
-Site built with pkgdown -2.0.2.
-
+ All functions+ + |
+ |
---|---|
+ + | +Retrieve Council votes on EU acts |
+
+ + | +Scrape list of court cases from Curia |
+
+ + | +Download XML notice associated with a URL |
+
+ + | +Retrieve additional data on EU documents |
+
+ + | +Label EuroVoc concepts |
+
+ + | +Create SPARQL queries |
+
+ + | +Execute SPARQL queries |
+
Developed by Michal Ovadek.
+Site built with pkgdown 2.0.2.
+