diff --git a/docs/404.html b/docs/404.html index f9b0671..dd2783d 100644 --- a/docs/404.html +++ b/docs/404.html @@ -100,8 +100,7 @@

Page not found (404)

-

Site built with pkgdown -2.0.2.

+

Site built with pkgdown 2.0.2.

diff --git a/docs/articles/index.html b/docs/articles/index.html index 400d232..ef66ec5 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -1,82 +1,80 @@ - -Articles • eurlex - - -
-
- - - -
-
- - -
-

All vignettes

-

- -
eurlex: Retrieve data on European Union law in R
-

Retrieve data on European Union law in R with pre-defined SPARQL and -REST queries.

-
-
-
- - -
- - - - - - - - + +Articles • eurlex + + +
+
+ + + +
+
+ + +
+

All vignettes

+

+ +
eurlex: Retrieve data on European Union law in R
+

Retrieve data on European Union law in R with pre-defined SPARQL and REST queries.

+
+
+
+ + +
+ + + + + + + + diff --git a/docs/authors.html b/docs/authors.html index fa1e85f..e5bea49 100644 --- a/docs/authors.html +++ b/docs/authors.html @@ -1,106 +1,105 @@ - -Authors and Citation • eurlex - - -
-
- - - -
-
-
- - - -
  • -

    Michal Ovadek. Author, maintainer, copyright holder. -

    -
  • -
-
-
-

Citation

- -
-
- - -

Ovadek M (2021). -“Facilitating access to data on European Union laws.” -Political Research Exchange, 3. -doi: 10.1080/2474736X.2020.1870150. -

-
@Article{,
-  title = {Facilitating access to data on European Union laws},
-  author = {Michal Ovadek},
-  year = {2021},
-  journal = {Political Research Exchange},
-  volume = {3},
-  issue = {1},
-  doi = {10.1080/2474736X.2020.1870150},
-}
- -
- -
- - - -
- - - - - - - - + +Authors and Citation • eurlex + + +
+
+ + + +
+
+
+ + + +
  • +

    Michal Ovadek. Author, maintainer, copyright holder. +

    +
  • +
+
+
+

Citation

+ +
+
+ + +

Ovadek M (2021). +“Facilitating access to data on European Union laws.” +Political Research Exchange, 3. +doi: 10.1080/2474736X.2020.1870150. +

+
@Article{,
+  title = {Facilitating access to data on European Union laws},
+  author = {Michal Ovadek},
+  year = {2021},
+  journal = {Political Research Exchange},
+  volume = {3},
+  issue = {1},
+  doi = {10.1080/2474736X.2020.1870150},
+}
+ +
+ +
+ + + +
+ + + + + + + + diff --git a/docs/index.html b/docs/index.html index 2ae1f25..aacef63 100644 --- a/docs/index.html +++ b/docs/index.html @@ -78,98 +78,42 @@
-

CRAN_Status_Badge CRAN_Downloads # eurlex -

-

An R package for retrieving official data on European Union law and -policy.

+

CRAN_Status_Badge CRAN_Downloads # eurlex

+

An R package for retrieving official data on European Union law and policy.

Installation

Install from CRAN via install.packages("eurlex").

-

The development version is available via -remotes::install_github("michalovadek/eurlex").

+

The development version is available via remotes::install_github("michalovadek/eurlex").

Usage

-

The eurlex R package attempts to significantly reduce -the overhead associated with using SPARQL and REST APIs made available -by the EU Publication Office. Compared to web-scraping, the package -provides simpler, more efficient and transparent access to data on -European Union laws and policies.

-

The eurlex package currently envisions the typical -use-case to consist of getting bulk information about EU legislation -into R as fast as possible. The package contains three core functions to -achieve that objective: elx_make_query() to create -pre-defined or customized SPARQL queries; elx_run_query() -to execute the pre-made or any other manually input query; and -elx_fetch_data() to fire GET requests for certain metadata -to the REST API.

-

The function elx_make_query takes as its first argument -the type of resource to be retrieved (such as “directive”) from the -semantic database that powers Eur-Lex (and other publications) called -Cellar. If you are familiar with SPARQL, you can always specify your own -queries and execute them with elx_run_query().

-

elx_run_query() executes SPARQL queries on a -pre-specified endpoint of the EU Publication Office. It outputs a -data.frame where each column corresponds to one of the -requested variables, while the rows accumulate observations of the -resource type satisfying the query criteria. Obviously, the more data is -to be returned, the longer the execution time, varying from a few -seconds to several hours, depending also on your connection. The first -column always contains the unique URI of a “work” (legislative act or -court judgment) which identifies each resource in Cellar. Several -human-readable identifiers are normally associated with each “work” but -the most useful one is CELEX, -retrieved by default.

-

For the moment, it is recommended to retrieve metadata one variable -at a time. For example, if you wish to obtain the legal bases of -directives and the date of transposition, you should run separate -calls:

+

The eurlex R package attempts to significantly reduce the overhead associated with using SPARQL and REST APIs made available by the EU Publication Office. Compared to web-scraping, the package provides simpler, more efficient and transparent access to data on European Union laws and policies.

+

The eurlex package currently envisions the typical use-case to consist of getting bulk information about EU legislation into R as fast as possible. The package contains three core functions to achieve that objective: elx_make_query() to create pre-defined or customized SPARQL queries; elx_run_query() to execute the pre-made or any other manually input query; and elx_fetch_data() to fire GET requests for certain metadata to the REST API.

+

The function elx_make_query takes as its first argument the type of resource to be retrieved (such as “directive”) from the semantic database that powers Eur-Lex (and other publications) called Cellar. If you are familiar with SPARQL, you can always specify your own queries and execute them with elx_run_query().

+

elx_run_query() executes SPARQL queries on a pre-specified endpoint of the EU Publication Office. It outputs a data.frame where each column corresponds to one of the requested variables, while the rows accumulate observations of the resource type satisfying the query criteria. Obviously, the more data is to be returned, the longer the execution time, varying from a few seconds to several hours, depending also on your connection. The first column always contains the unique URI of a “work” (legislative act or court judgment) which identifies each resource in Cellar. Several human-readable identifiers are normally associated with each “work” but the most useful one is CELEX, retrieved by default.

+

For the moment, it is recommended to retrieve metadata one variable at a time. For example, if you wish to obtain the legal bases of directives and the date of transposition, you should run separate calls:

    -
  1. ids <- elx_make_query("directive") %>% -elx_run_query()
  2. -
  3. lbs <- elx_make_query("directive", include_lbs = TRUE) -%>% elx_run_query()
  4. -
  5. dates <- elx_make_query("directive", include_date_transpos -= TRUE) %>% elx_run_query()
  6. -
  7. ids %>% dplyr::left_join(lbs) %>% -dplyr::left_join(dates)
  8. +
  9. ids <- elx_make_query("directive") %>% elx_run_query()
  10. +
  11. lbs <- elx_make_query("directive", include_lbs = TRUE) %>% elx_run_query()
  12. +
  13. dates <- elx_make_query("directive", include_date_transpos = TRUE) %>% elx_run_query()
  14. +
  15. ids %>% dplyr::left_join(lbs) %>% dplyr::left_join(dates)
-

rather than elx_make_query("directive", include_lbs = TRUE, -include_date_transpos = TRUE). This approach is usually faster -and should also make it easier to understand the returned data frame(s), -especially when some variables contain missing or duplicated data. -Always keep an eye on whether the work and -celex columns identify rows uniquely or not.

-

One of the main contributions of the SPARQL requests is that we -obtain a comprehensive list of identifiers that we can subsequently use -to obtain more data relating to the document in question. While the -results of the SPARQL queries are useful also for webscraping (with the -rvest package), the function elx_fetch_data() -enables us to fire GET requests to retrieve data on documents with known -identifiers (including Cellar URI). The function currently enables -downloading the title and the full text of a document in all available -languages.

-

See the vignette -for a walkthrough on how to use the package. Check function -documentation for most up-to-date overview of features. Example use -cases are shown in this paper.

+

rather than elx_make_query("directive", include_lbs = TRUE, include_date_transpos = TRUE). This approach is usually faster and should also make it easier to understand the returned data frame(s), especially when some variables contain missing or duplicated data. Always keep an eye on whether the work and celex columns identify rows uniquely or not.

+

One of the main contributions of the SPARQL requests is that we obtain a comprehensive list of identifiers that we can subsequently use to obtain more data relating to the document in question. While the results of the SPARQL queries are useful also for webscraping (with the rvest package), the function elx_fetch_data() enables us to fire GET requests to retrieve data on documents with known identifiers (including Cellar URI). The function currently enables downloading the title and the full text of a document in all available languages.

+

See the vignette for a walkthrough on how to use the package. Check function documentation for most up-to-date overview of features. Example use cases are shown in this paper.

Cite

-

Michal Ovádek (2021) Facilitating access to data on European Union -laws, Political Research Exchange, 3:1, DOI: 10.1080/2474736X.2020.1870150

+

Michal Ovádek (2021) Facilitating access to data on European Union laws, Political Research Exchange, 3:1, DOI: 10.1080/2474736X.2020.1870150

Note

-

This package nor its author are in any way affiliated with the EU -Publications Office. Please refer to the applicable data -reuse policies.

-

Please consider contributing to the maintenance and development of -the package by reporting bugs or suggesting new features.

+

This package nor its author are in any way affiliated with the EU Publications Office. Please refer to the applicable data reuse policies.

+

Please consider contributing to the maintenance and development of the package by reporting bugs or suggesting new features.

Latest changes @@ -178,11 +122,9 @@

Latest changeseurlex 0.4.0

    -
  • download XML notices associated with Cellar URLs with -elx_download_xml() +
  • download XML notices associated with Cellar URLs with elx_download_xml()
  • -
  • retrieve European Case Law Identifier (ECLI) with -elx_make_query(include_ecli = TRUE) +
  • retrieve European Case Law Identifier (ECLI) with elx_make_query(include_ecli = TRUE)
@@ -191,11 +133,9 @@

eurlex 0.3.6elx_run_query() now fails gracefully in presence of -internet/server problems +elx_run_query() now fails gracefully in presence of internet/server problems
  • -elx_fetch_data() now automatically fixes urls with -parentheses (e.g. “32019H1115(01)” used to fail)
  • +elx_fetch_data() now automatically fixes urls with parentheses (e.g. “32019H1115(01)” used to fail)
  • minor fixes to vignette
  • elx_parse_xml no longer an exported function
  • @@ -205,26 +145,13 @@

    eurlex 0.3.6eurlex 0.3.5

    @@ -283,8 +210,7 @@

    Developers

    -

    Site built with pkgdown -2.0.2.

    +

    Site built with pkgdown 2.0.2.

    diff --git a/docs/news/index.html b/docs/news/index.html index 2ff82d1..6bb0376 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -1,74 +1,12 @@ - - - - - - - -Changelog • eurlex - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Changelog • eurlex - + + - - - -
    -
    - -
    - -
    +
    -
    -

    -eurlex 0.3.6 Unreleased -

    -
    -

    -Major changes

    -
      -
    • +
      + +
      +

      Major changes

      +
      • download XML notices associated with Cellar URLs with elx_download_xml() +
      • +
      • retrieve European Case Law Identifier (ECLI) with elx_make_query(include_ecli = TRUE) +
      • +
      +
      +

      Minor changes

      +
      • host of smaller code improvements in elx_fetch_data() +
      • +
      • more consistent and strict error generation across all server-interacting functions
      • +
      • started adding unit tests
      • +
      +
      +
      + +
      +

      Major changes

      +
      • elx_run_query() now fails gracefully in presence of internet/server problems
      • elx_fetch_data() now automatically fixes urls with parentheses (e.g. “32019H1115(01)” used to fail)
      • -
      -
      -
      -

      -Minor changes

      -
        -
      • minor fixes to vignette
      • +
      +
      +

      Minor changes

      +
      • minor fixes to vignette
      • elx_parse_xml no longer an exported function
      • -
      -
      -
      -
      -

      -eurlex 0.3.5 2021-03-14 -

      -
      -

      -Major changes

      -
        -
      • it is now possible to select all resource types available with elx_make_query(resource_type = "any"). Since there are nearly 1 million CELEX codes, use with discretion and expect long execution times
      • -
      • results can be restricted to a particular directory code with elx_make_query(directory = "18") (directory code “18” denotes Common Foreign and Security Policy)
      • -
      • results can be restricted to a particular sector with elx_make_query(sector = 2) (sector code 2 denotes EU international agreements)
      • -
      -
      -
      -

      -Minor changes

      -
      +
      +
      + +
      +

      Major changes

      +
      • it is now possible to select all resource types available with elx_make_query(resource_type = "any"). Since there are nearly 1 million CELEX codes, use with discretion and expect long execution times
      • +
      • results can be restricted to a particular directory code with elx_make_query(directory = "18") (directory code “18” denotes Common Foreign and Security Policy)
      • +
      • results can be restricted to a particular sector with elx_make_query(sector = 2) (sector code 2 denotes EU international agreements)
      • +
      +
      +

      Minor changes

      +
      • new feature: request date of court case submission elx_make_query(include_date_lodged = TRUE)
      • -
      • new feature: request type of court procedure and outcome elx_make_query(include_court_procedure = TRUE) +
      • new feature: request type of court procedure and outcome elx_make_query(include_court_procedure = TRUE)
      • -
      • new feature: request directory code of legal act elx_make_query(include_directory = TRUE) +
      • new feature: request directory code of legal act elx_make_query(include_directory = TRUE)
      • elx_curia_list() has a new default parameter parse = TRUE which creates separate columns for ecli, see_case, appeal applying regular expressions on case_info
      • -
      -
      -
      -
      -

      -eurlex 0.3.4 2020-11-08 -

      -
      -

      -Major changes

      -
        -
      • new feature: request citations referenced in target resource with elx_make_query(include_citations = TRUE); retrieved in CELEX form
      • -
      • new feature: request document author(s) with elx_make_query(include_author = TRUE) +
      +
      +
      + +
      +

      Major changes

      +
      • new feature: request citations referenced in target resource with elx_make_query(include_citations = TRUE); retrieved in CELEX form
      • +
      • new feature: request document author(s) with elx_make_query(include_author = TRUE)
      • XML parsing is now more efficient due to utilising (rather than stripping) namespaces (but still room for improvement)
      • -
      -
      -
      -

      -Minor changes

      -
        -
      • fixed bug in elx_label_eurovoc whereby resulting data frames contained list-columns
      • -
      -
      -
      -
      -

      -eurlex 0.3.3 Unreleased -

      -
      -

      -Minor changes

      -
        -
      • hotfix for critical bug in xml parsing that scrambled column with legal basis where this was requested
      • -
      -
      -
      -
      -

      -eurlex 0.3.2 Unreleased -

      -
      -

      -Major changes

      -
        -
      • improvement to legal basis harvesting thanks to help from Eur-Lex insiders
      • +
      +
      +

      Minor changes

      +
      • fixed bug in elx_label_eurovoc whereby resulting data frames contained list-columns
      • +
      +
      +
      + +
      +

      Minor changes

      +
      • hotfix for critical bug in xml parsing that scrambled column with legal basis where this was requested
      • +
      +
      +
      + +
      +

      Major changes

      +
      • improvement to legal basis harvesting thanks to help from Eur-Lex insiders
      • legal basis results are now slightly more comprehensive and correct
      • legal basis results now include a new column detailing the “suffix” (paragraph, subparagraph, etc.) in string form
      • -
      -
      -
      -

      -Minor changes

      -
        -
      • minor updates to documentation
      • -
      -
      -
      -
      -

      -eurlex 0.3.1 2020-09-11 -

      -
      -

      -Minor changes

      -
        -
      • +
      +
      +

      Minor changes

      +
      • minor updates to documentation
      • +
      +
      +
      + +
      +

      Minor changes

      +
      • elx_fetch_data() now prefers CELEX-based URLs (instead of Cellar URIs) as input, as they appear to yield fewer missing documents
      • -
      -
      -
      -
      -

      -eurlex 0.3.0 Unreleased -

      -
      -

      -Major changes

      -
      +
      +
      + +
      +

      Major changes

      +
      • +elx_fetch_data("text") now retrieves plain text from html, pdf and MS Word documents
      • the type of source file is documented
      • added handling of multiple files: all available text is retrieved and concatenated
      • so far no support for images requiring OCR for text extraction for the sake of limiting dependencies and avoiding prolonging execution time
      • -
      +
    -
    -
    -

    -eurlex 0.2.3 Unreleased -

    -
    -

    -Minor changes

    -
    -
    -

    -eurlex 0.2.2 Unreleased -

    -
    -

    -Major changes

    -
      -
    • +
      + +
      +

      Major changes

      + -
      -
      -
      -

      -eurlex 0.2.1 2020-08-19 -

      -
      -

      -Minor changes

      -
        -
      • optimization, reducing dependencies, etc.
      • -
      -
      -
      -
      -

      -eurlex 0.2.0 Unreleased -

      -
      -

      -Major changes

      -
        -
      • addition of proposals and national implementing laws to possible SPARQL queries
      • +
      +
      +
      + +
      +

      Minor changes

      +
      • optimization, reducing dependencies, etc.
      • +
      +
      +
      + +
      +

      Major changes

      +
      • addition of proposals and national implementing laws to possible SPARQL queries
      • EuroVoc topics, retrievable in all EU languages, can now be included in SPARQL results
      • new date options (force, end of validity, transposition)
      • added elx_curia_list() to retrieve full list of EU court cases
      • -
      -
      -
      -

      -Minor changes

      -
        -
      • switch from XML to xml2
      • +
      +
      +

      Minor changes

      +
      • switch from XML to xml2
      • SPARQL package dependency removed
      • cascading language options for elx_fetch_data()
      • -
      -
      +
    +
    - - - + + diff --git a/docs/reference/elx_run_query.html b/docs/reference/elx_run_query.html index 2944aa5..26710fb 100644 --- a/docs/reference/elx_run_query.html +++ b/docs/reference/elx_run_query.html @@ -1,126 +1,127 @@ - -Execute SPARQL queries — elx_run_query • eurlex - - -
    -
    - - - -
    -
    - - -
    -

    Executes cURL request to a pre-defined endpoint of the EU Publications Office. -Relies on elx_make_query to generate valid SPARQL queries

    -
    - -
    -
    elx_run_query(
    -  query = "",
    -  endpoint = "http://publications.europa.eu/webapi/rdf/sparql"
    -)
    -
    - -
    -

    Arguments

    -
    query
    -

    A valid SPARQL query specified by `elx_make_query` or manually

    -
    endpoint
    -

    SPARQL endpoint

    -
    -
    -

    Value

    -

    A data frame containing the results of the SPARQL query. -Column `work` contains the Cellar URI of the resource. Rows with even one missing variable are dropped.

    -
    - -
    -

    Examples

    -
    # \donttest{
    -elx_run_query(elx_make_query("directive", include_force = TRUE))
    -#> # A tibble: 4,367 x 4
    -#>    work                               type                          celex  force
    -#>    <chr>                              <chr>                         <chr>  <chr>
    -#>  1 http://publications.europa.eu/res~ http://publications.europa.e~ 31979~ false
    -#>  2 http://publications.europa.eu/res~ http://publications.europa.e~ 31989~ false
    -#>  3 http://publications.europa.eu/res~ http://publications.europa.e~ 31984~ false
    -#>  4 http://publications.europa.eu/res~ http://publications.europa.e~ 31966~ true 
    -#>  5 http://publications.europa.eu/res~ http://publications.europa.e~ 31993~ false
    -#>  6 http://publications.europa.eu/res~ http://publications.europa.e~ 31992~ false
    -#>  7 http://publications.europa.eu/res~ http://publications.europa.e~ 31983~ false
    -#>  8 http://publications.europa.eu/res~ http://publications.europa.e~ 31966~ false
    -#>  9 http://publications.europa.eu/res~ http://publications.europa.e~ 31974~ false
    -#> 10 http://publications.europa.eu/res~ http://publications.europa.e~ 31982~ false
    -#> # ... with 4,357 more rows
    -# }
    -
    -
    -
    - -
    - - -
    - - - - - - - - + +Execute SPARQL queries — elx_run_query • eurlex + + +
    +
    + + + +
    +
    + + +
    +

    Executes cURL request to a pre-defined endpoint of the EU Publications Office. +Relies on elx_make_query to generate valid SPARQL queries. +Results are capped at 1 million rows.

    +
    + +
    +
    elx_run_query(
    +  query = "",
    +  endpoint = "http://publications.europa.eu/webapi/rdf/sparql"
    +)
    +
    + +
    +

    Arguments

    +
    query
    +

    A valid SPARQL query specified by `elx_make_query()` or manually

    +
    endpoint
    +

    SPARQL endpoint

    +
    +
    +

    Value

    +

    A data frame containing the results of the SPARQL query. +Column `work` contains the Cellar URI of the resource. Rows with even one missing variable are dropped.

    +
    + +
    +

    Examples

    +
    # \donttest{
    +elx_run_query(elx_make_query("directive", include_force = TRUE))
    +#> # A tibble: 4,367 x 4
    +#>    work                               type                          celex  force
    +#>    <chr>                              <chr>                         <chr>  <chr>
    +#>  1 http://publications.europa.eu/res~ http://publications.europa.e~ 31979~ false
    +#>  2 http://publications.europa.eu/res~ http://publications.europa.e~ 31989~ false
    +#>  3 http://publications.europa.eu/res~ http://publications.europa.e~ 31984~ false
    +#>  4 http://publications.europa.eu/res~ http://publications.europa.e~ 31966~ true 
    +#>  5 http://publications.europa.eu/res~ http://publications.europa.e~ 31993~ false
    +#>  6 http://publications.europa.eu/res~ http://publications.europa.e~ 31992~ false
    +#>  7 http://publications.europa.eu/res~ http://publications.europa.e~ 31983~ false
    +#>  8 http://publications.europa.eu/res~ http://publications.europa.e~ 31966~ false
    +#>  9 http://publications.europa.eu/res~ http://publications.europa.e~ 31974~ false
    +#> 10 http://publications.europa.eu/res~ http://publications.europa.e~ 31982~ false
    +#> # ... with 4,357 more rows
    +# }
    +
    +
    +
    + +
    + + +
    + + + + + + + + diff --git a/docs/reference/index.html b/docs/reference/index.html index 2f493af..9192ab7 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -1,110 +1,109 @@ - -Function reference • eurlex - - -
    -
    - - - -
    -
    - - - - - - - - - - - - - - - - - -
    -

    All functions

    -

    -
    -

    elx_council_votes()

    -

    Retrieve Council votes on EU acts

    -

    elx_curia_list()

    -

    Scrape list of court cases from Curia

    -

    elx_download_xml()

    -

    Download XML notice associated with a URL

    -

    elx_fetch_data()

    -

    Retrieve additional data on EU documents

    -

    elx_label_eurovoc()

    -

    Label EuroVoc concepts

    -

    elx_make_query()

    -

    Create SPARQL queries

    -

    elx_run_query()

    -

    Execute SPARQL queries

    - - -
    - - -
    - - - - - - - - + +Function reference • eurlex + + +
    +
    + + + +
    +
    + + + + + + + + + + + + + + + + + +
    +

    All functions

    +

    +
    +

    elx_council_votes()

    +

    Retrieve Council votes on EU acts

    +

    elx_curia_list()

    +

    Scrape list of court cases from Curia

    +

    elx_download_xml()

    +

    Download XML notice associated with a URL

    +

    elx_fetch_data()

    +

    Retrieve additional data on EU documents

    +

    elx_label_eurovoc()

    +

    Label EuroVoc concepts

    +

    elx_make_query()

    +

    Create SPARQL queries

    +

    elx_run_query()

    +

    Execute SPARQL queries

    + + +
    + + +
    + + + + + + + + diff --git a/docs/search.json b/docs/search.json new file mode 100644 index 0000000..fe51488 --- /dev/null +++ b/docs/search.json @@ -0,0 +1 @@ +[]