remotes::install_dev("tmap")
library(spanishoddata)
diff --git a/articles/flowmaps-interactive.html b/articles/flowmaps-interactive.html
index a0ce85f..31b960b 100644
--- a/articles/flowmaps-interactive.html
+++ b/articles/flowmaps-interactive.html
@@ -70,7 +70,7 @@
-
+ Making interactive flow maps
@@ -84,7 +84,7 @@
- Making interactive flow maps
This tutorial shows how to make interactive ‘flow maps’ with data from spanishoddata and the flowmapblue (Boyandin 2024) data visualisation package. We cover two examples. First, we only visualise the total flows for a single day. In the second more advanced example we also use the time component that allows you to interactively filter flows by time of day. For both examples, make sure you first go though the initial setup steps. To make static flow maps, please see the static flow maps tutorial.
+ This tutorial shows how to make interactive ‘flow maps’ with data from spanishoddata and the flowmapblue (Boyandin 2024) data visualisation package. We cover two examples. First, we only visualise the total flows for a single day. In the second more advanced example we also use the time component that allows you to interactively filter flows by time of day. For both examples, make sure you first go though the initial setup steps. To make static flow maps, please see the static flow maps tutorial.
1 Setup
diff --git a/articles/flowmaps-static.html b/articles/flowmaps-static.html
index 50b62c7..78566ae 100644
--- a/articles/flowmaps-static.html
+++ b/articles/flowmaps-static.html
@@ -70,7 +70,7 @@
-
+ Making static flow maps
or how to re-create spanishoddata logo
@@ -84,7 +84,7 @@
- Making static flow maps
This tutorial shows how to make static ‘flow maps’ with data from spanishoddata and the flowmapper (Mast 2024) data visualisation package. We cover two examples. First, we only use the origin-destination flows and district zones that you can get using the spanishoddata package. In the second more advanced example we also use mapSpain and hexSticker packages to re-create the spanishoddata logo. For both examples, make sure you first go though the initial setup steps. To make interactive flow maps, please see the interactive flow maps tutorial.
+ This tutorial shows how to make static ‘flow maps’ with data from spanishoddata and the flowmapper (Mast 2024) data visualisation package. We cover two examples. First, we only use the origin-destination flows and district zones that you can get using the spanishoddata package. In the second more advanced example we also use mapSpain and hexSticker packages to re-create the spanishoddata logo. For both examples, make sure you first go though the initial setup steps. To make interactive flow maps, please see the interactive flow maps tutorial.
1 Setup
diff --git a/articles/quick-get.html b/articles/quick-get.html
index d61e40a..78d0a43 100644
--- a/articles/quick-get.html
+++ b/articles/quick-get.html
@@ -70,7 +70,7 @@
-
+ Quicky get daily data
@@ -84,7 +84,7 @@
- Quicky get daily data
+
1 Introduction
This vignette demonstrates how to get minimal daily aggregated data on the number of trips between municipalities using the spod_quick_get_od()
function. With this function, you only get total trips for a single day, and no additional variables that are available in the full v2 (2022 onwards) data set. The advantage of this function is that it is much faster than downloading the full data from source CSV files using spod_get()
, as each CSV file for a single day is about 200 MB in size. Also, this way of getting the data is much less demanding on your computer as you are only getting a small table from the internet (less than 1 MB), and no data processing (such as aggregation from more detailed hourly data with extra columns that is happening when you use spod_get()
function) is required on your computer.
diff --git a/articles/v1-2020-2021-mitma-data-codebook.html b/articles/v1-2020-2021-mitma-data-codebook.html
index c6dd9af..f0b202c 100644
--- a/articles/v1-2020-2021-mitma-data-codebook.html
+++ b/articles/v1-2020-2021-mitma-data-codebook.html
@@ -70,7 +70,7 @@
-
+ Codebook and cookbook for v1 (2020-2021) Spanish mobility data
@@ -84,7 +84,7 @@
- Codebook and cookbook for v1 (2020-2021) Spanish mobility data
You can view this vignette any time by running:
+ You can view this vignette any time by running:
spanishoddata::spod_codebook(ver = 1)
diff --git a/articles/v2-2022-onwards-mitma-data-codebook.html b/articles/v2-2022-onwards-mitma-data-codebook.html
index a4dbe93..b9648b7 100644
--- a/articles/v2-2022-onwards-mitma-data-codebook.html
+++ b/articles/v2-2022-onwards-mitma-data-codebook.html
@@ -70,7 +70,7 @@
-
+ Codebook and cookbook for v2 (2022 onwards) Spanish mobility data
@@ -84,7 +84,7 @@
- Codebook and cookbook for v2 (2022 onwards) Spanish mobility data
You can view this vignette any time by running:
+ You can view this vignette any time by running:
spanishoddata::spod_codebook(ver = 2)
diff --git a/pkgdown.yml b/pkgdown.yml
index 1b1cb4e..22d5348 100644
--- a/pkgdown.yml
+++ b/pkgdown.yml
@@ -9,7 +9,7 @@ articles:
quick-get: quick-get.html
v1-2020-2021-mitma-data-codebook: v1-2020-2021-mitma-data-codebook.html
v2-2022-onwards-mitma-data-codebook: v2-2022-onwards-mitma-data-codebook.html
-last_built: 2025-01-03T11:33Z
+last_built: 2025-01-07T16:43Z
urls:
reference: https://rOpenSpain.github.io/spanishoddata/reference
article: https://rOpenSpain.github.io/spanishoddata/articles
diff --git a/reference/spod_get_data_dir.html b/reference/spod_get_data_dir.html
index 7020b62..d6bc022 100644
--- a/reference/spod_get_data_dir.html
+++ b/reference/spod_get_data_dir.html
@@ -68,9 +68,9 @@ Value
Examples
spod_set_data_dir(tempdir())
#> Data directory is writeable.
-#> Data directory successfully set to: /tmp/RtmpZ6rjic
+#> Data directory successfully set to: /tmp/Rtmp7Zoro3
spod_get_data_dir()
-#> /tmp/RtmpZ6rjic
+#> /tmp/Rtmp7Zoro3
diff --git a/reference/spod_graphql_valid_dates.html b/reference/spod_graphql_valid_dates.html
new file mode 100644
index 0000000..c87b0d5
--- /dev/null
+++ b/reference/spod_graphql_valid_dates.html
@@ -0,0 +1,101 @@
+
+Get valid dates from the GraphQL API — spod_graphql_valid_dates • spanishoddata
+ Skip to contents
+
+
+
+
+
+
+
+ Get valid dates from the GraphQL API
+
+
+
+
+
+ Value
+ A Date
vector of dates that are valid to request data with spod_quick_get_od()
.
+
+
+
+
+
+
+
+
+
+
+
+
+ Site built with pkgdown 2.1.1.
+
+
+ Template rostemplate by dieghernan, based on Bootstrapious.
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/spod_set_data_dir.html b/reference/spod_set_data_dir.html
index 79841bb..e915ab5 100644
--- a/reference/spod_set_data_dir.html
+++ b/reference/spod_set_data_dir.html
@@ -69,7 +69,7 @@ Value
Examples
spod_set_data_dir(tempdir())
#> Data directory is writeable.
-#> Data directory successfully set to: /tmp/RtmpZ6rjic
+#> Data directory successfully set to: /tmp/Rtmp7Zoro3
diff --git a/search.json b/search.json
index d85dcdc..5c42205 100644
--- a/search.json
+++ b/search.json
@@ -1 +1 @@
-[{"path":"https://rOpenSpain.github.io/spanishoddata/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2024 spanishoddata authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"intro","dir":"Articles","previous_headings":"","what":"Introduction","title":"Download and convert mobility datasets","text":"TL;DR (long, didn’t read): analysing 1 week data, use spod_convert() convert data DuckDB spod_connect() connect analysis using dplyr. Skip section . main focus vignette show get long periods origin-destination data analysis. First, describe compare two ways get mobility data using origin-destination data example. package functions overall approaches working types data available package, number trips, overnight stays data. show get days origin-destination data spod_get(). Finally, show download convert multiple weeks, months even years origin-destination data analysis-ready formats. See description datasets Codebook cookbook v1 (2020-2021) Spanish mobility data Codebook cookbook v2 (2022 onwards) Spanish mobility data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"two-ways-to-get-the-data","dir":"Articles","previous_headings":"","what":"Two ways to get the data","title":"Download and convert mobility datasets","text":"two main ways import datasets: -memory object spod_get(); connection DuckDB Parquet files disk spod_convert() + spod_connect(). latter recommended large datasets (1 week), much faster memory efficient, demonstarte . spod_get() returns objects appropriate small datasets representing days national origin-destination flows. recommend converting data analysis-ready formats (DuckDB Parquet) using spod_convert() + spod_connect(). allow work much longer time periods (months years) consumer laptop (8-16 GB memory). See section details.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"analysing-large-datasets","dir":"Articles","previous_headings":"","what":"Analysing large datasets","title":"Download and convert mobility datasets","text":"mobility datasets available {spanishiddata} large. Particularly origin-destination data, contains millions rows. data sets may fit memory computer, especially plan run analysis multiple days, weeks, months, even years. work datasets, highly recommend using DuckDB Parquet. systems efficiently processing larger--memory datasets, user-firendly presenting data familiar data.frame/tibble object (almost). great intoroduction , recommend materials Danielle Navarro, Jonathan Keane, Stephanie Hazlitt: website, slides, video tutorial. can also find examples aggregating origin-destination data flows analysis visualisation vignettes static interactive flows visualisation. Learning use DuckDB Parquet easy anyone ever worked dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. However, since learning curve master new tools, provide helper functions novices get started easily open datasets DuckDB Parquet. Please read relevant sections , first show convert data, use . main considerations make choosing DuckDB Parquet (can get spod_convert() + spod_connect()), well CSV.gz (can get spod_get()) analysis speed, convenience data analysis, specific approach prefer getting data. discuss three . data format choose may dramatically impact speed analysis (e.g. filtering dates, calculating number trips per hour, per week, per month, per origin-destination pair, data aggregation manipulation). tests (see Figure 1), found conducting analysis using DuckDB database provided significant speed advantage using Parquet , importantly, raw CSV.gz files. Specifically, comparing query determine mean hourly trips 18 months zone pair, observed using DuckDB database 5 times faster using Parquet files 8 times faster using CSV.gz files. Figure 1: Data processing speed comparison: DuckDB engine running CSV.gz files vs DuckDB database vs folder Parquet files reference, simple query used speed comparison Figure 1: Figure 1 also shows DuckDB format give best performance even low-end systems limited memory number processor cores, conditional fast SSD storage. Also note, choose work long time periods using CSV.gz files via spod_get(), need balance amount memory processor cores via max_n_cpu max_mem_gb arguments, otherwise analysis may fail (see grey area figure), many parallel processes running time limited memory. Regardless data format (DuckDB, Parquet, CSV.gz), functions need data manipulation analysis . analysis actually performed DuckDB (Mühleisen Raasveldt 2024) engine, presents data regular data.frame/tibble object R (almost). point view, difference data formats. can manipulate data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. provide examples following sections. Please refer recommended external tutorials vignettes Analysing large datasets section. choice converting DuckDB Parquet also made based plan work data. Specifically whether want just download long periods even available data, want get data gradually, progress analysis. plan work long time periods, recommend DuckDB, one big file easier update completely. example may working 2020 data. Later decide add 2021 data. case better delete database create scratch. want certain dates, analyse add additional dates later, Parquet may better, day saved separate file, just like original CSV files. Therefore updating folder Parquet files easy just creating new file missing date. work individual days, may notice advantages DuckDB Parquet formats. case, can keep using CSV.gz format analysis using spod_get() function. also useful quick tutorials, need one two days data demonstration purposes.","code":"# data represents either CSV files acquired from `spod_get()`, a `DuckDB` database or a folder of Parquet files connceted with `spod_connect()` data |> group_by(id_origin, id_destination, time_slot) |> summarise(mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"duckdb-vs-parquet-csv","dir":"Articles","previous_headings":"","what":"How to choose between DuckDB, Parquet, and CSV","title":"Download and convert mobility datasets","text":"main considerations make choosing DuckDB Parquet (can get spod_convert() + spod_connect()), well CSV.gz (can get spod_get()) analysis speed, convenience data analysis, specific approach prefer getting data. discuss three . data format choose may dramatically impact speed analysis (e.g. filtering dates, calculating number trips per hour, per week, per month, per origin-destination pair, data aggregation manipulation). tests (see Figure 1), found conducting analysis using DuckDB database provided significant speed advantage using Parquet , importantly, raw CSV.gz files. Specifically, comparing query determine mean hourly trips 18 months zone pair, observed using DuckDB database 5 times faster using Parquet files 8 times faster using CSV.gz files. Figure 1: Data processing speed comparison: DuckDB engine running CSV.gz files vs DuckDB database vs folder Parquet files reference, simple query used speed comparison Figure 1: Figure 1 also shows DuckDB format give best performance even low-end systems limited memory number processor cores, conditional fast SSD storage. Also note, choose work long time periods using CSV.gz files via spod_get(), need balance amount memory processor cores via max_n_cpu max_mem_gb arguments, otherwise analysis may fail (see grey area figure), many parallel processes running time limited memory. Regardless data format (DuckDB, Parquet, CSV.gz), functions need data manipulation analysis . analysis actually performed DuckDB (Mühleisen Raasveldt 2024) engine, presents data regular data.frame/tibble object R (almost). point view, difference data formats. can manipulate data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. provide examples following sections. Please refer recommended external tutorials vignettes Analysing large datasets section. choice converting DuckDB Parquet also made based plan work data. Specifically whether want just download long periods even available data, want get data gradually, progress analysis. plan work long time periods, recommend DuckDB, one big file easier update completely. example may working 2020 data. Later decide add 2021 data. case better delete database create scratch. want certain dates, analyse add additional dates later, Parquet may better, day saved separate file, just like original CSV files. Therefore updating folder Parquet files easy just creating new file missing date. work individual days, may notice advantages DuckDB Parquet formats. case, can keep using CSV.gz format analysis using spod_get() function. also useful quick tutorials, need one two days data demonstration purposes.","code":"# data represents either CSV files acquired from `spod_get()`, a `DuckDB` database or a folder of Parquet files connceted with `spod_connect()` data |> group_by(id_origin, id_destination, time_slot) |> summarise(mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"speed-comparison","dir":"Articles","previous_headings":"3 Analysing large datasets","what":"Analysis Speed","title":"Download and convert mobility datasets","text":"data format choose may dramatically impact speed analysis (e.g. filtering dates, calculating number trips per hour, per week, per month, per origin-destination pair, data aggregation manipulation). tests (see Figure 1), found conducting analysis using DuckDB database provided significant speed advantage using Parquet , importantly, raw CSV.gz files. Specifically, comparing query determine mean hourly trips 18 months zone pair, observed using DuckDB database 5 times faster using Parquet files 8 times faster using CSV.gz files. Figure 1: Data processing speed comparison: DuckDB engine running CSV.gz files vs DuckDB database vs folder Parquet files reference, simple query used speed comparison Figure 1: Figure 1 also shows DuckDB format give best performance even low-end systems limited memory number processor cores, conditional fast SSD storage. Also note, choose work long time periods using CSV.gz files via spod_get(), need balance amount memory processor cores via max_n_cpu max_mem_gb arguments, otherwise analysis may fail (see grey area figure), many parallel processes running time limited memory.","code":"# data represents either CSV files acquired from `spod_get()`, a `DuckDB` database or a folder of Parquet files connceted with `spod_connect()` data |> group_by(id_origin, id_destination, time_slot) |> summarise(mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convenience-of-data-analysis","dir":"Articles","previous_headings":"3 Analysing large datasets","what":"Convenience of data analysis","title":"Download and convert mobility datasets","text":"Regardless data format (DuckDB, Parquet, CSV.gz), functions need data manipulation analysis . analysis actually performed DuckDB (Mühleisen Raasveldt 2024) engine, presents data regular data.frame/tibble object R (almost). point view, difference data formats. can manipulate data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. provide examples following sections. Please refer recommended external tutorials vignettes Analysing large datasets section.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"scenarios-of-getting-the-data","dir":"Articles","previous_headings":"3 Analysing large datasets","what":"Scenarios of getting the data","title":"Download and convert mobility datasets","text":"choice converting DuckDB Parquet also made based plan work data. Specifically whether want just download long periods even available data, want get data gradually, progress analysis. plan work long time periods, recommend DuckDB, one big file easier update completely. example may working 2020 data. Later decide add 2021 data. case better delete database create scratch. want certain dates, analyse add additional dates later, Parquet may better, day saved separate file, just like original CSV files. Therefore updating folder Parquet files easy just creating new file missing date. work individual days, may notice advantages DuckDB Parquet formats. case, can keep using CSV.gz format analysis using spod_get() function. also useful quick tutorials, need one two days data demonstration purposes.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Download and convert mobility datasets","text":"Make sure loaded package: Choose spanishoddata download (convert) data setting data directory following command: function also ensure directory created sufficient permissions write . can also set data directory environment variable: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"library(spanishoddata) spod_set_data_dir(data_dir = \"~/spanish_od_data\") Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Download and convert mobility datasets","text":"Choose spanishoddata download (convert) data setting data directory following command: function also ensure directory created sufficient permissions write . can also set data directory environment variable: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"spod_set_data_dir(data_dir = \"~/spanish_od_data\") Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"spod-get","dir":"Articles","previous_headings":"","what":"Getting a single day with spod_get()","title":"Download and convert mobility datasets","text":"might seen codebooks v1 v2 data, can get single day’s worth data -memory object spod_get(): output look like : Note lazily-evaluated -memory object (note :memory: database path). means data loaded memory call collect() . useful quick exploration data, recommended large datasets, demonstrated .","code":"dates <- c(\"2024-03-01\") d_1 <- spod_get(type = \"od\", zones = \"distr\", dates = dates) class(d_1) # Source: table [?? x 19] # Database: DuckDB v1.0.0 [... 6.5.0-45-generic:R 4.4.1/:memory:] date time_slot id_origin id_destination distance activity_origin 1 2024-03-01 19 01009_AM 01001 0.5-2 frequent_activity 2 2024-03-01 15 01002 01001 10-50 frequent_activity"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"duckdb","dir":"Articles","previous_headings":"","what":"Analysing the data using DuckDB database","title":"Download and convert mobility datasets","text":"Please make sure steps Setup section . can download convert data DuckDB database two steps. example, select dates, download data manually (note: use dates_2 refer fact using v2 data): , can convert downloaded data (including files might downloaded previosly running spod_get() spod_download() dates date intervals) DuckDB like (dates = \"cached_v2\" means use downloaded files): dates = \"cached_v2\" (can also dates = \"cached_v1\" v1 data) argument instructs function work already-downloaded files. default resulting DuckDB database v2 origin-destination data districts saved SPANISH_OD_DATA_DIR directory v2/tabular/duckdb/ filename od_distritos.duckdb (can change file path save_path argument). function returns full path database file, save db_2 variable. can also desired save location save_path argument spod_convert(). can also convert dates range dates list DuckDB: case, missing data yet downloaded automatically downloaded, 2020-02-17 redownloaded, already requsted creating db_1. requested dates converted DuckDB, overwriting file db_1. , save path output DuckDB database file db_2 variable. can read introductory information connect DuckDB files , however simplify things created helper function. connect data stored path db_1 db_2 can following: Just like , spod_get() funciton used download raw CSV.gz files analyse without conversion, resulting object my_od_data_2 also tbl_duckdb_connection. , can treat regular data.frame tibble use dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_od_data_2 advise “disconnect” data using: useful free-memory neccessary like run spod_convert() save data location. Otherwise, also helpful avoid unnecessary possible warnings terminal garbage collected connections.","code":"dates_2 <- c(start = \"2023-02-14\", end = \"2023-02-17\") spod_download(type = \"od\", zones = \"distr\", dates = dates_2) db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = \"cached_v2\", save_format = \"duckdb\", overwrite = TRUE) db_2 # check the path to the saved `DuckDB` database dates_1 <- c(start = \"2020-02-17\", end = \"2020-02-19\") db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = dates_1, overwrite = TRUE) my_od_data_2 <- spod_connect(db_2) spod_disconnect(my_od_data_2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convert-to-duckdb","dir":"Articles","previous_headings":"","what":"Convert to DuckDB","title":"Download and convert mobility datasets","text":"can download convert data DuckDB database two steps. example, select dates, download data manually (note: use dates_2 refer fact using v2 data): , can convert downloaded data (including files might downloaded previosly running spod_get() spod_download() dates date intervals) DuckDB like (dates = \"cached_v2\" means use downloaded files): dates = \"cached_v2\" (can also dates = \"cached_v1\" v1 data) argument instructs function work already-downloaded files. default resulting DuckDB database v2 origin-destination data districts saved SPANISH_OD_DATA_DIR directory v2/tabular/duckdb/ filename od_distritos.duckdb (can change file path save_path argument). function returns full path database file, save db_2 variable. can also desired save location save_path argument spod_convert(). can also convert dates range dates list DuckDB: case, missing data yet downloaded automatically downloaded, 2020-02-17 redownloaded, already requsted creating db_1. requested dates converted DuckDB, overwriting file db_1. , save path output DuckDB database file db_2 variable.","code":"dates_2 <- c(start = \"2023-02-14\", end = \"2023-02-17\") spod_download(type = \"od\", zones = \"distr\", dates = dates_2) db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = \"cached_v2\", save_format = \"duckdb\", overwrite = TRUE) db_2 # check the path to the saved `DuckDB` database dates_1 <- c(start = \"2020-02-17\", end = \"2020-02-19\") db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = dates_1, overwrite = TRUE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"load-converted-duckdb","dir":"Articles","previous_headings":"","what":"Load the converted DuckDB","title":"Download and convert mobility datasets","text":"can read introductory information connect DuckDB files , however simplify things created helper function. connect data stored path db_1 db_2 can following: Just like , spod_get() funciton used download raw CSV.gz files analyse without conversion, resulting object my_od_data_2 also tbl_duckdb_connection. , can treat regular data.frame tibble use dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_od_data_2 advise “disconnect” data using: useful free-memory neccessary like run spod_convert() save data location. Otherwise, also helpful avoid unnecessary possible warnings terminal garbage collected connections.","code":"my_od_data_2 <- spod_connect(db_2) spod_disconnect(my_od_data_2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"parquet","dir":"Articles","previous_headings":"","what":"Analysing the data using Parquet","title":"Download and convert mobility datasets","text":"Please make sure steps Setup section . process exactly DuckDB . difference data converted parquet format stored SPANISH_OD_DATA_DIR v1/clean_data/tabular/parquet/ directory v1 data (change save_path argument), subfolders hive-style format like year=2020/month=2/day=14 inside folders single parquet file placed containing data day. advantage format can “update” quickly. example, first downloaded data March April 2020, converted period parquet format, downloaded data May June 2020, run convertion function , convert data May June 2020 add existing parquet files. save time wait March April 2020 converted . Let us convert dates parquet format: now request additional dates overlap already converted data like specifiy argument overwrite = 'update' update existing parquet files new data: , 16 17 Feboruary converted . new data, converted (18 19 February) converted, added existing folder structure ofparquet files stored default save_path location, /clean_data/v1/tabular/parquet/od_distritos. Alternatively, can set save location setting save_path argument. Working parquet files exactly DuckDB Arrow files. Just like , can use helper function spod_connect() connect parquet files: Mind though, first converted data period 14 17 February 2020, converted data period 16 19 February 2020 save default location, od_parquet contains path data, therefore my_od_data_3 connect data. can check like : analysis, please refer recommended external tutorials vignettes Analysing large datasets section.","code":"type <- \"od\" zones <- \"distr\" dates <- c(start = \"2020-02-14\", end = \"2020-02-17\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\") dates <- c(start = \"2020-02-16\", end = \"2020-02-19\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\", overwrite = 'update') my_od_data_3 <- spod_connect(od_parquet) my_od_data_3 |> dplyr::distinct(date) |> dplyr::arrange(date)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convert-to-parquet","dir":"Articles","previous_headings":"","what":"Convert to Parquet","title":"Download and convert mobility datasets","text":"process exactly DuckDB . difference data converted parquet format stored SPANISH_OD_DATA_DIR v1/clean_data/tabular/parquet/ directory v1 data (change save_path argument), subfolders hive-style format like year=2020/month=2/day=14 inside folders single parquet file placed containing data day. advantage format can “update” quickly. example, first downloaded data March April 2020, converted period parquet format, downloaded data May June 2020, run convertion function , convert data May June 2020 add existing parquet files. save time wait March April 2020 converted . Let us convert dates parquet format: now request additional dates overlap already converted data like specifiy argument overwrite = 'update' update existing parquet files new data: , 16 17 Feboruary converted . new data, converted (18 19 February) converted, added existing folder structure ofparquet files stored default save_path location, /clean_data/v1/tabular/parquet/od_distritos. Alternatively, can set save location setting save_path argument.","code":"type <- \"od\" zones <- \"distr\" dates <- c(start = \"2020-02-14\", end = \"2020-02-17\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\") dates <- c(start = \"2020-02-16\", end = \"2020-02-19\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\", overwrite = 'update')"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"load-converted-parquet","dir":"Articles","previous_headings":"","what":"Load the converted Parquet","title":"Download and convert mobility datasets","text":"Working parquet files exactly DuckDB Arrow files. Just like , can use helper function spod_connect() connect parquet files: Mind though, first converted data period 14 17 February 2020, converted data period 16 19 February 2020 save default location, od_parquet contains path data, therefore my_od_data_3 connect data. can check like : analysis, please refer recommended external tutorials vignettes Analysing large datasets section.","code":"my_od_data_3 <- spod_connect(od_parquet) my_od_data_3 |> dplyr::distinct(date) |> dplyr::arrange(date)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"all-dates","dir":"Articles","previous_headings":"","what":"Download all available data","title":"Download and convert mobility datasets","text":"prepare origin-destination data v1 (2020-2021) analysis whole period data availability, please follow steps : Warning Due mobile network outages, data certain dates missing. Kindly keep mind calculating mean monthly weekly flows. Please check original data page currently known missing dates. time writing, following dates missing: 26, 27, 30, 31 October; 1, 2 3 November 2023; 4, 18, 19 April 2024, 10 11 November 2024. can use spod_get_valid_dates() function get available dates. example origin-destination district level v1 data. can change type “number_of_trips” zones “municipalities” v1 data. v2 data, just use dates starting 2022-01-01 dates_v2 . Use function arguments v2 way shown v1, also consult v2 data codebook, many datasets addition “origin-destination” “number_of_trips”. convert downloaded data DuckDB format lightning fast analysis. can change save_format parquet want save data Parquet format. comparison overview two formats please see Converting data DuckDB/Parquet faster analysis. default, spod_convert_data() save converted data SPANISH_OD_DATA_DIR directory. can change save_path argument spod_convert_data() want save data different location. conversion, 4 GB operating memory enough, speed process depends number processor cores speed disk storage. SSD preferred. default, spod_convert_data() use except one processor cores computer. can adjust max_n_cpu argument spod_convert_data(). can also increase maximum amount memory used max_mem_gb argument, makes difference analysis stage. Finally, analysis_data_storage simply store path converted data. Either path DuckDB database file path folder Parquet files. reference, converting whole v1 origin-destination data DuckDB takes 20 minutes 4 GB memory 3 processor cores. final size DuckDB database 18 GB, Parquet format - 26 GB. raw CSV files gzip archives 20GB. v2 data much larger, origin-destination tables 2022 - mid-2024 taking 150+ GB raw CSV.gz format. can pass analysis_data_storage path spod_connect() function, whether DuckDB Parquet. function determine data type automatically give back tbl_duckdb_connection1. set max_mem_gb 16 GB. Generally, , feel free increase , also consult Figure 1 speed testing results Speed section. can try combinations max_mem_gb max_n_cpu arguments needs Compared conversion process, might want increase available memory analysis step. , better. can control max_mem_gb argument. can manipulate my_data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_data advise “disconnect” free memory:","code":"dates_v1 <- spod_get_valid_dates(ver = 1) dates_v2 <- spod_get_valid_dates(ver = 2) type <- \"origin-destination\" zones <- \"districts\" spod_download( type = type, zones = zones, dates = dates_v1, return_local_file_paths = FALSE, # to avoid getting all downloaded file paths printed to console max_download_size_gb = 50 # in Gb, this should be well over the actual download size for v1 data ) save_format <- \"duckdb\" analysis_data_storage <- spod_convert_data( type = type, zones = zones, dates = \"cached_v1\", # to just convert all data that was previously downloaded, no need to specify dates here save_format = save_format, overwrite = TRUE ) my_data <- spod_connect( data_path = analysis_data_storage, max_mem_gb = 16 ) spod_disconnect(my_data)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"download-all-data","dir":"Articles","previous_headings":"","what":"Download all data","title":"Download and convert mobility datasets","text":"example origin-destination district level v1 data. can change type “number_of_trips” zones “municipalities” v1 data. v2 data, just use dates starting 2022-01-01 dates_v2 . Use function arguments v2 way shown v1, also consult v2 data codebook, many datasets addition “origin-destination” “number_of_trips”.","code":"type <- \"origin-destination\" zones <- \"districts\" spod_download( type = type, zones = zones, dates = dates_v1, return_local_file_paths = FALSE, # to avoid getting all downloaded file paths printed to console max_download_size_gb = 50 # in Gb, this should be well over the actual download size for v1 data )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convert-all-data-into-analysis-ready-format","dir":"Articles","previous_headings":"","what":"Convert all data into analysis ready format","title":"Download and convert mobility datasets","text":"convert downloaded data DuckDB format lightning fast analysis. can change save_format parquet want save data Parquet format. comparison overview two formats please see Converting data DuckDB/Parquet faster analysis. default, spod_convert_data() save converted data SPANISH_OD_DATA_DIR directory. can change save_path argument spod_convert_data() want save data different location. conversion, 4 GB operating memory enough, speed process depends number processor cores speed disk storage. SSD preferred. default, spod_convert_data() use except one processor cores computer. can adjust max_n_cpu argument spod_convert_data(). can also increase maximum amount memory used max_mem_gb argument, makes difference analysis stage. Finally, analysis_data_storage simply store path converted data. Either path DuckDB database file path folder Parquet files.","code":"save_format <- \"duckdb\" analysis_data_storage <- spod_convert_data( type = type, zones = zones, dates = \"cached_v1\", # to just convert all data that was previously downloaded, no need to specify dates here save_format = save_format, overwrite = TRUE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"conversion-speed","dir":"Articles","previous_headings":"","what":"Conversion speed","title":"Download and convert mobility datasets","text":"reference, converting whole v1 origin-destination data DuckDB takes 20 minutes 4 GB memory 3 processor cores. final size DuckDB database 18 GB, Parquet format - 26 GB. raw CSV files gzip archives 20GB. v2 data much larger, origin-destination tables 2022 - mid-2024 taking 150+ GB raw CSV.gz format.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"connecting-to-and-analysing-the-converted-datasets","dir":"Articles","previous_headings":"","what":"Connecting to and analysing the converted datasets","title":"Download and convert mobility datasets","text":"can pass analysis_data_storage path spod_connect() function, whether DuckDB Parquet. function determine data type automatically give back tbl_duckdb_connection1. set max_mem_gb 16 GB. Generally, , feel free increase , also consult Figure 1 speed testing results Speed section. can try combinations max_mem_gb max_n_cpu arguments needs Compared conversion process, might want increase available memory analysis step. , better. can control max_mem_gb argument. can manipulate my_data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_data advise “disconnect” free memory:","code":"my_data <- spod_connect( data_path = analysis_data_storage, max_mem_gb = 16 ) spod_disconnect(my_data)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/disaggregation.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"OD data disaggregation","text":"vignette demonstrates origin-destination (OD) data disaggregation using {odjitter} package. package implementation method described paper “Jittering: Computationally Efficient Method Generating Realistic Route Networks Origin-Destination Data” (Lovelace, Félix, Carlino 2022) adding value OD data disaggregating desire lines. can especially useful transport planning purposes high levels geographic resolution required (see also od2net direct network generation OD data).","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/disaggregation.html","id":"data-preparation","dir":"Articles","previous_headings":"","what":"Data preparation","title":"OD data disaggregation","text":"’ll start loading week’s worth origin-destination data city Salamanca, building example README (note: chunks evaluated):","code":"od_db <- spod_get( type = \"od\", zones = \"distritos\", dates = c(start = \"2024-03-01\", end = \"2024-03-07\") ) distritos <- spod_get_zones(\"distritos\", ver = 2) distritos_wgs84 <- distritos |> sf::st_simplify(dTolerance = 200) |> sf::st_transform(4326) od_national_aggregated <- od_db |> group_by(id_origin, id_destination) |> summarise(Trips = sum(n_trips), .groups = \"drop\") |> filter(Trips > 500) |> collect() |> arrange(desc(Trips)) od_national_aggregated od_national_interzonal <- od_national_aggregated |> filter(id_origin != id_destination) salamanca_zones <- zonebuilder::zb_zone(\"Salamanca\") distritos_salamanca <- distritos_wgs84[salamanca_zones, ] ids_salamanca <- distritos_salamanca$id od_salamanca <- od_national_interzonal |> filter(id_origin %in% ids_salamanca) |> filter(id_destination %in% ids_salamanca) |> arrange(Trips) od_salamanca_sf <- od::od_to_sf( od_salamanca, z = distritos_salamanca )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/disaggregation.html","id":"disaggregating-desire-lines","dir":"Articles","previous_headings":"","what":"Disaggregating desire lines","title":"OD data disaggregation","text":"’ll need additional dependencies: ’ll get road network OSM: can use road network disaggregate desire lines: Let’s plot disaggregated desire lines: results show can add value OD data disaggregating desire lines {odjitter} package. can useful understanding spatial distribution trips within zone transport planning. plotted disaggregated desire lines top major road network Salamanca. next step routing help prioritise infrastructure improvements.","code":"remotes::install_github(\"dabreegster/odjitter\", subdir = \"r\") remotes::install_github(\"nptscot/osmactive\") salamanca_boundary <- sf::st_union(distritos_salamanca) osm_full <- osmactive::get_travel_network(salamanca_boundary) osm <- osm_full[salamanca_boundary, ] drive_net <- osmactive::get_driving_network(osm) drive_net_major <- osmactive::get_driving_network_major(osm) cycle_net <- osmactive::get_cycling_network(osm) cycle_net <- osmactive::distance_to_road(cycle_net, drive_net_major) cycle_net <- osmactive::classify_cycle_infrastructure(cycle_net) map_net <- osmactive::plot_osm_tmap(cycle_net) map_net od_jittered <- odjitter::jitter( od_salamanca_sf, zones = distritos_salamanca, subpoints = drive_net, disaggregation_threshold = 1000, disaggregation_key = \"Trips\" ) od_jittered |> arrange(Trips) |> ggplot() + geom_sf(aes(colour = Trips), size = 1) + scale_colour_viridis_c() + geom_sf(data = drive_net_major, colour = \"black\") + theme_void()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Making interactive flow maps","text":"basemap final visualisation need free Mapbox access token. can get one account.mapbox.com/access-tokens/ (need Mapbox account, free). may skip step, case interative flowmap basemap, flows just flow solid colour background. got access token, can set MAPBOX_TOKEN environment variable like : Choose spanishoddata download (convert) data setting data directory following command: function also ensure directory created sufficient permissions write . can also set data directory environment variable: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(MAPBOX_TOKEN = \"YOUR_MAPBOX_ACCESS_TOKEN\") library(spanishoddata) library(flowmapblue) library(tidyverse) library(sf) spod_set_data_dir(data_dir = \"~/spanish_od_data\") Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Making interactive flow maps","text":"Choose spanishoddata download (convert) data setting data directory following command: function also ensure directory created sufficient permissions write . can also set data directory environment variable: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"spod_set_data_dir(data_dir = \"~/spanish_od_data\") Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"simple-example","dir":"Articles","previous_headings":"","what":"Simple example - plot flows data as it is","title":"Making interactive flow maps","text":"Let us get flows districts tipycal working day 2021-04-07: also get district zones polygons mathch flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook). visualise flows, flowmapblue expects two data.frames following format (use packages’s built-data Switzerland illustration): Locations data.frame id, optional name, well lat lon coordinates locations WGS84 (EPSG: 4326) coordinate reference system. Flows data.frame origin, dest, count flows locations, origin dest must match id’s locations data.frame , count number trips . need coordinates origin destination. can use centroids districts_v1 polygons . Remember, map basemap, need setup Mapbox access token setup section vignette. Create interactive flowmap flowmapblue function. example use darkMode clustering, disable animation. recommend disabling clustering plotting flows hundreds thousands locations, reduce redability map. Video Video demonstrating standard interactive flowmap can play around arguments flowmapblue function. example, can turn animation mode: Video Video demonstrating animated interactive flowmap Screenshot demonstrating animated interactive flowmap","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_ine_code residence_province_name time_slot distance n_trips trips_total_length_km year month day 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 7 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 7 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 7 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 7 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 7 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 7 districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; 22117; 22164; 22187; 22214; 22102; 22103; 22115; 22117; 22164; 22187; 222… Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 070… Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429… str(flowmapblue::ch_locations) 'data.frame': 26 obs. of 4 variables: $ id : chr \"ZH\" \"LU\" \"UR\" \"SZ\" ... $ name: chr \"Zürich\" \"Luzern\" \"Uri\" \"Schwyz\" ... $ lat : num 47.4 47.1 46.8 47.1 46.9 ... $ lon : num 8.65 8.11 8.63 8.76 8.24 ... str(flowmapblue::ch_flows) str(flowmapblue::ch_flows) 'data.frame': 676 obs. of 3 variables: $ origin: chr \"ZH\" \"ZH\" \"ZH\" \"ZH\" ... $ dest : chr \"ZH\" \"BE\" \"LU\" \"UR\" ... $ count : int 66855 1673 1017 84 1704 70 94 250 1246 173 ... od_20210407_total <- od_20210407 |> group_by(origin = id_origin, dest = id_destination) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_total) # A tibble: 6 × 3 origin dest count 1 01001_AM 01036 39.8 2 01001_AM 01051 2508. 3 01001_AM 0105903 1644. 4 01001_AM 09363_AM 3.96 5 01001_AM 09907_AM 32.6 6 01001_AM 17033 9.61 districts_v1_centroids <- districts_v1 |> st_transform(4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = districts_v1$id) |> rename(lon = X, lat = Y) head(districts_v1_centroids) lon lat id 1 -5.5551053 42.59849 2408910 2 0.3260681 42.17266 22117_AM 3 -3.8136448 37.74344 2305009 4 2.8542636 39.80672 07058_AM 5 -3.8229513 37.77294 2305006 6 -3.8151096 37.86309 2305005 flowmap <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap flowmap_anim <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = TRUE, clustering = TRUE ) flowmap_anim"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"get-data","dir":"Articles","previous_headings":"","what":"Get data","title":"Making interactive flow maps","text":"Let us get flows districts tipycal working day 2021-04-07: also get district zones polygons mathch flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_ine_code residence_province_name time_slot distance n_trips trips_total_length_km year month day 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 7 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 7 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 7 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 7 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 7 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 7 districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; 22117; 22164; 22187; 22214; 22102; 22103; 22115; 22117; 22164; 22187; 222… Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 070… Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Flows","title":"Making interactive flow maps","text":"Let us get flows districts tipycal working day 2021-04-07:","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_ine_code residence_province_name time_slot distance n_trips trips_total_length_km year month day 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 7 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 7 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 7 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 7 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 7 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 7"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"zones","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Zones","title":"Making interactive flow maps","text":"also get district zones polygons mathch flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; 22117; 22164; 22187; 22214; 22102; 22103; 22115; 22117; 22164; 22187; 222… Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 070… Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"prepare-data-for-visualization","dir":"Articles","previous_headings":"","what":"Prepare data for visualization","title":"Making interactive flow maps","text":"visualise flows, flowmapblue expects two data.frames following format (use packages’s built-data Switzerland illustration): Locations data.frame id, optional name, well lat lon coordinates locations WGS84 (EPSG: 4326) coordinate reference system. Flows data.frame origin, dest, count flows locations, origin dest must match id’s locations data.frame , count number trips .","code":"str(flowmapblue::ch_locations) 'data.frame': 26 obs. of 4 variables: $ id : chr \"ZH\" \"LU\" \"UR\" \"SZ\" ... $ name: chr \"Zürich\" \"Luzern\" \"Uri\" \"Schwyz\" ... $ lat : num 47.4 47.1 46.8 47.1 46.9 ... $ lon : num 8.65 8.11 8.63 8.76 8.24 ... str(flowmapblue::ch_flows) str(flowmapblue::ch_flows) 'data.frame': 676 obs. of 3 variables: $ origin: chr \"ZH\" \"ZH\" \"ZH\" \"ZH\" ... $ dest : chr \"ZH\" \"BE\" \"LU\" \"UR\" ... $ count : int 66855 1673 1017 84 1704 70 94 250 1246 173 ... od_20210407_total <- od_20210407 |> group_by(origin = id_origin, dest = id_destination) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_total) # A tibble: 6 × 3 origin dest count 1 01001_AM 01036 39.8 2 01001_AM 01051 2508. 3 01001_AM 0105903 1644. 4 01001_AM 09363_AM 3.96 5 01001_AM 09907_AM 32.6 6 01001_AM 17033 9.61"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"expected-data-format","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Expected data format","title":"Making interactive flow maps","text":"visualise flows, flowmapblue expects two data.frames following format (use packages’s built-data Switzerland illustration): Locations data.frame id, optional name, well lat lon coordinates locations WGS84 (EPSG: 4326) coordinate reference system. Flows data.frame origin, dest, count flows locations, origin dest must match id’s locations data.frame , count number trips .","code":"str(flowmapblue::ch_locations) 'data.frame': 26 obs. of 4 variables: $ id : chr \"ZH\" \"LU\" \"UR\" \"SZ\" ... $ name: chr \"Zürich\" \"Luzern\" \"Uri\" \"Schwyz\" ... $ lat : num 47.4 47.1 46.8 47.1 46.9 ... $ lon : num 8.65 8.11 8.63 8.76 8.24 ... str(flowmapblue::ch_flows) str(flowmapblue::ch_flows) 'data.frame': 676 obs. of 3 variables: $ origin: chr \"ZH\" \"ZH\" \"ZH\" \"ZH\" ... $ dest : chr \"ZH\" \"BE\" \"LU\" \"UR\" ... $ count : int 66855 1673 1017 84 1704 70 94 250 1246 173 ..."},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"aggregate-data---count-total-flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Aggregate data - count total flows","title":"Making interactive flow maps","text":"","code":"od_20210407_total <- od_20210407 |> group_by(origin = id_origin, dest = id_destination) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_total) # A tibble: 6 × 3 origin dest count 1 01001_AM 01036 39.8 2 01001_AM 01051 2508. 3 01001_AM 0105903 1644. 4 01001_AM 09363_AM 3.96 5 01001_AM 09907_AM 32.6 6 01001_AM 17033 9.61"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"create-locations-table","dir":"Articles","previous_headings":"","what":"Create locations table with coordinates","title":"Making interactive flow maps","text":"need coordinates origin destination. can use centroids districts_v1 polygons .","code":"districts_v1_centroids <- districts_v1 |> st_transform(4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = districts_v1$id) |> rename(lon = X, lat = Y) head(districts_v1_centroids) lon lat id 1 -5.5551053 42.59849 2408910 2 0.3260681 42.17266 22117_AM 3 -3.8136448 37.74344 2305009 4 2.8542636 39.80672 07058_AM 5 -3.8229513 37.77294 2305006 6 -3.8151096 37.86309 2305005"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"create-the-plot","dir":"Articles","previous_headings":"","what":"Create the plot","title":"Making interactive flow maps","text":"Remember, map basemap, need setup Mapbox access token setup section vignette. Create interactive flowmap flowmapblue function. example use darkMode clustering, disable animation. recommend disabling clustering plotting flows hundreds thousands locations, reduce redability map. Video Video demonstrating standard interactive flowmap can play around arguments flowmapblue function. example, can turn animation mode: Video Video demonstrating animated interactive flowmap Screenshot demonstrating animated interactive flowmap","code":"flowmap <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap flowmap_anim <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = TRUE, clustering = TRUE ) flowmap_anim"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"advanced-example","dir":"Articles","previous_headings":"","what":"Advanced example - time filter","title":"Making interactive flow maps","text":"following simple example, let us now add time filter flows. use flowmapblue function plot flows districts_v1_centroids typical working day 2021-04-07. Just like , aggregate data rename columns. time keep combine date time_slot (corresponds hour day) procude timestamps, flows can interactively filtered time day. now using flows hour day, 24 times rows data, simple example. Therefore take longer generate plot resulting visualisation may work slower. create manageable example, let us filter data Madrid surrounding areas. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. District zone boundaries Barcelona nearby areas Now prepare table coordinates flowmap: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. Video Video demonstrating time filtering flowmap Screnshot demonstrating time filtering flowmap","code":"od_20210407_time <- od_20210407 |> mutate(time = as.POSIXct(paste0(date, \"T\", time_slot, \":00:00\"))) |> group_by(origin = id_origin, dest = id_destination, time) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_time) # A tibble: 6 × 4 origin dest time count 1 08054 0818401 2021-04-07 01:00:00 43.7 2 08054 0818401 2021-04-07 17:00:00 87.1 3 08054 0818402 2021-04-07 16:00:00 62.6 4 08054 0818403 2021-04-07 05:00:00 26.8 5 08054 0818403 2021-04-07 07:00:00 44.9 6 08054 0818403 2021-04-07 02:00:00 7.11 zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_transform(crs = 4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = zones_barcelona_fua$id) |> rename(lon = X, lat = Y) head(zones_barcelona_fua_coords) lon lat id 1 2.154317 41.49969 08180 2 1.968438 41.48274 08054 3 2.106401 41.41265 0801905 4 2.118221 41.38697 0801904 5 2.150536 41.42915 0801907 6 2.152419 41.41014 0801906 od_20210407_time_barcelona <- od_20210407_time |> filter(origin %in% zones_barcelona_fua$id & dest %in% zones_barcelona_fua$id) flowmap_time <- flowmapblue( locations = zones_barcelona_fua_coords, flows = od_20210407_time_barcelona, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap_time"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"prepare-data-for-visualization-1","dir":"Articles","previous_headings":"","what":"Prepare data for visualization","title":"Making interactive flow maps","text":"Just like , aggregate data rename columns. time keep combine date time_slot (corresponds hour day) procude timestamps, flows can interactively filtered time day. now using flows hour day, 24 times rows data, simple example. Therefore take longer generate plot resulting visualisation may work slower. create manageable example, let us filter data Madrid surrounding areas. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. District zone boundaries Barcelona nearby areas Now prepare table coordinates flowmap: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. Video Video demonstrating time filtering flowmap Screnshot demonstrating time filtering flowmap","code":"od_20210407_time <- od_20210407 |> mutate(time = as.POSIXct(paste0(date, \"T\", time_slot, \":00:00\"))) |> group_by(origin = id_origin, dest = id_destination, time) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_time) # A tibble: 6 × 4 origin dest time count 1 08054 0818401 2021-04-07 01:00:00 43.7 2 08054 0818401 2021-04-07 17:00:00 87.1 3 08054 0818402 2021-04-07 16:00:00 62.6 4 08054 0818403 2021-04-07 05:00:00 26.8 5 08054 0818403 2021-04-07 07:00:00 44.9 6 08054 0818403 2021-04-07 02:00:00 7.11 zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_transform(crs = 4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = zones_barcelona_fua$id) |> rename(lon = X, lat = Y) head(zones_barcelona_fua_coords) lon lat id 1 2.154317 41.49969 08180 2 1.968438 41.48274 08054 3 2.106401 41.41265 0801905 4 2.118221 41.38697 0801904 5 2.150536 41.42915 0801907 6 2.152419 41.41014 0801906 od_20210407_time_barcelona <- od_20210407_time |> filter(origin %in% zones_barcelona_fua$id & dest %in% zones_barcelona_fua$id) flowmap_time <- flowmapblue( locations = zones_barcelona_fua_coords, flows = od_20210407_time_barcelona, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap_time"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"filter-the-zones","dir":"Articles","previous_headings":"3 Advanced example - time filter","what":"Filter the zones","title":"Making interactive flow maps","text":"now using flows hour day, 24 times rows data, simple example. Therefore take longer generate plot resulting visualisation may work slower. create manageable example, let us filter data Madrid surrounding areas. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. District zone boundaries Barcelona nearby areas Now prepare table coordinates flowmap:","code":"zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_transform(crs = 4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = zones_barcelona_fua$id) |> rename(lon = X, lat = Y) head(zones_barcelona_fua_coords) lon lat id 1 2.154317 41.49969 08180 2 1.968438 41.48274 08054 3 2.106401 41.41265 0801905 4 2.118221 41.38697 0801904 5 2.150536 41.42915 0801907 6 2.152419 41.41014 0801906"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"prepare-the-flows","dir":"Articles","previous_headings":"3 Advanced example - time filter","what":"Prepare the flows","title":"Making interactive flow maps","text":"Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around .","code":"od_20210407_time_barcelona <- od_20210407_time |> filter(origin %in% zones_barcelona_fua$id & dest %in% zones_barcelona_fua$id)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"visualise-the-flows-for-barcelona-and-surrounding-areas","dir":"Articles","previous_headings":"3 Advanced example - time filter","what":"Visualise the flows for Barcelona and surrounding areas","title":"Making interactive flow maps","text":"Now, can create new plot data. Video Video demonstrating time filtering flowmap Screnshot demonstrating time filtering flowmap","code":"flowmap_time <- flowmapblue( locations = zones_barcelona_fua_coords, flows = od_20210407_time_barcelona, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap_time"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Making static flow maps","text":"Choose spanishoddata download (convert) data setting data directory following command: function also ensure directory created sufficient permissions write . can also set data directory environment variable: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"library(spanishoddata) library(flowmapper) library(tidyverse) library(sf) spod_set_data_dir(data_dir = \"~/spanish_od_data\") Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Making static flow maps","text":"Choose spanishoddata download (convert) data setting data directory following command: function also ensure directory created sufficient permissions write . can also set data directory environment variable: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"spod_set_data_dir(data_dir = \"~/spanish_od_data\") Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"simple-example","dir":"Articles","previous_headings":"","what":"Simple example - plot flows data as it is","title":"Making static flow maps","text":"Let us get flows districts typical working day 2021-04-07: also get district zones polygons match flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook). flowmapper package developed visualise origin-destination ‘flow’ data (Mast 2024). package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; previous code chunk created od_20210407_total column names expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons . Now data structure match flowmapper‘s expected data format can plot sample data (plot containing flows ’busy’ world resemble haystack!). k_node argument add_flowmap function can used reduce business. Let us filter flows zones data just specific functional urban area take closer look flows. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_in…¹ residence_province_n…² time_slot distance n_trips trips_total_length_km year month 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 # ℹ abbreviated names: ¹residence_province_ine_code, ²residence_province_name # ℹ 1 more variable: day districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; … Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; … Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429… od_20210407_total <- od_20210407 |> group_by(o = id_origin, d = id_destination) |> summarise(value = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(o, d, value) head(od_20210407_total) # A tibble: 6 × 3 o d value 1 2408910 2408910 1889. 2 2408910 24154_AM 11.0 3 2408910 5029703 12.8 4 2408910 24181_AM 22.3 5 2408910 4802004 9.45 6 2408910 4718608 4.75 districts_v1_coords <- districts_v1 |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = districts_v1$id) |> rename(x = X, y = Y) head(districts_v1_coords) x y name 1 290380.7 4719394 2408910 2 774727.2 4674304 22117_AM 3 428315.4 4177662 2305009 4 1001283.0 4422732 07058_AM 5 427524.2 4180942 2305006 6 428302.1 4190937 2305005 # create base ggplot with boundaries removing various visual clutter base_plot_districts <- ggplot() + geom_sf(data = districts_v1, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) + guides(fill = \"none\") # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_all_districts <- base_plot_districts |> add_flowmap( od = od_20210407_total, nodes = districts_v1_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", legend_gradient = TRUE, k_node = 20 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_all_districts <- flows_plot_all_districts + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_all_districts zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906 od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id) # create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) + guides(fill = 'none') # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", legend_gradient = TRUE, k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"get-data","dir":"Articles","previous_headings":"","what":"Get data","title":"Making static flow maps","text":"Let us get flows districts typical working day 2021-04-07: also get district zones polygons match flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_in…¹ residence_province_n…² time_slot distance n_trips trips_total_length_km year month 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 # ℹ abbreviated names: ¹residence_province_ine_code, ²residence_province_name # ℹ 1 more variable: day districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; … Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; … Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Flows","title":"Making static flow maps","text":"Let us get flows districts typical working day 2021-04-07:","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_in…¹ residence_province_n…² time_slot distance n_trips trips_total_length_km year month 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 # ℹ abbreviated names: ¹residence_province_ine_code, ²residence_province_name # ℹ 1 more variable: day "},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"zones","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Zones","title":"Making static flow maps","text":"also get district zones polygons match flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; … Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; … Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"aggregate-data---count-total-flows","dir":"Articles","previous_headings":"","what":"Aggregate data - count total flows","title":"Making static flow maps","text":"","code":"od_20210407_total <- od_20210407 |> group_by(o = id_origin, d = id_destination) |> summarise(value = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(o, d, value)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"reshape-flows-for-visualization","dir":"Articles","previous_headings":"","what":"Reshape flows for visualization","title":"Making static flow maps","text":"flowmapper package developed visualise origin-destination ‘flow’ data (Mast 2024). package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; previous code chunk created od_20210407_total column names expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons .","code":"head(od_20210407_total) # A tibble: 6 × 3 o d value 1 2408910 2408910 1889. 2 2408910 24154_AM 11.0 3 2408910 5029703 12.8 4 2408910 24181_AM 22.3 5 2408910 4802004 9.45 6 2408910 4718608 4.75 districts_v1_coords <- districts_v1 |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = districts_v1$id) |> rename(x = X, y = Y) head(districts_v1_coords) x y name 1 290380.7 4719394 2408910 2 774727.2 4674304 22117_AM 3 428315.4 4177662 2305009 4 1001283.0 4422732 07058_AM 5 427524.2 4180942 2305006 6 428302.1 4190937 2305005"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-flows-table","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Prepare the flows table","title":"Making static flow maps","text":"previous code chunk created od_20210407_total column names expected flowmapper.","code":"head(od_20210407_total) # A tibble: 6 × 3 o d value 1 2408910 2408910 1889. 2 2408910 24154_AM 11.0 3 2408910 5029703 12.8 4 2408910 24181_AM 22.3 5 2408910 4802004 9.45 6 2408910 4718608 4.75"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-nodes-table-with-coordinates","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Prepare the nodes table with coordinates","title":"Making static flow maps","text":"need coordinates origin destination. can use centroids districts_v1 polygons .","code":"districts_v1_coords <- districts_v1 |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = districts_v1$id) |> rename(x = X, y = Y) head(districts_v1_coords) x y name 1 290380.7 4719394 2408910 2 774727.2 4674304 22117_AM 3 428315.4 4177662 2305009 4 1001283.0 4422732 07058_AM 5 427524.2 4180942 2305006 6 428302.1 4190937 2305005"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"plot-the-flows","dir":"Articles","previous_headings":"","what":"Plot the flows","title":"Making static flow maps","text":"Now data structure match flowmapper‘s expected data format can plot sample data (plot containing flows ’busy’ world resemble haystack!). k_node argument add_flowmap function can used reduce business. Let us filter flows zones data just specific functional urban area take closer look flows. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"# create base ggplot with boundaries removing various visual clutter base_plot_districts <- ggplot() + geom_sf(data = districts_v1, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) + guides(fill = \"none\") # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_all_districts <- base_plot_districts |> add_flowmap( od = od_20210407_total, nodes = districts_v1_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", legend_gradient = TRUE, k_node = 20 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_all_districts <- flows_plot_all_districts + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_all_districts zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906 od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id) # create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) + guides(fill = 'none') # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", legend_gradient = TRUE, k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"plot-the-entire-country","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Plot the entire country","title":"Making static flow maps","text":"Now data structure match flowmapper‘s expected data format can plot sample data (plot containing flows ’busy’ world resemble haystack!). k_node argument add_flowmap function can used reduce business.","code":"# create base ggplot with boundaries removing various visual clutter base_plot_districts <- ggplot() + geom_sf(data = districts_v1, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) + guides(fill = \"none\") # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_all_districts <- base_plot_districts |> add_flowmap( od = od_20210407_total, nodes = districts_v1_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", legend_gradient = TRUE, k_node = 20 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_all_districts <- flows_plot_all_districts + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_all_districts"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"zoom-in-to-the-city-level","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Zoom in to the city level","title":"Making static flow maps","text":"Let us filter flows zones data just specific functional urban area take closer look flows. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906 od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id) # create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) + guides(fill = 'none') # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", legend_gradient = TRUE, k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"filter-the-zones","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Filter the zones","title":"Making static flow maps","text":"Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function:","code":"zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Prepare the flows","title":"Making static flow maps","text":"Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around .","code":"od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"visualise-the-flows-for-barcelona-and-surrounding-areas","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Visualise the flows for Barcelona and surrounding areas","title":"Making static flow maps","text":"Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"# create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) + guides(fill = 'none') # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", legend_gradient = TRUE, k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"advanced-example","dir":"Articles","previous_headings":"","what":"Advanced example - aggregate flows for {spanishoddata} logo","title":"Making static flow maps","text":"advanced example need two additional packages: mapSpain (Hernangómez 2024) hexSticker (R-hexSticker?). Just like simple example , need flows visualise. Let us get origin-destination flows districts typical working day 2022-04-06: Also get spatial data zones. using version 2 zones, data got 2022 onwards, corresponds v2 data (see relevant codebook). Ultimately, like plot flows map Spain, aggregate flows visualisation avoid visual clutter. therefore also need nice map Spain, get using mapSpain (Hernangómez 2024) package: getting two sets boundaries. First one Canary Islands moved closer mainland Spain, nicer visualisation. Second one original location islands, can spatially join zones districts data got spanishoddata. Let us count total number trips made locations selected day 2022-04-06: Now need spatial join districts spain_for_join find districts fall within autonomous community. use spain_for_join. used spain_for_vis, districts Canary Islands match boundaries islands. way get table districts ids corresponding autonomous community names. can now add ids total flows districts id pairs calculate total flows autonomous communities: going use flowmapper (Mast 2024) package plot flows. package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; data right now flows_by_ca already correct format expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons . Now data structure match flowmapper’s expected data format: image may look bit bleak, put sticker, look great. make sticker using hexSticker (Yu 2020) package.","code":"# two new packages library(mapSpain) library(hexSticker) # load these too, if you have not already library(spanishoddata) library(flowmapper) library(tidyverse) library(sf) od <- spod_get(\"od\", zones = \"distr\", dates = \"2022-04-06\") districts <- spod_get_zones(\"distr\", ver = 2) spain_for_vis <- esp_get_ccaa() spain_for_join <- esp_get_ccaa(moveCAN = FALSE) flows_by_district <- od |> group_by(id_origin, id_destination) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(desc(id_origin), id_destination, n_trips) flows_by_district # A tibble: 402,711 × 3 id_origin id_destination n_trips 1 31260_AM 01017_AM 7.15 2 31260_AM 01043 13.7 3 31260_AM 0105902 16.1 4 31260_AM 2512005 12.2 5 31260_AM 26002_AM 8 6 31260_AM 26026_AM 4 7 31260_AM 26036 38.3 8 31260_AM 26061_AM 10.6 9 31260_AM 26084 5.5 10 31260_AM 2608902 109. # ℹ 402,701 more rows # ℹ Use `print(n = ...)` to see more rows district_centroids <- districts |> st_centroid() |> st_transform(crs = st_crs(spain_for_join)) ca_distr <- district_centroids |> st_join(spain_for_join) |> st_drop_geometry() |> filter(!is.na(ccaa.shortname.en)) |> select(id, ca_name = ccaa.shortname.en) ca_distr # A tibble: 3,784 × 2 id ca_name 1 01001 Basque Country 2 01002 Basque Country 3 01004_AM Basque Country 4 01009_AM Basque Country 5 01010 Basque Country 6 01017_AM Basque Country 7 01028_AM Basque Country 8 01036 Basque Country 9 01043 Basque Country 10 01047_AM Basque Country # ℹ 3,774 more rows # ℹ Use `print(n = ...)` to see more rows flows_by_ca <- flows_by_district |> left_join(ca_distr |> rename(id_orig = ca_name), by = c(\"id_origin\" = \"id\") ) |> left_join(ca_distr |> rename(id_dest = ca_name), by = c(\"id_destination\" = \"id\") ) |> group_by(id_orig, id_dest) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> rename(o = id_orig, d = id_dest, value = n_trips) flows_by_ca # A tibble: 358 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. 7 Andalusia Cantabria 153. 8 Andalusia Castile and León 3114. 9 Andalusia Castile-La Mancha 13655. 10 Andalusia Catalonia 5453. # ℹ 348 more rows # ℹ Use `print(n = ...)` to see more rows head(flows_by_ca) # A tibble: 6 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. spain_for_vis_coords <- spain_for_vis |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = spain_for_vis$ccaa.shortname.en) |> rename(x = X, y = Y) head(spain_for_vis_coords) x y name 1 -4.5777846 37.46782 Andalusia 2 -0.6648791 41.51335 Aragon 3 -5.9936312 43.29377 Asturias 4 2.9065933 39.57481 Balearic Islands 5 -10.7324736 35.36091 Canary Islands 6 -4.0300438 43.19772 Cantabria # create base ggplot with boundaries removing any extra elements base_plot <- ggplot() + geom_sf(data = spain_for_vis, fill=NA, col = \"grey30\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) + guides(fill = 'none') # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot <- base_plot|> add_flowmap( od = flows_by_ca, nodes = spain_for_vis_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", k_node = 10 # play around with this parameter to aggregate nodes and flows ) # customise colours and remove legend, as we need a clean image for the logo flows_plot <- flows_plot + guides(fill=\"none\") + scale_fill_gradient(low=\"#FABB29\", high = \"#AB061F\") flows_plot sticker(flows_plot, # package name package= \"spanishoddata\", p_size=4, p_y = 1.6, p_color = \"gray25\", p_family=\"Roboto\", # ggplot image size and position s_x=1.02, s_y=1.19, s_width=2.6, s_height=2.72, # white hex h_fill=\"#ffffff\", h_color=\"grey\", h_size=1.3, # url url = \"github.com/rOpenSpain/spanishoddata\", u_color= \"gray25\", u_family = \"Roboto\", u_size = 1.2, # save output name and resolution filename=\"./man/figures/logo.png\", dpi=300 # )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"get-data-1","dir":"Articles","previous_headings":"","what":"Get data","title":"Making static flow maps","text":"Just like simple example , need flows visualise. Let us get origin-destination flows districts typical working day 2022-04-06: Also get spatial data zones. using version 2 zones, data got 2022 onwards, corresponds v2 data (see relevant codebook). Ultimately, like plot flows map Spain, aggregate flows visualisation avoid visual clutter. therefore also need nice map Spain, get using mapSpain (Hernangómez 2024) package: getting two sets boundaries. First one Canary Islands moved closer mainland Spain, nicer visualisation. Second one original location islands, can spatially join zones districts data got spanishoddata.","code":"od <- spod_get(\"od\", zones = \"distr\", dates = \"2022-04-06\") districts <- spod_get_zones(\"distr\", ver = 2) spain_for_vis <- esp_get_ccaa() spain_for_join <- esp_get_ccaa(moveCAN = FALSE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"flows-1","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Flows","title":"Making static flow maps","text":"Just like simple example , need flows visualise. Let us get origin-destination flows districts typical working day 2022-04-06: Also get spatial data zones. using version 2 zones, data got 2022 onwards, corresponds v2 data (see relevant codebook).","code":"od <- spod_get(\"od\", zones = \"distr\", dates = \"2022-04-06\") districts <- spod_get_zones(\"distr\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"map-of-spain","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Map of Spain","title":"Making static flow maps","text":"Ultimately, like plot flows map Spain, aggregate flows visualisation avoid visual clutter. therefore also need nice map Spain, get using mapSpain (Hernangómez 2024) package: getting two sets boundaries. First one Canary Islands moved closer mainland Spain, nicer visualisation. Second one original location islands, can spatially join zones districts data got spanishoddata.","code":"spain_for_vis <- esp_get_ccaa() spain_for_join <- esp_get_ccaa(moveCAN = FALSE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"flows-aggregation","dir":"Articles","previous_headings":"","what":"Flows aggregation","title":"Making static flow maps","text":"Let us count total number trips made locations selected day 2022-04-06: Now need spatial join districts spain_for_join find districts fall within autonomous community. use spain_for_join. used spain_for_vis, districts Canary Islands match boundaries islands. way get table districts ids corresponding autonomous community names. can now add ids total flows districts id pairs calculate total flows autonomous communities:","code":"flows_by_district <- od |> group_by(id_origin, id_destination) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(desc(id_origin), id_destination, n_trips) flows_by_district # A tibble: 402,711 × 3 id_origin id_destination n_trips 1 31260_AM 01017_AM 7.15 2 31260_AM 01043 13.7 3 31260_AM 0105902 16.1 4 31260_AM 2512005 12.2 5 31260_AM 26002_AM 8 6 31260_AM 26026_AM 4 7 31260_AM 26036 38.3 8 31260_AM 26061_AM 10.6 9 31260_AM 26084 5.5 10 31260_AM 2608902 109. # ℹ 402,701 more rows # ℹ Use `print(n = ...)` to see more rows district_centroids <- districts |> st_centroid() |> st_transform(crs = st_crs(spain_for_join)) ca_distr <- district_centroids |> st_join(spain_for_join) |> st_drop_geometry() |> filter(!is.na(ccaa.shortname.en)) |> select(id, ca_name = ccaa.shortname.en) ca_distr # A tibble: 3,784 × 2 id ca_name 1 01001 Basque Country 2 01002 Basque Country 3 01004_AM Basque Country 4 01009_AM Basque Country 5 01010 Basque Country 6 01017_AM Basque Country 7 01028_AM Basque Country 8 01036 Basque Country 9 01043 Basque Country 10 01047_AM Basque Country # ℹ 3,774 more rows # ℹ Use `print(n = ...)` to see more rows flows_by_ca <- flows_by_district |> left_join(ca_distr |> rename(id_orig = ca_name), by = c(\"id_origin\" = \"id\") ) |> left_join(ca_distr |> rename(id_dest = ca_name), by = c(\"id_destination\" = \"id\") ) |> group_by(id_orig, id_dest) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> rename(o = id_orig, d = id_dest, value = n_trips) flows_by_ca # A tibble: 358 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. 7 Andalusia Cantabria 153. 8 Andalusia Castile and León 3114. 9 Andalusia Castile-La Mancha 13655. 10 Andalusia Catalonia 5453. # ℹ 348 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"aggregate-raw-origin-destination-data-by-original-ids","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Aggregate raw origin destination data by original ids","title":"Making static flow maps","text":"Let us count total number trips made locations selected day 2022-04-06:","code":"flows_by_district <- od |> group_by(id_origin, id_destination) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(desc(id_origin), id_destination, n_trips) flows_by_district # A tibble: 402,711 × 3 id_origin id_destination n_trips 1 31260_AM 01017_AM 7.15 2 31260_AM 01043 13.7 3 31260_AM 0105902 16.1 4 31260_AM 2512005 12.2 5 31260_AM 26002_AM 8 6 31260_AM 26026_AM 4 7 31260_AM 26036 38.3 8 31260_AM 26061_AM 10.6 9 31260_AM 26084 5.5 10 31260_AM 2608902 109. # ℹ 402,701 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"match-ids-of-districts-with-autonomous-communities","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Match ids of districts with autonomous communities","title":"Making static flow maps","text":"Now need spatial join districts spain_for_join find districts fall within autonomous community. use spain_for_join. used spain_for_vis, districts Canary Islands match boundaries islands. way get table districts ids corresponding autonomous community names.","code":"district_centroids <- districts |> st_centroid() |> st_transform(crs = st_crs(spain_for_join)) ca_distr <- district_centroids |> st_join(spain_for_join) |> st_drop_geometry() |> filter(!is.na(ccaa.shortname.en)) |> select(id, ca_name = ccaa.shortname.en) ca_distr # A tibble: 3,784 × 2 id ca_name 1 01001 Basque Country 2 01002 Basque Country 3 01004_AM Basque Country 4 01009_AM Basque Country 5 01010 Basque Country 6 01017_AM Basque Country 7 01028_AM Basque Country 8 01036 Basque Country 9 01043 Basque Country 10 01047_AM Basque Country # ℹ 3,774 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"count-flows-between-pairs-of-autonomous-communities","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Count flows between pairs of autonomous communities","title":"Making static flow maps","text":"can now add ids total flows districts id pairs calculate total flows autonomous communities:","code":"flows_by_ca <- flows_by_district |> left_join(ca_distr |> rename(id_orig = ca_name), by = c(\"id_origin\" = \"id\") ) |> left_join(ca_distr |> rename(id_dest = ca_name), by = c(\"id_destination\" = \"id\") ) |> group_by(id_orig, id_dest) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> rename(o = id_orig, d = id_dest, value = n_trips) flows_by_ca # A tibble: 358 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. 7 Andalusia Cantabria 153. 8 Andalusia Castile and León 3114. 9 Andalusia Castile-La Mancha 13655. 10 Andalusia Catalonia 5453. # ℹ 348 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"reshape-flows-for-visualization-1","dir":"Articles","previous_headings":"","what":"Reshape flows for visualization","title":"Making static flow maps","text":"going use flowmapper (Mast 2024) package plot flows. package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; data right now flows_by_ca already correct format expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons .","code":"head(flows_by_ca) # A tibble: 6 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. spain_for_vis_coords <- spain_for_vis |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = spain_for_vis$ccaa.shortname.en) |> rename(x = X, y = Y) head(spain_for_vis_coords) x y name 1 -4.5777846 37.46782 Andalusia 2 -0.6648791 41.51335 Aragon 3 -5.9936312 43.29377 Asturias 4 2.9065933 39.57481 Balearic Islands 5 -10.7324736 35.36091 Canary Islands 6 -4.0300438 43.19772 Cantabria"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-flows-table-1","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Prepare the flows table","title":"Making static flow maps","text":"data right now flows_by_ca already correct format expected flowmapper.","code":"head(flows_by_ca) # A tibble: 6 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899."},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-nodes-table-with-coordinates-1","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Prepare the nodes table with coordinates","title":"Making static flow maps","text":"need coordinates origin destination. can use centroids districts_v1 polygons .","code":"spain_for_vis_coords <- spain_for_vis |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = spain_for_vis$ccaa.shortname.en) |> rename(x = X, y = Y) head(spain_for_vis_coords) x y name 1 -4.5777846 37.46782 Andalusia 2 -0.6648791 41.51335 Aragon 3 -5.9936312 43.29377 Asturias 4 2.9065933 39.57481 Balearic Islands 5 -10.7324736 35.36091 Canary Islands 6 -4.0300438 43.19772 Cantabria"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"plot-the-flows-1","dir":"Articles","previous_headings":"","what":"Plot the flows","title":"Making static flow maps","text":"Now data structure match flowmapper’s expected data format: image may look bit bleak, put sticker, look great.","code":"# create base ggplot with boundaries removing any extra elements base_plot <- ggplot() + geom_sf(data = spain_for_vis, fill=NA, col = \"grey30\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ) + guides(fill = 'none') # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot <- base_plot|> add_flowmap( od = flows_by_ca, nodes = spain_for_vis_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", k_node = 10 # play around with this parameter to aggregate nodes and flows ) # customise colours and remove legend, as we need a clean image for the logo flows_plot <- flows_plot + guides(fill=\"none\") + scale_fill_gradient(low=\"#FABB29\", high = \"#AB061F\") flows_plot"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"make-the-sticker","dir":"Articles","previous_headings":"","what":"Make the sticker","title":"Making static flow maps","text":"make sticker using hexSticker (Yu 2020) package.","code":"sticker(flows_plot, # package name package= \"spanishoddata\", p_size=4, p_y = 1.6, p_color = \"gray25\", p_family=\"Roboto\", # ggplot image size and position s_x=1.02, s_y=1.19, s_width=2.6, s_height=2.72, # white hex h_fill=\"#ffffff\", h_color=\"grey\", h_size=1.3, # url url = \"github.com/rOpenSpain/spanishoddata\", u_color= \"gray25\", u_family = \"Roboto\", u_size = 1.2, # save output name and resolution filename=\"./man/figures/logo.png\", dpi=300 # )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/quick-get.html","id":"intro","dir":"Articles","previous_headings":"","what":"Introduction","title":"Quicky get daily data","text":"vignette demonstrates get minimal daily aggregated data number trips municipalities using spod_quick_get_od() function. function, get total trips single day, additional variables available full v2 (2022 onwards) data set. advantage function much faster downloading full data source CSV files using spod_get(), CSV file single day 200 MB size. Also, way getting data much less demanding computer getting small table internet (less 1 MB), data processing (aggregation detailed hourly data extra columns happening use spod_get() function) required computer.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/quick-get.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Quicky get daily data","text":"Make sure loaded package: Choose spanishoddata download (convert) data setting data directory following command: function also ensure directory created sufficient permissions write . can also set data directory environment variable: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project: Note Setting local data directory case optional, data downloaded directly web API caching disk. However, metadata downloaded check range valid dates available time request. metadata downloaded temporary location default, data directory, set .","code":"library(spanishoddata) library(dplyr) spod_set_data_dir(data_dir = \"~/spanish_od_data\") Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/quick-get.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Quicky get daily data","text":"Choose spanishoddata download (convert) data setting data directory following command: function also ensure directory created sufficient permissions write . can also set data directory environment variable: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project: Note Setting local data directory case optional, data downloaded directly web API caching disk. However, metadata downloaded check range valid dates available time request. metadata downloaded temporary location default, data directory, set .","code":"spod_set_data_dir(data_dir = \"~/spanish_od_data\") Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/quick-get.html","id":"get-data","dir":"Articles","previous_headings":"","what":"Get the data","title":"Quicky get daily data","text":"get data, use spod_quick_get_od() function. need specify whether need municipalities districts, municipal level data can accessed function. min_trips argument specifies minimum number trips include data. set min_trips 0, get data origin-destination pairs specified date. data returned tibble contanes requested date, identifiers origin destination municipalities, number trips, total length trips kilometers. get trips certain length, use distances argument. get trips certain municipalities, use id_origin id_destination arguments. can get valid munincipality identifiers spod_get_zones(\"muni\", ver = 2) function. function need download spatial data, might take time might want setup data download folder spod_setup_cache() done . Let us select locations Madrid name: Now let use use IDs origins gett trips Madrid rest Spain: Similarly, can set limits destination municipalities: can now proceed analyse flows visualise static interactive flow maps tutorials.","code":"od_1000 <- spod_quick_get_od( date = \"2022-01-01\", min_trips = 1000 ) glimpse(od_1000) glimpse(od_1000) Rows: 8,524 Columns: 5 $ date 2022-01-01, 2022-01-01, 2022-01-01, 2022… $ id_origin \"01001\", \"01002\", \"01002\", \"01002\", \"0100… $ id_destination \"01059\", \"01036\", \"01002\", \"01054_AM\", \"0… $ n_trips 2142, 1215, 8899, 1105, 2250, 4621, 1992,… $ trips_total_length_km 27130, 13743, 26700, 10603, 12228, 69999,… od_1000 # A tibble: 8,524 × 5 date id_origin id_destination n_trips trips_total_length_km 1 2022-01-01 01001 01059 2142 27130 2 2022-01-01 01002 01036 1215 13743 3 2022-01-01 01002 01002 8899 26700 4 2022-01-01 01002 01054_AM 1105 10603 5 2022-01-01 01002 01010 2250 12228 6 2022-01-01 01009_AM 01059 4621 69999 7 2022-01-01 01009_AM 01009_AM 1992 16395 8 2022-01-01 01009_AM 01051 2680 18554 9 2022-01-01 01010 01002 2147 11578 10 2022-01-01 01017_AM 01017_AM 1847 12695 # ℹ 8,514 more rows # ℹ Use `print(n = ...)` to see more rows od_long <- spod_quick_get_od( date = \"2022-01-01\", min_trips = 0, distances = c(\"10-50km\", \"50+km\") ) glimpse(od_long) Rows: 247,208 Columns: 5 $ date 2022-01-01, 2022-01-01, 2022-01-01, 2022… $ id_origin \"08015\", \"08015\", \"08015\", \"08015\", \"0801… $ id_destination \"08285\", \"17902_AM\", \"43014\", \"08007\", \"0… $ n_trips 5, 1, 5, 165, 210, 111, 1486, 39, 52, 166… $ trips_total_length_km 339, 161, 924, 5052, 2955, 2453, 29630, 1… od_long # A tibble: 247,208 × 5 date id_origin id_destination n_trips trips_total_length_km 1 2022-01-01 08015 08285 5 339 2 2022-01-01 08015 17902_AM 1 161 3 2022-01-01 08015 43014 5 924 4 2022-01-01 08015 08007 165 5052 5 2022-01-01 08015 08030 210 2955 6 2022-01-01 08015 08051 111 2453 7 2022-01-01 08015 08121 1486 29630 8 2022-01-01 08015 08122_AM 39 1886 9 2022-01-01 08015 08300_AM 52 1301 10 2022-01-01 08015 08902 166 2042 # ℹ 247,198 more rows # ℹ Use `print(n = ...)` to see more rows municipalities <- spod_get_zones(\"muni\", ver = 2) head(municipalities) madrid_muni_ids <- municipalities |> filter(str_detect(name, \"Madrid\")) |> pull(id) madrid_muni_ids [1] \"28073\" \"28079\" \"28127\" \"45087\" flows_from_Madrid <- spod_quick_get_od( date = \"2022-01-01\", min_trips = 0, id_origin = madrid_muni_ids ) glimpse(flows_from_Madrid) Rows: 2,232 Columns: 5 $ date
Download and convert mobility datasets
@@ -84,7 +84,7 @@ -Download and convert mobility datasets
+
1 Introduction
TL;DR (too long, didn’t read): For analysing more than 1 week of data, use spod_convert()
to convert the data into DuckDB
and spod_connect()
to connect to it for analysis using dplyr. Skip to the section about it.
diff --git a/articles/disaggregation.html b/articles/disaggregation.html
index 22e14ef..2aede1d 100644
--- a/articles/disaggregation.html
+++ b/articles/disaggregation.html
@@ -70,7 +70,7 @@
-
+ OD data disaggregation
@@ -84,7 +84,7 @@
- OD data disaggregation
+
1 Introduction
TL;DR (too long, didn’t read): For analysing more than 1 week of data, use spod_convert()
to convert the data into DuckDB
and spod_connect()
to connect to it for analysis using dplyr. Skip to the section about it.