-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_animals()
: info of 2 tags collapsed into one column
#287
Comments
I'd love to hear some opinions on this. I'm not one to stand in the way of progress. As it stands, I would expect you to be able to do this anyway. Except... I played around with this, and at the moment this wouldn't actually work directly in cases where there is a comma in the I also noticed the error you get isn't super informative, as it doesn't actually tell you what tags weren't found. I'll patch get_tags() so it'll also accept comma separated I'm aiming for something like this to result in the data.frame you are looking for: get_animals() %>%
pull(tag_serial_number) %>%
stringr::str_split(stringr::fixed(",")) %>%
unlist() %>%
get_tags(tag_serial_number = .) |
@lottepohl could you provide an example table of the output you get so I can better understand what you mean? |
@PieterjanVerhelst sure! Here you go. So I will reformulate my suggestion: This approach is similar to the different length measurements that can be retrieved with @PieterjanVerhelst do you understand the situation better now? It's nothing super urgent, just something that would be nice to have. Because indeed, like @PietrH said, this could be done with just a couple of lines of code. Cheers, library(etn)
library(dplyr)
con <- etn::connect_to_etn()
smoothhounds <- etn::get_animals(scientific_name = "Mustelus asterias")
# many columns contain double information, e.g. 'tag_serial_number' has the same info twice
smoothhounds %>%
dplyr::select(tag_serial_number, tag_type, tag_subtype, acoustic_tag_id) %>%
head()
#> # A tibble: 6 × 4
#> tag_serial_number tag_type tag_subtype acoustic_tag_id
#> <chr> <chr> <chr> <chr>
#> 1 1293314,1293314 acoustic-archival,acoustic-arch… animal,ani… A69-9006-3820,…
#> 2 1293321,1293321 acoustic-archival,acoustic-arch… animal,ani… A69-9006-3835,…
#> 3 1293315,1293315 acoustic-archival,acoustic-arch… animal,ani… A69-9006-3822,…
#> 4 1293316,1293316 acoustic-archival,acoustic-arch… animal,ani… A69-9006-3825,…
#> 5 1293322,1293322 acoustic-archival,acoustic-arch… animal,ani… A69-9006-3836,…
#> 6 1293317,1293317 acoustic-archival,acoustic-arch… animal,ani… A69-9006-3826,…
# acoustic_tag_id has two different values
smoothhounds %>%
dplyr::select(acoustic_tag_id) %>%
head()
#> # A tibble: 6 × 1
#> acoustic_tag_id
#> <chr>
#> 1 A69-9006-3820,A69-9006-3821
#> 2 A69-9006-3835,A69-9006-3834
#> 3 A69-9006-3822,A69-9006-3823
#> 4 A69-9006-3825,A69-9006-3824
#> 5 A69-9006-3836,A69-9006-3837
#> 6 A69-9006-3826,A69-9006-3827
# there are different columns for different length measurements
smoothhounds %>%
dplyr::select(length1:length4_unit) %>%
colnames()
#> [1] "length1" "length1_unit" "length2_type" "length2" "length2_unit"
#> [6] "length3_type" "length3" "length3_unit" "length4_type" "length4"
#> [11] "length4_unit" |
Aha indeed, now I understand. I am not a fan of having multiple values in one cell, separated by a separator (here a comma). This makes data processing like filtering hard and cumbersome. Why not add two records (or the number needed) instead of collapsing everything in one record, separated with commas? |
@PieterjanVerhelst how I understand that would be the same as I propose? To have a column for each unique acoustic tag id, and serial number etc.? Cheers! |
Not sure, but I would suggest the following format, with the example of two transmitters (hence two tag serial numbers) each having two acoustic tag IDs with a specific sensor value as an example:
|
@PieterjanVerhelst Yes I agree! However, to my understanding, the table that you show is similar to the output you get from library(etn)
library(dplyr)
con <- etn::connect_to_etn()tags <- etn::get_tags(acoustic_tag_id = c("A69-9006-3820", "A69-9006-3821", "A69-9006-3835", "A69-9006-3834"))
tags %>% dplyr::select(tag_serial_number, sensor_type, acoustic_tag_id)
#> # A tibble: 4 × 3
#> tag_serial_number sensor_type acoustic_tag_id
#> <chr> <chr> <chr>
#> 1 1293314 pressure A69-9006-3821
#> 2 1293314 temperature A69-9006-3820
#> 3 1293321 pressure A69-9006-3835
#> 4 1293321 temperature A69-9006-3834 @PietrH @PieterjanVerhelst could the following work? To illustrate, the output from
The output from
If desired, these two tables could be joined ( |
@peterdesmet There is a proposal to change the output of Do you happen to remember why you chose to do it this way back in the day? 6b81948 |
@lottepohl indeed, your suggestion is also what seems more easy for data processing and manipulation 👍 . |
Thank you for all your input! I'm trying to visualise your proposed flow of functions.
Could you write me some (pseudo-)code of how you'd like this to work: how would you connect the functions in your script, and what the intermediary tables would look like? |
@PietrH The reason tag information was collapsed with comma is because of naming. With The solution that @lottepohl proposes solves this and Notes:
|
I looked a bit more into it and want to change my suggestion. Scope
I propose the following: 1. Collapse repeated tag info in animalsIf one of the library(etn)
library(dplyr)
con <- etn::connect_to_etn()
# Before
animals <- get_animals(tag_serial_number = c(1293314, 1293321))
animals %>% select(animal_id, animal_project_code, tag_serial_number, tag_type, tag_subtype, acoustic_tag_id, acoustic_tag_id_alternative, scientific_name)
#> # A tibble: 2 × 8
#> animal_id animal_project_code tag_serial_number tag_type tag_subtype
#> <int> <chr> <chr> <chr> <chr>
#> 1 3171 ADST-Shark 1293314,1293314 acoustic-archival… animal,ani…
#> 2 3172 ADST-Shark 1293321,1293321 acoustic-archival… animal,ani…
#> # ℹ 3 more variables: acoustic_tag_id <chr>, acoustic_tag_id_alternative <chr>,
#> # scientific_name <chr>
# After
#> # A tibble: 2 × 8
#> animal_id animal_project_code tag_serial_number tag_type tag_subtype
#> <int> <chr> <chr> <chr> <chr>
#> 1 3171 ADST-Shark 1293314 acoustic-archival animal
#> 2 3172 ADST-Shark 1293321 acoustic-archival animal
#> # ℹ 3 more variables: acoustic_tag_id <chr>, acoustic_tag_id_alternative <chr>,
#> # scientific_name <chr> 2. One step further: remove
|
@peterdesmet a good suggestion which I follow, but to be able to better judge it, could you provide a full table of a before and after example? |
@PieterjanVerhelst here you go: Current implementation (notice repeated identical values in animals
# A tibble: 80 × 66
animal_id animal_project_code tag_serial_number tag_type tag_subtype acoustic_tag_id acoustic_tag_id_alte…¹
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 3171 ADST-Shark 1293314,1293314 acoustic-archival,acou… animal,ani… A69-9006-3820,… ,
2 3172 ADST-Shark 1293321,1293321 acoustic-archival,acou… animal,ani… A69-9006-3835,… ,
3 3173 ADST-Shark 1293315,1293315 acoustic-archival,acou… animal,ani… A69-9006-3822,… ,
4 3174 ADST-Shark 1293316,1293316 acoustic-archival,acou… animal,ani… A69-9006-3825,… ,
5 3175 ADST-Shark 1293322,1293322 acoustic-archival,acou… animal,ani… A69-9006-3836,… ,
6 3176 ADST-Shark 1293317,1293317 acoustic-archival,acou… animal,ani… A69-9006-3826,… ,
7 3177 ADST-Shark 1293318,1293318 acoustic-archival,acou… animal,ani… A69-9006-3828,… ,
8 3178 ADST-Shark 1293319,1293319 acoustic-archival,acou… animal,ani… A69-9006-3831,… ,
9 3179 ADST-Shark 1293320,1293320 acoustic-archival,acou… animal,ani… A69-9006-3832,… ,
10 3180 ADST-Shark 1293293,1293293 acoustic-archival,acou… animal,ani… A69-9006-3778,… ,
# ℹ 70 more rows
# ℹ abbreviated name: ¹acoustic_tag_id_alternative
# ℹ 59 more variables: scientific_name <chr>, common_name <chr>, aphia_id <int>, animal_label <chr>, animal_nickname <chr>,
# tagger <chr>, capture_date_time <dttm>, capture_location <chr>, capture_latitude <dbl>, capture_longitude <dbl>,
# capture_method <chr>, capture_depth <chr>, capture_temperature_change <chr>, release_date_time <dttm>,
# release_location <chr>, release_latitude <dbl>, release_longitude <dbl>, recapture_date_time <dttm>, length1_type <chr>,
# length1 <dbl>, length1_unit <chr>, length2_type <chr>, length2 <dbl>, length2_unit <chr>, length3_type <chr>, …
# ℹ Use `print(n = ...)` to see more rows Solution 1 (don't repeat identical values, animals
# A tibble: 80 × 66
animal_id animal_project_code tag_serial_number tag_type tag_subtype acoustic_tag_id acoustic_tag_id_alte…¹
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 3171 ADST-Shark 1293314 acoustic-archival animal A69-9006-3820,A69-90… NA
2 3172 ADST-Shark 1293321 acoustic-archival animal A69-9006-3835,A69-90… NA
3 3173 ADST-Shark 1293315 acoustic-archival animal A69-9006-3822,A69-90… NA
4 3174 ADST-Shark 1293316 acoustic-archival animal A69-9006-3825,A69-90… NA
5 3175 ADST-Shark 1293322 acoustic-archival animal A69-9006-3836,A69-90… NA
6 3176 ADST-Shark 1293317 acoustic-archival animal A69-9006-3826,A69-90… NA
7 3177 ADST-Shark 1293318 acoustic-archival animal A69-9006-3828,A69-90… NA
8 3178 ADST-Shark 1293319 acoustic-archival animal A69-9006-3831,A69-90… NA
9 3179 ADST-Shark 1293320 acoustic-archival animal A69-9006-3832,A69-90… NA
10 3180 ADST-Shark 1293293 acoustic-archival animal A69-9006-3778,A69-90… NA
# ℹ 70 more rows
# ℹ abbreviated name: ¹acoustic_tag_id_alternative
# ℹ 59 more variables: scientific_name <chr>, common_name <chr>, aphia_id <int>, animal_label <chr>, animal_nickname <chr>,
# tagger <chr>, capture_date_time <dttm>, capture_location <chr>, capture_latitude <dbl>, capture_longitude <dbl>,
# capture_method <chr>, capture_depth <chr>, capture_temperature_change <chr>, release_date_time <dttm>,
# release_location <chr>, release_latitude <dbl>, release_longitude <dbl>, recapture_date_time <dttm>, length1_type <chr>,
# length1 <dbl>, length1_unit <chr>, length2_type <chr>, length2 <dbl>, length2_unit <chr>, length3_type <chr>, …
# ℹ Use `print(n = ...)` to see more rows Solution 1+2 (don't repeat identical values, only retain animals
# A tibble: 80 × 62
animal_id animal_project_code tag_serial_number scientific_name common_name aphia_id animal_label animal_nickname tagger
<int> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
1 3171 ADST-Shark 1293314 Mustelus asterias Starry smo… 105821 7901 NA ,
2 3172 ADST-Shark 1293321 Mustelus asterias Starry smo… 105821 7912 NA ,
3 3173 ADST-Shark 1293315 Mustelus asterias Starry smo… 105821 7913 NA ,
4 3174 ADST-Shark 1293316 Mustelus asterias Starry smo… 105821 7902 NA ,
5 3175 ADST-Shark 1293322 Mustelus asterias Starry smo… 105821 7903 NA ,
6 3176 ADST-Shark 1293317 Mustelus asterias Starry smo… 105821 7904 NA ,
7 3177 ADST-Shark 1293318 Mustelus asterias Starry smo… 105821 7905 NA ,
8 3178 ADST-Shark 1293319 Mustelus asterias Starry smo… 105821 7906 NA ,
9 3179 ADST-Shark 1293320 Mustelus asterias Starry smo… 105821 7907 NA ,
10 3180 ADST-Shark 1293293 Mustelus asterias Starry smo… 105821 7908 NA ,
# ℹ 70 more rows
# ℹ 53 more variables: capture_date_time <dttm>, capture_location <chr>, capture_latitude <dbl>, capture_longitude <dbl>,
# capture_method <chr>, capture_depth <chr>, capture_temperature_change <chr>, release_date_time <dttm>,
# release_location <chr>, release_latitude <dbl>, release_longitude <dbl>, recapture_date_time <dttm>, length1_type <chr>,
# length1 <dbl>, length1_unit <chr>, length2_type <chr>, length2 <dbl>, length2_unit <chr>, length3_type <chr>,
# length3 <dbl>, length3_unit <chr>, length4_type <chr>, length4 <dbl>, length4_unit <chr>, weight <dbl>,
# weight_unit <chr>, age <dbl>, age_unit <chr>, sex <chr>, life_stage <chr>, wild_or_hatchery <chr>, stock <chr>, …
# ℹ Use `print(n = ...)` to see more rows |
I would go for option 1+2 as |
Hi everyone,
In the table retrieved with
etn::get_animals()
, the information of two tags is collapsed in one row if the animal was equipped with a tag containing 2 sensors (resulting in 1 uniquetag_serial_number
with two differentacoustic_tag_id
s), see here. Then, a few columns (tag_serial_number
,tag_type
,tag_subtype
,acoustic_tag_id
andacoustic_tag_id_alternative
) have double entries, separated with commas.Would it be possible to only have the
tag_serial_number
in the output fromget_animals()
? Then, the collapsing could be avoided and the column contents, when e.g. animals are tagged both with one and two sensors, would be consistent.The remaining tag details, if needed, could be joined from the output from
get_tags()
.Just an idea, what do you think?
All the best,
Lotte
The text was updated successfully, but these errors were encountered: