This repository has been archived by the owner on Apr 7, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmetadata_manuscript.Rmd
1448 lines (1099 loc) · 76.5 KB
/
metadata_manuscript.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: 'A metadata approach to evaluate the state of ocean knowledge: Strengths, limitations, and application to Mexico'
author: "Palacios-Abrantes J.1*¶, Cisneros-Montemayor A.M.1¶, Cisneros-Mata M.A.2¶, Rodríguez L.3¶, Arreguín-Sánchez F.4¶, Aguilar V.5&, Domínguez-Sánchez S.6&, Fulton S.7&, López-Sagástegui R.6&, Reyes-Bonilla H.8&, Rivera R.9&, Salas S.10&, Simoes N.11-14& & Cheung, W.W.L.1¶"
csl: plos.csl
output:
word_document:
reference_docx: PlosTemplate.docx
pdf_document:
fig_caption: yes
geometry: margin=1in
editor_options:
chunk_output_type: console
bibliography: Metadata_References.bib
---
^1^Institute for the Oceans and Fisheries, The University of British Columbia, Vancouver, Canada. ^2^ Instituto Nacional de Pesca y Acuacultura, Guaymas, Sonora, México. ^3^ IEnvironmental Defense Fund de México, La Paz, México. ^4^ Instituto Politécnico Nacional, Centro Interdisciplinario de Ciencias Marinas, La Paz, México. ^5^ Comisión Nacional para el Conocimiento y Uso de la Biodiversidad, Ciudad de México, México. ^6^ University of California, San Diego, Scripps Institution of Oceanography, La Jolla, CA, USA. ^7^ Comunidad y Biodiversidad, Cancún, México. ^8^ Universidad Autónoma de Baja California Sur, La Paz, México. ^9^ SmartFish Rescate de Valor, A.C., La Paz, México. ^10^ Instituto Politécnico Nacional, Centro de Investigación y Estudios Avanzados del Instituto Politécnico Nacional, México. ^11^ Universidad Marista de Mérida. ^12^ Universidad Nacional Autónoma de México, Unidad Multidisciplinaria de Docencia e Investigación – Sisal, México. ^13^ Laboratorio Nacional de Resiliencia Costera, Laboratorios Nacionales, Ciudad de México, México. ^14^ Texas A&M University, Corpus Christi, Texas, USA.
^*^ Corresponding author: [email protected]
^¶^ These authors contributed equally to this work
^&^ These authors contributed equally to this work
```{r setup, eval=T, echo=F, warning=F,message=F, results='hide'}
#### READ ME !!! ####
# Run this chunk before knit so you make sure you have all pkgs installed in R
# bibliography: Metadata_Reference.bib
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
#### Library ####
packages <- c(
"data.table", # data wrangling
"dplyr", # data wrangling
"tidyr", # data wrangling
"ggplot2", # plotting figures
"cowplot", # accomodating figures
"ggpubr", # accomodating figures
"wesanderson", # colors pallet
"ggrepel", # Pretty plot texts
"gridExtra", # accomodating figures
"networkD3", # For flow figure
"sf", # For mexico's map
"tools", # For mexico's map
"taxize" # for getting species names
)
ipak(packages)
```
```{r Libraries and Data, eval=T, echo=F, warning=F,message=F}
#--------------------------#
# Data needed ####
#--------------------------#
# Metadata Template (Results)
# Version December 2018
Meta_Template <- suppressMessages(fread("~/Dropbox/Metadata_Mexico/English/Templates/Template_7.csv",
colClasses = c(Location = 'character',
Notes = 'character'
)
)
)
# Monitoreo Nooroeste Cleaned and processed Template
### Note ##
# This Template is not included in the final Metadata Template but accounted for in the analysis.
Monitoreo_T <- read.csv("~/Dropbox/Metadata_Mexico/Manuscript/Data/Monitoreo_Template.csv")
# Merging both #
Template <- Meta_Template %>%
bind_rows(Monitoreo_T)
# For effort map (Methodology) #
Congresos <- fread("Data/Lugares.csv")
# Metadata Key (Annex) #
Key <- fread("Data/Metadata_Key.csv")
#--------------------------#
# Functions needed ####
#--------------------------#
# For plotting time plot (Fig 3)
source('Functions/ts_fun.R')
# For plots standardization
ggtheme_plot <- function() {
theme(
plot.title = element_text(size = rel(1), hjust = 0, face = "bold"),
panel.background = element_blank(),
strip.background = element_blank(),
panel.border = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_text(size = 18,
angle = 0,
face = "plain"),
axis.text.y = element_text(size = 20),
axis.title = element_text(size = 20),
legend.key = element_rect(colour = NA, fill = NA),
legend.position = "top",
legend.title = element_text(size = 22),
legend.text = element_text(size = 16),
strip.text.x = element_text(size = 24, colour = "darkgrey")
)
}
# For map standarization
ggtheme_map <- function(base_size = 9, Region = "NA") {
theme(text = element_text(
color = "gray30", size = base_size),
plot.title = element_text(size = rel(1.25), hjust = 0, face = "bold"),
panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_line(color = "transparent"),
strip.background =element_rect(fill = "transparent"),
strip.text.x = element_text(size = 18, colour = "black",face= "bold.italic", angle = 0),
axis.line = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
legend.position = "bottom"
)
}
```
# Abstract
Climate change, mismanaged resource extraction, and pollution are reshaping global marine ecosystems with direct consequences on human societies. Sustainable ocean development requires knowledge and data across disciplines, scales and knowledge types. Although several disciplines are generating large amounts of data on marine socio-ecological systems, such information is often underutilized due to fragmentation across institutions or stakeholders, limited standardization across scale, time or disciplines, and the fact that information is often not searchable within existing databases. Compiling metadata, the information which describes existing sets of data, is an effective tool that can address these challenges, particularly when metadata corresponding to multiple datasets can be combined to integrate, organize and classify multidisciplinary data. Here, using Mexico as a case study, we describe the compilation and analysis of a metadatabase of ocean knowledge that aims to improve access to information, facilitate multidisciplinary data sharing and integration, and foster collaboration among stakeholders. We also evaluate the knowledge trends and gaps for informing ocean management. Analysis of the metadatabase highlights that past and current research in Mexico focuses strongly on ecology and fisheries, with biological data more consistent over time and space compared to data on human dimensions. Regional imbalances in available information were also evident, with most available information corresponding to the Gulf of California, Campeche Bank and Caribbean and less available for the central and south Pacific and the western Gulf of Mexico. Despite existing knowledge gaps in Mexico and elsewhere, we argue that systematic efforts such as this can often reveal an abundance of information for decision-makers to develop policies that meet key commitments on ocean sustainability. Surmounting current cross-scale social and ecological challenges for sustainability requires transdisciplinary approaches. Metadatabases are critical tools to make efficient use of existing data, highlight and address strengths and deficiencies, and develop scenarios to inform policies for managing complex marine social-ecological systems.
# Introduction
The ocean contributes to human wellbeing by providing a diversity of goods and services such as food, energy, transport, among others as well as a source of cultural and recreational values to people [@Gattuso:2015jz; @Costello:2016kp]. However, drivers from human activities, including climate change, excessive extraction of marine resources, and pollution are impacting global marine biodiversity and ecosystem services [@Poloczanska:2013kj; @Weatherdon:2016ws; @Halpern:2008tu; @Portner:2014wm] and causing undesired social and economic outcomes [@Singh:2017ds]. Mitigating and managing these human drivers, and achieving sustainable ocean development, requires data from different disciplines, that spans longest time ranges possible, and that covers different geographic scales. Only with this diverse and complementary knowledge can policymakers evaluate status and trends, and set clear targets, for effective policy design and implementation [@IPBES:2016uq]. ]. Adopting a multidisciplinary approach has been recently recognized in partnerships aiming to achieve the cross-disciplinary United Nations (UN) Sustainable Development Goals [@UnitedNations2018]. Yet despite a call for global shift towards open science and the benefits imbedded [@@Michener:2006ib], data identification, access, and sharing continue to be a challenge throughout the world [@Tai:2018fj].
Metadata is important in the harmonization of existing data across scales, disciplines and domains. Metadata refers to the information required to understand the data such as the data type, content, source, quality, format, structure, and accessibility [@Michener:1997vb; @Michener:2006ib]. Metadata repositories (and their development itself) can assist in addressing the challenges of data sharing, by improving data access, fostering collaboration among stakeholders, and facilitating subsequent analyses and data refinement [@CisnerosMontemayor:2016jn; @CisnerosMontemayor:2017eq]. Various research fields related to socio-ecological marine systems have generated large amounts of data. However, such information is often underutilized because it is scattered and held by different institutions or stakeholders, not standardized, and either not readily found nor widely accessible [@Portner:2014wm; @IPBES:2016uq; @Sagarminaga:2017vf]. Metadata is particularly useful for developing nations with limited research capacity [@Tai:2018fj] and where data exist but are perceived to be limited or unavailable [@OECD:2016eq].
Country level repositories for marine systems including metadata have been created, with examples including Australia [@Hoenner:2018ki], Canada [@CisnerosMontemayor:2016jn], and the Canary Islands in Spain [@REDMIC:tg]. The Integrated Marine Observing System (IMOS) is an Australian national collaborative research project that includes a metadatabase allowing users to see dynamic graphs, enter metadata, and access data [@Hoenner:2018ki]. Such database resulted in hundreds of peer review publications, book chapters and reports [@IMOS:cldB6FWT]. In Canada [@CisnerosMontemayor:2016jn], a metadata repository was created with the objective of identifying thematic and information gaps in marine research for the Arctic, Pacific, and Atlantic regions, and was subsequently used to evaluate national policy progress towards the Convention on Biological Diversity - Aichi Targets (CBD) [@CisnerosMontemayor:2017eq]. The Integrated Marine Data Repository of the Canary Islands (REDMIC) includes data, metadata, research documents, maps, and interactive graphs related to the marine environment, which have supported regional decision making and research [@REDMIC:tg]. All of these initiatives aim to increase data access, support metadata research, and improve science-based decision making related to marine environmental policies.
In this study, we develop a framework for interdisciplinary metadatabase of marine systems, with the aims of assessing existing research and information status and trends to support decision making for sustainable ocean development. We applied this framework to Mexico as an example of a developing nation with extensive marine and coastal areas [@Sagarminaga:2017vf]. As in other parts of the world, multiple academic (e.g. research institutions [@PortaldeDatosAbie:2018ui]), government [@INEGI:8XWgQ3Xx], civil society organizations (CSO) [@COBI:2018to], and private organizations and institutions generate and host a wealth of data from multiple research fields. However, information on these data—and the data itself—is not always visible, accessible, or searchable in a standardized format, so that individuals working in specific fields may be unaware of past or current related research. Further, the full scope of research - both temporally and spatially - is not easily available to policymakers. These limitations can be addressed through a dedicated effort centered around building and maintaining a metadata repository.
This study describes the processes of metadatabase design, compilation, and methods to link and harmonize datasets from different scales and domains; we then offer examples of metadata-based analyses of historical, regional, and thematic trends. Creating and maintaining an open-source metadata repository can facilitate interpretation of information through public consultation and data sharing. Metadata analyses are critical to help identify data gaps and promote networking and collaboration among a wide array of individuals, institutions and organizations.
# Materials and methods
To develop a metadatabase of ocean research in Mexico (hereafter referred to as the MDB) we framed a four-stage process: (1) development of the MDB structure; (2) identification, outreach and compilation of available repositories and datasets; (3) development of protocols for metadata inclusion and sharing [@CisnerosMontemayor:2016jn]; (4) publication of the MDB in an accessible, open source and long-term stable platform with a partner institution (The National Commission for Biodiveristy, CONABIO [@CONABIO:Nx5xZZHT]). We then provided examples of meta-analyses for identification of information trends and gaps. The final MDB can be found at https://www.infoceanos.conabio.gob.mx.
## Metadata structure
There are five hierarchical levels to the MDB structure: Metadatabase > Repository > Dataset > Record > Data point (Fig 1). The metadatabase includes the metadata of datasets, while repositories are structures that compile multiple datasets. Repositories can exist as web-based data sources (e.g. Ocean Biogeographic Information System (OBIS) [@OBIS:av1nRut2]), thematic reports that contain data (e.g. Mexican Official Catch Statistics [@SAGARPACONAPESCA:2013wa]), or as institutional, laboratory or research project encompassing multiple datasets (e.g. the species catalogue of the National University’s Institute for Marine Science and Limnology, UNAM-ICMyL [@UNAMUNINMAR:TlMZFU99]). Metadata records are individual entries that describe each dataset within a repository (e.g. ‘clam landings in region A’, or ‘clam landings in region B’; Fig 1). Metadata records contain descriptions of existing data, but not the data themselves; in marine metadatabases these descriptions may include information about fisheries landings, species distributions, or fuel cost of fishing. A data point is a single item of information within a record. For example, a metadata record of annual fish (species specific) population abundance data from 2000 to 2003 includes four (yearly average) data points of estimated abundance data. Records are scale-specific spatially; for example, fisheries catch can be recorded by regional level or country level.
**Fig 1. A schematic diagram of the metadata compilation process.** From the original repository, three different datasets are represented: the first dataset contains one topic: “landings”, the second contains two topics: “landings” and “revenue”, and the third contains three topics: “landings, “aquaculture”, and “totals”. In addition, each dataset has multiple spatial components. The last column shows how the records would appear in the metadatabase.
## Metadata categories
Standardization of information within a metadatabase structure provides guidance for consistent description of new data subjects (e.g. abalone, clam, tuna) and types (e.g. methods, units of measurement, and details of experimental design) [@Michener:1997vb; @Reichman:2011kv; @Hoenner:2018ki]. Here, we assigned metadata fields (information categories) to maximize flexibility to accommodate multi-disciplinary data and allowing for various meta-analyses. Initially, the structure was adapted from a previous metadatabase developed for Canadian oceans [@CisnerosMontemayor:2016jn], with subsequent modifications (mainly to ensure compatibility of geographical and species nomenclature with existing frameworks in Mexico) following suggestions in meetings with ocean experts as described in the following section on metadata collection. The key difference between the structure of the MDB and the previous effort for Canada is that the metadata records in the latter represent a particular repository of information (e.g. a report or a database), with a metadata field indicating the number of unique time series within the record. In the MDB, each time series is a unique metadata record and a field notes its corresponding repository. While this structure requires somewhat more effort to input each time series individually, the resulting metadatabase is easier to analyze and allows for more specific information to be added to each record if necessary. The final MDB structure includes 29 categories ranging from general information (e.g. region or subject) to specific metadata including number of data points in the dataset and corresponding research fields (S1 Table).
## Metadata collection
Compilation of metadata began with a review of public online repositories including OBIS [@OceanicBiogeographi:2018uc] and the UN’s Food and Agriculture Organization (UN-FAO) fisheries statistics [@FishStatJsoftware:2016uf], followed by federal government catalogues such as the Mexico’s Fisheries and Aquaculture Yearbook [@SAGARPACONAPESCA:2013wa], and datasets produced and hosted by universities and CSOs working with the marine environment. Using the first MDB developed with public data as a platform for discussion, we held a series of 20 workshops (~30 people each) with research groups (including universities, government researchers and CSO) in eight cities throughout Mexico regions (Fig 2). This was followed-up by in-person and virtual meetings, as well as presentations at national and international conferences to highlight progress and encourage others to contribute and collaborate (S2 Table). We additionally meet with four Mexican federal governmental institutions (CONACyT- National Council of Science and Technology [@CONACyT:2018wt], INAPESCA-National Institute of Fisheries and Aquaculture [@INAPESCA:2017vj], INECC-Ecology and Climate Change Institute [@INECC:n21eS6Cb], and CONABIO [@CONABIO:Nx5xZZHT]), and well-established data repository initiatives (e.g. dataMares [@dataMaresWorkPubl:MRU2oArL], FMCN-Monitoreo Noroeste [@FMCN:fXq_s_Z4]) to include their data in the metadatabase. While this represents an important first effort, it does not comprises all the potential data sources in Mexico highlighting the importance of continuing the current effort.
```{r Effort_map, echo=F, message =F, warning=F, eval=F, fig.cap="Fig 2. Data collection effort. Location of the places where data were collected. CSO= Organizations of the Civic Society."}
#--------------------------#
# Load data
#--------------------------#
# Lugares.csv can be found as a supplemental material of the paper (Taable 2S)
Congresos <- fread("Data/Lugares.csv") %>%
filter(Event != "AFS", # Did not happen
Type != "Other") # Remove other events from dataset
# Shapefile from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/
path.ne.coast <- "./Data/ne_50m_admin_0_countries"
file_name <- "ne_50m_admin_0_countries.shp"
# Read shapefile
data_coast <- st_read(dsn = path.ne.coast,
layer = file_path_sans_ext(file_name)
)
# data_coast$NAME_ES
# names(data_coast)
# head(data_coast)
# Filter countries to inlcude in figure
Countries <- c("Estados Unidos", "Belice","Guatemala","El Salvador","Costa Rica","Panamá","Honduras","Nicaragua")
# Create two datasets for different fill colors
Mex <- filter(data_coast, NAME_ES == "México")
Central_A <- filter(data_coast, NAME_ES %in% Countries)
#--------------------------#
# Plot Effort map ####
#--------------------------#
ggplot() +
geom_sf(data = Mex, fill ="grey90", colour = "black") +
geom_sf(data = Central_A, fill ="grey80", colour = "black") +
coord_sf(ylim = c(32,7),
xlim = c(-120,-75)
) +
# Points of effort
geom_point(data = Congresos,
aes(
x= Long,
y = Lat,
colour = Type,
shape = Type
),
size = 5
) +
# Getting the text and locations for GoM #
geom_text_repel(data = subset(Congresos, Long > -98),
aes(
x= Long,
y = Lat,
label = Event,
color = Type
),
show.legend = FALSE, # Don't display "a" in legend
size = 5, # Tamaño de texto
point.padding = 0.2, #Distancia de la línea al punto
box.padding = 0.5,
force = 1, # Overlapping labels
segment.alpha = 0.5,
nudge_x = 3 -subset(Congresos, Long > -98)$Long,
direction = "y",
hjust = 0.5,
) +
# Getting the text and locations for the GoC #
geom_text_repel(data = subset(Congresos, Long < -100),
aes(
x= Long,
y = Lat,
label = Event,
color = Type
),
show.legend = FALSE, # Don't display "a" in legend
size = 5, # Tamaño de texto
point.padding = 1, #Distancia de la línea al punto
box.padding = .2,
force = .5, # Overlapping labels
segment.alpha = 0.5,
nudge_x = subset(Congresos, Long < -100)$Long,
direction = "y",
hjust = 1
) +
# Getting the text and locations for DF #
geom_text_repel(data = subset(Congresos, Location == "DF"),
aes(
x= Long,
y = Lat,
label = Event,
color = Type
),
show.legend = FALSE, # Don't display "a" in legend
size = 5, # Tamaño de texto
force = 3, # Overlapping labels
segment.alpha = 0.5
) +
scale_colour_manual(values = c("#3B9AB2", "#EBCC2A", "#F21A00", "#E1AF00","black")) +
annotate("text",
label= "Mexico",
x = -102,
y = 25,
size = 6,
colour = "black"
) +
theme_classic() +
theme(
panel.grid.major = element_line(color = "transparent"),
strip.background =element_rect(fill = "transparent"),
axis.line = element_blank(),
axis.ticks=element_blank(),
axis.text = element_blank(),
legend.position = "top",
legend.text = element_text(size = 20),
legend.title = element_text(size = 20)
) +
labs(x = "",
y = "")
# Save plot in tiff for plos (.png for Github due to size limits)
ggsave("Fig2.tiff",
plot = last_plot(),
width = 11,
height = 9,
units = "in",
path = "./Figures/")
```
```{r Repository_Exploration, eval = T, echo = F}
#### Main repositories ####
Repositories <- Template %>%
filter(Compilation_Title != "NA") %>%
group_by(Compilation_Title) %>%
summarise(n=n()) %>%
arrange(desc(n)) %>%
mutate(Percentage = round((n/nrow(Template))*100))
# Just to explore... by data points
Repositories_DP <- Template %>%
filter(Compilation_Title != "NA") %>%
group_by(Compilation_Title) %>%
summarise(n=sum(Data_Time_Points, na.rm=T)) %>%
arrange(desc(n)) %>%
mutate(Percentage = round((n/sum(Template$Data_Time_Points, na.rm=T))*100))
#### N Institution_Type ####
Institutions <- Template %>%
filter(Institution_Type != "NA") %>%
group_by(Institution_Type) %>%
summarise(n= length(unique(Institution)),
Cuales = paste(unique(Author),
collapse = "; ")) %>%
arrange(desc(n))
# Main three
## Datamares
DataMares <- Template %>%
filter(Compilation_Title != "NA") %>%
group_by(Compilation_Title) %>%
summarise(n=n()) %>%
arrange(desc(n)) %>%
mutate(Percentage = round((n/nrow(Template))*100))
## Obis
OBIS <- Template %>%
filter(Institution == "OBIS") %>%
group_by(Compilation_Title) %>%
summarise(n=n())
OBIS_Spp <- Template %>%
filter(Institution == "OBIS") %>%
group_by(Subject_name) %>%
summarise(n=n())
# Datos MX
DatosMX <- Template %>%
filter(Compilation_Title == "Datos Abiertos Mx") %>%
group_by(Dataset_Title,
Institution
) %>%
summarise(
n=n()
) %>%
arrange(desc(n))
```
**Fig 2. Locations where metadata workshops were held and contributing institutions.** Abbreviations in S4 Table. Map reprinted from Natural Earth (naturalearthdata.com)
## Types of data sources
We included all available data sources in the MDB. Firstly, we attempted to include all available data related to Mexican ocean that were publicly available through the internet. These include data from academic, environmental CSO, governmental, international, and private (e.g. industry or personal non-academic) institute and organizations. Another source was unpublished data that were directly kept and maintained by stakeholders and/or institutions. The followings summarize some of the institutions that contributed data to the MDB, with a full list of contributing institutions in Table S3.
### a. Academia
```{r ACA_Source, eval=T, echo=F, warning=F, message=F}
#--------------------------#
# Academic repositories ####
#--------------------------#
Aca_Repositories <- Template %>%
filter(
Institution_Type == "ACA",
Compilation_Title != "NA") %>%
group_by(Institution) %>%
summarise(n=length(unique(Dataset_Title))) %>%
arrange(desc(n))
Aca_Repo <- length(unique(Aca_Repositories$Institution))
Aca_Dataset <- max(Aca_Repositories$n)
### Exploring the top institutions
UNIATMOS <- Template %>%
filter(Institution == "UNAM-UNIATMOS") %>%
group_by(Dataset_Title,
Compilation_Title) %>%
summarise(n())
UNAM_UAY <- Template %>%
filter(Institution == "UNAM-UAY") %>%
group_by(Dataset_Title,
Compilation_Title) %>%
summarise(n())
CINVESTAV <- Template %>%
filter(Institution == "CINVESTAV-Merida") %>%
group_by(Dataset_Title,
Compilation_Title) %>%
summarise(n())
```
Academic data sources include any database hosted by a public or private academic institution in Mexico. Sources with comparatively large available data include the Digital Climatic Atlas of Mexico hosted by the National University (UNAM) [@AtlasClimaticoDigi:2018vh] which has an extensive open-access compilation of datasets on physicochemical parameters used in, among other uses, climate change models. The UNAM’s academic unit in Sisal, Yucatán (UNAM-UAY) provided information on topics including oceanographic, ecological, fisheries, biological, and tourism data [@UNAMUAY:2NhrxmaQ]. Finally, The Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV-IPN) holds extensive information on fisheries and tourism, mainly in the Yucatan peninsula [@CINVESTAVIPN:eLD1W_RK].
### b. Governmental institutes
```{r GOV_Sources, eval=T, echo=F}
#--------------------------#
# Gov repositories ####
#--------------------------#
Gov <- Template %>%
filter(Institution_Type == "GOV") %>%
# filter(Compilation_Title == "Datos Abiertos Mx") %>%
group_by(Institution,
Compilation_Title) %>%
summarise(n=length(unique(Dataset_Title)),
web = paste(unique(Reference),
collapse = " ")) %>%
arrange(desc(n))
Gov_Repo <- length(unique(Gov$Institution))
# Datos abiertos Mx
Datos_Abiertos <- Template %>%
# filter(Institution_Type == "GOV") %>%
filter(Compilation_Title == "Datos Abiertos Mx") %>%
group_by(Institution) %>%
summarise(n=length(unique(Dataset_Title))) %>%
arrange(desc(n))
```
Through a 2015 Mexican decree that establishes regulations for open data, the Mexican federal government made an unprecedented effort to host and make available thousands of public datasets through a national Open Data Portal [@DOF:2015vv; @DatosAbiertos:2017vn]. While the site does not comprise all information generated through decades of public programs, it represents a source of more than 500 datasets related to corruption, economic development, public services, climate change and human rights [@DatosAbiertos:2017vn]. These types of data, although not uniquely related to marine ecosystems, are nonetheless important in considering many aspects of socio-ecological interactions that do indeed matter for ocean policy design [@IPBES:2016uq]. In addition to what can be found in the portal, governmental agencies also have data on their institutional web sites. Among the largest repositories in the metadata set are the Secretariat of Economy [@SistemaNacionalde:2017wf], the fisheries commission CONAPESCA [@SAGARPACONAPESCA:2013wa], and CONABIO [@CONABIO:2017uq]. All data from these and other institutions featured in the metadatabase are public and immediately available at the moment of consultation through reports, internet portals, and yearbooks.
### c. Civil Society Organizations (CSO's)
```{r NGO_Sources, eval=T, echo=F}
#--------------------------#
# NGO repositories ####
#--------------------------#
CSO <- Template %>%
filter(Institution_Type == "NGO") %>%
group_by(Institution) %>%
summarise(n())
CSO_Repo <- length(unique(CSO$Institution))
CSO_Monitoreo <- Monitoreo_T %>%
filter(Institution_Type == "NGO") %>%
group_by(Institution,
Author
) %>%
summarise(n())
```
CSOs are sources of information that include fisheries, conservation, oceanography and sociological data. Comunidad y Biodiversidad, A.C (COBI) contributed the largest CSO repository in the metadatabase. This CSO aims to preserve marine ecosystems that are deteriorating due to unsustainable exploitation of natural resources and has extensive monitoring programs dating back over two decades [@COBI:2018to]. FMCN-Monitoreo Noroeste project is the second largest source of metadata from CSOs in the MDB and is itself a repository for monitoring data (~1,000 datasets) including efforts from 20 CSOs [@FMCN:fXq_s_Z4].
### d. International academic sources
```{r Int_Sources, eval=T, echo=F}
#--------------------------#
# International repositories ####
#--------------------------#
# Standarized to "Int"
List <- c(
"Int",
"IGO",
"INT",
"Igo"
)
International <- Template %>%
filter(Institution_Type %in% List) %>%
group_by(Institution) %>%
summarise(n())
Inter_Repo <- length(unique(International$Institution))
# Top repositories
Units <- Template %>%
filter(Institution == "UBC") %>%
group_by(Author) %>%
summarise(n())
Fishbase <- Template %>%
filter(Institution == "FishBase Consortium") %>%
group_by(Subject_name) %>%
summarise(n())
```
International research groups hold a variety of data for Mexico specifically at the global scale. dataMares and OBIS are the main international repositories available in the MDB. dataMares is an open source platform based at the University of California, San Diego, that hosts and facilitate access to robust scientific data related to Mexican coasts [@dataMaresWorkPubl:MRU2oArL]. OBIS is a global open-access data and information repository on marine biodiversity [@OBIS:av1nRut2]. In addition, the Arizona-Sonora Desert Museum has an extensive checklist of invertebrates of the Gulf of California, the University of British Columbia through the Changing Ocean Research Unit [@CORU:Auy8sh-X] and Fisheries Economic Research Unit [@FERU:V1kclIoK], holds more than three thousand records on fisheries economics, model projections on climate change and the associated changes in biodiversity and fisheries catches. Lastly, FishBase [@FishBase:2018wx] and SeaLifeBase [@SeaLife:kJFsMA4], online databases of marine life, provide life history data, trophic ecology, and other issues for more than two thousand species occurring in Mexico.
## Metadatabase analysis
The MDB analysis was performed using the statistical software R-Studio (R) Version 1.1.463 with the packages data.table [@Packagedatatable:2019uh] and tidyverse [@PackagetidyverseE:2017vq]. We compared different metadata categories by number and percentage of records available by research field. Analyses include spatial and temporal distribution of the metadata collected, the amount of metadata collected by taxa, research field, and type of data source, as well as the socio-ecological relationship of the metadata. All figures were produced using the R packages ggplot2 [@PackageggplotCre:2018uv], cowplot [@CowplotStreamlined:2019wt], ggpubr [@Packageggpubrggp:2018tv], ggrepel [@PackageggrepelAut:2018to], gridExtra [@PackagegridExtraM:2017wx] and wesanderson [@Pckagewesanderson:2018ug].
For the spatial component we used the packages ggplot2 [@PackageggplotCre:2018uv] and sf [@PackagesfSimpleF:2018vp], and Mexico’s shapefile was made with Natural Earth data (http://naturalearthdata.com). Although other spatial divisions exist for Mexico (e.g. CONABIO identifies five marine ecoregions, CONAPESCA identifies six fishing regions), we had to standardize the spatial division in order to include multidisciplinary data (Fig 2). In addition, “Subject names” such as “shrimp”, “shrimps”, “shrimp without head” were standardized as “Shrimp”, and scientific names were updated and corrected for typos with the package taxize [@Chamberlain:GvjOpci4].
To identify thematic trends, we counted the number of records in the metadatabase, as well as the amount of data points (years of data) available in each record for the years of collection. All metadata was categorized based on their socio-ecological interaction using the DPSIR (Drivers, Pressures, State, Impacts, and Response) framework [@OECD:1993ui]. Accordingly, *Benefits* represent social benefits from natural systems (e.g. fisheries landings), *Pressure* (which we here equate with *Drivers*) represents any pressure from human activities to nature (e.g. fishing effort), *Response* considers actions that reduce pressure on natural systems (e.g. limiting fishing effort), finally *State* refers to the status of natural systems (e.g. stock assessments). We used the package networkD3 [@PackagenetworkDD:2017wd] to analyze the relation between records, institutions, research topics and DPSIR. Finally, we ran Chi-Square Test of statistical difference [@PackagestatsTheR:izARar7B] in the number of records between each variable to describe significant differences.
It is possible that some records include duplicated datasets. We used R to automatize the identification of redundant sources of information (e.g. institutions with the same database). In addition, when possible, we asked data owners and repository curators if a database was already published in another repository. However, given the size of the metadatabase and extensive efforts to identify duplicated records, we do not expect this to be a significant issue. Records representing the same dataset (e.g. CONAPESCA catches and dataMares catches) but with different levels of processing (e.g. cleaned-up data or different years) were kept as separate records in the MDB.
# Results
```{r Results, eval=T, echo=F}
#--------------------------#
# Number of repositories and Institutions ####
#--------------------------#
Repo <- length(unique(Template$Compilation_Title))
Inst <- length(unique(Template$Institution))
Datasets <- length(unique(Template$Dataset_Title))
#--------------------------#
# Disciplines (Sociology. eg.) ####
#--------------------------#
Total_records <- nrow(Template)
Research_Field <- Template %>%
group_by(Research_Field) %>%
summarise(
Records =n(),
DP=sum(Data_Time_Points,
na.rm =T)) %>%
mutate(
Rate_Log = log10(DP/Records),
Rate = round(DP/Records,2),
Record_Per = round((Records/Total_records)*100)
) %>%
arrange(desc(Records))
#--------------------------#
### Chi-square on research field (Type) ##
#--------------------------#
# H0: The likelihoods of having the same number of records per resereach source are equal.
# H1: The likelihoods of having the same number of records per resereach source are NOT equal.
Research_Chi <- chisq.test(Research_Field$Records)
# Research_Chi
# Reject null hypothesis (p < 0.001)
# Data Exploration
# First Place
First_Place <- Research_Field %>%
arrange(desc(Record_Per)) %>%
slice(1)
# For text #
Main_RF <- First_Place$Research_Field
Main_RF_n <- round(First_Place$Records/1000)
Main_RF_Per <- First_Place$Record_Per
# Second Place
Second_Place <- Research_Field %>%
arrange(desc(Record_Per)) %>%
slice(2)
# For text #
Second_RF <- Second_Place$Research_Field
Second_RF_Per <- Second_Place$Record_Per
Second_RF_n <- round(Second_Place$Records/1000)
# Third Place
Third_Place <- Research_Field %>%
arrange(desc(Record_Per)) %>%
slice(3)
# For text #
Third_RF <- Third_Place$Research_Field
Third_RF_Per <- Third_Place$Record_Per
Third_RF_n <- round(Third_Place$Records/1000)
###____ End of paragrpah _____ ###
#--------------------------#
### Chi-square on type of sources ####
#--------------------------#
Sources <- Template %>%
filter(!is.na(Institution_Type),
Institution_Type != "Unknown",
Institution_Type != "") %>%
group_by(Institution_Type) %>%
summarise(Records = n()) %>%
mutate(Percentage = round((Records/nrow(Template)*100),2)) %>%
arrange(desc(Percentage))
# National_Sources <- sum(Sources$Percentage[2:5])
# H0: The likelihoods of having the same number of records per resereach source are equal.
# H1: The likelihoods of having the same number of records per resereach source are NOT equal.
# Source_Chi <- chisq.test(Sources$Records)
# Source_Chi
# Reject null hypothesis (p < 0.001)
```
As of October of 2018, the metadatabase of marine research in Mexico currently includes `r Total_records` records, from `r nrow(Datasets)` datasets contained in `r Repo` repositories held by academic (n = `r Aca_Repo`), governmental agencies (n = `r Gov_Repo`), inter-governmental (n = 2), CSO (n = `r CSO_Repo`), and international data sources (n = `r Inter_Repo`). Records are not equally distributed across research fields ($X^2$ = 337060, d.f. = 10, *p* < 0.001), with `r Main_RF` comprising `r Main_RF_Per`% of all records, followed by `r Second_RF` with `r Second_RF_Per`% (Fig 3).
**Fig 3. Number of records per research field.** A: Thousands of Records. B: Data points per records. Category Other in A represents all of the color-matching categories in B. Category Other in B represents mainly shipping.
```{r Bar_Plot, eval=F, echo=F, fig.align="center", fig.height=6, fig.width=12, fig.cap="Fig 3. Number of records per research field. A: Thousands of Records. B: Data points per records. Category Other in A represents all of the color-matching categories in B. Category Other in B represents mainly shipping."}
# Group small categories into "others"
Other <- c("Oceanography",
"Other",
"Sociology",
"Tourism",
"Turism",
"Aquaculture")
#--------------------------#
# Plot figure 3 ####
#--------------------------#
# Plot left, per record
P1 <- Research_Field %>%
filter(Research_Field != "",
!is.na(Research_Field)
) %>%
mutate("Research_Topic" =ifelse(Research_Field %in% Other, "Other",Research_Field)) %>%
ggplot(.,
aes(
x=reorder(Research_Topic,
-Records),
y=Records/1000, #Thousands
fill=Research_Topic
)) +
geom_bar(stat="identity")+
scale_fill_manual(values = c("#3B9AB2", # Conservation
"#78B7C5", # Ecology
"#EBCC2A", # Fisheries
"#F21A00") # Other
) +
coord_flip()+
theme_classic() +
ylab("Thousands Records")+
xlab("Research Field") +
ggtheme_plot() +
theme(legend.position = "none")
# Plot right, dp per record
P2 <- Research_Field %>%
filter(!is.na(Research_Field)) %>%
ggplot(.,
aes(
x=reorder(Research_Field,
-Rate),
y=Rate,
fill=Research_Field
)) +
geom_bar(stat="identity")+
#coord_flip()+
theme_classic() +
ylab("Data Points per Record")+
xlab("")+
coord_flip()+
scale_fill_manual(values = c(
"#F21A00", # Aquaculture - Other
"#3B9AB2", # Conservation
"#78B7C5", # Ecology
"#EBCC2A",# Fisheries
"#F21A00", # Oceanography - Other
"#F21A00", # Other - Other (Mainly shipping)
"#F21A00", # Sociology - Other
"#F21A00"# Tourism - Other
)
) +
ggtheme_plot() +
theme(legend.position = "none")
# Transform plot to grob with the cowplot package
gt <- arrangeGrob(P1,
P2,
ncol = 2
)
as_ggplot(gt) + # transform to a ggplot
draw_plot_label(label = c("A", "B"),
size = 25,
x = c(0, 0.5),
y = c(1, 1)
)
ggsave("Fig3.tiff",
plot = last_plot(),
width = 12,
height = 6,
units = "in",
path = "./Figures")
```
```{r dataMares, eval=F, echo = F}
# Exploration of Data Mares
dataMares <-Template %>%
filter(Compilation_Title == "dataMares") %>%
group_by(Dataset_Title) %>%
summarise(n())
dataMares <-Template %>%
filter(
Compilation_Title == "Datamares",
Location == "Santa Clara"
)
# head(dataMares)
# names(dataMares)
# unique(dataMares$Compilation_Title)
# View(dataMares)
```
International sources (e.g. Global Biodiversity Information Facility-GBIF; dataMares) contributed the highest number of records for Mexico (49%), though these include data collected by Mexican researchers, in Mexican institutions, or funded by the Mexican government [@Alonso:hv; @Fuentes:2017vr]. In general, metadata records are dominated by academic sources (across multiple topics) and government sources (mainly “Fisheries”) sources. While data sources varied among types of institutions, dataMares (52 datasets mostly on “Fisheries” representing more than 22,000 metadata records), Datos Abiertos Mx (90 datasets from nine different government agencies), and OBIS (19,000 records for more than 13,000 species) represent 46% of all records. Only 20 datasets are classified as private within the metadata (“Dataset Available” category), suggesting that virtually all data here analyzed are open access and available for consultation, and authors likely open for collaborations.
Analyzing metadata collection years shed light on historical research trends as reflected in available data (Fig 4). The first metadata records dated back to data collected in 1791 (plankton records), and data on ecology were historically well represented with several collection events through time. Most fishery records begin in the early 1950s, expanding later as local research increased, with a remarkable increase in records on conservation topics around the first decade of the 21st century. Our analysis also shows a downward trend in total records starting around 2010 and an abrupt drop around 2015 (Fig 4). We believe this trend from 2015 to date are probably due to the delay in gathering and preparing information before it is made available.
**Fig 4. Yearly metadata records by major research category.** Results shown from year 1950 onward. See Fig 1-B for categories included within "Other".
```{r time_plot, eval=F, echo=F,message=F, warning=F, fig.cap="Metadata records (thousands of records) included in a given year per research topic (results shown from year 1950 onward). ‘Other’ includes aquaculture, oceanography, tourism and sociology."}
# GLobal variables
YInicio <- 1700
YFin <- 2017
# Plot with Only Conservation, Ecology and Fisheries ####
C <- ts_subset(Template,YInicio,YFin,"Conservation")
E <- ts_subset(Template,YInicio,YFin,"Ecology")
Fi <- ts_subset(Template,YInicio,YFin,"Fisheries")
Ot <- ts_subset(Template,YInicio,YFin,c("Oceanography",
"Other",
"Sociology",
"Tourism",
"Turism",
"Aquaculture")
)
# To set the plot order
Fin <- cbind(C,E,Fi,Ot)
colnames(Fin) <- c("A","Conservation",
"AA","Ecology",
"AAA","Fisheries",
"AAAA","Other"
)
# Everything together
Fin= na.omit(Fin[,c(
"Conservation",
"Ecology",
"Fisheries",
"Other"
)]
)
###___en of step___###
#Transforms the results to time series
J_TS <- ts(Fin,
start=c(1700,1),
end = c(2017,12),
frequency= 1)
Fin$Date <- seq(1700,2017,1)
# Subset data for 1950 to 2017
GFin <- Fin %>%
gather("Research Topic","Value",1:4) %>%
filter(Date >= 1950 & Date <= 2017)
# Plot it
ggplot(GFin) +
geom_area(
aes(x = Date,
y = Value/1000,
fill = `Research Topic`,
colour = `Research Topic`),
alpha = 0.5) + # add the breaks
geom_vline(
aes(xintercept=2017), # END Lable
# Add vertical lines representing mayor data changes
linetype="dashed") +
annotate("text", x = 2016, y = 50, label = "2017", angle = 90) +
geom_vline(
aes(xintercept=1951), # Catch Statistics in Mexico
linetype="dashed") +
annotate("text", x = 1950, y = 30, label = "Early catch statistics in Mexico", angle = 90, size = 6) +
geom_vline(
aes(xintercept=2000), # Catch Statistics in Mexico Anuario de 2013
linetype="dashed") +
annotate("text", x = 1999, y = 28, label = "Release of disaggregated fisheries data", angle = 90, size = 6) +
geom_vline(
aes(xintercept=2008), # "Biologic Data From Fish from the Yucatan Peninsula
linetype="dashed") +
annotate("text", x = 2007, y = 30, label = "Biological Info. of fish from Yucatan", angle = 90, size = 6) +
ggtheme_plot() +
theme(
legend.position = "top",
legend.text = element_text(size = 18),
legend.title = element_text(size = 18),
axis.title.y = element_text(size = 20),
axis.title.x = element_text(size = 20),
axis.text.x = element_text(size= 20),
axis.text.y = element_text(size= 20)
) +
scale_colour_manual(values = c("#3B9AB2", "#78B7C5", "#EBCC2A","#F21A00")) +
scale_fill_manual(values = c("#3B9AB2", "#78B7C5", "#EBCC2A","#F21A00")) +
scale_x_continuous(name ="Date",
limits = c(1950, 2020),
breaks = seq(1950,2020,10))+
scale_y_continuous("Metadata Records (Thousands)",
limits = c(0, 50),
breaks = seq(0,50,10))
ggsave("Fig4.png",
plot = last_plot(),
width = 12,
height = 6,
units = "in",
path = "./Figures")
```
```{r Species_analysis, eval=T, echo = F, warning=F, message=F}
#### Species Information ###
Species <- Template %>%
filter(Subject_name !="TBD") %>%
filter(!is.na(Subject_name)) %>%
group_by(Subject_name) %>%
summarise(x=n(),
DP = sum(Data_Time_Points,
na.rm=T)) %>%
arrange(-x)
Total <- sum(Species$x)
Species_Text <- length(unique(Species$Subject_name))
Species <- Species %>%
mutate(SP_Percentage = (x/Total)*100) %>%
arrange(desc(SP_Percentage))
# First 10%
Per_10 <- round(sum(Species$SP_Percentage[2:28]),2)
# 24/nrow(Species)*100
#First 50%
Per_50 <- round(sum(Species$SP_Percentage[2:970]),2)
#
# 970/nrow(Species)*100
#### Where do they come from? ####
Per_50_Spp <- Species$Subject_name[2:970]
Per_50_Source <- Template %>%
filter(Subject_name %in% Per_50_Spp) %>%
group_by(Research_Field) %>%
summarise(n=n())
Per_50_Spp_Totl <- sum(Per_50_Source$n)
Per_50_Source <- Per_50_Source %>%
mutate(Percen = round((n/Per_50_Spp_Totl)*100))
###########################################
### Less than 100 records
Under_100_R <- Species %>%
filter(x <= 100)
Under_100 <- round((nrow(Under_100_R)/nrow(Species))*100,2)
One_Recod <- Species %>%
filter(x <= 1)
One_R <- round((nrow(One_Recod)/nrow(Species))*100,2)
#_____________ NOTE__________ ##
# Do not run when knitteing, it will take ages ##
#____________________________ ##
### Subset Records with Taxa ###
Correct_Taxa <- gnr_resolve(names = Species$Subject_name, #Looks for homogenic names
best_match_only = TRUE, # Will only give us the best match
canonical = TRUE #Removes names
)
#Records at the taxa level
Taxa_Records <- Template %>%
filter(Subject_name %in% Correct_Taxa$submitted_name) %>%
group_by(Subject_name) %>%
summarise(n())
Taxa_R <- round((nrow(Taxa_Records)/nrow(Species))*100,2) # 97.46
Non_Taxa <- Template %>%
filter(!Subject_name %in% Correct_Taxa$submitted_name) %>%
group_by(Subject_name) %>%
summarise(n())
#### Los tiburcios #####
Tiburcios <- Template %>%
filter(Author == "Silva, A.")
```
There are 24,083 subjects (taxa target of the data colelction) represented in the metadatabase. Most single-subject records (97%) represented taxa (e.g. *Octopus maya*, or *Epinephelus* spp.) and only 3% was identified with common names such as "Octopus" or "Mangrove". Assessments not differentiated by a single subject are grouped under “Multiple species” and comprised only 3% of all records. While the list of species in the metadata was quite large, data availability was uneven: 3.7% of subjects with most metadata records comprise `r One_R`% of all records. Subjects with the most amount of records were Carcharhinidae shark species *Carcharhinus porosus* and *C. falciformis* with 1,200 records each, followed by *C. limbatus* with almost 1,000 records.
```{r Geographic, echo = F, eval=T}
### Areas ####
Area <- Template %>%
group_by(Area) %>%
summarise(Entradas = n()) %>%
filter(Area !="na") %>%
filter(Area != "TBD") %>%
arrange(-Entradas) %>%
mutate(Per_Area = round((Entradas/nrow(Template)*100)))
First_ID <- Area$Area[1]
Second_ID <- Area$Area[2]
Third_ID <- Area$Area[3]
First_Value <- Area$Per_Area[1]
Second_Value <- Area$Per_Area[2]
Third_Value <- Area$Per_Area[3]
### Regions ###
# Reject null hypothesis (p < 0.001)