library(here)
library(tidyverse)
# Load all the RDA files
<- list.files(path = here("data", "output", "rda"), pattern = "\\.rda$", full.names = TRUE)
rda_files lapply(rda_files, load, envir = .GlobalEnv)
Explore DwC-DP
Explore the DwC-DP tables
You can load all the RDA files like below and play around with the tables as demonstrated in this page. Any pull request is also welcomed.
Community measurements
Does DwC-DP handles community measurements better than DwCA?
DwCA
In DwCA, we will have to have all these as Occurrence records so that measurements like “Standard Length” and “Total Length” of each individual in emof has an Occurrence record to point to. However, this does not warrant user to add the organismQuantity of all individuals together.
occurrenceID | eventID | scientificName | lifeStage | organismQuantity | organismQuantityType |
---|---|---|---|---|---|
BROKE_WEST_RMT_051_RMT8_217697 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | 28 | individuals | |
AAV3FF_00231 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
AAV3FF_00232 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
AAV3FF_00233 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
AAV3FF_00234 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
AAV3FF_00235 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
AAV3FF_00236 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
AAV3FF_00237 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
AAV3FF_00238 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
AAV3FF_00239 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
AAV3FF_00240 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
AAV3FF_00241 | BROKE_WEST_RMT_051_RMT8 | Electrona antarctica | larvae | 1 | individual |
This has improved with the term eco:isLeastSpecificTargetCategoryQuantityInclusive from the Humboldt Extension. If the event has isLeastSpecificTargetCategoryQuantityInclusive = true, the count in organismQuantity of the Occurrence includes all of the larvae individuals.
DwC-DP
The Occurrence and Material are separate concepts now. We do not need to have a separate Occurrence just to indicate the total catch of a taxon. We can have a single Occurrence (total catch) with multiple Materials (individual) to indicate the total catch and preserved individuals. The measurements of the individuals (Assertions) can be directly linked to the Materials via the Material Assertion table.
<- "BROKE_WEST_RMT_051_RMT8"
event_id <- "urn:lsid:marinespecies.org:taxname:217697"
taxon_id
# isLeastSpecificTargetCategoryQuantityInclusive for the Event?
%>% filter(eventID == event_id) %>% select(eventID, isLeastSpecificTargetCategoryQuantityInclusive) survey
# total catch of electrona antarctica from BROKE_WEST_RMT_051_RMT8
# because isLeastSpecificTargetCategoryQuantityInclusive = true, the count in organismQuantity of occurrenceID BROKE_WEST_RMT_051_RMT8_217697 includes the larvae from BROKE_WEST_RMT_051_RMT8_217697_Larvae
%>% filter(eventID == event_id & taxonID == taxon_id) %>%
occurrence select(eventID, occurrenceID, scientificName, lifeStage, organismQuantity, organismQuantityType)
# all preserved electrona antarctica from BROKE_WEST_RMT_051_RMT8
<- material %>% filter(eventID == event_id & taxonID == taxon_id) %>%
mat select(materialEntityID, eventID, materialEntityType, scientificName, preparations)
mat
# all measurements of individual electrona antarctica from BROKE_WEST_RMT_051_RMT8
%>% left_join(material_assertion, by = "materialEntityID") %>%
mat select(materialEntityID, assertionType, assertionValueNumeric, assertionValue, assertionUnit)
Body part measurements
Very often, we received dataset with measurements performed on a specific body part of an organism. Example: https://www.gbif.org/occurrence/3344249657
DwCA
It is difficult to model with DwCA because:
- a specific body part/sample of an organism is not an Occurrence nor an Event
- it is difficult to express relationship between a Material (e.g. body part) and the Organism
- it is difficult to distinguish measurements performed on a Material of an dwc:Organism and to express that the Organism was preserved and located in certain collection from an institution.
Currently, I modeled it using eMoF pointing to the Occurrence with body part in measurementRemarks
. Specifying the body part can be embedded in a matrix of a NERC vocabulary but it is not practical to mint NERC for every body part and body part is specific to a taxon.
occurrenceID | scientificName |
---|---|
SO_Isotope_1985_2017_1013 | Glabraster antarctica (E.A.Smith, 1876) |
occurrenceID | measurementType | measurementValue | measurementUnit | measurementRemarks |
---|---|---|---|---|
SO_Isotope_1985_2017_1013 | The carbon elemental content measured in the tegument of the considered sea star specimen, expressed in relative percentage of dry mass | 12.28 | relative percentage of dry mass | tegument |
DwC-DP
Relationship between Materials can be specified through the derivedFromMaterialEntityID
field. Example of a krill eaten by a fish can be modeled within a single Material
table.
%>% filter(str_starts(materialEntityID, "AAV3FF_00025")) %>%
material select(materialEntityID, derivedFromMaterialEntityID, materialEntityType, scientificName, materialEntityRemarks)
Non-detections
DwCA
Before Humboldt Extension was developed, non-detections are represented as an Occurrence record with occurrenceStatus = absent in DwCA.
When Humboldt Extension comes along, non-detections can be inferred by looking at the target scopes that do not have an Occurrence record if the target scope was fully reported.
DwC-DP
In DwC-DP, non-detections can be represented as an Occurrence with occurrenceStatus = absent, just like in DwCA. Similarly, non-detections can also be inferred by looking at the SurveyTarget that do not have an Occurrence record if only detections were reported.
library(ggplot2)
<- survey_target %>%
target_taxa filter(surveyTargetType == "taxon") %>%
select(surveyID, surveyTargetID, surveyTargetValue) %>%
rename(scientificName = surveyTargetValue) %>%
distinct()
<- occurrence %>%
occurrence_taxa select(surveyTargetID, eventID, scientificName, eventID) %>%
rename(surveyID = eventID) %>%
distinct()
<- occurrence_taxa %>%
detections_and_non_detections full_join(target_taxa, by = c("surveyID", "surveyTargetID")) %>%
rename(targetTaxon = scientificName.y, occurrenceTaxon = scientificName.x) %>%
mutate(occurrenceStatus = case_when(is.na(occurrenceTaxon) ~ "notDetected", TRUE ~ "detected"))
# Swap the axes so survey IDs are on the y-axis
ggplot(detections_and_non_detections, aes(y = surveyID, x = targetTaxon, fill = occurrenceStatus)) +
geom_tile(color = "white", linewidth = 0.5) +
scale_fill_manual(values = c("detected" = "#1e88e5", "notDetected" = "#ffcdd2")) +
labs(
title = "Occurrence Status by Target Taxon and Survey",
y = "Survey ID", # Now on y-axis
x = "Target Taxon", # Now on x-axis
fill = "occurrenceStatus"
+
) theme_minimal() +
theme(
axis.text.y = element_text(size = 7.5), # Small text for survey IDs
axis.text.x = element_text(size = 9, angle = 45, hjust = 1), # Angled species names
panel.grid = element_blank(),
legend.position = "bottom"
)