Explore DwC-DP

Published

April 24, 2025

Explore the DwC-DP tables

You can load all the RDA files like below and play around with the tables as demonstrated in this page. Any pull request is also welcomed.

library(here)
library(tidyverse)

# Load all the RDA files
rda_files <- list.files(path = here("data", "output", "rda"), pattern = "\\.rda$", full.names = TRUE)
lapply(rda_files, load, envir = .GlobalEnv)

Community measurements

Does DwC-DP handles community measurements better than DwCA?

DwCA

In DwCA, we will have to have all these as Occurrence records so that measurements like “Standard Length” and “Total Length” of each individual in emof has an Occurrence record to point to. However, this does not warrant user to add the organismQuantity of all individuals together.

Occurrence table in DwCA
occurrenceID eventID scientificName lifeStage organismQuantity organismQuantityType
BROKE_WEST_RMT_051_RMT8_217697 BROKE_WEST_RMT_051_RMT8 Electrona antarctica 28 individuals
AAV3FF_00231 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual
AAV3FF_00232 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual
AAV3FF_00233 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual
AAV3FF_00234 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual
AAV3FF_00235 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual
AAV3FF_00236 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual
AAV3FF_00237 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual
AAV3FF_00238 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual
AAV3FF_00239 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual
AAV3FF_00240 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual
AAV3FF_00241 BROKE_WEST_RMT_051_RMT8 Electrona antarctica larvae 1 individual

This has improved with the term eco:isLeastSpecificTargetCategoryQuantityInclusive from the Humboldt Extension. If the event has isLeastSpecificTargetCategoryQuantityInclusive = true, the count in organismQuantity of the Occurrence includes all of the larvae individuals.

DwC-DP

The Occurrence and Material are separate concepts now. We do not need to have a separate Occurrence just to indicate the total catch of a taxon. We can have a single Occurrence (total catch) with multiple Materials (individual) to indicate the total catch and preserved individuals. The measurements of the individuals (Assertions) can be directly linked to the Materials via the Material Assertion table.

event_id <- "BROKE_WEST_RMT_051_RMT8"
taxon_id <- "urn:lsid:marinespecies.org:taxname:217697"

# isLeastSpecificTargetCategoryQuantityInclusive for the Event?
survey %>% filter(eventID == event_id) %>% select(eventID, isLeastSpecificTargetCategoryQuantityInclusive)
# total catch of electrona antarctica from BROKE_WEST_RMT_051_RMT8
# because isLeastSpecificTargetCategoryQuantityInclusive = true, the count in organismQuantity of occurrenceID BROKE_WEST_RMT_051_RMT8_217697 includes the larvae from BROKE_WEST_RMT_051_RMT8_217697_Larvae
occurrence %>% filter(eventID == event_id & taxonID == taxon_id) %>% 
  select(eventID, occurrenceID, scientificName, lifeStage, organismQuantity, organismQuantityType)
# all preserved electrona antarctica from BROKE_WEST_RMT_051_RMT8
mat <- material %>% filter(eventID == event_id & taxonID == taxon_id) %>% 
  select(materialEntityID, eventID, materialEntityType, scientificName, preparations)
mat
# all measurements of individual electrona antarctica from BROKE_WEST_RMT_051_RMT8
mat %>% left_join(material_assertion, by = "materialEntityID") %>% 
  select(materialEntityID, assertionType, assertionValueNumeric, assertionValue, assertionUnit)

Body part measurements

Very often, we received dataset with measurements performed on a specific body part of an organism. Example: https://www.gbif.org/occurrence/3344249657

DwCA

It is difficult to model with DwCA because:

  • a specific body part/sample of an organism is not an Occurrence nor an Event
  • it is difficult to express relationship between a Material (e.g. body part) and the Organism
  • it is difficult to distinguish measurements performed on a Material of an dwc:Organism and to express that the Organism was preserved and located in certain collection from an institution.

Currently, I modeled it using eMoF pointing to the Occurrence with body part in measurementRemarks. Specifying the body part can be embedded in a matrix of a NERC vocabulary but it is not practical to mint NERC for every body part and body part is specific to a taxon.

Occurrence table in DwCA
occurrenceID scientificName
SO_Isotope_1985_2017_1013 Glabraster antarctica (E.A.Smith, 1876)
eMoF table in DwCA
occurrenceID measurementType measurementValue measurementUnit measurementRemarks
SO_Isotope_1985_2017_1013 The carbon elemental content measured in the tegument of the considered sea star specimen, expressed in relative percentage of dry mass 12.28 relative percentage of dry mass tegument

DwC-DP

Relationship between Materials can be specified through the derivedFromMaterialEntityID field. Example of a krill eaten by a fish can be modeled within a single Material table.

material %>% filter(str_starts(materialEntityID, "AAV3FF_00025")) %>% 
  select(materialEntityID, derivedFromMaterialEntityID, materialEntityType, scientificName, materialEntityRemarks)

Non-detections

DwCA

Before Humboldt Extension was developed, non-detections are represented as an Occurrence record with occurrenceStatus = absent in DwCA.

When Humboldt Extension comes along, non-detections can be inferred by looking at the target scopes that do not have an Occurrence record if the target scope was fully reported.

DwC-DP

In DwC-DP, non-detections can be represented as an Occurrence with occurrenceStatus = absent, just like in DwCA. Similarly, non-detections can also be inferred by looking at the SurveyTarget that do not have an Occurrence record if only detections were reported.

library(ggplot2)

target_taxa <- survey_target %>% 
  filter(surveyTargetType == "taxon") %>%
  select(surveyID, surveyTargetID, surveyTargetValue) %>%
  rename(scientificName = surveyTargetValue) %>%
  distinct()

occurrence_taxa <- occurrence %>% 
  select(surveyTargetID, eventID, scientificName, eventID) %>%
  rename(surveyID = eventID) %>%
  distinct()

detections_and_non_detections <- occurrence_taxa %>% 
  full_join(target_taxa, by = c("surveyID", "surveyTargetID")) %>%
  rename(targetTaxon = scientificName.y, occurrenceTaxon = scientificName.x) %>%
  mutate(occurrenceStatus = case_when(is.na(occurrenceTaxon) ~ "notDetected", TRUE ~ "detected"))

# Swap the axes so survey IDs are on the y-axis
ggplot(detections_and_non_detections, aes(y = surveyID, x = targetTaxon, fill = occurrenceStatus)) +
  geom_tile(color = "white", linewidth = 0.5) +
  scale_fill_manual(values = c("detected" = "#1e88e5", "notDetected" = "#ffcdd2")) +
  labs(
    title = "Occurrence Status by Target Taxon and Survey",
    y = "Survey ID",  # Now on y-axis
    x = "Target Taxon",  # Now on x-axis
    fill = "occurrenceStatus"
  ) +
  theme_minimal() +
  theme(
    axis.text.y = element_text(size = 7.5),  # Small text for survey IDs
    axis.text.x = element_text(size = 9, angle = 45, hjust = 1),  # Angled species names
    panel.grid = element_blank(),
    legend.position = "bottom"
  )