Explore DwC-DP

Published

April 24, 2025

Explore the DwC-DP tables

You can load all the RDA files like below and play around with the tables as demonstrated in this page. Any pull request is also welcomed.

library(here)
library(tidyverse)

# Load all the RDA files
rda_files <- list.files(path = here("data", "output", "rda"), pattern = "\\.rda$", full.names = TRUE)
lapply(rda_files, load, envir = .GlobalEnv)

Community measurements

Does DwC-DP handles community measurements better than DwCA?

DwCA

In DwCA, we will have to have all these as Occurrence records so that measurements like “Standard Length” and “Total Length” of each individual in emof has an Occurrence record to point to. However, this does not warrant user to add the organismQuantity of all individuals together.

Occurrence table in DwCA
occurrenceID	eventID	scientificName	lifeStage	organismQuantity	organismQuantityType
BROKE_WEST_RMT_051_RMT8_217697	BROKE_WEST_RMT_051_RMT8	Electrona antarctica		28	individuals
AAV3FF_00231	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual
AAV3FF_00232	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual
AAV3FF_00233	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual
AAV3FF_00234	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual
AAV3FF_00235	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual
AAV3FF_00236	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual
AAV3FF_00237	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual
AAV3FF_00238	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual
AAV3FF_00239	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual
AAV3FF_00240	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual
AAV3FF_00241	BROKE_WEST_RMT_051_RMT8	Electrona antarctica	larvae	1	individual

This has improved with the term eco:isLeastSpecificTargetCategoryQuantityInclusive from the Humboldt Extension. If the event has isLeastSpecificTargetCategoryQuantityInclusive = true, the count in organismQuantity of the Occurrence includes all of the larvae individuals.

DwC-DP

The Occurrence and Material are separate concepts now. We do not need to have a separate Occurrence just to indicate the total catch of a taxon. We can have a single Occurrence (total catch) with multiple Materials (individual) to indicate the total catch and preserved individuals. The measurements of the individuals (Assertions) can be directly linked to the Materials via the Material Assertion table.

event_id <- "BROKE_WEST_RMT_051_RMT8"
taxon_id <- "urn:lsid:marinespecies.org:taxname:217697"

# isLeastSpecificTargetCategoryQuantityInclusive for the Event?
survey %>% filter(eventID == event_id) %>% select(eventID, isLeastSpecificTargetCategoryQuantityInclusive)

# total catch of electrona antarctica from BROKE_WEST_RMT_051_RMT8
# because isLeastSpecificTargetCategoryQuantityInclusive = true, the count in organismQuantity of occurrenceID BROKE_WEST_RMT_051_RMT8_217697 includes the larvae from BROKE_WEST_RMT_051_RMT8_217697_Larvae
occurrence %>% filter(eventID == event_id & taxonID == taxon_id) %>% 
  select(eventID, occurrenceID, scientificName, lifeStage, organismQuantity, organismQuantityType)

# all preserved electrona antarctica from BROKE_WEST_RMT_051_RMT8
mat <- material %>% filter(eventID == event_id & taxonID == taxon_id) %>% 
  select(materialEntityID, eventID, materialEntityType, scientificName, preparations)
mat

# all measurements of individual electrona antarctica from BROKE_WEST_RMT_051_RMT8
mat %>% left_join(material_assertion, by = "materialEntityID") %>% 
  select(materialEntityID, assertionType, assertionValueNumeric, assertionValue, assertionUnit)

Body part measurements

Very often, we received dataset with measurements performed on a specific body part of an organism. Example: https://www.gbif.org/occurrence/3344249657

DwCA

It is difficult to model with DwCA because:

a specific body part/sample of an organism is not an Occurrence nor an Event
it is difficult to express relationship between a Material (e.g. body part) and the Organism
it is difficult to distinguish measurements performed on a Material of an dwc:Organism and to express that the Organism was preserved and located in certain collection from an institution.

Currently, I modeled it using eMoF pointing to the Occurrence with body part in measurementRemarks. Specifying the body part can be embedded in a matrix of a NERC vocabulary but it is not practical to mint NERC for every body part and body part is specific to a taxon.

Occurrence table in DwCA
occurrenceID	scientificName
SO_Isotope_1985_2017_1013	Glabraster antarctica (E.A.Smith, 1876)

eMoF table in DwCA
occurrenceID	measurementType	measurementValue	measurementUnit	measurementRemarks
SO_Isotope_1985_2017_1013	The carbon elemental content measured in the tegument of the considered sea star specimen, expressed in relative percentage of dry mass	12.28	relative percentage of dry mass	tegument

DwC-DP

Relationship between Materials can be specified through the derivedFromMaterialEntityID field. Example of a krill eaten by a fish can be modeled within a single Material table.

material %>% filter(str_starts(materialEntityID, "AAV3FF_00025")) %>% 
  select(materialEntityID, derivedFromMaterialEntityID, materialEntityType, scientificName, materialEntityRemarks)

Non-detections

DwCA

Before Humboldt Extension was developed, non-detections are represented as an Occurrence record with occurrenceStatus = absent in DwCA.

When Humboldt Extension comes along, non-detections can be inferred by looking at the target scopes that do not have an Occurrence record if the target scope was fully reported.

DwC-DP

In DwC-DP, non-detections can be represented as an Occurrence with occurrenceStatus = absent, just like in DwCA. Similarly, non-detections can also be inferred by looking at the SurveyTarget that do not have an Occurrence record if only detections were reported.

library(ggplot2)

target_taxa <- survey_target %>% 
  filter(surveyTargetType == "taxon") %>%
  select(surveyID, surveyTargetID, surveyTargetValue) %>%
  rename(scientificName = surveyTargetValue) %>%
  distinct()

occurrence_taxa <- occurrence %>% 
  select(surveyTargetID, eventID, scientificName, eventID) %>%
  rename(surveyID = eventID) %>%
  distinct()

detections_and_non_detections <- occurrence_taxa %>% 
  full_join(target_taxa, by = c("surveyID", "surveyTargetID")) %>%
  rename(targetTaxon = scientificName.y, occurrenceTaxon = scientificName.x) %>%
  mutate(occurrenceStatus = case_when(is.na(occurrenceTaxon) ~ "notDetected", TRUE ~ "detected"))

# Swap the axes so survey IDs are on the y-axis
ggplot(detections_and_non_detections, aes(y = surveyID, x = targetTaxon, fill = occurrenceStatus)) +
  geom_tile(color = "white", linewidth = 0.5) +
  scale_fill_manual(values = c("detected" = "#1e88e5", "notDetected" = "#ffcdd2")) +
  labs(
    title = "Occurrence Status by Target Taxon and Survey",
    y = "Survey ID",  # Now on y-axis
    x = "Target Taxon",  # Now on x-axis
    fill = "occurrenceStatus"
  ) +
  theme_minimal() +
  theme(
    axis.text.y = element_text(size = 7.5),  # Small text for survey IDs
    axis.text.x = element_text(size = 9, angle = 45, hjust = 1),  # Angled species names
    panel.grid = element_blank(),
    legend.position = "bottom"
  )