ARCHIVED CONTENT: In December 2020, the CZO program was succeeded by the Critical Zone Collaborative Network (CZ Net) ×

Vocabularies

Use controlled vocabularies to describe data

SUMMARY: CZO investigators benefit from describing their data using terms from a Controlled Vocabulary (CV) that is shared by the broader scientific community. The use of CVs enable better data sharing, discovery and integration. The CZOData team had developed a number of systems to help CZO investigators to select appropriate terms.

CZOData Team Contacts:  Jeffery HorsburghKerstin Lehnert, and Megan Carter.
Questions?  Email the CZOData Project team

Data Policies & GuidelinesData Sharing Guidelines > Vocabularies


Benefits and Outcomes

The use of a Controlled Vocabulary (CV) is a critical, early component of the data sharing, discovery and integration workflow that scientists need as they investigate complex processes from multiple CZOs over expanding spatial and temporal scales. Without the use of CVs, different investigators commonly use different terms to describe the same concepts and sometimes disagree about the meaning of terms. Such semantic heterogeneity across data sources can make discovery, integration, and synthesis of data difficult to impossible.

CZO investigators, CZO data managers and CZO data users can all realize the following benefits from the use of controlled vocabularies in describing datasets:

  • CVs reduce the amount of semantic heterogeneity in data from multiple sources/CZOs
  • CVs make it easier for the CZOData Project to catalog data for discovery purposes
  • CVs make it easier for CZO data managers to create consistent descriptions of similar datasets
  • CVs make it easier for potential data users to understand and interpret the data.

A number of controlled vocabulary systems have been developed to meet the needs of the critical zone science community, all with input from CZO investigators and data managers. These community shared vocabularies represent consensus of community members on the terms that should be used to describe data.


Instructions

  1. Select the appropriate controlled vocabulary system for your data from our recommended CV systems, below.
    1. Select based on scientific discipline, data type and the targeted archival data center (see our Data DOI page for reccomendations for CZOs).
  2. Select the appropriate controlled vocabulary term list that best matches each metadata field.
    1. Many databases and data exchange formats are explicit about which CV term list should be used to populate each field. Follow those specifications.
    2. Many data systems have more generic dataset descriptors and keywords. As much as possible, select terms from a CV that shares a similar concept.
  3. Use the appropriate term(s) from the selected CV term list.
    1. Read term definitions to confirm your choice.
    2. Select terms from a drop-down list if the data entry system has that feature, or cut & paste terms to avoid errors.
    3. Use the entire URI/URL for the term (i.e., http://vocabulary.odm2.org/unitstype/concentrationMassPerVolume/) to link to definitions and other associated information.
  4. Contribute to the community CVs
    1. Suggest new terms if you can’t find a term you need.
    2. Editing or adding to the content of existing terms
    3. Both the ODM2 CV and CUAHSI HIS systems have online, community moderation systems (links given below).

 


Recommended Controlled Vocabulary Systems for CZO Data

Observation Data Model v2 (ODM2) Controlled Vocabularies
  • ODM2 CVs were specifically developed to meet the needs of the critical zone science community by the CZOData team. We combined the best terms from existing CVs (i.e. CUAHSI CVs, IEDA EarthChem CVs) and developed a number of new CV lists to facilitate data integration across disciplines (i.e., http://vocabulary.odm2.org/methodtype/).
  • ODM2 CVs were designed to work natively with data systems developed around the ODM2 information model, which is focused on integrating both sensor AND sample-based observations. ODM2-based data systems include the new cyber-infrastructure for the Interdisciplinary Earth Data Alliance (IEDA) EarthChem and the new System for Earth Sample Registration (SESAR) and planned cyber-infrastructure for CUAHSI Water Data Center and Hydroshare. As a result, these data systems will be migrating to the use of ODM2 Controlled Vocabularies.
  • ODM2 CVs are required for sharing data via the new YAML Observation Data Archive & exchange (YODA) file format, which will replace the CZO Display File format (see our YODA Files page for guidelines for CZOs).
  • The ODM2 controlled vocabularies csn be accessed and edited at:
  • CZO investigators and data managers are strongly encouraged to use ODM2 Controlled Vocabularies as much as possible.
    • Most terms contained within CUAHSI CVs and IEDA EarthChem CVs have been preserved in the ODM2 CV system. If a term has been removed (for good reason), we are developing means to suggest appropriate replacements.
    • Two exceptions are for data for CUAHSI HIS and for IEDA geochemical variables and methods, as described below.
CUAHSI HIS ODM Version 1.1.1 Controlled Vocabularies
  • The CUAHSI Water Data Center currently runs and curates the controlled vocabularies that are used to define metadata for instances of Version 1.1.1 of the CUAHSI Observations Data Model (ODM).
  • With respect to the CZOData project, the CUAHSI ODM Version 1.1.1 controlled vocabularies have been used up to this point to validate CZO Display Files for hydrologic time series data submitted by each of the individual CZOs.
    • CZO Display Files are harvested from each of the individual CZO websites by the CZOData Central system, which is hosted at SDSC. Harvested files are validated to make sure that they are using terms from the existing controlled vocabularies and are then parsed into ODM 1.1.1 database instances for each CZO. The CZOData Central system maintains the ODM databases and has deployed CUAHSI HIS WaterOneFlow web services for each CZO that has published display files. The WaterOneFlow web services have been registered with the CUAHSI Water Data Center, and all data in the CZO ODM databases are discoverable and accessible via the CUAHSI HIS/CUAHSI Water Data Center.
  • The CUAHSI HIS controlled vocabularies can be accessed and edited at:
  • CZO investigators and data managers should use CUAHSI HIS ODM Version 1.1.1 Controlled Vocabularies only in these cases:
    • If you are publishing hydrologic time series data using the older CZO Display File format
    • If you are currently managing your hydrologic time series data using Version 1.1.1 of the CUAHSI HIS. 
IEDA EarthChem Controlled Vocabularies
  • The IEDA EarthChem CVs were designed to guide metadata entry for PetDB, SedDB, VentDB, the EarthChem Library and SESAR.
  • The IEDA cyber-infrastructure is being rebuilt around the ODM2 information model, and as such IEDA is in the process of adopting many of the ODM2 Controlled Vocabularies. However, ODM2 CVs do not presently cover all of IEDA’s needs.
  • The IEDA EarthChem CVs can presently be accessed at:
  • CZO investigators and data managers should use IEDA EarthChem Controlled Vocabularies in these cases:
    • The ODM2 Variable Name CV does not presently contain all relevant geochemical variables, because it was taken directly from CUAHSI without merging with analogous IEDA CVs. If a required Variable Name does not yet exist in the ODM2 Variable Name CV, use terms from these IEDA CV Lists:
    • If equivalents do not exist in ODM2 CVs for specialized IEDA CVs, such as for Methods, Minerals, Tectonic Setting, etc., use terms from those specialized IEDA CVs.
    • If using an IEDA EarthChem data entry template or form, use their CVs.


IMPORTANT NOTE:  The CZOData project is in the process of defining a new archival/exchange file format called YAML Observations Data Archive and Exchange (YODA) that will replace the older CZO Display File format for hydrologic time series. The newer format will support both hydrologic time series and data derived from physical samples. When the new format and associated software tools are complete, we will encourage data managers to move to the new YODA format rather than using the CZO Display File format.

 


Additional information