ARCHIVED CONTENT: In December 2020, the CZO program was succeeded by the Critical Zone Collaborative Network (CZ Net) ×

YODA files

Share data as YODA files (not fully operational)

SUMMARY: The CZOData team has been developing the new YAML Observations Data Archive and Exchange (YODA) file format to extend the original CZO Display File specification to accommodate the full diversity of critical zone science data -- such as hydrological time series, soil profile geochemistry, biodiversity transects, etc. -- that can be organized with the Observations Data Model v2 (ODM2). As an implementation of ODM2, YODA will serve as a text-file encoding both for archiving observational data at recommended data centers and for integrating diverse data from multiple sources using ODM2-based cyberinfrastructure at these data centers and the CZO Central system.

CZOData Team Contacts: Anthony Aufdenkampe and Jeffery Horsburgh
Questions?  Email the CZOData Project team

Data Policies & GuidelinesData Sharing Guidelines > YODA files


Benefits and Outcomes

The new YAML Observations Data Archive and Exchange (YODA) file format was specifically designed by the CZOData team in collaboration with CZO data managers to substantially extend and replace the existing CZO Display File format, which was only capable of encoding hydrologic time series. YODA provides the capability to encode both sensor time series datasets and specimen-based laboratory datasets. In addition, the YODA File will meet the following requirements:

  • Easy for humans to read and write. Anyone opening the file in a text editor or spreadsheet application should be able to quickly and intuitively understand the file contents and how to use the data. YODA uses YAML, which is a simplified markup language, for encoding data files.
  • Easy for machines to parse and generate. The file should be very easy to parse and validate with the wide variety of software tools used by scientists.
  • Conform to the metadata requirements of an ODM2 Dataset, and yet have the flexibility to utilize a variety of controlled vocabularies.
  • Serve as a self-describing archival file format that contains all necessary data and metadata to define a dataset. These kinds of self-describing data files are strongly encouraged by our recommended data centers, and we are working with IEDA EarthChem and CUAHSI Hydroshare to develop capabilities to parse and read the contents of YODA files for enhanced data integration.

CZO investigators, data managers and data users will benefit from the following:

  • Quality-controlled datasets from sensors or samples can be archived at recommended data centers, listed at CriticalZone.org, and cataloged by the CZO Data Search Portal all by using a consistent, self-describing file format that captures the data and all associated metadata within a single file.
  • YODA Excel templates designed for specific dataset profiles (i.e., multivariate time series; multi-specimen measurements) will guide CZO investigators to type and/or paste their data and metadata into a series of data entry forms, which will then be used by the Excel template to automatically generate a YODA file.
  • CZO data managers who do not wish to use the YODA Excel templates can automate the generation of YODA files by using code developed by the ODM2 team or by writing their own code.
  • All YODA files, regardless of how they are generated, can be harvested and validated using a single codebase for granular data integration that could not be readily possible if dealing with many different file formats.
  • YODA and the Excel data entry templates developed by the CZOData Team use controlled vocabularies developed as part of the ODM2 project. These CVs can be modified by CZO data managers and community members to meet their needs using the online moderation system at http://vocabulary.odm2.org.
  • For CZO data users, YODA provides a standardized file format around which visualization and analysis software can be built.

Details of the YODA file format and associated Excel data entry templates can be found at the YODA-File Github source code repository: https://github.com/ODM2/YODA-File.

 


Instructions

In general, there are two workflows for creating YODA files. Details are provided in the following sections.

Generate YODA Files using CZOData Excel Data Entry Templates

The CZOData Team has created two Microsoft Excel data entry templates that can be used by Data Managers to create YODA files.  These include a Time Series YODA template, and a Specimen YODA template. These Excel templates provide pre-formatted tables into which CZO investigators and CZO data managers can paste or type metadata and data values for a particular dataset. The templates provide access directly to ODM2 Controlled Vocabulary terms directly within the template files (e.g., users can choose terms from pre-populated lists to populate metadata fields rather than typing in their own terms). Once data entry is complete, an automated script within the template files can be executed to export a valid YODA file that can then be listed at CriticalZone.org.

We are currently finishing up development of the Excel template files, but prototypes are available for download from the YODA-File GitHub repository https://github.com/ODM2/YODA-File.

Generate YODA Files Using Code

In some cases, Data Managers may need to generate large numbers of YODA files, or they may want to automate the process of YODA file creation. YODA is a text file specification based on the data serialization and interchange format of YAML (YAML Ain't Markup Language), a superset of JSON (JavaScript Object Notation). YAML can be readily generated or parsed by any modern computer language using well-tested libraries (see Projects list at http://yaml.org). Therefore, one valid option for creating YODA files is for CZO Data Managers to develop scripts or other code to interact with their underlying data system to automatically generate YODA files. Thus, data can be managed according to the current system used by the CZO, but exported to the YODA format for exchange with CZOData Central.  In the case where a CZO adopts ODM2 as part of their underlying data management infrastructure, the CZOData Team is developing tools for exporting YODA files directly from ODM2 databases (see following section).

Other Functionality for YODA Files

The CZOData Team is currently working on a set of Python-based tools for working with YODA files. These tools are being developed within an open-source GitHub repository at: https://github.com/ODM2/YODA-Tools. Tools under development include a YODA file validator - i.e., a Python-based utility that will parse a YODA file and ensure that it is complete, conformant with ODM2 controlled vocabularies, and ready for posting at CriticalZone.org.

Additional relevant tools related to YODA files include code for parsing YODA files into a Python-based object structure for loading datasets into an ODM2 database. The same object structure can be used to query a dataset out of an ODM2 database for export to a YODA file. This Python-based object model is part of ongoing development of an application programming interface (API) for ODM2 - see https://github.com/ODM2/ODM2PythonAPI. These tools may be very useful for data managers who are considering or who have decided to use ODM2 databases for managing their sensor or sample-based data.

 


Additional information