ARCHIVED CONTENT: In December 2020, the CZO program was succeeded by the Critical Zone Collaborative Network (CZ Net) ×

Datasets

List datasets at CriticalZone.org

SUMMARY: The CriticalZone.org website is the starting point for you to share data and for the public to access it.  You only need to enter metadata; your actual data files are *not* uploaded.   Instead, your files are accessible at a URL on your CZO's data server or at a data center. The metadata are quite flexible and can handle any data type, file format, data quality, etc. Once your data is listed, it becomes discoverable at the main website CriticalZone.org as well as via the CZO Central Catalog at Search.CriticalZone.org.

Contact: David Lubinski

Data Policies & GuidelinesData Sharing Guidelines > Datasets


Benefits and Outcomes

Share all your data quickly

Sharing data early gives investigators and students “credit” and exposure.  CZO investigators and students should submit their data to CZO Data Managers as soon as they can, even if it is incomplete and needs to stay private. CZO Data Managers can host datasets locally and password protect datasets that need to be kept private until they are ready to be published. This provides a number of benefits:

  • CZO Investigators, Students & Data Managers get to know each other and their needs,
  • Metadata is collected before that information is lost. Don't forget to include NSF and other Award numbers.
  • Investigators get credit even before the data goes public.

Listing datasets at CriticalZone.org helps CZO Data Managers meet CZO data sharing requirements.   All CZO investigators and collaborators who receive material or logistical support from a CZO agree to:

  • share data privately within 1 year.
  • release data to the public within 2 years.  
  • extensions are possible, but require review and approval by CZO PIs.  
  • for more details, see the CZO Data Sharing Policy.

Listing datasets at CriticalZone.org is relatively easy because the process is solely based on metadata, not actual data files.  A key metadata requirement is that every data file be available at a URL elsewhere: on your CZO's data server or at a recommended data center. Beyond that URL requirement, the metadata are quite flexible and updatable.  The metadata can handle datasets of any: 

  • data type (earth science, bioscience, time series, point data, GIS data, transects, etc). 
  • file format (CSV, Excel, zip file, html web page, etc).
  • file size (Kilobytes, Megabytes, Terabytes etc).
  • status or data quality (incomplete, private, raw, final, error-corrected, etc).
  • author (CZO, federal, or other sources).
  • grouping of files (single file to tens of related files)

Referencing data files with the following traits will make your data as widely useful and distributed as possible.  Your data files should *ideally* contain:   


Make your data easy to discover and browse

CZO Dataset Listings (CriticalZone.org)
Listing your datasets automatically makes them widely discoverable and easy to access.  Your CriticalZone.org datasets are readily discovered by anyone searching via Google and other search engines. And, visitors to CriticalZone.org can either search/browse datasets within one CZO or across the full CZO network.  Browsing is made easier because most datasets are grouped into natural collections of multiple data files and all dataset titles have a consistent format.  Moreover, datasets can be browsed by Title, Field Area, Topic, or Discipline Tag. 

CZO Data Search Portal (Search.CriticalZone.org)
If website visitors need more powerful search than is available at CriticalZone.org, they can visit the “CZO Data Search Portal” at Search.CriticalZone.org. CZO Dataset Listings are synced daily with the portal, and the portal links those listings back to their associated pages at CriticalZone.org.  The Search Portal also includes additional CZO data files from the CZO Central Data Catalog (data previously harvested via CZO Display Files and data to be harvested via  YODA files). The search portal may expand to include additional relevant data not authored by CZO.  The Search Portal is built upon an ESRI Geoportal Server, which promotes further discoverability of the data by publishing in standardized geographic metadata formats like ISO-19115.  Such formats enable interoperability with other cyberinfrastructure systems.  The CZO dataset listings can then be federated with other national, non-CZO data catalogs and data repositories.  Such federation means your data can be even more widely discoverable.   

 


Instructions

 

1. Decide how you want to lump/split your data files.

Data file listings at CriticalZone.org are organized into “CZO Datasets", which are aggregates of data.  They are data collections naturally grouped by field areas, variables, time periods and other topics.  Each CZO Dataset consists of one or more “Dataset Components”, each of which references a data file at a specific URL.  It is up to each CZO data manager to decide how to group their collections of data.  Established CZOs have about 30-85 Datasets, which is a reasonable length of a list to browse.  And some of those browsable lists already encompass more than 60 million data values. After initial input, the number of datasets is expected to grow slowly for each CZO as new instrumentation, analyses, and sites are added.  The number of data values, however, is expected to grow quickly.   All CZOs should attempt to list the entirety of their data, including geospatial data.  

An example CZO Dataset is  “B2 Desert Site - Meteorology (2009-2015)” from the Catalina-Jemez CZO.  This meteorological dataset currently consists of nine main components, seven of which are annual collections of data in csv format hosted on the Catalina-Jemez data server.   Another component will be added to the dataset each year.  Two additional components are used to point to (1) a web interface that enables filtering by date, data type, and variable and (2) a web page that connects to multiple files discussing methods.  

A similar example is “Betasso (BT_Met) - Meteorology (2009-2015)” from the Boulder Creek CZO.  It too is based on meteorological data collection.  This CZO Dataset, however, consists of a single component: a link to a web interface on the Boulder CZO data server that enables downloading the entire dataset or filtering it by date, data type, output format, and variable.  This approach allows minimal metadata to be entered into the CZO CMS while still connecting to one or more complex and large data files.   

A much different example is “Shale Hills, Boulder, Luquillo, JRB-SCM - Soil Geochemistry (2001-2013)”.  This geochemical dataset was created by members of the Shale Hills CZO using samples and data from four CZOs.  It consists of a single, but complex component hosted on the Shale Hills data server: a downloadable zip file that includes a Microsoft Access file. 

A final example is “Critical Zone Tree 2 - Soil Moisture, Soil Temperature, Electrical Conductivity, Matric Potential, Sap Flow (2010-2012)” from the Southern Sierra CZO.  This dataset is derived from a series of sensors arrayed around a single tree.  Its’ 10 dataset components are CSV files containing data from different kinds of sensors, different years of collection, and different levels of data quality.  Some of the data are private.  All are hosted on the Sierra CZO data server. 

Although all of the above examples show data values stored at individual CZOs, the ultimate goal is for the data to be stored with a Data DOI at a relevant archival data center.  Once the data are housed at the data center, the Dataset URL at CriticalZone.org should be updated to point to the data center's version.

 

2. Enter Datasets in the CZO Content Management System (CMS)

Dataset entry requires a login to the CMS. If needed, contact the CZO webmaster David Lubinski.  He will provide access and video screencasts for training.     

The CMS was designed to fit the differing data management workflows of each CZO by making it possible to simply enter metadata manually. This isn’t as onerous as it sounds. CZO data managers are encouraged to reuse existing metadata by copying and pasting. Also, most CZOs are lumping multiple components into a single dataset or referencing a single web page that provides filtering access to larger, more complex data files.  Established CZOs currently have about 30-60 Datasets, which is a reasonable length of a list to maintain and to browse.  In many cases, maintenance is minimal and limited to adding new components to existing datasets. If you have many datasets with many components to enter, contact David Lubinski.  He may be able to assist with manual entry and/or importing your metadata content from a structured format (ie CSV).   

The metadata for CZO Datasets are fairly rich.  They were designed to handle the required fields of archival data centers/repositories (see Data DOI) as well as standardized geographic metadata formats like ISO-19115.  Detailed “in situ” instructions are given in the CMS for each field on the multi-tabbed entry form.  

Required Fields

  • Title composite - automatically created from Location, Topic, optional Subtopic, Start Date, End Date.
  • Component Data matrix - One of more rows with the following required columns: Location, Topic, URL, Private or Public, Data Level.  And an optional column: DOI.
  • Description/Abstract
  • Dataset Creators/Authors
  • Citation for this dataset
  • Contact Person & Info
  • Keywords
  • Variables
  • National Discipline Tag
  • CZO Field Areas
  • North, West, East, and South Bounding Latitudes

Optional Fields

Basic fields: Subtitle, CZO(s), CZO Dataset Creators/Authors, 

Other fields: Dataset DOI, External Link(s), Award/Grant Number(s), Comments, Related Datasets, Primary Publications, Publications that use this Dataset, Local Discipline/Research Group, Local Research Foci, 

Map fields: General Map image upload(s), General Map layer upload(s), Centroid Lat and Long, Point Markers, KML/KMZ file(s), Map freeform text & code.

 

3. Check the associated web pages at CriticalZone.org

After entering datasets, check the corresponding public web pages to ensure all the information is correct.  Pages include summaries of multiple datasets, such as 

/sierra/data/datasets/ and /sierra/data/datasets/by-field-area/

as well as individual datasets like /sierra/data/dataset/2642/ and /sierra/data/dataset/3641/

Note that the layout of these pages is likely to change a bit to better display geographic extent and optional maps.  If you have additional suggestions for improvement, contact David Lubinski

 


Additional information