Automation of Ecological Data Management Using Structured Metadata

Interim Status Report

Metadata, or data describing data, represent information essential to the long-term preservation and usability of ecological data. They are also essential for data sharing among ecological researchers and with other scientific disciplines. Metadata should therefore be considered an indispensable and integral part of the data itself, and the results of this project are expected to contribute significantly to making this a reality.

Several phases of this project have been completed. We have developed a Document Type Definition (DTD) for Ecological Metadata Language (EML), which is a formalized representation of the ecological metadata content standard described in Michener et al. 1997. Although a DTD is not meant to be used directly by a human reader, we have included the above link for those of you technically interested in the specification.

This DTD, and others still to be created, are intended as input to a general purpose metadata editor also under development at NCEAS. This metadata editor will allow a user easy forms-based entry and modification of metadata in accordance with any particular metadata content standard. It is the DTD which provides the editor with complete information on the expected and permitted content of the metadata, in compliance with the selected standard (in the above case, with Michener et al. 1997). An example metadata file (this is mostly useful when viewed with the metadata editor as one or more forms) and a example data set have also been generated to demonstrate potential use of the EML. These versions of EML have also been used in the Quality assurance tool described below. The metadata editor will fill a major need of researchers at NCEAS who must integrate large numbers of arbitrarily structured and often poorly documented data sets. It is also expected to be useful as a tool in practical data management in the wider ecological research community.

Having produced a structured set of metadata allows for the automation of some aspects of data management. For example, we have developed a proof-of-concept prototype application that performs Quality Assurance analysis on ecological data soley based upon EML metadata provided to describe the data set. Information on the prototype can be found in the "Products" list in the right hand column of this page.

Having worked with the structured metadata in our quality assurance prototype application, we have discovered several important areas for improvement in the EML metadata standard. Consequently, we are actively working on a revision of the standard which incorporates these improvements. In addition, we are actively participating in the National Biological Information Infrastructure's (NBII) work on a metadata content standard for environmental biology that is compatible with the geospatial data communities principal metadata standard, the FGDC Content Standard for Digital Geospatial Metadata. See the paper referenced above and the link to NBII below for more information.

See also

  • Michener et al. 1997. Ecological Applications. February.
  • ESA FLED Committee. Report Volume I.
  • National Biological Information Infrastructure Metadata (NBII). Draft available from the Biological Working Group of the FGDC.

Software Products

Publications

The work was wrtten up and presented at the Third IEEE Computer Society Metadata Conference. The citations are:

Nottrott, R., M. B. Jones, and M. Schildhauer. 1999. Using XML-structured metadata to automate quality assurance processing for ecological data. Proceedings of the Third IEEE Computer Society Metadata Conference. Bethesda, MD. April 6-7, 1999.

Frondorf, A., M. B. Jones, and S. Stitt. 1999. Linking the FGDC Geospatial Metadata Content Standard to the Biological/Ecological Sciences Proceedings of the Third IEEE Computer Society Metadata Conference. Bethesda, MD. April 6-7, 1999.