Research at NCEAS relies on access to existing data on a broad variety of topics, and these data usually are very difficult to locate, access, interpret, and analyze. Our approach to informatics uses extensive documentation about data (i.e., metadata) to overcome many of these data management challenges. Thus, most of our tools utilize metadata to effectively handle the complexity of ecological data sources.
Our emphasis on metadata-driven systems comes from an ongoing analysis of the capabilities of existing metadata standards that are relevant to ecology. Some of the standards we have examined include the Content Standard for Digital Geospatial Metadata (FGDC), the NBII Biological Data Profile (key sections of which we helped to author), the Dublin Core Metadata Element Set, and the Directory Interchange Format (DIF). While several of these standards are relevant, none comprehensively addressed the needs of ecology, particularly in terms of allowing automated data processing on heterogeneous ecological data sources. Thus, we believed there was a need for a new metadata specification with the following properties:
- Extensible at run-time
- Highly structured for machine parsing
- Supports automated data processing
- Supports non-geospatial data
- Incorporates important biological metadata (e.g., taxonomy)
- Practical adoption by the ecological community
We designed and developed Ecological Metadata Language (EML) to satisfy these design requirements and to be compatible with metadata recommendations originally developed by the Ecological Society of America's Future of Long-term Ecological Data (FLED) committee and associated efforts (Michener et al., 1997, Ecological Applications). EML is a flexible, highly structured metadata markup language that allows scientists to fully describe their data, while maintaining a light footprint for those scientists not yet ready to invest the time needed to more comprehensively preserve their data. An EML description provides all of the information found in other roughly comparable standards (e.g., Biological Data Profile) but also provides details about both the logical and physical structure of data to allow them to be machine processed by automated systems.
Building these automated systems has been the focus of many of our informatics research projects. Our projects span the whole scientific process, including automating data collection in the field using metadata-driven form generation (Jalama), desktop data and metadata editing (Morpho), data discovery on the network (Morpho), schema-independent data and metadata storage, search, and preservation (Metacat), and metadata-enabled analytical tools for capturing and executing ecological analyses and models as scientific workflows (Monarch, Kepler). Each of these tools relies on metadata, and moves us closer to our goal of an information management solution that allows synthetic re-use of data to flourish.
Advances in ecological informatics will enhance our ability to discover, access, integrate, and appropriately apply the growing body of ecological and other data that are needed to inform integrative environmental research. The informatics projects undertaken by NCEAS staff and collaborators address key challenges in the storage and management of environmental data registries and repositories, discovery and preparation of these data for further analysis and synthesis, advanced automated machine processing of information and models through knowledge representation and semantic mediation approaches, and powerful visualization and intuitive access of these capabilities to the practicing research scientist.