Ecoinformatics Research Projects

Projects undertaken by NCEAS staff and collaborators address key challenges including:

  • Storage and management of environmental data
  • Discovery and preparation of data for further analysis and synthesis
  • Advanced automated machine processing of information and models
  • Making these capabilities available to practicing scientists

Semantic Tools for Ecological Data Management (Semtools)
The Semantic Tools for Data Management (Semtools) project is tackling critical issues in the management and use of heterogeneous scientific data.  Existing approaches to managing data and associated metadata fail to adequately capture the semantics of the scientific process, thereby impeding the utility of those data for important scientific issues.  Semtools will provide new capabilities for data access, discovery, integration, and visualization by developing software tools that utilize semantically annotated data and metadata.  This project will create semantic enhancements to the EML structured metadata format, the Morpho metadata editing software, and the Metacat metadata and data management system. Semtools is a collaboration between NCEAS, UC Davis, and the Santa Barbara Coastal LTER. Funded in 2008 by the National Science Foundation - DBI-0743429.

Scientific Observations Network (SONet)
 Advances in environmental science increasingly depend on information from multiple disciplines to tackle broader and more complex questions about the natural world.  Such advances, however, are hindered by data heterogeneity, which impedes the ability of researchers to discover, interpret, and integrate relevant data that have been collected by others.  The Scientific Observations Network (SONet) will initiate a multi-disciplinary, community-driven effort to define and develop the necessary specifications and technologies to facilitate semantic interpretation and integration of observational data.  The technological approaches will derive from recent advances in knowledge representation that have demonstrated great utility in enhancing scientific communication and data interoperability within the genomics community.  This effort will constitute a community of experts consisting of environmental science researchers, computer scientists, and information managers, to develop open-source, standards-based approaches to the semantic modeling of observational data. Funded in 2008 by the National Science Foundation - OCI-0753144.

Virtual Data Center (VDC)
The scientific community needs reliable infrastructure that enables open, stable, persistent, robust, and secure access to well-described and logically organized biodiversity, ecological and environmental data. What is needed is a virtual distributed network of data centers that seamlessly supports discovery and user-friendly access to a broad array of data, metadata, and other digital products that are archived securely and permanently in multiple locations. Under this proposal we will design a Virtual Data Center (VDC) for biodiversity, ecological and environmental data—all founded on open standards and protocols for interoperability among existing and new data centers. The VDC project is a collaboration among the Long Term Ecological Research Network Office, NCEAS, the National Evolutionary Synthesis Center, the   National Biological Information Infrastructure, Oak Ridge National Laboratory, and the University of Kansas Biodiversity Research Center. Funded in 2008 by the National Science Foundation - OCI-0753138.

Kepler: The Kepler Project's overall goal is to produce an open-source scientific workflow system that allows scientists to design scientific workflows and execute them efficiently using emerging Grid-based approaches to distributed computation. Kepler work at NCEAS was originally funded as part of the Science Environment for Ecological Knowledge (SEEK). The transition from a research prototype to reliable software system has been funded under the Kepler/CORE project, a multi-institutional collaboration with UC Davis, NCEAS, and UC San Diego. The Kepler project has grown to become a cross-project collaboration with contributing members from Kepler/CORE, SEEK, SDM Center , Ptolemy, GEON, and many others. Funded, in part, in 2002 by the National Science Foundation - Information Technology Research Program DBI-0225676 and in 2007 by the National Science Foundation - Office of Cyberinfrastructure OCI-0722079.

Knowledge Network for Biocomplexity (KNB)
KNB is a network for data sharing that facilitates ecological and environmental research. KNB is a collaborative effort including ecologists and technologists. Partners include NCEAS, Long Term Ecological Research Network (LTER), San Diego Supercomputer Center (SDSC), and Texas Tech University (TTU). The goal of KNB is to enable efficient discovery, access, interpretation, integration, and analysis of complex ecological data from a highly distributed set of field stations, laboratories, research sites, and individual researchers. KNB software products include applications to describe, store, and query ecological data from a common framework. KNB produced a structured metadata format for ecological data (EML), software to generate this format (Morpho), and a robust metadata and data management system (Metacat) that enables researchers to participate in a distributed global network of Data Repositories. Funded in 1999 by the National Science Foundation - Knowledge & Distributed Intelligence Program DEB-0072909

Production Implementation of the Knowledge Network for Biocomplexity
In this project, our goal is to refine the software tools and technology frameworks developed as part of the KNB research effort, so that these are highly usable by research scientists on a practical basis. Dedicated software engineers and metadata coordinators will optimize and assist in the use of KNB technologies, with the specific aim of promoting the use of the KNB as a rich information source for the ecological community. Populating the KNB has two components: 1) identifying and locating appropriate data, and 2) facilitating their inclusion in the KNB system. Raising the awareness of these tools within the ecological community will both make the system more useful, and imbue the community with the interest and skills needed to enhance the long-term value of data. This, in turn, should stimulate news ways of conducting ecological research. KNB technologies facilitate the development and support of data registries being used by a growing number of organizations. For example, there is currently a prototype operating on behalf of the Ecological Society of America to allow authors to register data associated with their journal articles. Funded in 2003 by the Andrew W. Mellon Foundation 

<!--[if gte vml 1]> <![endif]-->Science Environment for Ecological Knowledge (SEEK)
SEEK is a multi-institutional collaboration of ecologists, systematists, and computer scientists researching scientific workflow modeling with advanced semantics.  The goals of SEEK are to make fundamental improvements in how researchers can 1) gain broad access to ecological data and information, 2) rapidly locate and utilize distributed computational services, and 3) employ powerful new methods for capturing, reproducing, and extending the analysis process itself. The SEEK approaches to data are compatible with the KNB technologies, but significantly extend these to incorporate data resources from the natural history museum and biodiversity science communities, as well as the geosciences and remote-sensing communities. Products include the scientific workflow application, Kepler, and the EcoGrid, a network of networks of ecologically-relevant data and analytical components. This project incorporates cutting-edge advances in semantic mediation and knowledge representation. SEEK is a collaborative project of LTER, NCEAS, SDSC, the University of Kansas (Biodiversity Research Center) and the University of California, Davis. Funded in 2002 by the National Science Foundation - Information Technology Research program DBI-0225676 

Ecological Metadata Language (EML): EML is a metadata specification that can be used to comprehensively describe ecological data in terms of content, structure, and research context. EML is a formalization and extension of prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications). EML is defined and revised through an on-going community effort, particularly involving the participation of ecological research station information managers and other interested parties. A number of prominent research organizations, such as NCEAS, LTER, the UC Natural Reserve System, Kruger National Park, and OBFS, are expressing interest in or actively using EML as their interchange language and cataloguing standard for ecological metadata. EML was generated largely as a product of the Knowledge Network for Biocomplexity.

Real-time Environment for Analytical Processing (REAP): The REAP project's goal is to extend the Kepler scientific workflow system to fully integrate access to sensor networks. New capabilities will include the ability to include sensor data in workflows, monitor, inspect and control sensor networks, and simulate the design of new sensor networks. REAP is a collaboration among NCEAS, San Diego Supercomputer Center, UC Davis, OPeNDAP, UC Los Angeles, and Oregon State University Funded in 2006 by the National Science Foundation - Cyberinfrastructure for Environmental Observatories (CEOP) Program 

FIRST Project: The Faculty Institutes for Reforming Science Teaching (FIRST) project is developing new metadata standards for assessment in ecological education to facilitate the exchange of educational assessment data. Participants will be developing means for semantically describing assessment instruments to allow comparison of different assessment techniques. NCEAS is participating as a subcontractor on this project lead by researchers at Michigan State University. Funded in 2006 by the National Science Foundation

Jalama: Capturing Data in the Field: Jalama, developed jointly with scientists at the Marine Science Institute at UCSB, investigated how rich metadata can be used to develop flexible, easy to use forms for data entry in lab and field environments for ecology. Research focused on clarifying how to automate the creation of effective user interfaces for data collection. Software products from the project target both desktop and handheld computers.  Funded in 2002 by the National Science Foundation - Biological Databases & Informatics program DBI-0131178

UC Natural Reserve System Data Registry (NRS): The University of California Natural Reserve System contributes to the understanding and management of the Earth and its natural systems by supporting university-level teaching, research and public service at protected natural areas throughout California. NCEAS collaborates with the NRS in building an information management system that facilitates research and education in the UC NRS. One of the major projects is the UC NRS Data Registry , which is based on the KNB technologies. Funded in 1999 by the University of California

VegBank - US National Vegetation Classification: The VegBank online data repository is being developed to store vegetation data in support of the US National Vegetation Classification. The system is comprised of three components used to archive vegetation plots data, plant taxonomic data, and vegetation community data. Major programming efforts and technology infrastructure are located at NCEAS, in partnership with investigators at the University of North Carolina, and the Panel on Vegetation Classification, Ecological Society of AmericaFunded in 2000 and 2002 to the University of North Carolina by the National Science Foundation Biological Databases and Informatics Program DBI-0213794, DBI-9905838 


Resource Discovery Initiative for Field Stations (RDIFS): RDIFS Research Coordination Network (RCN) activities focus principally on enhancing the ecological informatics infrastructure for field biology and developing mechanisms for discovery of data and information resources that can facilitate research and education at North American biological field stations. These objectives are being accomplished through two integrated networking activities: (1) research that encompasses five inter-related resource discovery activities and (2) an intensive training component that provides field station personnel with a solid foundation in the computational and informatics skills that are critical for developing, archiving, managing, and communicating data and information resources. As part of the LTER-lead RDIFS effort, NCEAS has adapted tools from the KNB to create the Organization of Biological Field Stations Data Registry. LTER Network Office - funded in 2001 by the National Science Foundation - Research Collaboration Networks Program
 
NCEAS Data Repository is a standards-based documentation of metadata and data from synthesis projects arising at NCEAS, based on KNB technologies.

LTER Data Catalog is a collaboration with LTER Data Managers and the LTER Network Office to develop metadata standards and promote search capabilities for data and metadata. It is based largely on KNB technologies.

Interaction Web Database provides web-based access and submissions of data concerning ecological interactions, particularly pollination/pollinator relationships.

Global Population Dynamics Database
is an extensive collection of time series data from plant and animal populations, hosted by the Center for Population Biology at Silwood Park, and co-developed with NCEAS, and the Department of Ecology and Evolution at the University of Tennessee.

Kruger National Park, South Africa  NCEAS collaborates with information managers and scientists at Kruger National Park to develop effective informatics solutions for data collected within the park, as well as for research purposes, decision-making, and public edification. The collaboration has a special focus on the confederation of data, and the development and deployment of Kepler workflow solutions for conservation management analyses.   The project is based on KNB and SEEK technologies. This approach is also being adopted at each of the 22 South African national parks (SANParks). SANParks Data Repository   Funded by the Andrew W. Mellon Foundation

Paleobiology Database is a web-based resource of fossil information that includes 52,000 collection records and 511,889 taxonomic occurrences from 13,962 published references.  The project is led by research scientist Dr. John Alroy, who is a former NCEAS Postdoctoral Associate.


Webs on the Web (WOW) This project will develop the information technology needed to increase the quality, sophistication, and pedagogical accessibility of analyses and visualizations of ecological network data. The complexity of ecological network data is immense and therefore represents a challenging opportunity for software development targeting the ecological sciences. Funded in 2002 by the National Science Foundation - Biological Databases & Informatics Program

Metadata Editor: This project was an early effort designed to test the effectiveness of the then emerging Extensible Markup Language (XML) for representing ecological metadata. A prototype metadata editor was developed that allowed us to develop plans for the current Morpho and Metacat system developed under the KNB project. Funded by a 1997 National Science Foundation supplement to NCEAS

<!--[if gte vml 1]> <![endif]-->

Postdoctoral Training in the Management of Environmental Information: NCEAS is involved with collaborators in developing new techniques for managing environmental information. For this project we have recruited postdoctoral researchers to work in three fundamental areas of informatics: Knowledge Representation, Taxonomic Nomenclature and Classification, and Informatics Training. The informatics training position has been developed in conjunction with the LTER Network Office, where the position is located. Through this project NCEAS is playing a pivotal role in the training of young scientists in the management and analysis of Environmental Information. Funded in 2003 by the Andrew W. Mellon Foundation