Scientific research at NCEAS relies on access to existing data on a broad variety of topics, and these data usually are difficult to locate, access, interpret, and analyze. NCEAS' Informatics Research facilitates synthetic research by developing and supporting products and tools that are broadly useful to the ecological research community.
Our work is conducted with a number of partner collaborators and impacts the way ecological research is conducted, especially relative to synthesis and collaboration, which depend so heavily on extending access to relevant data.
Projects undertaken by NCEAS staff and collaborators address key challenges including:
- Storage and management of environmental data
- Discovery and preparation of data for further analysis and synthesis
- Advanced automated machine processing of information and models
Making these capabilities available to practicing scientists
NCEAS and its partner collaborators are developing a strategic plan for a nationally supported Institute that would coordinate the development and sustainable support of innovative and interoperable scientific software tools that can transform science at the intersection of earth, environmental, and life sciences. The ISEES steering committee is facilitating a series of design workshops with the broader research community to create a vision and criteria for an Institute. ISEES will advance the state of science software by engaging earth and environmental research communities to address the software barriers that most impede grand challenge earth science. The envisioned institute will enable researchers to collaboratively address the entire software lifecycle, from product conceptualization, to requirements analysis, design, development, testing, deployment, long-term support, and decommissioning.
DataONE (Observation Network for Earth) is building cyberinfrastructure for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. Supported by the U.S. National Science Foundation, DataONE ensures preservation and access to multi-scale, multi-discipline, and multi-national science data. DataONE makes biological data available from the genome to the ecosystem; makes environmental data available from atmospheric, ecological, hydrological, and oceanographic sources; provides secure and long-term preservation and access; and engages scientists, land-managers, policy makers, students, educators, and the public. DataONE is a collaboration between NCEAS/UCSB, the University of New Mexico, the Oak Ridge National Laboratory, the California Digital Library, NESCent, and a number of other organizations. Funded in 2009 by the National Science Foundation - OCI-0830944.
Community Dynamics Toolbox analysis of long-term ecological dynamics using the Kepler Workflow System
As ecologists continue to gather long-term data at site, regional, continental, and global scales, there will be an increasing need for tools to measure the pattern and rate of change in plant and animal communities in response to multiple environmental drivers. The National Science Foundation has funded the NCEAS Informatics team and collaborators from University of New Mexico and University of Wisconsin’s Center for Limnology to gather together multiple metrics of ecological dynamics into one toolbox will provide ecologists with a new set of tools for quantifying how communities change over time. Our approach builds upon many recent informatics developments (EML, DataONE, LTER NIS, PASTA, Kepler) to advance ecological research. The toolbox will make community analysis more accessible, expose a variety of indices to wider use, and, with existing workflows, will help reduce data preparation efforts and foster unprecedented potential for collaboration.
The Exxon Valdez Oil Spill Trustee Council and state and federal agencies are supporting a five-year, $12 million long-term monitoring program in the Gulf of Alaska region affected by the 1989 Exxon Valdez oil spill. The monitoring program, called Gulf Watch Alaska, includes 25 principal scientists and seeks to provide data to identify and help understand the impacts of multiple ecosystem factors on the recovery of injured resources. It builds upon the past 23 years of restoration research and monitoring by the EVOSTC and federal and state agencies. Monitoring efforts will span a range of species and marine conditions, organized in three components, with integrated data management and synthesis of science information provided across the components. The program includes sites in Prince William Sound, lower Cook Inlet and the outer Kenai Peninsula coast. This program is expected to be 20 years in total length, but planned and funded in five-year increments. To facilitate a thorough understanding of the effects of the oil spill, NCEAS has focused on collating and documenting 25 years of historical data in preparation for synthesis and made available these data available for use by a wide array of technical and non-technical users. NCEAS will also convene two cross-cutting synthesis working groups to do a full-systems analysis of the effects of the 1989 oil spill on Prince William Sound and the state of recovery of the affected ecosystems.
The Semantic Tools for Data Management (Semtools) project is tackling critical issues in the management and use of heterogeneous scientific data. Existing approaches to managing data and associated metadata fail to adequately capture the semantics of the scientific process, thereby impeding the utility of those data for important scientific issues. Semtools will provide new capabilities for data access, discovery, integration, and visualization by developing software tools that utilize semantically annotated data and metadata. This project will create semantic enhancements to the EML structured metadata format, the Morpho metadata editing software, and the Metacat metadata and data management system. Semtools is a collaboration between NCEAS, UC Davis, and the Santa Barbara Coastal LTER. Funded in 2008 by the National Science Foundation - DBI-0743429.
Advances in environmental science increasingly depend on information from multiple disciplines to tackle broader and more complex questions about the natural world. Such advances, however, are hindered by data heterogeneity, which impedes the ability of researchers to discover, interpret, and integrate relevant data that have been collected by others. The Scientific Observations Network (SONet) will initiate a multi-disciplinary, community-driven effort to define and develop the necessary specifications and technologies to facilitate semantic interpretation and integration of observational data. The technological approaches will derive from recent advances in knowledge representation that have demonstrated great utility in enhancing scientific communication and data interoperability within the genomics community. A community of experts consisting of environmental science researchers, computer scientists, and information managers, will come together to develop open-source, standards-based approaches to the semantic modeling of observational data. Funded in 2008 by the National Science Foundation - OCI-0753144.
Completed Informatics Projects
Management and Analysis of Environmental Observatory Data Using the Kepler Scientific Workflow System
National initiatives such as the National Ecological Observatory Network (NEON) and the Ocean Observatories Initiative (OOI) have highlighted the need for improvements in cyberinfrastructure supporting environmental observatories. Although previous initiatives have focused on data acquisition and archiving, scientists also need cyberinfrastructure that supports integration of data acquired from different instruments, and modeling and analysis of archived and real-time data sources. Thus, this project produced extensions to the Kepler scientific workflow system that provide access to observatory data through systems such as OPeNDAP and sensor networks and that expose these data in workflows for analysis and modeling.
Digital Resource Discovery and Dynamic Learning Communities for a Changing Biology
The Ecological Society of America (ESA) requested that the NCEAS be a sub-award on their NSF NSDL program entitled “Digital Resource Discovery and Dynamic Learning Communities for a Changing Biology.” The overall goal of the project was to enhance discovery and use of digital library resources from the EcoEd Digital Library and other digital libraries under the BEN (BioSciEdNet) umbrella. ESA collaborated with the Cornell Lab of Ornithology Science Pipes project (NSF DUE-0734857) to achieve this goal in an undergraduate education context. Science Pipes now provides access to biodiversity data for students and teachers to create and share analyses and visualizations. NCEAS collaborated with the Cornell Lab of Ornithology, specifically Paul Allen, and with ESA to extend Science Pipes to provide access to exemplar ecology datasets, data templates, and models that illustrate core ecological concepts. In addition, components were added to Science Pipes to allow students to use these datasets and models in analyses and visualizations.
Science Environment for Ecological Knowledge (SEEK)
SEEK was multi-institutional collaboration of ecologists, systematists, and computer scientists researching scientific workflow modeling with advanced semantics. The goals of SEEK were to make fundamental improvements in how researchers can 1) gain broad access to ecological data and information, 2) rapidly locate and utilize distributed computational services, and 3) employ powerful new methods for capturing, reproducing, and extending the analysis process itself. The SEEK approaches to data were compatible with the KNB technologies, but significantly extended these to incorporate data resources from the natural history museum and biodiversity science communities, as well as the geosciences and remote-sensing communities. Products include the scientific workflow application, Kepler, and the EcoGrid, a network of networks of ecologically-relevant data and analytical components. This project incorporated cutting-edge advances in semantic mediation and knowledge representation. SEEK was a collaborative project of LTER, NCEAS, SDSC, the University of Kansas (Biodiversity Research Center) and the University of California, Davis. Funded in 2002 by the National Science Foundation - Information Technology Research program DBI-0225676.
Virtual Data Center (VDC)
The scientific community needs reliable infrastructure that enables open, stable, persistent, robust, and secure access to well-described and logically organized biodiversity, ecological and environmental data. What is needed is a virtual distributed network of data centers that seamlessly supports discovery and user-friendly access to a broad array of data, metadata, and other digital products that are archived securely and permanently in multiple locations. We designed a Virtual Data Center (VDC) for biodiversity, ecological and environmental data—all founded on open standards and protocols for interoperability among existing and new data centers. The VDC project was a collaboration among the Long Term Ecological Research Network Office, NCEAS, the National Evolutionary Synthesis Center, the National Biological Information Infrastructure, Oak Ridge National Laboratory, and the University of Kansas Biodiversity Research Center. Funded in 2008 by the National Science Foundation - OCI-0753138.
The Faculty Institutes for Reforming Science Teaching (FIRST) project developed new metadata standards for assessment in ecological education to facilitate the exchange of educational assessment data. Participants developed means for semantically describing assessment instruments to allow comparison of different assessment techniques. NCEAS participated as a subcontractor on this project lead by researchers at Michigan State University. Funded in 2006 by the National Science Foundation.
UC Natural Reserve System Data Registry (NRS)
The University of California Natural Reserve System contributes to the understanding and management of the Earth and its natural systems by supporting university-level teaching, research, and public service at protected natural areas throughout California. NCEAS collaborated with the NRS in building an information management system that facilitates research and education in the UC NRS. One of the major projects was the UC NRS Data Registry, which is based on the KNB technologies. Funded in 1999 by the University of California.
Resource Discovery Initiative for Field Stations (RDIFS)
RDIFS Research Coordination Network (RCN) focused principally on enhancing the ecological informatics infrastructure for field biology and developed mechanisms for discovery of data and information resources that help facilitate research and education at North American biological field stations. As part of the LTER-lead RDIFS effort, NCEAS has adapted tools from the KNB to create the Organization of Biological Field Stations Data Registry. LTER Network Office - funded in 2001 by the National Science Foundation - Research Collaboration Networks Program.
Kruger National Park, South Africa
NCEAS collaborated with information managers and scientists at Kruger National Park to develop effective informatics solutions for data collected within the park, as well as for research purposes, decision-making, and public edification. The collaboration focused especially on the confederation of data, and the development and deployment of Kepler workflow solutions for conservation management analyses. The project is based on KNB and SEEK technologies. This approach was adopted at each of the twenty-two South African national parks (SANParks). SANParks Data Repository. Funded by the Andrew W. Mellon Foundation.
Webs on the Web (WOW)
This project developed the information technology needed to increase the quality, sophistication, and pedagogical accessibility of analyses and visualizations of ecological network data. The complexity of ecological network data is immense and therefore represents a challenging opportunity for software development targeting the ecological sciences. Funded in 2002 by the National Science Foundation - Biological Databases & Informatics Program.
Knowledge Network for Biocomplexity (KNB)
KNB is a network for data sharing that facilitates ecological and environmental research. KNB is a collaborative effort including ecologists and technologists. Partners include NCEAS, Long Term Ecological Research Network (LTER), San Diego Supercomputer Center (SDSC), and Texas Tech University (TTU). The goal of KNB is to enable efficient discovery, access, interpretation, integration, and analysis of complex ecological data from a highly distributed set of field stations, laboratories, research sites, and individual researchers. KNB software products include applications to describe, store, and query ecological data from a common framework. KNB produced a structured metadata format for ecological data (EML), software to generate this format (Morpho), and a robust metadata and data management system (Metacat) that enables researchers to participate in a distributed global network of Data Repositories. Funded in 1999 by the National Science Foundation - Knowledge & Distributed Intelligence Program DEB-0072909
LTER Data Catalog was a collaboration with LTER Data Managers and the LTER Network Office to develop metadata standards and promote search capabilities for data and metadata. It is based largely on KNB technologies.