Completed Data Science Research Projects

Digital Resource Discovery and Dynamic Learning Communities for a Changing Biology

The Ecological Society of America (ESA) requested that the NCEAS be a sub-award on their NSF NSDL program entitled “Digital Resource Discovery and Dynamic Learning Communities for a Changing Biology.” The overall goal of the project was to enhance discovery and use of digital library resources from the EcoEd Digital Library and other digital libraries under the BEN (BioSciEdNet) umbrella. ESA collaborated with the Cornell Lab of Ornithology Science Pipes project (NSF DUE-0734857) to achieve this goal in an undergraduate education context. Science Pipes now provides access to biodiversity data for students and teachers to create and share analyses and visualizations. NCEAS collaborated with the Cornell Lab of Ornithology, specifically Paul Allen, and with ESA to extend Science Pipes to provide access to exemplar ecology datasets, data templates, and models that illustrate core ecological concepts. In addition, components were added to Science Pipes to allow students to use these datasets and models in analyses and visualizations.

FIRST Project

The Faculty Institutes for Reforming Science Teaching (FIRST) project developed new metadata standards for assessment in ecological education to facilitate the exchange of educational assessment data. Participants developed means for semantically describing assessment instruments to allow comparison of different assessment techniques. NCEAS participated as a subcontractor on this project lead by researchers at Michigan State UniversityFunded in 2006 by the National Science Foundation.

Gulf Watch Alaska

The Exxon Valdez Oil Spill Trustee Council and state and federal agencies are supporting a five-year, $12 million long-term monitoring program in the Gulf of Alaska region affected by the 1989 Exxon Valdez oil spill. The monitoring program, called Gulf Watch Alaska, includes 25 principal scientists and seeks to provide data to identify and help understand the impacts of multiple ecosystem factors on the recovery of injured resources. It builds upon the past 23 years of restoration research and monitoring by the EVOSTC and federal and state agencies. Monitoring efforts will span a range of species and marine conditions, organized in three components, with integrated data management and synthesis of science information provided across the components. The program includes sites in Prince William Sound, lower Cook Inlet and the outer Kenai Peninsula coast. This program is expected to be 20 years in total length, but planned and funded in five-year increments.

NCEAS helped facilitate a thorough understanding of the oil spill's effects by collating and documenting 25 years of historical data in preparation for synthesis and made available these data available for use by a wide array of technical and non-technical users. NCEAS also convened two synthesis working groups to do a full-systems analysis of the effects of the 1989 oil spill on Prince William Sound and the state of recovery of the affected ecosystems.

Institute for Sustainable Earth and Environmental Software (ISEES)

NCEAS and its partner collaborators developed a strategic plan for a nationally supported Institute that would coordinate the development and sustainable support of innovative and interoperable scientific software tools that can transform science at the intersection of earth, environmental, and life sciences. The ISEES steering committee facilitated a series of design workshops with the broader research community to create a vision and criteria for an Institute. Through ISEES, they seek to advance scientific software and enable researchers to address the entire software lifecycle collaboratively, from product conceptualization, to requirements analysis, design, development, testing, deployment, long-term support, and decommissioning.

Knowledge Network for Biocomplexity (KNB)

KNB is a network for data sharing that facilitates ecological and environmental research. KNB is a collaborative effort including ecologists and technologists. Partners include NCEAS, Long Term Ecological Research Network (LTER), San Diego Supercomputer Center (SDSC), and Texas Tech University (TTU). The goal of KNB is to enable efficient discovery, access, interpretation, integration, and analysis of complex ecological data from a highly distributed set of field stations, laboratories, research sites, and individual researchers. KNB software products include applications to describe, store, and query ecological data from a common framework. KNB produced a structured metadata format for ecological data (EML), software to generate this format (Morpho), and a robust metadata and data management system (Metacat) that enables researchers to participate in a distributed global network of Data RepositoriesFunded in 1999 by the National Science Foundation - Knowledge & Distributed Intelligence Program DEB-0072909

LTER Data Catalog

This was a collaboration with LTER Data Managers and the LTER Network Office to develop metadata standards and promote search capabilities for data and metadata. It is based largely on KNB technologies.

Management and Analysis of Environmental Observatory Data Using the Kepler Scientific Workflow System

National initiatives such as the National Ecological Observatory Network (NEON) and the Ocean Observatories Initiative (OOI) have highlighted the need for improvements in cyberinfrastructure supporting environmental observatories. Although previous initiatives have focused on data acquisition and archiving, scientists also need cyberinfrastructure that supports integration of data acquired from different instruments, and modeling and analysis of archived and real-time data sources. Thus, this project produced extensions to the Kepler scientific workflow system that provide access to observatory data through systems such as OPeNDAP and sensor networks and that expose these data in workflows for analysis and modeling.

NCEAS Data Repository

This is a standards-based documentation of metadata and data from synthesis projects arising at NCEAS, based on KNB technologies.

Resource Discovery Initiative for Field Stations (RDIFS)

RDIFS Research Coordination Network (RCN) focused principally on enhancing the ecological informatics infrastructure for field biology and developed mechanisms for discovery of data and information resources that help facilitate research and education at North American biological field stations. As part of the LTER-lead RDIFS effort, NCEAS has adapted tools from the KNB to create the Organization of Biological Field Stations Data RegistryLTER Network Office - funded in 2001 by the National Science Foundation - Research Collaboration Networks Program.

SANParks Data Repository

NCEAS collaborated with information managers and scientists at Kruger National Park to develop effective informatics solutions for data collected within the park, as well as for research purposes, decision-making, and public edification. The collaboration focused especially on the confederation of data, and the development and deployment of Kepler workflow solutions for conservation management analyses. The project is based on KNB and SEEK technologies. This approach was adopted at each of the twenty-two South African national parks (SANParks). SANParks Data Repository. Funded by the Andrew W. Mellon Foundation.

Science Environment for Ecological Knowledge (SEEK)

SEEK was multi-institutional collaboration of ecologists, systematists, and computer scientists researching scientific workflow modeling with advanced semantics.  The goals of SEEK were to make fundamental improvements in how researchers can 1) gain broad access to ecological data and information, 2) rapidly locate and utilize distributed computational services, and 3) employ powerful new methods for capturing, reproducing, and extending the analysis process itself. The SEEK approaches to data were compatible with the KNB technologies, but significantly extended these to incorporate data resources from the natural history museum and biodiversity science communities, as well as the geosciences and remote-sensing communities. Products include the scientific workflow application, Kepler, and the EcoGrid, a network of networks of ecologically-relevant data and analytical components. This project incorporated cutting-edge advances in semantic mediation and knowledge representation. SEEK was a collaborative project of LTER, NCEAS, SDSC, the University of Kansas (Biodiversity Research Center) and the University of California, Davis. Funded in 2002 by the National Science Foundation - Information Technology Research program DBI-0225676.

Scientific Observations Network (SONet)

Advances in environmental science increasingly depend on information from multiple disciplines to tackle broader and more complex questions about the natural world. Such advances, however, are hindered by data heterogeneity, which impedes the ability of researchers to discover, interpret, and integrate relevant data that have been collected by others. The Scientific Observations Network (SONet) initiated a multi-disciplinary, community-driven effort to define and develop the necessary specifications and technologies to facilitate semantic interpretation and integration of observational data. The technological approaches were derived from recent advances in knowledge representation that have demonstrated great utility in enhancing scientific communication and data interoperability within the genomics community. A community of experts consisting of environmental science researchers, computer scientists, and information managers, came together to develop open-source, standards-based approaches to the semantic modeling of observational data. Funded in 2008 by the National Science Foundation - OCI-0753144.

Semantic Tools for Ecological Data Management (Semtools)

The Semantic Tools for Data Management (Semtools) project tackled critical issues in the management and use of heterogeneous scientific data. Approaches to managing data and associated metadata often fail to adequately capture the semantics of the scientific process, thereby impeding the utility of those data for important scientific issues. Semtools provided new capabilities for data access, discovery, integration, and visualization by developing software tools that utilize semantically annotated data and metadata. This project created semantic enhancements to the EML structured metadata format, the Morpho metadata editing software, and the Metacat metadata and data management system. Semtools was a collaboration between NCEASUC Davis, and the Santa Barbara Coastal LTER. Funded in 2008 by the National Science Foundation - DBI-0743429.

Virtual Data Center (VDC)

The scientific community needs reliable infrastructure that enables open, stable, persistent, robust, and secure access to well-described and logically organized biodiversity, ecological and environmental data. What is needed is a virtual distributed network of data centers that seamlessly supports discovery and user-friendly access to a broad array of data, metadata, and other digital products that are archived securely and permanently in multiple locations. This project addressed these needs through the creation of the Virtual Data Center (VDC) for biodiversity, ecological and environmental data—all founded on open standards and protocols for interoperability among existing and new data centers. The VDC project was a collaboration among the Long Term Ecological Research Network Office, NCEAS, the National Evolutionary Synthesis Center, the National Biological Information Infrastructure, Oak Ridge National Laboratory, and the University of Kansas Biodiversity Research Center. Funded in 2008 by the National Science Foundation - OCI-0753138.

UC Natural Reserve System Data Registry

The University of California Natural Reserve System contributes to the understanding and management of the Earth and its natural systems by supporting university-level teaching, research, and public service at protected natural areas throughout California. NCEAS collaborated with the NRS in building an information management system that facilitates research and education in the UC NRS. One of the major projects was the UC NRS Data Registry, which is based on the KNB technologies. Funded in 1999 by the University of California.

Webs on the Web (WOW)

This project developed the information technology needed to increase the quality, sophistication, and pedagogical accessibility of analyses and visualizations of ecological network data. The complexity of ecological network data is immense and therefore represents a challenging opportunity for software development targeting the ecological sciences. Funded in 2002 by the National Science Foundation - Biological Databases & Informatics Program.