Projects undertaken by NCEAS staff and collaborators address key challenges including:
- Storage and management of environmental data
- Discovery and preparation of data for further analysis and synthesis
- Advanced automated machine processing of information and models
- Making these capabilities available to practicing scientists
DataONE (Observation Network for Earth) is building cyberinfrastructure for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. Supported by the U.S. National Science Foundation, DataONE will ensure preservation and access to multi-scale, multi-discipline, and multi-national science data. DataONE will transcend domain boundaries and make biological data available from the genome to the ecosystem; make environmental data available from atmospheric, ecological, hydrological, and oceanographic sources; provide secure and long-term preservation and access; and engage scientists, land-managers, policy makers, students, educators, and the public. DataONE is a collaboration between NCEAS/UCSB , the University of New Mexico , the Oak Ridge National Laboratory , the California Digital Library , NESCent , and a number of other organizations . Funded in 2009 by the National Science Foundation  - OCI-0830944.
Semantic Tools for Ecological Data Management (Semtools) 
The Semantic Tools for Data Management (Semtools) project is tackling critical issues in the management and use of heterogeneous scientific data. Existing approaches to managing data and associated metadata fail to adequately capture the semantics of the scientific process, thereby impeding the utility of those data for important scientific issues. Semtools will provide new capabilities for data access, discovery, integration, and visualization by developing software tools that utilize semantically annotated data and metadata. This project will create semantic enhancements to the EML  structured metadata format, the Morpho  metadata editing software, and the Metacat  metadata and data management system. Semtools is a collaboration between NCEAS , UC Davis , and the Santa Barbara Coastal LTER . Funded in 2008 by the National Science Foundation  - DBI-0743429.
Scientific Observations Network (SONet) 
Advances in environmental science increasingly depend on information from multiple disciplines to tackle broader and more complex questions about the natural world. Such advances, however, are hindered by data heterogeneity, which impedes the ability of researchers to discover, interpret, and integrate relevant data that have been collected by others. The Scientific Observations Network (SONet) will initiate a multi-disciplinary, community-driven effort to define and develop the necessary specifications and technologies to facilitate semantic interpretation and integration of observational data. The technological approaches will derive from recent advances in knowledge representation that have demonstrated great utility in enhancing scientific communication and data interoperability within the genomics community. This effort will constitute a community of experts consisting of environmental science researchers, computer scientists, and information managers, to develop open-source, standards-based approaches to the semantic modeling of observational data. Funded in 2008 by the National Science Foundation  - OCI-0753144.
Virtual Data Center (VDC) 
The scientific community needs reliable infrastructure that enables open, stable, persistent, robust, and secure access to well-described and logically organized biodiversity, ecological and environmental data. What is needed is a virtual distributed network of data centers that seamlessly supports discovery and user-friendly access to a broad array of data, metadata, and other digital products that are archived securely and permanently in multiple locations. Under this proposal we will design a Virtual Data Center (VDC) for biodiversity, ecological and environmental data—all founded on open standards and protocols for interoperability among existing and new data centers. The VDC project is a collaboration among the Long Term Ecological Research Network Office , NCEAS , the National Evolutionary Synthesis Center , the National Biological Information Infrastructure , Oak Ridge National Laboratory , and the University of Kansas Biodiversity Research Center . Funded in 2008 by the National Science Foundation  - OCI-0753138.
Kepler:  The Kepler Project's overall goal is to produce an open-source scientific workflow system that allows scientists to design scientific workflows and execute them efficiently using emerging Grid-based approaches to distributed computation. Kepler work at NCEAS was originally funded as part of the Science Environment for Ecological Knowledge (SEEK) . The transition from a research prototype to reliable software system has been funded under the Kepler/CORE project , a multi-institutional collaboration with UC Davis, NCEAS, and UC San Diego. The Kepler project has grown to become a cross-project collaboration with contributing members from Kepler/CORE , SEEK , SDM Center  , Ptolemy , GEON , and many others. Funded, in part, in 2002 by the National Science Foundation - Information Technology Research Program DBI-0225676 and in 2007 by the National Science Foundation - Office of Cyberinfrastructure OCI-0722079.
Knowledge Network for Biocomplexity (KNB) 
KNB is a network for data sharing that facilitates ecological and environmental research. KNB is a collaborative effort including ecologists and technologists. Partners include NCEAS, Long Term Ecological Research Network (LTER), San Diego Supercomputer Center (SDSC), and Texas Tech University (TTU). The goal of KNB is to enable efficient discovery, access, interpretation, integration, and analysis of complex ecological data from a highly distributed set of field stations, laboratories, research sites, and individual researchers. KNB software products include applications to describe, store, and query ecological data from a common framework. KNB produced a structured metadata format for ecological data (EML) , software to generate this format (Morpho) , and a robust metadata and data management system (Metacat)  that enables researchers to participate in a distributed global network of Data Repositories . Funded in 1999 by the National Science Foundation  - Knowledge & Distributed Intelligence Program DEB-0072909
Production Implementation of the Knowledge Network for Biocomplexity 
In this project, our goal is to refine the software tools and technology frameworks developed as part of the KNB research effort, so that these are highly usable by research scientists on a practical basis. Dedicated software engineers and metadata coordinators will optimize and assist in the use of KNB technologies, with the specific aim of promoting the use of the KNB as a rich information source for the ecological community. Populating the KNB has two components: 1) identifying and locating appropriate data, and 2) facilitating their inclusion in the KNB system. Raising the awareness of these tools within the ecological community will both make the system more useful, and imbue the community with the interest and skills needed to enhance the long-term value of data. This, in turn, should stimulate news ways of conducting ecological research. KNB technologies facilitate the development and support of data registries being used by a growing number of organizations. For example, there is currently a prototype operating on behalf of the Ecological Society of America  to allow authors to register data associated with their journal articles. Funded in 2003 by the Andrew W. Mellon Foundation
SEEK is a multi-institutional collaboration of ecologists, systematists, and computer scientists researching scientific workflow modeling with advanced semantics. The goals of SEEK are to make fundamental improvements in how researchers can 1) gain broad access to ecological data and information, 2) rapidly locate and utilize distributed computational services, and 3) employ powerful new methods for capturing, reproducing, and extending the analysis process itself. The SEEK approaches to data are compatible with the KNB technologies, but significantly extend these to incorporate data resources from the natural history museum and biodiversity science communities, as well as the geosciences and remote-sensing communities. Products include the scientific workflow application, Kepler , and the EcoGrid , a network of networks of ecologically-relevant data and analytical components. This project incorporates cutting-edge advances in semantic mediation and knowledge representation. SEEK  is a collaborative project of LTER, NCEAS, SDSC, the University of Kansas (Biodiversity Research Center) and the University of California, Davis. Funded in 2002 by the National Science Foundation - Information Technology Research program DBI-0225676
Ecological Metadata Language (EML):  EML is a metadata specification that can be used to comprehensively describe ecological data in terms of content, structure, and research context. EML is a formalization and extension of prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications). EML is defined and revised through an on-going community effort, particularly involving the participation of ecological research station information managers and other interested parties. A number of prominent research organizations, such as NCEAS, LTER, the UC Natural Reserve System, Kruger National Park, and OBFS, are expressing interest in or actively using EML as their interchange language and cataloguing standard for ecological metadata. EML was generated largely as a product of the Knowledge Network for Biocomplexity .
Real-time Environment for Analytical Processing (REAP):  The REAP project's goal is to extend the Kepler scientific workflow system to fully integrate access to sensor networks. New capabilities will include the ability to include sensor data in workflows, monitor, inspect and control sensor networks, and simulate the design of new sensor networks. REAP is a collaboration among NCEAS, San Diego Supercomputer Center, UC Davis, OPeNDAP, UC Los Angeles, and Oregon State University Funded in 2006 by the National Science Foundation - Cyberinfrastructure for Environmental Observatories (CEOP) Program
FIRST Project:  The Faculty Institutes for Reforming Science Teaching (FIRST) project is developing new metadata standards for assessment in ecological education to facilitate the exchange of educational assessment data. Participants will be developing means for semantically describing assessment instruments to allow comparison of different assessment techniques. NCEAS is participating as a subcontractor on this project lead by researchers at Michigan State University. Funded in 2006 by the National Science Foundation
Jalama: Capturing Data in the Field:  Jalama, developed jointly with scientists at the Marine Science Institute  at UCSB, investigated how rich metadata can be used to develop flexible, easy to use forms for data entry in lab and field environments for ecology. Research focused on clarifying how to automate the creation of effective user interfaces for data collection. Software products from the project target both desktop and handheld computers. Funded in 2002 by the National Science Foundation - Biological Databases & Informatics program DBI-0131178
UC Natural Reserve System Data Registry  (NRS): The University of California Natural Reserve System  contributes to the understanding and management of the Earth and its natural systems by supporting university-level teaching, research and public service at protected natural areas throughout California. NCEAS collaborates with the NRS in building an information management system that facilitates research and education in the UC NRS. One of the major projects is the UC NRS Data Registry  , which is based on the KNB technologies. Funded in 1999 by the University of California
VegBank  - US National Vegetation Classification: The VegBank online data repository is being developed to store vegetation data in support of the US National Vegetation Classification. The system is comprised of three components used to archive vegetation plots data, plant taxonomic data, and vegetation community data. Major programming efforts and technology infrastructure are located at NCEAS, in partnership with investigators at the University of North Carolina, and the Panel on Vegetation Classification, Ecological Society of America . Funded in 2000 and 2002 to the University of North Carolina by the National Science Foundation Biological Databases and Informatics Program DBI-0213794, DBI-9905838
Resource Discovery Initiative for Field Stations (RDIFS):  RDIFS Research Coordination Network (RCN) activities focus principally on enhancing the ecological informatics infrastructure for field biology and developing mechanisms for discovery of data and information resources that can facilitate research and education at North American biological field stations. These objectives are being accomplished through two integrated networking activities: (1) research that encompasses five inter-related resource discovery activities and (2) an intensive training component that provides field station personnel with a solid foundation in the computational and informatics skills that are critical for developing, archiving, managing, and communicating data and information resources. As part of the LTER-lead RDIFS effort, NCEAS has adapted tools from the KNB to create the Organization of Biological Field Stations Data Registry . LTER Network Office - funded in 2001 by the National Science Foundation - Research Collaboration Networks Program
NCEAS Data Repository  is a standards-based documentation of metadata and data from synthesis projects arising at NCEAS, based on KNB technologies.
LTER Data Catalog  is a collaboration with LTER Data Managers and the LTER Network Office to develop metadata standards and promote search capabilities for data and metadata. It is based largely on KNB technologies.
Interaction Web Database  provides web-based access and submissions of data concerning ecological interactions, particularly pollination/pollinator relationships.
Global Population Dynamics Database  is an extensive collection of time series data from plant and animal populations, hosted by the Center for Population Biology at Silwood Park, and co-developed with NCEAS, and the Department of Ecology and Evolution at the University of Tennessee.
Kruger National Park, South Africa  NCEAS collaborates with information managers and scientists at Kruger National Park to develop effective informatics solutions for data collected within the park, as well as for research purposes, decision-making, and public edification. The collaboration has a special focus on the confederation of data, and the development and deployment of Kepler workflow solutions for conservation management analyses. The project is based on KNB and SEEK technologies. This approach  is also being adopted at each of the 22 South African national parks (SANParks). SANParks Data Repository  Funded by the Andrew W. Mellon Foundation
Paleobiology Database  is a web-based resource of fossil information that includes 52,000 collection records and 511,889 taxonomic occurrences from 13,962 published references. The project is led by research scientist Dr. John Alroy, who is a former NCEAS Postdoctoral Associate.
Webs on the Web (WOW)  This project will develop the information technology needed to increase the quality, sophistication, and pedagogical accessibility of analyses and visualizations of ecological network data. The complexity of ecological network data is immense and therefore represents a challenging opportunity for software development targeting the ecological sciences. Funded in 2002 by the National Science Foundation - Biological Databases & Informatics Program
Metadata Editor: This project was an early effort designed to test the effectiveness of the then emerging Extensible Markup Language (XML) for representing ecological metadata. A prototype metadata editor was developed that allowed us to develop plans for the current Morpho  and Metacat  system developed under the KNB project. Funded by a 1997 National Science Foundation supplement to NCEAS
Postdoctoral Training in the Management of Environmental Information: NCEAS is involved with collaborators in developing new techniques for managing environmental information. For this project we have recruited postdoctoral researchers to work in three fundamental areas of informatics: Knowledge Representation, Taxonomic Nomenclature and Classification, and Informatics Training. The informatics training position has been developed in conjunction with the LTER Network Office, where the position is located. Through this project NCEAS is playing a pivotal role in the training of young scientists in the management and analysis of Environmental Information. Funded in 2003 by the Andrew W. Mellon Foundation