Creating powerful new scientific computing tools and methods for openly sharing data and code and enabling robust, reproducible science..
Engaging with resident scientists and visiting scientific research teams to help solve scientific analysis and computing challenges.
Providing visiting scientists with the computing and collaboration tools, network infrastructure, and onsite support required to access, analyze, and visualize the synthesis of large scientific datasets.
Helping scientists acquire advanced skills in data analysis and collaboration to effectively engage in synthesis science.
NCEAS’ Informatics Program has been a leader in creating new informatics research collaborations, bringing together the best science and technology minds to create new tools to enhance our ability to discover, access, integrate, and appropriately apply the growing body of ecological and other data. NCEAS team members are widely published on the latest in data management and informatics. Ultimately, these advances in informatics will have a large impact on our knowledge and understanding of ecosystems and our ability to apply that understanding to the world’s most pressing conservation and resource management issues.
Data sharing and access to technology are crucial elements in arriving at new frontiers in ecology and new solutions to environmental problems. The NCEAS Data Policy requires scientists to document and publish their datasets and code for robust, reproducible science. Many of NCEAS’ Informatics projects and tools are focused on facilitating data sharing, and all of the tools developed at NCEAS are freely available and open source. In addition, we maintain the public KNB Data Repository, which houses thousands of freely accessible datasets, generated at NCEAS and elsewhere.
Research and Development
Aiming to improve scientists' understanding of scientific data, scripting, and workflows.
Describing data and how it was measured
The NCEAS Informatics program is facilitating data discovery and understanding by supporting the standardization of metadata and methods across software products. The mission is to create software tools that scientists can use to annotate data using semantic terms from ontologies. With these efforts, scientists will be be able to better find data they need to conduct large-scale data analysis and integration by being able to precisely determine exactly what a piece of data represents.
Tracking how data was created, and by whom or what process
Provenance is information about activities involved in producing scientific data and other products. The NCEAS Informatics team is building cyberinfrastructre to collect and produce provenance about scientific data products to facilitate reproducible research and meet the increasing needs of open data sharing for scientific journals. The Informatics team is building a data model and software tools that can describe in detail the software, people, and input data that were used to create derived data at some point in the past ("retrospective provenance") as well as describe a scientific workflow that has yet to be executed ("prospective provenance").
Software and Tools
An R package that provides read/write access to data and metadata from the DataONE network of data repositories, including the KNB Data Repository, Dryad, LTER, and others. Member Nodes in DataONE are independent data repositories that have adopted DataONE services for interoperability, making each of the repositories accessible to client tools such as the DataONE R Client using a standard interface. The DataONE R Client can be used to access data files and to write new data and metadata files to repositories in the DataONE network.
A reliable, open-source scientific workflow system that enables scientists to design workflows and execute them efficiently. By using a workflow system, researchers can mix together analysis and modeling steps that use a wide variety of computing engines such as R, Matlab, and python. Kepler facilitates access to a broad range of ecologically relevant data that are housed in the KNB (Knowledge Network for Biocomplexity), while also providing a basis for sharing analyses through a growing library of executable components and workflows. Kepler also provides a demonstration of fully integrated access to sensor networks with the ability to include sensor data in workflows, monitor, inspect, and control sensor networks, and simulate the design of new sensor networks.
Enables users to create and manage EML metadata and to share those metadata, and their associated data, with others. It provides an easy-to-use, cross-platform application for accessing and manipulating metadata and data both locally and on the network through powerful connections with Metacat.
A flexible data and metadata repository that can store and version both data and its associated metadata using a wide variety of standards. Metacat has specialized features suitable for guaranteeing local autonomy and access control, while also affording the possibility of broad-scale replication and information sharing as a Member Node in the DataONE network. Metacat servers are used as the basis of the KNB and DataONE networks, as well as many other repositories around the world (PISCO, GulfWatch Alaska, Taiwan Ecological Research Network, and others).
A metadata specification for describing tabular (relational) data sets that are common in ecology and earth and environmental science. EML can be used in a modular and extensible manner to document ecological data, including a description of the purpose and contents of a data set, methods used to collect it, people responsible for the data, and details of how to interpret data tables properly.
A semantic model designed to accurately describe observational data in sufficient detail to enable logic-based machine reasoning to help scientists with common research tasks such as finding and merging data sets.
See all informatics software and tools
DataONE (Observation Network for Earth) is building cyberinfrastructure for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. Supported by the U.S. National Science Foundation, DataONE ensures preservation and access to multi-scale, multi-discipline, and multi-national science data. DataONE makes biological data available from the genome to the ecosystem; makes environmental data available from atmospheric, ecological, hydrological, and oceanographic sources; provides secure and long-term preservation and access; and engages scientists, land-managers, policy makers, students, educators, and the public. DataONE is a collaboration between NCEAS/UCSB, the University of New Mexico, the Oak Ridge National Laboratory, the California Digital Library, NESCent, and a number of other organizations. Funded in 2009 by the National Science Foundation - OCI-0830944.
It is increasingly difficult for researchers in the geosciences to locate relevant data for integrative analysis, due to the rapidly growing volume, variety, and complexity of data available. Yet, it is necessary to discover, access, and integrate data from multiple sources to generate robust, large-scale scientific insights. GeoLink will help to meet the challenges of geoscience research in an age of Big Data. GeoLink will advance the use of techniques in Linked Open Data and the Semantic Web to help confederate disparate earth science data resources, focused initially on oceanographic information archived in several major national data repositories.
GeoLink’s methodology will be flexible and easily extendable to new repositories and topics, while respecting and preserving the heterogeneous landscape of existing providers. GeoLink will enable standardized discovery of information resources across NSF-supported repositories such as the Integrated Earth Data Applications (IEDA), the Long-Term Ecological Research Network (LTER) via DataONE, Biological and Chemical Oceanography Data (BCO-DMO), Rolling Decks to Repositories (R2R), and the International Ocean Discovery Program (IODP). To achieve this goal, GeoLink will develop formal, generalized semantic descriptions of content from the above repositories, grounded in typical scientific use cases, and based on community-defined Ontology Design Patterns (ODP’s). Use of ODP’s should provide advantages over traditional data discovery and integration methodologies by exposing data in simple but consistent ways over the Web, using W3C-sanctioned languages (RDF/OWL). A Web portal will be developed to demonstrate integrated discovery functionality, while also serving to collect user feedback on performance and desired features.
Community Dynamics Toolbox analysis of long-term ecological dynamics using the Kepler Workflow System
As ecologists continue to gather long-term data at site, regional, continental, and global scales, there will be an increasing need for tools to measure the pattern and rate of change in plant and animal communities in response to multiple environmental drivers. The National Science Foundation has funded the NCEAS Informatics team and collaborators from University of New Mexico and University of Wisconsin’s Center for Limnology to gather together multiple metrics of ecological dynamics into one toolbox will provide ecologists with a new set of tools for quantifying how communities change over time. Our approach builds upon many recent informatics developments (EML, DataONE, LTER NIS, PASTA, Kepler) to advance ecological research. The toolbox will make community analysis more accessible, expose a variety of indices to wider use, and, with existing workflows, will help reduce data preparation efforts and foster unprecedented potential for collaboration.
The Exxon Valdez Oil Spill Trustee Council and state and federal agencies are supporting a five-year, $12 million long-term monitoring program in the Gulf of Alaska region affected by the 1989 Exxon Valdez oil spill. The monitoring program, called Gulf Watch Alaska, includes 25 principal scientists and seeks to provide data to identify and help understand the impacts of multiple ecosystem factors on the recovery of injured resources. It builds upon the past 23 years of restoration research and monitoring by the EVOSTC and federal and state agencies. Monitoring efforts will span a range of species and marine conditions, organized in three components, with integrated data management and synthesis of science information provided across the components. The program includes sites in Prince William Sound, lower Cook Inlet and the outer Kenai Peninsula coast. This program is expected to be 20 years in total length, but planned and funded in five-year increments. To facilitate a thorough understanding of the effects of the oil spill, NCEAS has focused on collating and documenting 25 years of historical data in preparation for synthesis and made available these data available for use by a wide array of technical and non-technical users. NCEAS will also convene two cross-cutting synthesis working groups to do a full-systems analysis of the effects of the 1989 oil spill on Prince William Sound and the state of recovery of the affected ecosystems.
The National Science Foundation has made a 5-year, $5.9 million award to a national partnership between NCEAS, DataONE, and NOAA's National Centers for Environmental Information (NCEI) to develop and curate the NSF Arctic Data Center, a new archive for Arctic scientific data as well as other related research documents. The NSF Arctic Data Center will provide the data storage, curation, and discovery features to support NSF's Arctic science community. The NSF Arctic Data Center will also be able to archive research products such as software, workflows, and provenance information about the entire research process. The new NSF Arctic Data Center interface will allow users to search for data from the extensive arctic data collection using filters, such as the name of data creator, year, identifier, taxa, location, and keywords. The discovery interface will also provide a map-based overview of the spatial distribution of data sets and allow users to zoom and pan to specific locations of interest, which will be helpful in locating historical data in specific regions. Authors will be able to seamlessly upload and share their data from their desktop, contributing associated metadata and assigning a Digital Object Identifier so that their data are easily citable. The NSF Arctic Data Center team will also continue to support data-management planning and access to Arctic data publications, in addition to user-support services.
See all past and present projects
Director of Informatics, Research & Development
I'm driven to create open science solutions to data and software challenges that have historically impeded large scale scientific synthesis. At NCEAS, I have focused on building a global data sharing infrastructure that led to the KNB Repository and the DataONE federation. Along the way I've built metadata standards, data management tools, and data analysis software. I have the pleasure of coordinating a great team focused on informatics research and development at NCEAS.
Director of Computing
I have been the Director of Computing at NCEAS since its inception in 1995. I oversee the creation and maintenance of NCEAS’ cyberinfrastructure and technology staff with a focus on the scientific computing needs of researchers at NCEAS. My technology research interests are primarily in the areas of informatics, the semantic web, scientific workflows, computer-supported collaborative research, and Open Science, all in the context of facilitating integrative environmental and conservation science.
I provide support in data processing, analysis and modeling to working groups of the Science for Nature And People Partnership (SNAPP). I am interested in crunching data to find novel understanding of our environment, with a particular interest in geospatial and temporal analysis. My background is in Eco-hydrology and Earth observation techniques (remote sensing and GIS). .
Projects Data Coordinator
As a Projects Data Coordinator, I provide support in data management and data processing to the working groups of the State of Alaska Salmon and People (SASAP). I am interested in using open data techniques to facilitate synthesis science among researchers who are asking questions that will inform environmental policy making. I was first introduced to data processing and data analysis through my academic background in physical oceanography, and am enjoying applying this foundation to more interdisciplinary ecology research.
As a data manager with a background in field ecology, I archive and curate data and metadata for the NSF Arctic Data Center and other repositories, ensuring public availability and digestibility for decades to come. I educate the Arctic research community on how and why to practice open-science.
As part of the scientific computing support staff I provide technical support in the broadest sense for our residents and working groups - desktops, networking, email, printing, support for our audio/video equipment, remote collaboration and electrical pencil sharpeners - to name some key areas. Being the first line of defense, an important part of my job is to make sure everybody from our short-term visitors to our long-term residents has the tools they need to be productive in their area. If I do not have the answer to a specific request, chances are I know who in our team does or would be equipped to address it.
I have worked on informatics projects for the last fifteen years, focusing on generic solutions to common data management needs in the earth and ecological sciences. Over the years I have built systems to document and archive data for regional and international consortia, stream data in real time from arrays of oceanographic sensors, and have been involved in standards development efforts. I am currently focusing on the software that underlies the DataONE federation. I try to handle computer systems in stride, despite their frequent tantrums, and do so from Colorado.
Science Software Engineer
I am interested in conducting, teaching, and developing software for open science. As a scientific software engineer at NCEAS, I am working to make open scientific data discoverable. My background is in computer science and biology and this combination offers me an applied perspective on the scientific software I develop. My primary biological interests are fisheries ecology and stock assessment, particularly statistical modeling and data analysis. My primary computing interests tend to support my biological interests and that means I do a lot of automation, testing, data mining, and data visualization. Making software for other people to use makes me happy.
I design and build software tools that aid scientists in managing and sharing data, data provenance and research. As the volume of data, derived data products and analysis results grows, the challenge of data sharing, collaboration and reproducibility grows as well. I'm committed to helping build the cyber-infrasture to meet these challenges, so investigators can spend more time answering important scientific questions and less time "sed", "grep" and "awking".
As a software engineer at NCEAS, I have worked on different environmental software projects, including Knowledge Network for Biocomplexity (KNB), Science Environment for Ecological Knowledge (SEEK), Kepler and Data Observation Network for Earth (DataONE). My focus is to develop robust, efficient and scalable software products from desktop applications to server repositories.
As a software designer and developer with an education in ecology and conservation, my mission is to create useful web applications that support and advance the research of environmental scientists.