Environmental Data Science Research Projects

Our research in data science and informatics makes environmental data easier to locate, access, interpret, and analyze.

Often done in partnership with other scientific institutions, our data science research generates solutions for the following:

  • Storage and management of environmental data
  • Discovery and preparation of data for further analysis and synthesis
  • Automated machine processing of information and models
  • Making these capabilities available to practicing scientists

Current Projects


Data Observation Network for Earth (ONE) is a community-driven project that provides open and secure access to environmental data from multiple data centers, networks, and organizations from across the world, called member nodes. The data it preserves represent many disciplines, scales, and nations, and are available to scientists, decision-makers, educators, and the public for free.

DataONE is a collaboration between NCEAS, the University of New Mexico, the Oak Ridge National Laboratory, the California Digital LibraryNESCent, and a number of other organizations

Visit this project's website.

Arctic Data Center

The Arctic Data Center is an archive for scientific data and other research documents on the Arctic generated by projects funded by the National Science Foundation. It allows Arctic researchers to store and discover information about the entire research process, including software, workflows, and data provenance.

The Arctic Data Center also supports its users with data management tools, community support, and trainings. It is a collaboration between NCEAS, DataONE, and NOAA's National Centers for Environmental Information (NCEI) and funded by the National Science Foundation.

Visit this project's website.

ABC Tracker

ABC Tracker is a tool that enables scientists to track and study animal behavior by video. NCEAS is working in partnership with the project leads, who are based at the University of North Carolina, to help them archive the data generated by the tool and make that data accessible to other scientists.

Visit this project's website.


CodeMeta is an effort to standardize the exchange of software metadata across repositories and organizations through a common vocabulary and schema that will connect data coding services such as GitHub, figshare, and Zenodo. This work supports shareable and reproducible data and methods. 

Visit this project's website.

Community Dynamics Toolbox

Ecologists gather long-term data at multiple scales, necessitating tools that measure patterns and rates of change in plant and animal communities in response to the many factors that affect them. This toolbox makes analyses of ecological communities more accessible and usable, and is intended to minimize data preparation efforts and foster collaboration.

This project is gathering metrics of ecological dynamics into one toolbox that will allow ecologists to quantify how communities change over time. It is funded by the National Science Foundation and includes collaborators from University of New Mexico and University of Wisconsin-Madison’s Center for Limnology. 

Visit this project's website.

Data Provenance

Data provenance involves clarifying where data came from and how scientists have previously used them, which is critical for scientific reproducibility and data reuse.

Our research team is building cyberinfrastructure that will collect and produce information about data provenance to improve researchers’ capacity to share their data and the processes involved in creating them, called scientific workflows. The models and software this team is building will allow detailed descriptions of the journeys of environmental data, including their “retrospective” and “prospective” provenance, or their past and possible future uses in scientific workflows. 

Data Task Force

The Data Task Force is a team of data scientists at NCEAS focused on gathering and aligning data about Alaska’s salmon for the State of Alaska's Salmon and People project. Through in-depth search and rescue missions, the Task Force locates data from a wide variety of sources, which they then align to the same format, organize into datasets, and make them easily findable for future use. The overall goal is to improve the efficiency and productivity of the research process.

This project is funded by the Gordon and Betty Moore Foundation. Learn more.

EarthCube GeoLink

A growing collection of standard protocols, formats, and vocabularies, often characterized as the Semantic Web ("Web of Data"), offers a powerful approach for publishing research data online. The GeoLink project brings together experts from geoscience, computer science, and library science in an effort to develop Semantic Web components that support discovery and reuse of geoscience data and knowledge.

Participating repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards from many disciplines, ranging from marine geology to paleoclimate.

Visit this project's website.


The U.S. Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) is a repository for Earth and environmental science data - specifically, data obtained from observational, experimental, and modeling research that is funded by DOE’s Office of Science under its Subsurface Biogeochemical Research and Terrestrial Ecosystem Science programs within the Environmental Systems Science activity.

NCEAS data scientists helped build ESS-DIVE in collaboration with scientists from Lawrence Berkeley National Lab and National Energy Research Scientific Computing. The project is funded by the Data Management program within the Climate and Environmental Science Division under the DOE’s Office of Biological and Environmental Research program, and is maintained by the Lawrence Berkeley National Laboratory.

Visit this project's website.

Make Data Count

In recognition that research impact includes the generation of data, Making Data Count is an effort to collect usage and citation metrics for data objects and develop a service that collates and shares these metrics with the scientific community.

This project is working with the research community to develop a clear set of guidelines for defining data usage and create a central hub for data metrics, including the number of data views, downloads, citations, saves, and social media mentions.

Making Data Count is a partnership between NCEAS, DataONE, the California Digital Library, and DataCite and funded by the Alfred P. Sloan Foundation.

Visit this project's website.


MetaDIG provides quality analysis tools for researchers to assess metadata and data records against community recommendations.

A computational engine allows researchers to write discrete metadata checks in multiple languages, including R, Python, and Java – which can also operate on data available from the DataONE federation of data repositories – returning results in a standard format.

MetaDig supports multiple metadata dialects, including Ecological Metadata Language (EML), ISO 19115, and the Biological Data Profile, among others. This project is supported by the National Science Foundation.  

Visit this project's website.

Semantics for Scientific Measurements

Improving scientists’ ability to find, understand, and integrate data is necessary for synthesis and other large-scale analyses. As more data become shareable via Web-based platforms, these exchanges increasingly rely on aligned terms for describing scientific measurements and metadata.

This research team is creating tools and approaches for describing scientific measurements in standardized ways to optimize information exchange over the Web. This includes building controlled vocabularies, or semantics, of measurements that have been vetted by the ecological and environmental community. This work will facilitate more efficient discovery, interpretation, and reuse of environmental data by promoting greater clarity and consistency in descriptions of scientific measurements.

Whole Tale

This multi-institutional collaborative effort is helping researchers improve the reproducibility of their research. By offering tools and guidance, Whole Tale enables researchers to develop and share “living publications” that integrate data, code and scholarly articles.

Our collaborators include the University of Illinois at Urbana-Champaign, University of Chicago, University of Texas at Austin, and the University of Notre Dame.

Visit this project's website.

View Completed Projects