Skip to main content

National Center for Ecological Analysis and Synthesis

Project Description

We believe that environmental science, ecology, and conservation biology would be greatly enriched by expanding the ecologist's analytical toolbox to include machine learning (ML) approaches to data analysis. We use the term ML loosely to distinguish between parametric statistics and a variety of new, computational methods for recognizing and analyzing patterns in data. Generally, parametric methods assume highly restrictive theoretical properties of data, such as additivity, linearity, independence, and distribution (e.g., normality). Ecological data, by contrast, represent highly complex systems and commonly violate these assumptions [1-3]. Unfortunately, failure to appreciate these subtleties of ecological data often results in misguided analysis and incomplete or incorrect conclusions. In recent years, ML researchers have developed techniques for analyzing data not suited to parametric statistics. Older machine learning algorithms include neural networks and decision trees. Now, newer techniques like boosting and kernel methods (e.g., support vector machines), provide new opportunities for extracting subtle patterns from complex data, while hybrid methods integrate parametric models and ML to exploit computation and hard-won biological understanding simultaneously. Despite successes elsewhere (e.g., bioinformatics, astrophysics) ML has not been widely adopted by ecologists. Complex situations that might be addressed with ML include identifying optimal policies for managing ecological systems under uncertainty, forecasting, nonlinear modeling, and scientific inference with non-independent data. Accommodating these scientific and statistical difficulties within parametric statistics ranges from cumbersome to impossible. Therefore, we propose a working group to identify obstacles, scope out promising research, produce case studies, and develop a book length tutorial for ecologists on the practical application of ML.
Working Group Participants

Principal Investigator(s)

John M. Drake, William T. Langford

Project Dates

Start: June 1, 2006

completed

Participants

Peter M. Buston
Consejo Superior de Investigaciones Científicas (CSIC)
Rich Caruana
Cornell University
Jonathan M. Chase
Washington University in St. Louis
T. Jonathan Davies
University of California, Santa Barbara
Thomas G. Dietterich
Oregon State University
Andrew P. Dobson
Princeton University
John M. Drake
University of Georgia
Saso Dzeroski
Jozef Stefan Institute
Jane Elith
University of Melbourne
Cesare Furlanello
Istituto Trentino Di Cultura
Trevor Hastie
Unknown
Reuben P. Keller
University of Notre Dame
Andreas Krause
Carnegie Mellon University
William T. Langford
RMIT University
Dragos Margineantu
Unknown
Julian D. Olden
University of Washington
Gill Ward
Stanford University
Matt White
Arthur Rylah Institute for Environmental Research
Bianca Zadrozny
Universidade Federal Fluminense

Products

  1. Journal Article / 2011

    Determinants of reproductive success in dominant pairs of clownfish: A boosted regression tree analysis

  2. Journal Article / 2012

    Trait-based risk assessment for invasive species: High performance across diverse taxonomic groups, geographic ranges and machine learning/statistical tools

  3. Journal Article / 2011

    Scavenging: How carnivores and carrion structure communities

Are you part of a working group or visiting NCEAS for another opportunity? Check out our page of resources for you.

Learn More