NCEAS Project 10921

Machine learning for the environment

  • John M. Drake
  • William T. Langford

ActivityDatesFurther Information
Working Group2nd—6th October 2006Participant List  
Working Group2nd—10th June 2007Participant List  
Working Group20th—24th October 2008Participant List  

Abstract
We believe that environmental science, ecology, and conservation biology would be greatly enriched by expanding the ecologist¿s analytical toolbox to include machine learning (ML) approaches to data analysis. We use the term ML loosely to distinguish between parametric statistics and a variety of new, computational methods for recognizing and analyzing patterns in data. Generally, parametric methods assume highly restrictive theoretical properties of data, such as additivity, linearity, independence, and distribution (e.g., normality). Ecological data, by contrast, represent highly complex systems and commonly violate these assumptions [1-3]. Unfortunately, failure to appreciate these subtleties of ecological data often results in misguided analysis and incomplete or incorrect conclusions. In recent years, ML researchers have developed techniques for analyzing data not suited to parametric statistics. Older machine learning algorithms include neural networks and decision trees. Now, newer techniques like boosting and kernel methods (e.g., support vector machines), provide new opportunities for extracting subtle patterns from complex data, while hybrid methods integrate parametric models and ML to exploit computation and hard-won biological understanding simultaneously. Despite successes elsewhere (e.g., bioinformatics, astrophysics) ML has not been widely adopted by ecologists. Complex situations that might be addressed with ML include identifying optimal policies for managing ecological systems under uncertainty, forecasting, nonlinear modeling, and scientific inference with non-independent data. Accommodating these scientific and statistical difficulties within parametric statistics ranges from cumbersome to impossible. Therefore, we propose a working group to identify obstacles, scope out promising research, produce case studies, and develop a book length tutorial for ecologists on the practical application of ML.

TypeProducts of NCEAS Research
Journal Article Buston, Peter M.; Elith, Jane. 2011. Determinants of reproductive success in dominant pairs of clownfish: A boosted regression tree analysis. Journal of Animal Ecology. Vol: 80. Pages 528-538. (Online version)
Journal Article Keller, Reuben P.; Kocev, Dragi; Dzeroski, Saso. 2012. Trait-based risk assessment for invasive species: High performance across diverse taxonomic groups, geographic ranges and machine learning/statistical tools. Diversity and Distributions. Vol: 17(3). Pages 451-461. (Online version)
Journal Article Wilson, Erin E.; Wolkovich, Elizabeth M. 2011. Scavenging: How carnivores and carrion structure communities. Trends in Ecology and Evolution. Vol: 26(3). Pages 129-135. (Online version)