Machine learning for the environment
- Drake, John
- Langford, William
| Activity | Dates | Further Information |
|---|---|---|
| Working Group | 2nd—6th October 2006 | Participant List |
| Working Group | 2nd—10th June 2007 | Participant List |
| Working Group | 20th—24th October 2008 | Participant List |
Abstract
We believe that environmental science, ecology, and conservation biology would be greatly enriched by expanding the ecologist¿s analytical toolbox to include machine learning (ML) approaches to data analysis. We use the term ML loosely to distinguish between parametric statistics and a variety of new, computational methods for recognizing and analyzing patterns in data. Generally, parametric methods assume highly restrictive theoretical properties of data, such as additivity, linearity, independence, and distribution (e.g., normality). Ecological data, by contrast, represent highly complex systems and commonly violate these assumptions [1-3]. Unfortunately, failure to appreciate these subtleties of ecological data often results in misguided analysis and incomplete or incorrect conclusions. In recent years, ML researchers have developed techniques for analyzing data not suited to parametric statistics. Older machine learning algorithms include neural networks and decision trees. Now, newer techniques like boosting and kernel methods (e.g., support vector machines), provide new opportunities for extracting subtle patterns from complex data, while hybrid methods integrate parametric models and ML to exploit computation and hard-won biological understanding simultaneously. Despite successes elsewhere (e.g., bioinformatics, astrophysics) ML has not been widely adopted by ecologists. Complex situations that might be addressed with ML include identifying optimal policies for managing ecological systems under uncertainty, forecasting, nonlinear modeling, and scientific inference with non-independent data. Accommodating these scientific and statistical difficulties within parametric statistics ranges from cumbersome to impossible. Therefore, we propose a working group to identify obstacles, scope out promising research, produce case studies, and develop a book length tutorial for ecologists on the practical application of ML.



