About Scientific Computing at NCEAS

At NCEAS, we regard modern ecology as a synthetic, integrative, and collaborative science. This means much more than simply communicating results via publications and presentations. The underlying data and analyses are themselves valuable products of scientific investigation, and their re-use and synthesis frequently opens novel avenues of research. We recognize that this new paradigm places several unfamiliar demands on ecologists, many of whom have previously worked only with their own datasets. First, our scientists increasingly require informatics and analytical approaches that can effectively handle larger-scale, longer-term, and thematically diverse information as inputs. Second, scientists must be prepared to use computer applications that generate reusable workflows (e.g., analytical procedures) and replicable results that are well-documented. This is very different from the antiquated model of describing methods only with terse text in publications, and archiving quantitative output only in the form of summary tables and figures. NCEAS provides both tools and training to facilitate this transition, helping ecologists not only to use existing technologies more effectively, but also to adopt new technologies where relevant. Our experiences have demonstrated that exposure to these methods can increase the productivity of individual ecologists who use them, allow NCEAS working groups to collaborate far more productively, and ultimately preserve the scientific value of all analytical and data products.

Our scientific computing services can be roughly broken into two major categories, Analytics and Informatics. Our expert analysts are always available for consultation on these and related topics, and will work with you to solve quantitative problems and overcome computational challenges you encounter while conducting research at NCEAS.

Analytics

Analytics includes statistical procedures, computational algorithms, numerical models, and other methods for framing and solving quantitative problems. Whenever possible, we actively seek and strongly advocate computational approaches that (1) can be easily shared among, and repeated by, collaborating scientists who may have different operating systems and limited access to software; (2) are transparent, reliable, and verifiable; (3) have reusable components that future researchers can implement to solve similar computational challenges; and (4) can be easily scaled to handle arbitrarily large problems of similar design. The analytical software options available at NCEAS follow directly from these considerations. Although occasionally providing specialty programs (upon request) that do not meet all of these criteria, we have otherwise carefully assembled a powerful lineup of scripted, cross-platform, scalable applications that are well-supported, generate robust numerical results, and permit batch processing. Although these packages require an initial learning investment, and may seem intimidating to scientists familiar with only "point-and-click" software, we strongly argue that the long-term payoff is significant. We also favor applications that are open-source, allowing researchers to inspect, customize, and ultimately better understand the underlying algorithms and procedures. Finally, although we do support several large, proprietary software applications, we seek solutions that are low-cost and lightweight whenever this can be done without compromising the analysis; this approach maximizes sharing among working group collaborators who may have limited computing resources at their home institutions.

  • General quantitative analysis: For virtually any type of statistical analysis, we recommend and support R, which is open-source, cost free, highly flexible, widely used by academic scientists and statisticians, and supported by a remarkably extensive library of community-developed functions. However, SAS may be a useful alternative for researchers who have legacy code or who need to run procedures specific to the SAS system. We also support Matlab, which is often the preferred programming environment for ecologists developing simulation models and implementing other kinds of numerical analysis. All three of these applications are installed by default on NCEAS workstations and servers. Although the vast majority of analytical needs of NCEAS scientists are met by either R, Matlab, or SAS, we can help you to develop custom solutions in other cases. If your computational demands exceed the capabilities of these applications and require finer-grained control of processing tasks and memory addressing, we can assist you with programming in lower-level languages such as C, C++, and Fortran. Similarly, we can help guide implementation of analyses and other processing tasks using general high level languages such as Python and Perl as needed, particularly in cases where the facilities of the language and its libraries are particularly well-suited to the job.

  • Spatial analysis and mapping: For visualization of multi-layer maps and advanced GIS functionality, we offer the open source applications Quantum GIS, PostGIS, and GRASS GIS; we also maintain licenses for the proprietary ESRI ArcGIS. That said, we have found that ecologists often need to process and analyze spatial data in ways that do not actually require a full GIS environment. In such cases, it may be preferable to use the relevant spatial functions available in more general analytical environments like R, or to use lower-level tools such as the GDAL/OGR utilities. This "non-GIS" approach to geospatial processing can often deliver results much more rapidly and with fewer data manipulations. If your project involves any mapping, spatial analysis, or geographic data manipulation, first check in with the scientific computing staff to identify optimal approaches that fit easily into your overall workflow.

  • High Performance Computing: Although many of our scientists find that their desktop and laptop computers are powerful enough to meet their needs, we also provide access to high performance computing resources. Our currently fastest compute server (eos) boasts 32 powerful processing cores that share 256 GB of memory; this server thus provides an option for running parallelized code, and can also be very useful for any computational task that is especially processor- and/or memory-limited. Our computing experts can work with you to determine which solution is most appropriate, and help to get you up and running.

Ecoinformatics

Informatics provides both a strategic framework and specific tools for acquiring, handling, interpreting, and storing data in a useful and efficient manner. Because of the extremely heterogeneous nature of ecological data, NCEAS scientists often face special informatics challenges. In addition to providing advice and ready-to-use tools for managing data, NCEAS is also a leader of novel research in this area. For more details, read about our Ecoinformatics Program .

  • Data management and sharing: In most cases, our visiting and resident scientists know best where to find relevant data. However, we can often provide pointers to important base layers and other "generic" datasets that can augment analysis. More importantly, we can advise you on appropriate strategies for preparing and manipulating your data. We are especially attuned to the needs of working groups, who can benefit from data management solutions that support remote access, multiple users, and version control. Although simple file-based solutions may be sufficient to meet the needs of many NCEAS scientists, we also offer access to PostgreSQL and MySQL, two powerful open-source database management systems installed on our servers.

  • Data documentation and archiving: Thoroughly describing your datasets and storing them in a safe, accessible repository will preserve their long-term scientific value to you, your collaborators, and the research community as a whole. In order to achieve this end, our data policy requires all visiting and resident scientists to document and register their NCEAS-related data products with the Knowledge Network for Biocomplexity. We provide a simple web interface that allows you to create and submit essential metadata, and also offer a free desktop application (Morpho) that features advanced documentation, data management, and querying capabilities. Additionally, our scientific computing staff can provide advice and training on the use of these tools.