Open Science for Synthesis - 2014

Software Skills Training for Early Career Scientists

July 21 - August 8, 2014
Santa Barbara, CA and Chapel Hill, NC

  NCEAS Logo
Sponsored by:
Institute for Sustainable Earth and
Environmental Software (ISEES)
Water Science Software Institute (WSSI)
Open Science for Synthesis is a unique bi-coastal training offered for early career scientists who want to learn new software and technology skills needed for open, collaborative, and reproducible synthesis research.
UC Santa Barbara’s National Center for Ecological Analysis and Synthesis (NCEAS) and University of North Carolina’s Renaissance Computing Institute (RENCI) co-lead Open Science for Synthesis (#OSS2014) as a three-week intensive training workshop with participants in both Santa Barbara, CA and Chapel Hill, NC from July 21 - August 8, 2014. The training was sponsored by the Institute for Sustainable Earth and Environmental Software (ISEES) and the Water Science Software Institute (WSSI), both of which are conceptualizing an institute for sustainable scientific software.
Participants received hands-on guided experience using best practices in the technical aspects that underlie successful open science and synthesis – from data discovery and integration to analysis and visualization, and special techniques for collaborative scientific research, including virtual collaboration over the Internet. A dynamic group of instructors provided for a mixture of instructive lectures, discussions forums, exercises, and real world application of skills to synthesis projects.


This Open Science training revolves around scientific computing and scientific software for reproducible science. Integrating statistical analysis into well-documented workflows is emphasized with the use of open-source, community-supported programming languages. Participants learn skills for rapid and robust implementation of open source scientific software. These approaches are explored and applied to ecological, environmental, evolutionary, Earth, and marine science synthesis.
The course weaves together several core themes which are reinforced –  and injected into the real-time synthetic scientific research process –  through daily work on group synthesis projects. Core training themes address:
  • Collaboration modes and technologies, virtual collaboration
  • Data management, preservation, and sharing
  • Data manipulation, integration, and exploration
  • Scientific workflows and reproducible research
  • Agile and sustainable software practices
  • Data analysis and modeling
  • Communicating results to broad communities
Throughout the course participants will receive a solid foundation in computing fundamentals for doing synthetic research in today’s computational- and data-intensive era. This includes:
  • Instruction on many aspects of R for data manipulation, analysis, and visualization
  • Survey of general programming constructs, paradigms, and best practices
  • Exposure to the Linux/UNIX command line environment and useful tools
  • Demystification of modern computers that have bearing on effective science
  • Discussion of cyberinfrastructure trends supporting open, networked, reproducible science

Group Synthesis Projects

Participants will form small synthesis teams that focus on utilizing the software skills they learn each day in the context of cross-cutting science research projects. Using an open community engagement process, participants can maximize their success in collaborative research and could potentially lead to publishable results.



Stanley C. Ahalt is director of the Renaissance Computing Institute (RENCI), professor of computer science at the University of North Carolina at Chapel Hill, and the head of the Biomedical Informatics Core for the North Carolina Translational and Clinical Sciences Institute. He is principal investigator for the Water Science Software Institute project, which seeks to build a cyberinfrastructure for managing, sharing and using water science data.

Nancy Baron is the Director of Science Outreach for COMPASS, an organization focused on communicating science. She is also the lead communication trainer for the Leopold Leadership Program based at Stanford University, USA. Nancy leads workshops for academic, government, and NGO scientists, helping them develop core competencies as scientist communicators who want to make their work more accessible and relevant to journalists, policy makers, and the public. For her work at the intersection of science and journalism, Nancy was awarded the 2013 Peter Benchley Ocean Award for Excellence in the Media.

Ben Bolker is a professor in the departments of Mathematics & Statistics and of Biology at McMaster University. His interests range widely in spatial, theoretical, mathematical, computational and statistical ecology, evolution and epidemiology; plant community, ecosystem, and epidemic dynamics.

Stephanie Hampton is the Director for the Center for Environmental Research, Education and Outreach (CEREO) and Professor at Washington State University. She is a freshwater ecologist whose research focuses primarily on discerning lake ecosystem dynamics through analysis of long-term ecological data. She is actively engaged in fostering skills for a vibrant community around data-intensive research as a co-PI on DataONE and in her previous role as Deputy Director at NCEAS.

Jefferson Heard is an expert in data mining, visualization, and mapping with strong roots in the Open Source and entrepreneurial communities. He is a Senior Research Software Developer at RENCI and the founder & CEO of TerraHub LLC, a startup incubated at the Carolina LaunchPad and Launch Chapel Hill focused on platforms as a service (PaaS) for mapping and geospatial data mining. He is currently the lead software architect of Hydroshare at RENCI, a multi-university collaboration on Big Data for hydrology.

Matthew B. Jones is the Director of Informatics Research and Development at NCEAS and PI for the Institute for Sustainable Earth and Environmental Software (ISEES) project. His environmental informatics research focuses on the management, integration, analysis, and modeling of heterogeneous data. He co-founded the DataONE federated data repository network, the Kepler open source scientific workflow system, and the Ecological Metadata Language project.

Chris Lenhardt is RENCI’s Domain Scientist for Environmental Data Sciences and Systems. Prior to RENCI, His work ranges from helping to create knowledge management frameworks for science data and information to studying the implications of emerging technologies. Lenhardt is active in the Federation for Earth Science Information Partners (ESIP), having served in various leadership capacities; he also contributes to the digital preservation committee, the physical samples and digital data cluster. He holds an M.A. in Political Science and an M.Sc. in International Relations.

Karthik Ram is a quantitative ecologist at the Berkeley Initiative for Global Change Biology (BigCB) at UC Berkeley. He is broadly interested in the structure and dynamics of food webs in terrestrial systems, from subterranean insect food webs in California coasts to large mammal system in the Rockies. Karthik currently works full time on the rOpenSci project, a collaborative effort aimed at improving access to scientific data.

Stacy Rebich Hespanha is a research associate at NCEAS. She earned a PhD in Geography with an emphasis in Cognitive Science at UC Santa Barbara. Her research interests span the fields of environmental communication, education, and 'big data' for social science. Stacy specializes in data visualization, computational text analysis, news media analysis, community engagement, and evaluation and assessment.

Mark Schildhauer is Director of Computing at NCEAS. His research interests include informatics, the semantic web, and scientific workflows, with a focus on environmental science. Schildhauer and colleagues developed the extensible observation ontology, OBOE, and a semantic annotation architecture that improves data discovery and re-use. He helped develop Ecological Metadata Language, is a co-founder of the Kepler scientific workflow project, and led the SEEK Knowledge Representation group.

Michael Stealey is a Senior Research Software Developer at the Renaissance Computing Institute (RENCI). While at RENCI his work has covered a range of topics including engagement with CUAHSI-HIS on environmental database modeling, ODMTools, and others. Prior to joining RENCI Michael was with the Center for Embedded Networked Sensing (CENS) at UCLA working on sensor platforms and autonomous sensing techniques.

Greg Wilson is the creator of Software Carpentry, a crash course in computing skills for scientists and engineers.  He has worked for 25 years in high-performance computing, data visualization, computer security, and academia, and is the author or editor of several books on computing (including the 2008 Jolt Award winner "Beautiful Code") and two for children. Greg received a Ph.D. in Computer Science from the University of Edinburgh in 1993, and presently runs the Software Carpentry project for the Mozilla Foundation.



Both NCEAS and RENCI are the recipients of NSF Software Infrastructure for Sustained Innovation awards. Their respective programs - the Institute for Sustainable Earth and Environmental Software and the Water Science Software Institute - are both designing an institute for sustainable scientific software that has a major emphasis on training in research software and computing. In addition, Software Carpentry was founded in 1998 with the goal to teach science and engineering lab skills for scientific computing, and brings a wealth of experience in teaching these skills to researchers from a variety of disciplines. The Open Science for Synthesis workshop collaboration was born out of this shared mission and derived from NCEAS’ successful Summer Institute 2013