Skip to main content

National Center for Ecological Analysis and Synthesis

An idealistic young graduate student reads Aaron Ellison's ecoessay, and decides to document, electronically archive, and make publicly available the excellent data from her doctoral research. She reads Michener et al. (1997, Ecological Applications 7:330-342), determines which metadata are needed for her data, writes two detailed data publications, publishes her data on the web, and makes sure that there are links to it from all of the relevant places. Her third year of fieldwork suffers (or perhaps she elects to spend an extra year and a half in graduate school). She only has time to write one conventional publication; she will get to the others later. By the time she gets settled into her postdoc, however, a prominent ecologist has analyzed her data, repeated most of the work she has not yet written up, and published the work under his own name (with a suitable acknowledgment, of course) in a major journal. Our young idealist applies for jobs, but the Neanderthals on the search committees place no value on her data publications: "Why, she only has one real publication! REJECT!" Disillusioned, our young idealist leaves science forever.

 

THE COSTS OF DATA SHARING?

The above scenario, while unlikely, is certainly plausible in the current environment. But a number of institutional changes that we see as nearly inevitable (for all the reasons described by Ellison) will make the archiving of data almost cost-free to the individual scientist.

  1. Some organization, such as ESA, will set up and maintain a data archive site, and bear the costs of indexing, advertising, and controlling access. The individual researcher need only submit the data in the appropriate format.
  2. Following up on Michener et al. (1997), an ESA committee will develop a set of "metadata forms" which lay out the set of information required for almost any study. Some fields will be simple (longitude); others will resemble the methods section of the paper. This form-based approach will replace the notion of a "data paper."
  3. We will start doing "good science." If we document the metadata (using the metadata forms) as we are developing and executing the study, it will require little extra effort and will provide us with a valuable record of what we have done when we have to revise the paper a year later.
  4. To prevent "scooping," only the metadata will be publicly searchable in the archive. The data themselves (for a fixed time after initial deposition or until released by the author) will only be released with approval from the author, and for a specific use.

Thus, archiving a new study will be almost cost free for the individual, and will require little extra time. We will be able to write just as many conventional publications as before (so the Neanderthals don't matter). Archiving old data will be costly, however (imagine writing the metadata for your master's thesis!), so we will want to back-archive only our very best data.

 

THE BENEFITS OF DATA SHARING?

In his essay, Ellison focuses on the many benefits of data sharing to the science as a whole. In some sense, it is our duty to share our data with the community of ecologists. But as ecologists, we know that pure altruism is rare. How might sharing data benefit the individual scientist directly?

  1. The metadata forms will encourage us to do good science. Not only will we better document our research, but also the greater structure in the planning phase of the project will lead to a better overall study.
  2. Knowing that other people will see our data and repeat our analyses will make us more careful in proofing our data and analysis results.
  3. When we return to our data with a new idea three years later it will take us only hours, instead of days or weeks, to reconstruct what we did.
  4. By advertising data in a searchable format, the metadata archive will lead to more innovative collaborations. The fact that access to data is controlled will result in the original researcher appearing as an author on many new papers, rather than just in the acknowledgments.

 

DATA SHARING WILL HAPPEN!

Once the appropriate institutional structures are in place, the net benefits to the individual will make data sharing inevitable, even in the absence of carrots and sticks from NSF. Only one major cost will remain. If her graduate advisor counts himself among the Neanderthals, our idealistic young student will have to endure his scorn and disapprobation . . . but grad students have been doing that for years!

 

The Postdocs at NCEAS are (in December '97) Bruce Kendall, Eric Seabloom, Fiorenza Micheli, Wendy Gram, Patty Debenham, Becky Burton, Kathy Cottingham, Ross Gerrard, Camille Parmesan, Gareth Russell, Mark Jeffries, and Ottar Bjornstad. They can be reached collectively at postdocs@nceas.ucsb.edu.