NCEAS' scientists present on data visualization and other topics at AGU 2013

NCEAS will be represented at the American Geophysical Union’s (AGU) annual fall meeting in San Francisco, California, December 9-13, by Mark Schildhauer, Director of Computing and Stacy Rebich Hespanha, NCEAS Postoctoral Associate. They will be participating in numerous sessions examining ways to enhance and accelerate science by improving the process of data discovery through text mining and visualizations, cultivating innovation in scientific software, and new data management training for scientists.  More information on the AGU annual meeting.

Tuesday, December 10 - IN22A. Enabling Better Science Through Improving Science Software Development Culture I
10:20 AM - 12:20 PM; 2020 (Moscone West)  

10: 35 - 10:50 AM   IN22A-02. ISEES: An Institute for Sustainable Software to Accelerate Environmental Science
Matthew B. Jones; Mark Schildhauer; Peter A. Fox

The Institute for Sustainable Earth and Environmental Software (ISEES) is being envisioned as a community-driven activity that can facilitate and galvanize activites around scientific software in an analogous way to synthesis centers such as NCEAS and NESCent that have stimulated massive advances in ecology and evolution. We will describe the results of six workshops (Science Drivers, Software Lifecycles, Software Components, Workforce Development and Training, Sustainability and Governance, and Community Engagement) that have been held in 2013 to envision such an institute. We will present community recommendations from these workshops and our strategic vision for how ISEES will address the technical issues in the software lifecycle, sustainability of the whole software ecosystem, and the critical issue of computational training for the scientific community.

Wednesday, December 11 - IN31C. Search, Discovery and Visual Representation of Scientific Data I Posters
8:00 AM - 12:20 PM; Hall A-C (Moscone South)

8:00 AM Poster Session - IN31C-1515. Visual Browsing of Earth and Environmental Science Topics to Enhance Data Discovery
Stacy Rebich Hespanha; Benjamin Adams; Mark Schildhauer

We will describe a method that couples text mining (topic modeling using Latent Dirichlet Allocation, or LDA) of repository metadata records with Self-Organizing Map (SOM) visualizations to enable potential data users to browse thematically across a number of data repositories. Visual thematic browsing capabilities help minimize the chances that someone searching for data will miss relevant data sets due to a mismatch between the user’s query and terms that appear in the object’s metadata. To augment the value of this visual, metadata-driven browsing approach, we also performed these text mining and visualization procedures on a corpus of over 121,000 abstracts from top Earth and environmental science journals. We then use the resulting broader and richer ‘map’ of Earth and environmental science drawn from these abstracts to better contextualize the repository metadata for visual browsing purposes. Finally, we demonstrate how visualization of this corpus of primary literature can serve as an entry point for discovery of ‘dark’ or ‘long-tail’ data that have not yet been made accessible through data repositories or other data-sharing means.

Friday, December 13 - IN52B. Semantically Enabling Annotation, Discovery, Access, and Integration of Scientific Data I
10:20 AM - 12:20 PM; 2020 (Moscone West)

11:35 - 11:50 AM - IN52B-06. Text Mining to Inform Construction of Earth and Environmental Science Ontologies
Mark Schildhauer; Benjamin Adams; Stacy Rebich Hespanha

We will discuss methods we have developed that utilize statistical topic modeling on a large corpus of Earth and environmental science articles, to expand coverage and disclose relationships among concepts in the Earth sciences. For our work we collected a corpus of over 121,000 abstracts from many of the top Earth and environmental science journals. We performed latent Dirichlet allocation topic modeling on this corpus to discover a set of latent topics, which consist of terms that commonly co-occur in abstracts. We match terms in the topics to concept labels in existing ontologies to reveal gaps, and we examine which terms are commonly associated in natural language discourse, to identify relationships that are important to formally model in ontologies.  Our text mining methodology uncovers significant gaps in the content of some popular existing ontologies, and we show how, through a workflow involving human interpretation of topic models, we can bootstrap ontologies to have much better coverage and richer semantics.

Friday, December 13 - ED53B. Managing Ecological Data for Effective Use and Reuse II Posters
1:40 PM - 6:00 PM; Hall A-C (Moscone South)

1:40 PM Poster Session - ED53B-0637. In Their Own Words: Researchers’ Stories of Challenges and Triumphs in Data Management and Sharing
Stacy Rebich Hespanha; Sarah M. Menz

We will report on the DataONE Data Stories project, which is focused on collecting researchers’ stories about conflicts and successes that they have encountered when managing data and making efforts to share or re-use data. We highlight the types of events and situations that commonly lead to conflict or obstacles for the researchers we have interviewed. We emphasize in particular those difficulties for which no adequate technical solutions currently exist, or for which technical solutions do not seem to be appropriate, in the hope that our analysis of these stories will stimulate dialog about the kinds of technical, social, and cultural solutions most needed to accelerate growth in better data management and sharing.

NCEAS News and Announcements

Subscribe to our newsletter to receive the latest research, calls for proposals, stories, and opportunities at NCEAS.
Posted on December 4, 2013