Data Science Fellowship Application Information


NCEAS is seeking applications for the 2020 Data Science Fellows program. Our next session will begin as early as November 2019. This practicum-style program gives fellows the opportunity to gain practical knowledge and skills needed to manage national-scale data repositories.

Fellows will gain experience and mentorship in activities directly related to the research project undertaken. Example research and development opportunities are listed below, however we also encourage the development of new projects and collaborations within this fellowship.


Example project opportunities

Python data and metadata processing library

We have a comprehensive suite of R packages that allow us to process data and metadata.  We can make the tools more accessible to a broader community of scientists by converting useful functions from our current R packages (arcticdatautils, datamgmt, metajam, recordr) to Python. This library, or libraries, should be built using the pre-existing DataONE python library as a dependency.  This project will require familiarity with DataONE infrastructures and the current packages; before advancing into the development phase for the python package.

Quality Assessment of Arctic Data Center metadata and data 
The goal of this project is to incorporate the support team’s data and metadata quality standards into the automated Quality Reports accessed from specific landing pages in the UI. There is a need to add checks for coordination between data objects, system metadata, and metadata to the currently existing metadata checks. This includes renovation of existing checks of EML objects, reading and checking data objects against the EML, and system metadata evaluation across all objects. A sophisticated approach is required so that the UI display renders quickly in all browsers and OSs, and displays a clear breakdown of how scores are calculated. This project entails both front-end design and back-end development and provides ample opportunities to collaborate with the development and support teams.
Harmonization of data publishing and documentation R packages 

We currently have several R packages that have been developed organically through several rounds of development. The goal of this project is to consolidate our R packages in a coherent way for our users to interact with DataONE API. We will use a hierarchical approach with low level packages designed for advanced users and public facing packages for high level interactions. The packages within scope of the project are: R dataone, datapack (helpers, public facing), arcticdatautils (helpers, non-public facing), and datamgmt (helpers, non-public facing), recordr (prov) and metajam (download helper, public facing). There will also a need to organize interactions with external packages such as EML, Assembly line from EDI, rdryad and zenodo. There is a need to figure out how we can take the best parts of each of these, remove redundancy and put them into a coherent set of packages on CRAN.

Education modules for undergraduate instruction 
The Arctic Data Center supports the preservation and curation of research data from within the panArctic region. The Center also provides documentation and training on best practices for data management. In this project, the fellow will create instructional materials designed for university level students that can be taught as a component within established curricula. The materials will explore discovery, integration and use of Arctic research data and provide information on using the Arctic Data Center as part of the hands-on lesson. Other modules will explore the importance and creation of metadata as well as policies surrounding data preservation, use and reuse. Intended for use in classes focussed on geography, environmental science, archeology etc the modules will be augmented with thematic case studies drawn from data within the Arctic Data Center. Finally, the fellow will support the publication of these materials to the website, as part of the Skillbuilding Hub and promote them across appropriate research institutions and networks.
Data citation and reuse
The Arctic Data Center supports the preservation and curation of research data from within the panArctic region. All current NSF funded Arctic research metadata are deposited to the Arctic Data Center and the Center also includes a large number of historic and previously funded data sets. Many of these data have been cited in publications by the data authors and by other conducting related or synthesis research. This publication database needs to be updated and the citations linked to the data within the Center. This project will undertake a literature and text search approach to identifying relevant publications in peer-reviewed journals.  In addition, the Arctic Data Center and the KNB Data  Repository need a mechanism to track these data set citations and record them in a structured database as annotations that are accessible to the Metacat data management server.  Another component of this project would be to work with the software development team at NCEAS to design and implement such an extension, and utilize it to trigger updates to our DOI metadata that is sent to DataCite for managed data packages.
Data Science support of LTER synthesis working groups (data harmonization and analysis)
The LTER Network Office (LNO) fosters enhanced communication, collaboration, synthesis, training, and engagement across the LTER Network. To promote analysis and synthesis of LTER data, the LNO fund and support synthesis working groups. NCEAS computing team provides data science support to these scientists and helps them with any data challenges they might have. This is a unique opportunity to get hands-on training in data science, work with synthesis projects, and interact with researchers.  The tasks will include: Helping with data acquisition, processing, and analysis challenges; Wrangling data for heterogeneous ecological and climate datasets; Testing and setting up web-based tools to assist with scientific collaboration; Preserving scientific products by documenting and archiving scientific findings

The department is especially interested in candidates who can contribute to the diversity and excellence of the academic community through research, teaching and service.

The University of California is an Equal Opportunity/Affirmative Action Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability status, protected veteran status, or any other characteristic protected by law.