This practicum-style fellowship program gives early career researchers the opportunity to gain practical knowledge and skills that are needed to manage national-scale data repositories.
Through the program, fellows gain experience in one or more of the following activities:
- Solve data and software issues relating to environmental science, working closely with our data and informatics team
- Undertake research related to open data infrastructure and practice
- Conduct outreach and create learning materials designed to enhance awareness and understanding of reproducible research
Data Science Fellows are in residence at NCEAS for 8-12 months.
Fellows gain experience and mentorship in activities directly related to the research project undertaken. We also encourage fellows to develop new projects and collaborations.
- Programming skills, including skills in new data management tools and languages
- Exposure to the day-to-day activities of managing national-scale data repositories
- Data science research, development, and outreach experience with the NCEAS informatics team
- A deep understanding of data management and software for data systems
- Experience working with a team passionate about environmental data science
Projects Data Coordinator
Example Project Opportunities
We have a comprehensive suite of R packages that allow us to process data and metadata. We can make the tools more accessible to a broader community of scientists by converting useful functions from our current R packages (arcticdatautils, datamgmt, metajam, recordr) to Python. This library, or libraries, should be built using the pre-existing DataONE python library https://github.com/DataONEorg/d1_python as a dependency. This project will require familiarity with DataONE infrastructures and the current packages; before advancing into the development phase for the python package.
The goal of this project is to incorporate the Arctic Data Center support team’s data and metadata quality standards into the automated Quality Reports accessed from specific landing pages in the UI. There is a need to add checks for coordination between data objects, system metadata, and metadata to the currently existing metadata checks. This includes renovation of existing checks of EML objects, reading and checking data objects against the EML, and system metadata evaluation across all objects. A sophisticated approach is required so that the UI display renders quickly in all browsers and OSs, and displays a clear breakdown of how scores are calculated. This project entails both front-end design and back-end development and provides ample opportunities to collaborate with the development and support teams.
We currently have several R packages that have been developed organically through several rounds of development. The goal of this project is to consolidate our R packages in a coherent way for our users to interact with DataONE API. We will use a hierarchical approach with low level packages designed for advanced users and public facing packages for high level interactions. The packages within scope of the project are: R dataone, datapack (helpers, public facing), arcticdatautils (helpers, non-public facing), and datamgmt (helpers, non-public facing), recordr (prov) and metajam (download helper, public facing). There will also a need to organize interactions with external packages such as EML, Assembly line from EDI, rdryad and zenodo. There is a need to figure out how we can take the best parts of each of these, remove redundancy and put them into a coherent set of packages on CRAN.
The Arctic Data Center supports the preservation and curation of research data from within the panArctic region. The Center also provides documentation and training on best practices for data management. In this project, the fellow will create instructional materials designed for university level students that can be taught as a component within established curricula. The materials will explore discovery, integration and use of Arctic research data and provide information on using the Arctic Data Center as part of the hands-on lesson. Other modules will explore the importance and creation of metadata as well as policies surrounding data preservation, use and reuse. Intended for use in classes focussed on geography, environmental science, archeology etc the modules will be augmented with thematic case studies drawn from data within the Arctic Data Center. Finally, the fellow will support the publication of these materials to the website, as part of the Skillbuilding Hub and promote them across appropriate research institutions and networks.
The Arctic Data Center supports the preservation and curation of research data from within the panArctic region. All current NSF funded Arctic research metadata are deposited to the Arctic Data Center and the Center also includes a large number of historic and previously funded data sets. Many of these data have been cited in publications by the data authors and by other conducting related or synthesis research. This publication database needs to be updated and the citations linked to the data within the Center. This project will undertake a literature and text search approach to identifying relevant publications in peer-reviewed journals. In addition, the Arctic Data Center and the KNB Data Repository need a mechanism to track these data set citations and record them in a structured database as annotations that are accessible to the Metacat data management server. Another component of this project would be to work with the software development team at NCEAS to design and implement such an extension, and utilize it to trigger updates to our DOI metadata that is sent to DataCite for managed data packages.
The LTER Network Office (LNO) fosters enhanced communication, collaboration, synthesis, training, and engagement across the LTER Network. To promote analysis and synthesis of LTER data, the LNO fund and support synthesis working groups. NCEAS computing team provides data science support to these scientists and helps them with any data challenges they might have. This is a unique opportunity to get hands-on training in data science, work with synthesis projects, and interact with researchers. The tasks will include: Helping with data acquisition, processing, and analysis challenges; Wrangling data for heterogeneous ecological and climate datasets; Testing and setting up web-based tools to assist with scientific collaboration; Preserving scientific products by documenting and archiving scientific findings.
The Next Generation of Environmental Scientists are Data Scientists
Hear from four of our previous data science fellows about what they found valuable about their experiences.Read
The department is especially interested in candidates who can contribute to the diversity and excellence of the academic community through research, teaching and service.
The University of California is an Equal Opportunity/Affirmative Action Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability status, protected veteran status, or any other characteristic protected by law.