A crisis is what led Chris Beltz to data science, well at least partly. Like many ecologists, he had gone into science with the hope of creating some information that could effect change, but midway through his PhD, when he realized proficiency with data management was going to be key to finishing his degree, he had a moment.
“I was really struggling with how to make my data reproducible in a meaningful and concrete way,” said Beltz, who is now in his last year of his PhD at Yale School of the Environment.
Fortunately, the opportunity to learn and apply data management skills outside of his own PhD research through the NCEAS Data Science Fellowship Program has since helped him see order – and possibility – in the complexity.
Beltz is among the latest crop of eight data science fellows that are wrapping up at NCEAS, the third cohort since the fellowship program’s inception in 2018. As part of the NCEAS Learning Hub, the fellowship is designed to be a skills accelerator to help early career researchers get their feet wet in skills that are less commonly taught in graduate programs in the life sciences: project management, collaboration, and, of course, data science.
Perhaps a less obvious benefit of the fellowship has been its ability to open fellows’ eyes to the career opportunities available to them, whether inside or outside of academia.
“No matter where you go – academia, the corporate world, or a non-profit – data science is useful,” said Erin McLean, who helps run the fellowship as the Community Engagement and Outreach Coordinator for the Arctic Data Center, an NCEAS partnership.
Since the first cohort, the program has evolved in response to alumni feedback – it is now longer at eight to twelve months (rather than six), fellows pick among pre-determined projects rather than design their own from scratch, and there is a “warm-up” training period to allow fellows to get up to speed on the data and coding skills they will need. What hasn’t changed is the cohort model, which has proven key for comradery and network building, and that fellows are matched with a mentor from NCEAS’ data science team.
McLean says the ability to work alongside professionals who aren’t their professors is important, as it opens fellows’ eyes to what is possible with data science and ecology.
“They learn there are alternative careers, and they can start to envision themselves in those careers, because now they know somebody,” said McLean.
Regardless of the role one ends up in, Beltz sees a huge market for data science skills, “even if that’s just on the receiving end of understanding complex systems, which is what PhDs in the life sciences do anyway,” he said.
Beltz actually took a break from his PhD program to do the fellowship. Big draws for him were the opportunity to use his predominantly self-taught R skills with data that were not his own and to work with established data scientists and in a more collaborative way.
His project allowed him to dive into the weeds of reproducibility and unpack its meaning to science and the world. He focused on developing metrics to assess how well data meet the FAIR principles – Findability, Accessibility, Interoperability, and Reusability – in order to improve the reusability of data packages, an aspect of reproducibility he hadn’t thought about before the fellowship.
“We’ve spent all of this money and time – and, by we, I mean scientists at large – collecting data, trying to answer questions. If we can make this data accessible to more people after the fact, it leverages everything that we have done,” said Beltz. “And maybe the questions that get answered on the second or third order are equally or more important than the things we tried to do ourselves.”
A big benefit of the fellowship that Beltz emphasizes is increased confidence. He now sees how applicable his skills are and he has a bigger network he can tap for advice. The fellowship also helped Beltz realize he wants to be much more oriented towards computation and data science than he ever expected when he started his graduate studies.
“I had really expected to be in the field, collecting my own data, running my own experiments. I wanted to be a field ecologist,” he said. “It turns out that my interest in terms of questions is much broader than I knew.”
The fellowship is a doorway for more than just ecologists coming at it from the academic world. Early career professionals on quite different paths have funneled in, with the common denominator of a curiosity about how data can make positive change.
Maya Samet came at the fellowship from nearly the opposite direction as Beltz. A statistics major who was fairly fresh out of college, the fellowship brought her into a whole new world – ecology – and one that actually spoke to something she had not yet considered making part of her career: her passion for nature.
“The fellowship took me to a place where I have a meaningful connection with my work and feel like I have the potential to do good in the world,” said Samet.
Samet was able to apply her math and computation chops to her project in ecological informatics, which focused on developing a process for collecting data citation metrics – which, unlike literature citations, is currently a bit of a wild west. Her work will contribute to the open science effort to enable researchers to get proper credit for the datasets they make accessible to other researchers.
Samet says the fellowship opened her eyes to how relevant her skills are to the sciences and has given her a foot in the door for a science-oriented career, if she ends up going that route. She also now knows how important it is to her to choose a path that intersects with her passions.
“The fellowship changed my vision in that [an emotional connection to my work] is now something I will also consider in future job opportunities,” she said.
Like Samet, Sarah Erickson came to the fellowship post college and pre-graduate school, but her pathway in was quite different. She had been doing science communications, interning for WorldFish in Malaysia and, before that, at the Australian Research Council’s Center of Excellence for Coral Reef Studies. What drove her to the fellowship was a desire for better data literacy.
“As I tapped my way into academia and what a PhD or Master’s might look like, everyone was telling me that I was going to have to learn quantitative methods and analysis, and it was a big realization that I would need to learn R or Python,” said Erickson, who was in the same fellowship cohort as Beltz and Samet.
She was hesitant at first to apply to the fellowship, since her work behind the computer thus far had little to do with numbers and data points. But she decided to take on the challenge because of her interest in data visualizations and a desire to one day be comfortable working with other people’s data, in addition to her own.
“I want to be able to work with scientists from a new perspective and be able to help them turn their data into something that is more than a piece of writing,” said Erickson.
For her fellowship project, Erickson was able to apply her science communication skills to develop educational modules and materials that teach undergraduate students how to use data repositories, a skill that is not typically taught at the college level.
The fellowship made Erickson feel well prepared and more marketable for her next step. She is looking to get more research and R experience and eventually apply to graduate school, likely in a program focused on the marine social sciences.
As she moves forward, Erickson says a big takeaway from the fellowship for her is the purpose-driven team spirit she experienced, even despite the fact that COVID forced this cohort-oriented experience to go virtual.
“One thing I am actively thinking about is continuing the legacy of the lessons in team values that I picked up on,” said Erickson. “I feel really lucky to have tapped into this world.”
This latest cohort of data science fellows was funded primarily by the NSF Polar Program through the Arctic Data Center with additional funding from the Long Term Ecological Research (LTER) program.