By Kathryn Meyer
Retrieving information is basic to environmental data science, but it can also be among the trickiest of tasks, especially when one researcher has labeled their data as carbon dioxide and another as CO2, for example. Fixing these issues of semantics is central to what Steven Chong has embarked on in his fellowship.
Chong came to NCEAS from a place that is at once unlikely, but also logical: the School of Information at the University of Arizona. The fellowship has allowed him to combine his knowledge of information science with his passion for biodiversity and interdisciplinary research.
“My professional goal is to build a career that makes biological information more accessible and user-friendly,” said Chong, whose work has focused on controlling data semantics for the Arctic Data Center.
Chong has specifically been working on building a carbon cycling ontology, or a controlled vocabulary that specifies relationships between terms used to label data. This will help to organize data about the same things that researchers have given different names and also identify how concepts relate to each other.
In other words, he is making it easier for researchers to retrieve relevant data for carbon measurements regardless of what short-hand term another researcher has used to describe them. This work will help improve the accuracy and efficiency of searching for data in the Arctic Data Center’s catalog, and fortunately for everyone, Chong was recently hired full-time to bring his ontology project to completion.
What are the most valuable things you learned from the fellowship?
SC: Proper documentation of your procedures and programming scripts is important. Not only is it important when working in a team environment but also for reminding your future self about work you did. There have been couple of times, when reviewing past work, where I wished my notes were more thorough and I had to spend some time to remember what I did. From a reproducible science perspective, proper documentation will enable others to reuse and replicate the results you came up with.
How do you hope to apply what you learned during the fellowship in your career?
SC: I have become much more proficient with using R and also gained a stronger background in organizing information so that it is understandable to computers. Research organizations, such as natural history museums and academic libraries, are dealing with increasing amounts of data. It is not always feasible to manually process big datasets in a time-efficient manner. Being able to automate some of these tasks should be advantageous during a job hunt.
Why do you think the data science work you've done through the fellowship is valuable for science, policy and/or management?
SC: Data semantics should make the Arctic Data Center’s information easier to find and will benefit Arctic research. For example, researchers could find data that are currently not well-described with adequate metadata or not standardized to a common format. Policy makers and managers should also make better informed decisions because the information they retrieve from the Arctic Data Center would be more accurate and complete.
What’s your favorite or most frequently used emoji?
SC: My favorite emoji is the "dancing Zoidberg." I’m a big fan of the show Futurama. Who doesn’t like dancing crabs?
Meet other data science fellows in this NCEAS Portrait series >>