Susan Davidson discusses "Why Data Citation Is a Computational Problem," a Contributed Article in the September 2016 Communications of the ACM.
00:00 You've traveled the road to knowledge. Can you find that road again?
00:07 References show the way -- but they traditionally only point to other academic papers. What about when understanding comes from a database instead?
00:17 Join us as Susan Davidson shows us how new kinds of references can forge clearer paths to wisdom, and honor those who gave it to us, in "Why Data Citation is a Computational Problem".
00:32 [Intro graphics/music]
00:40 The days of wandering dusty stacks are waning as more scientific information is found in digital databases. This change affects how we receive and communicate references.
00:55 DR. DAVIDSON: There's a large number of parts of a database, and there's an infinite number of queries that could be asked to get the information out of the database.
01:05 Some data comes not from researchers, but from devices like satellites.
01:10 DR. DAVIDSON: In this case, the scientist wants to be able to give credit to, or acknowledge, the tiles, the sets of information that they used out of this entire set of information during a particular scan of the Earth.
01:25 Then there's the problem of citations failing to credit the original researchers.
01:31 DR. DAVIDSON: The content of these databases are being generated by people! ... And to some extent people are doing it for free because they're members of the scientific community and they see a benefit to sharing that information. But unless they start getting credit, they may be less willing to contribute.
01:52 Database owners have tried to solve the problem in the past -- but with mixed results.
01:58 DR. DAVIDSON: There's a button here that says, "cite this resource". So I clicked it ... and what it comes back with is a "eagle ID" -- it's essentially the http address of this page. ... This is an example of wanting to do it, and getting part-way there, but not going the distance.
02:17 What's the solution? In the paper, Dr. Davidson and her colleagues show how data citations can be accomplished in three steps, using the IUPHAR Guide to Pharmacology as an example.
02:28 First, database owners define different types of citation. Then they create templates for each type. Finally, researchers create their citations.
02:37 The paper's authors believe that this system could also expand the kinds of citations available to us.
02:43 DR. DAVIDSON: I think that data citation of the future is going to look very different from what it looks now. ... So we could think more dramatically about citations that have hundreds of authors, or perhaps hundreds of citations, or perhaps the closure of citations -- that is, all of the influences that led up to a particular piece of work.
03:07 Whatever the future of data citation, the paper's authors say the time to settle the question is now.
03:14 DR. DAVIDSON: To many people the idea is pedantic. It's not a sexy topic, data citation: It's just what you put in the back of a paper, right? ... But I think given the landscape, in particular of the major federal funding agencies requiring people to be publishing their data products, that this is going to become an increasingly urgent problem.
03:41 Get all the details in the contributed article, "Why Data Citation is a Computational Problem", in the September 2016 issue of Communications of the ACM.
03:52 [Outro and credits]