9 de febrero de 2016
Libraries Support Data-Sharing Across the Research Lifecycle
OPEN DATA AS A “FIRST CLASS OBJECT OF SCHOLARLY COMMUNICATION”
Writing in the introduction to a 2015 research data-focused issue of the Journal of Librarianship and Scholarly Communication, Gail Clement and Lisa Schiff note “a growing consensus that the basic building blocks of knowledge…warrant the same degree of attention as the research papers that synthesize and interpret those raw artifacts.” This recognition is reflected in many areas. Since the early 2000s, federal funding agencies such as the National Institutes of Health and National Science Foundation have had explicit open data and data sharing policies. Recent updates to those policies in 2011 and 2012 impose delays to, or loss of, funding for noncompliance, creating substantial momentum that is driving engagement.
Private funding agencies have followed suit: major players such as the Gates, Ford, and Sloan Foundations require open data and data sharing. Scholarly journals across the spectrum have also addressed this issue, from mission-driven open venues like PLoS One, which promises to reject any articles that do not make their data publically available, to traditional commercial journals like Nature, which makes data-sharing a condition of publication.
One of the chief drivers has been the 2013 Office of Science and Technology Policy memorandum, which directed federal agencies with more than $100 million in research and development spending to create plans that include “a strategy for improving the public’s ability to locate and access digital data resulting from federally funded scientific research.” In subsequent years, covered agencies have developed individual policies tailored to distinct agencies’ practices. Common themes across funding agencies include deposit in an established repository, open licensing using models like Creative Commons, and development of a data management plan (DMP) as part of the proposal to be sure that open data is baked into the research from the start. These mandates have created an environment where data sharing is both a powerful tool for advancing scientific progress and an urgent pressure felt by individual researchers.
LIBRARIES AS PARTNERS IN RESEARCH
This urgency is felt across most campuses; libraries have responded with a variety of scholar-facing services. Roughly half of all libraries at institutions considered universities under The Carnegie Classification of Institutions of Higher Education have some form of data support programs, and my own at North Carolina State University (NCSU) is no exception. Our support was galvanized by the NIH’s 2012 public access mandate and our program has continued to evolve to address larger research data management issues. Like almost two-thirds of Carnegie university libraries, even those with a named expert, we do not have a full-time data librarian (as documented in a recent study by Kristin Briney, Abigail Goben, and Lisa Zilinski) so we manage our data services through a committee. As Hilary Davis, NCSU Libraries’ Head of Collection Management and Director of Research Data Services, told LJ, “with already-strong relationships established by subject liaison librarians, we hoped we could bring campus partners together to bridge decentralized approaches to data management.”
To help our researchers understand research data and data-sharing, our committee has developed a series of informational resources. We offer regular workshops and host the DMPTool, a resource that provides a template for designing a data management plan (DMP) as required by most funders. As Davis notes, the Libraries’ focus was initially on concrete outcomes that “provide a tangible service to help NC State’s researchers develop practical ways to manage, store, and share their research.”
NCSU also offers a DMP review service that helps individual faculty researchers meet their funders’ mandates through review of the researcher’s proposed DMP. In an illustrative case, we were approached by an NSF-funded faculty member, Dr. Tom Shriver. While his research on the use of propaganda to delegitimize protesters in authoritarian states was excellent, his data management plan needed some work. He shared his initial DMP with our committee, which reviewed the document and suggested changes, identified resources, and discussed strategies for storage and accessibility. With our support, Dr. Shriver revised his DMP and was awarded his NSF grant. Based on this experience, he has become a champion for the libraries, which, he said, have been “instrumental in helping me put together my data management plan” and thus his grant.
This collaborative approach to service, which connects diverse library expertise with researchers in their moment of need, has created opportunities to build networks within and beyond the library, to integrate library support into the research process, and to support open access to research data. Davis describes the way the libraries has also benefited from the “training ground” created by this review process, which “helped us to build RDM literacy into the existing skillsets of librarians through hands-on exposure to the very real needs of researchers.”
This training must be ongoing for librarians and researchers since technical resources and best practices are constantly in flux. Noting the challenges of supporting open data as new services emerge and funders refine their expectations, Davis concludes that “mandates from federal funding agencies are beginning to surface some standards for storage and public access to the results of research, but we are still operating in a shifting landscape.”
LIBRARIES FOSTER AN OPEN DATA ECOSYSTEM
As this landscape continues to evolve, librarians are also doing exciting work to help guide this evolution based on principles of openness and interoperability. Scholars such as Christine Borgman have argued that this practice is necessary to transition the traditional idea of a “data infrastructure” centered around scholarly journals or institutional repositories into a more robust “ecology” of research data. Creating this ecology will require evolution in many areas including developing new standards around validation of research data, processes for documenting provenance of data sets, and new types of governance and ownership of research data.
Librarians have been active in all of these spaces, making research data the focus of coordinated efforts. Libraries are minting digital object identifiers (DOIs) that identify specific datasets using services like DataCite and EZID. They are also doing rich work with metadata that facilitates discovery and reuse through individual consultations and the development of schema.
One of the major hurdles for open data that libraries are engaging with is storage. Many institutions, including my own, have an institutional repository designed to host scholarly articles, but there is not yet a mature ecosystem that supports the diverse forms of research data created by our researchers. As a result, the current state of data storage is a patchwork of institutional, subject-based, journal-based, and freestanding repositories. Large open-sourced projects like Dataverse and commercial products like Figshare power many efforts, but researchers often prefer to use a repository tailored to their own discipline such as Dryad for the biosciences, ArXive for physics, or the Social Science Research Network.
The library community is also committed to developing the Sloan- and IMLS-funded Shared Access Research Ecosystem (SHARE) initiative that federates deposited scholarly materials, including data, to make them more discoverable. Created as an alternative to the publisher-focused Clearinghouse for the Open Research of the United States (CHORUS) model, SHARE is designed to support the data-sharing mandated by the White House Office of Science and Technology Policy by designing a system for openness and interoperability.
With global programs like Dataverse and SHARE, discipline-specific repositories, and a host of institutional solutions, librarians are sowing seeds for the data ecosystem described by Borgman, but each of these projects is at a different stage of maturity. As a result, while each of these initiatives presents exciting opportunities to manage and share research data, so far researchers remain most likely to find data sets through references in a specific journal article. Major questions remain around data ownership and the storage of sensitive data. As the Scholarly Publishing and Academic Resources Coalition (SPARC) notes in its own overview of open data, “despite its tremendous importance, today, research data remains largely fragmented—isolated across millions of individual computers, blocked by disparate technical, legal and financial restrictions.”
USING OPEN DATA
Managing research data to make it discoverable as a discrete unit of scholarship is also a developing effort for libraries, who are using data to create powerful visualizations. At NCSU these efforts have flowered in specific projects around GIS data, such as Dr. Helena Mitasova’s GRASS GIS project, around library data, and in partnership with our Triangle Research Libraries colleagues in the Duke Libraries and UNC-Chapel Hill Libraries. This is also an evolving space, but user-friendly tools like Tableau and Lyra can give librarians a chance to get their feet wet before graduating to more sophisticated tools like the R project.
Truly robust data sharing can also drive new discoveries for disciplines such as statistics that rely on large data sets for their own scholarship.
The work of making data truly open is just beginning and there are many opportunities for new libraries to engage. While there is no one-size-fits-all approach to research data, tools such as Amanda Whitmire’s Data management as A Research Tool (DART) rubric can help a library program get up to speed. A coalition of the major library organizations are also in the process of creating a set of competencies that can be used to audit and improve existing services. The 2013 Association of Research Libraries (ARL) Spec Kit 334: “Research Data Management” can tide eager librarians over until their release.
Interested librarians can also look forward to exciting support from the Association of College & Research Libraries (ACRL). Building on the popular Scholarly Communication: From Understanding to Engagement Roadshow, a new daylong Research Data Roadshow is in development, with plans to launch in the summer of 2016. One of the designers, Megan Sapp Nelson, an Associate Professor of Library Sciences in the Purdue University Libraries, described the project: “As a subject liaison who has been working with data over the past five years, I’ve experienced the learning curve for taking on data services,” she said. “I’m excited by the challenge of presenting an admittedly dense topic in an engaging and approachable way for other liaisons who are beginning to consider their own engagement in data services.”
Yasmeen Shorish, a member of the ACRL’s Research and Scholarly Environment Committee planning the Roadshow, added that ACRL “recognized that data management is an emerging area of need across all academic libraries” and is designing the Roadshow to help librarians prepare to engage at all levels. “As libraries endeavor to support the entire lifecycle of scholarship, across myriad media, professional development in this area is vital to the success of that mission.”
Back at North Carolina State, Hilary Davis agrees: “We knew that providing a robust program to support research data management and discovery of NC State’s research assets could position the NCSU Libraries more firmly as a part of the research process.”
Autor: William M. Cross