Loe raamatut: «El manejo de datos»

Georgina Araceli Torres Vargas

Font:

El manejo de datos. Aproximación desde los estudios de la información

Georgina Araceli Torres Vargas

DR ©Universidad Nacional Autónoma de México, Instituto de Investigaciones Bibliotecológicas y de la Información

ISBN: 978-607-30-2710-6

Primera edición 2020

Colección: Tecnologías de la Información

Publicación dictaminada

La presente obra está bajo una licencia de Creative Commons by nc sa 4.0

Contenido

Minería de Texto y Minería de Datos

Sistematización de datos y servicios de información

Research Data Management and Libraries: Opportunities and Challenges

Presentación

Identificación de los temas de investigación en los documentos científicos del Colegio de Postgraduados

Minería de texto aplicada a un diagnóstico de usuarios en Ciencia y Tecnología: aprendizajes para fortalecer la investigación bibliotecológica

Minería de Datos, el caso de estudio de la Biblioteca Dr. Valentín Gómez Farías de la Facultad de Medicina de la UNAM.

Integración de los principios de linked data en el registro bibliográfico

Plan para el desarrollo de la Ciencia de Datos y Big Data (PDCDBD) en la UNAM con fines académicos y administrativos

MINERÍA DE TEXTO Y MINERÍA DE DATOS

SISTEMATIZACIÓN DE DATOS Y SERVICIOS DE INFORMACIÓN

Research Data Management and Libraries: Opportunities and Challenges

KRYSTYNA K. MATUSIAK

University of Denver

INTRODUCTION

Research Data Management (rdm) is a new area of service and infrastructure development at universities and research centers worldwide. The increasing volume and complexity of digital data, as well as the challenges associated with organization, preservation, and reuse of data, have contributed to the emergence of RDM as a priority in recent years. Modern science has increasingly become data-intensive with researchers using new methodology and instruments and producing an unprecedented amount of data (Borgman 2012). Digital technology has accelerated this process by providing new tools for collecting scientific evidence but also enabled building technical infrastructure for storing and sharing data. The researchers studying the growth of science found that global scientific output doubles every 9 years. Most of the scientific expansion has taken place in the modern era with the growth rate of 8 to 9% (Bornmann & Mutz 2015).

The motivations for deployment of RDM services are diverse, often emerging from a pragmatic need to comply with requests from funding agencies for data management planning, but also responding to the policy environment and calls for openness in science (Ayris et al. 2016; Fearon et al 2013; Pryor et al. 2013). National funding agencies in several countries now require researchers to prepare data management plans and to provide open access to data (NSF; UK Research and Innovation). The European Research Council (ERC) supports the principle of open access to research data and scholarly publications. It conducted a Pilot on Open Research Data for research projects funded through the Horizon 2020 program. As of 2017, the Pilot on Open Research Data has been extended and open access became the default for the research data generated as a result of the Horizon 2020 funding, although researchers can still opt out in some circumstances (ERC 2018). In addition to funder requirements, journal editors and publishers are increasingly requesting authors to provide open access to source data underpinning publications.

This paper provides an overview of RDM services and their importance in the context of Open Science. It summarizes the findings from the Data Curation project sponsored by the International Federation of Library Associations (IFLA) Library Theory and Research (LTR) Section. The IFLA study focused on the roles and responsibilities of RDM professionals in international and interdisciplinary contexts. This paper discusses the opportunities and challenges in providing RDM services in light of the findings from the IFLA Data Curation project.

OPEN DATA AND THE OPEN SCIENCE MOVEMENT

In the traditional scholarly communication model, scholars disseminated the results of their research through conference presentations, books, and articles published in peer-review, subscription-based journals. The Open Access (OA) movement has changed the model of scholarly publishing encouraging scholars to share their papers through open access publishing or depositing published articles in institutional or disciplinary repositories (Swan 2012). The emphasis of OA, however, has been almost exclusively on opening access to journal articles, not so much on research data. As Borgman (2015) notes open data is “substantially distinct from open access to scholarly literature” (p. 44). Researchers would sometimes share data sets with colleagues in the scholarly community but rarely provide open access as part of the traditional scholarly communication practice.

Data is a valuable output of scholarly work and the calls for providing open access to research data come not only from the funding agencies but also from the members of the scholarly community. Opening access to data is believed to contribute to transparency and reproducibility of research and to the more efficient scientific process (Kraker et al. 2011; Molloy 2011; Nosek et al. 2015). Open research data can be freely accessed, reused, and redistributed for scholarly purposes. The principles of FAIR data (findable, accessible, interoperable and reusable) provide a foundation for access and reuse of research data across disciplines and borders (Wilkinson et al. 2016). Open Data is a key component of the Open Science movement.

The Open Science movement advocates for opening all phases of the research cycle and sharing all outcomes of the scientific work (Foster 2018). It emphasizes a more open, inclusive, and collaborative research process and encourages new ways of diffusing knowledge by using digital technology. The term “Open Science” often serves as an umbrella term encompassing scholarly outputs, practices, and collaborative digital tools. In its broad understanding, it includes open data, open publications, open educational resources (OER), open source software, open peer review, and citizen science (Bezjak et al. 2018). Fecher and Friesike (2014) note the diversity and even ambiguity of the discourse on Open Science and identify several perspectives or “schools of thoughts,” ranging from making knowledge freely available for everyone to developing an alternative system for evaluating quality and measuring impact.

Vicente-Sáez and Martínez-Fuentes (2018) acknowledge the diversity of perspectives and concepts of Open Science in their systematic review of the scholarly literature. The authors provide an integrated definition to stimulate a debate about the social, economic, and human added value of Open Science. As a result of their analysis, Open Science is defined as

the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods. In a nutshell, Open Science is transparent and accessible knowledge that is shared and developed through collaborative networks (Vicente-Sáez & Martínez-Fuentes 2018).

The concept of Open Science and the FAIR data principles have been embraced by the European Commission and incorporated into the European Open Science Cloud roadmap (European Commission 2018). A recent report examines the range of data skills needed to support the implementation of FAIR principles and distinguishes between research community skills, data science, and data stewardship (Hodson et al. 2018). The proponents of Open Data recognize that not all data can be open and acknowledge the need to balance openness and protection of sensitive data (European Commission 2016). Qualitative and personal data in social and health sciences pose many challenges for sharing. Some data can be anonymized and released while other data sets will need to remain closed. The European Commission promotes the principle that data should be “as open as possible, as closed as necessary” (European Commission 2016, p.4). Research data management is a critical component of opening and sharing data and determining the levels of openness.

ACADEMIC LIBRARIES AND RDM

The data-intensive research environment and the movement towards Open Science present new opportunities for library professionals. University libraries in many countries have been assuming leadership roles in promoting open access and offering services in RDM. Traditionally, libraries provided data services for their users by acquiring datasets and ensuring their discovery and access. The new environment challenges libraries to move beyond the traditional service roles of facilitating the discovery and delivery of information resources (Fearon et al. 2013). It encourages a more participatory role in the research process and the development of new services to actively support scholars in managing and preserving research data. The concept of data life-cycle plays a central role in developing and organizing RDM consultative and technical services (Carlson 2014). Librarians offer unique expertise in metadata and archiving, and add value at different points of the data cycle.

Academic libraries began to provide a broader range of data management services to support researchers in meeting the requirements of funders and publishers in the last decade. Academic librarians with expertise in RDM who support researchers in meeting funders’ compliance and preparing data for release are a vital part of the services. The development of RDM services and the roles of academic libraries in data stewardship have been the subject of extensive survey research (Cox & Pinfield 2014; Tenopir, Birch, & Allard 2012; Tenopir et al. 2015). The focus of this research was on the types of services offered by academic librarians, maturity levels, and plans for future development. The findings indicate that academic libraries mostly offer consultative services and training, especially for data management planning. Technical services that involve maintaining a data repository and support for data archiving were limited. Many researchers see RDM services as an extension of traditional academic library roles in outreach and training.

Most of the research, however, focused on academic libraries in the United States and the United Kingdom. More recently, Tenopir et al. (2017) conducted a survey of research data services in European academic libraries. The study indicates that more European libraries currently offer consultative than technical services, but also manage infrastructure for data storage and collaborate with other units on campus. Cox et al. (2017) expanded the coverage to seven countries and provided an international comparison of several aspects of RDM development, including policy and governance, type of services, and staff deployment and skills. The IFLA Data Curation project built upon this prior research and expanded it by providing an international and interdisciplinary perspective. The design of the study and the findings are reported in the forthcoming paper (Tammaro et al. forthcoming). The preliminary findings about the types and structure of RDM services were presented at the Association for Information Science and Technology conference (Matusiak & Sposito 2017).

IFA DATA CURATION PROJECT

The primary objective of the IFLA LTR project was to identify the roles and responsibilities of RDM practitioners working in multiple countries. The study also focused on the terminology used to describe the emerging practices and new professional roles. The study was designed using a mixed-method approach and consisted of three phases:

Comprehensive literature review and data mining to analyze the terminology used to describe the emerging practices and new professional roles

Quantitative content analysis of job announcements for data curators and RDM librarians

Semi-structured interviews with professionals working as data librarians, data curators, or research data managers.

The quantitative phase of the study concentrated on the content analysis of job announcements derived from a variety of library and information science job posting sites, including International Association for Social Science Information Services and Technology (IASSIST), and Code4Lib. The goal of the content analysis was to examine the titles, roles, responsibilities, qualifications, and competencies listed in the advertised positions. The data set included 441 job advertisements. Most of the analyzed positions (73.6%) were based in the United States. However, the data set also had some international coverage. The widest distribution came from Europe with 17 European countries in the sample.

The findings from the quantitative analysis of job announcements indicate a wide variation in titles used to identify positions. There was no single title standing out as a standard for the discipline. The most common titles included librarianship in some form, such as Data Services Librarians, Digital Scholarship Librarians, or Research Data Management Librarians. The positions were frequently advertised under a wide variety of titles often with additional data-related responsibilities, such as data science or data reference services. In the analyzed data set, RDM services were located primarily (84.2%) in universities and academic libraries. The range of responsibilities also reflects the influence of librarianship with the top responsibilities in public services including instruction, reference, and outreach. However, a degree in librarianship was required in only 27% of the job advertisements.

In the qualitative phase, semi-structured interviews were conducted with professionals working as data librarians, data experts, datacurators, or research data managers. The goal of interviews was to gain insight into the practice of research data management and to examine the services from the perspective of the professionals working in the field. The interviews were conducted with 26 professionals from Australia, Canada, U.S. and six countries in Western Europe. The study participants were employed at 24 organizations, including:

Academic libraries (19)

Campus-wide research data service centers (3)

University departments (2)

Data archive (1)

Research center (1).

All participants held Masters degrees, including 15 had Masters in Library and Information Science (MLIS). Ten participants had PhDs in a variety of disciplines, including biology, environmental science, history, information science, medical informatics, or philosophy. The participants held different position titles although many of their responsibilities and job functions overlapped. Several participants, working mostly in Europe, did not have MLIS but had advanced disciplinary degrees and prior research experience. The variety of titles confirmed the findings from the quantitative phase of the study.

Despite the differences in position titles and terminology, the study found a sense of a shared purpose or even mission among the participants. The professionals across institutional and national settings emphasized that their primary roles and responsibilities involved assisting researchers in meeting funder requirements, improving data management practices, and ultimately contributing to a more efficient research process and better-quality data. Several participants mentioned the end-goal of “making data more usable” (P-L, Interview), and efforts to advocate the FAIR data principles. The participants emphasized that although assisting researchers with meeting funder’s requirements was one of the immediate goals, they also wanted to improve research practices, as stated by Participant V, “that’s really what we want to be leading to, it’s not just about compliance but actually trying to change research culture and get people to think it’s good research practice” (P-V, Interview).

The types of RDM services identified in this study encompassed both consultative and technical services. The concept of the research data lifecycle played a central role in organizing and structuring services. All professionals participating in this study were engaged in consultative services, outreach, and open access advocacy. The consultative, informational services were typically offered at the beginning of the research cycle in the form of one-on-one consultations, workshops and seminars for faculty and graduate students, or online tutorials and guidelines. The consultative services focused on offering guidance and support in:

Meeting compliance with funders’ requirements

Developing data management plans (DMP)

Following data management best practices

Adhering to data citation standards

Promoting open access and data sharing

A smaller number of participants assisted researchers with technical aspects of depositing data in repositories and archival storage. Technical services were usually offered at the end of the research data life cycle. Technical infrastructure and the level of support depended on institutional settings. Technical services involved offering support in:

Data management

Data formats and file naming conventions

Data cleaning and verification

Data conversion

Data description and documentation

Metadata creation using standardized schemas

Data deposit/publishing

Ingest into repository systems

Assigning identifiers

Data anonymization

Data security

Archiving and preservation

The participating information professionals often acted as mediators between different stakeholders building networks of expertise and community around good research practices. Their work required some technical skills and knowledge of new technological solutions since they often made recommendations to researchers and led RDM initiatives at their institutions. The new and evolving character of the positions required expertise in multiple areas and the ability to adapt to the changing environment. Specific technical expertise and the level of required skills depended on institutional settings. The study participants emphasized that it’s often impossible for one person to fulfill all the necessary skills and competences found in job descriptions. The lack of technical skills and hands-on experience with databases and scripting was mentioned for professionals with library backgrounds.

RDM services were primarily located in academic libraries as part of research and consultation departments or digital scholarship units. University libraries represented that largest group in the sample but the type of services, the stage of its development, and the level of support for researchers varied greatly between the sites. In the early stage of RDM development, academic libraries usually focused on needs assessment, outreach, training, and open access advocacy and provided consulting services on developing DMPs, metadata, and data curation practices. Academic libraries with more advanced RDM services offered not only assistance with DMPs, metadata, but also with data citation, data sharing and with technical aspects of depositing data in repositories.

The study, however, demonstrated that academic libraries are not the only centers of RDM services on university campuses. It identified new organizational strategies, including embedded services, distributed networks of RDM expertise, and multi-purpose research data services centers. In the embedded model, librarians were working on the faculty-led research projects and research labs throughout the university. They provided support not only at the beginning and end of the research cycle, but also shared expertise and advice on best data management practices throughout the research projects. Distributed networks often had formal structures and were comprised of professionals with expertise in RDM, IT, copyright, research ethics, and scholarly communication. Academic librarians often served as coordinators and referred researchers to the relevant “pockets of expertise” in the campus network. Distributed networks represented efforts in community building around improving data management practices and opening data.

Campus-wide research data service centers represent a new model that reflects an evolution of services and recognition that a more comprehensive suite of skills and expertise is necessary to support data management. Three cases were identified in the sample – one in the United States and two in Europe. Both European data service centers have evolved from RDM services originally located at academic libraries. These new interdisciplinary initiatives involved cross-campus collaboration and cooperation of several units, including the university library, IT department, legal services, and office for research. Research data service centers tended to be multi-purpose and provided university research communities not only with the expertise, tools, and infrastructure necessary to manage research data but also offered support for other forms of scholarly activities. Academic librarians were employed there along IT specialists and legal experts.

The findings of the study indicate that RDM is an evolving sociotechnical practice that involves not only technical systems and services structured around research data life cycle but also a range of social activities. The work of RDM professionals in improving data management practices and advocating open access occurs on multiple levels, starting with individual researchers and their teams, building networks at their institutions, and then expanding to regional, national, and international communities. The theme of shared values and changing research culture was discussed by participants from multiple countries, pointing to the emerging international character of the RDM profession. Community building emerged as an essential requirement for research data management and involved a shared understanding of the benefits of managed data and the impact of open data on scholarship and society.

Tasuta katkend on lõppenud.