Open Research Data
Do you have any doubts? Do you want to consult your Data Management Plan, get advice, choose a repository, talk about FAIR or opening data?
Please contact us:
Department of Digital Access to Collections – cyfrowa@umlub.pl,
tel: +48 81 448 58 13,
Main Library building, 18 Szkolna Street, second floor, room 213.
We answer questions, solve problems, and dispel doubts.
We remind you that National Science Center introduced the obligation to attach a Research Data Management Plan to the project application form, and also obliges grantees to make research data available in open access – unless there are exceptional circumstances.
Repositories
We recommend choosing the general repository RepOD (parent collection RepOD), or finding a repository corresponding to the scope of the research being performed (search engine: https://www.re3data.org/).
Recommended materials
Guidelines for NCN applicants to complete the DATA MANAGEMENT PLAN in the research project
Natalia Galica's presentation “Open data researchin the policy and practice of the National Science Centre” CC-BY License
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18.
Huerta, E. A., Blaiszik, B., Brinson, L. C. et al. FAIR for AI: An interdisciplinary and international community building perspective. Sci Data 10, 487 (2023). https://doi.org/10.1038/s41597-023-02298-6.
FAQ
- What types of data do we consider to be research data that can be collected and made available in research data repositories?
A. The types of data collected and shared are very diverse. They depend on the field of science and the research methodology adopted. These include:
- Text documents, notes
- Numerical data
- Questionnaires, surveys, survey results
- Audio and video recordings, photos
- Database content (video, audio, text, images)
- Mathematical models, algorithms
- Software (scripts, input files…)
- Results of computer simulations
- Laboratory protocols, methodological descriptions
- samples, artifacts, objects.
- How should the principle “as open as possible – as closed as necessary” be interpreted with regard to research data?
A. According to the principle of "as open as possible - as closed as necessary", research data should be made available immediately after the project is completed or with the first dissemination of research results, e.g. during a conference, in an article or other form of publication. Opening research data means that they are collected by repositories and made available free of charge to anyone interested. However, it sometimes happens that some research data resources cannot be made available in an open model (e.g. due to copyright law, principles of commercialization of research results, patent proceedings, then an embargo is imposed on such data, i.e. periodic exclusion of data from open access; so-called "on demand" access is possible, i.e. based on the application and obtained consent of the researcher.
- What exactly does “raw research data” mean?
Answer: The term "raw research data" refers to data generated directly by research devices. It is also referred to as unprocessed data, primary data, i.e. not subjected to any processing by the researcher and tools for analyzing the research material. Raw data after processing is called processed data.
- In what formats should research data be saved?
A. The formats of research data files can be any, but to ensure universal access and openness, it is good to use formats that do not require commercial software to read the data. When planning the process of saving research data, the availability of open and closed recording formats should also be taken into account:
open formats are:
- for text files: csv, odt, ods, odp, rtf, txt, html, xml
- for graphic files – png
- for audio files – flac
closed data recording formats are used:
- for text files – doc, docx
- for text and graphic files – pdf
- for graphic files – tiff
- to text files and databases – xls
- for sound files – mp3
- How detailed should research data metadata be?
A. The level of detail in the description of research data depends primarily on the research project manager and research team and their needs and expectations regarding the scope of the characteristics of the research data. It is worth consulting the editors and managers of the research data repository (data steward, data librarian, data curator) about the level of detail in metadata. Basic data includes: author, title, keywords, funding institution, scientific discipline, license, etc.). Metadata is a tool that facilitates the identification, use and management of data, so all types of metadata should be included.
- What is metadata?
A. Metadata is data about data, i.e. an essential element in organizing access to research data, its understanding, content and form characteristics, as well as so-called reuse. There are three main types of metadata:
– Descriptive metadata – provides information necessary to find and identify a data set. May include: title, data author, abstract and keywords.
– Structural metadata – used to describe the relationships and dependencies between individual data sets and elements of these sets in order to, for example, facilitate navigation.
– Administrative metadata – is helpful in managing a specific data resource. It contains information about how and when (i.e. date) the data was created, file type, access information. There are several subsets of administrative data. Within administrative metadata, two separate types of metadata are mentioned, these are:
- rights management metadata relating to intellectual property rights,
- preservation metadata, which contains information needed to archive and maintain the resource.
- Who decides which metadata format to use?
A. There are no top-down requirements, e.g. from the NCN, regarding the use of a specific format for describing research data. The NCN recommends, among others, one of the popular and frequently used formats Dublin Core (alongside Data Cite and DDI), but the final decision is made by the research project manager, together with the research team and the advisor (repository editor, data steward, data librarian).
- Is the use (depositing and sharing) of the repository system on commercial terms?
A. Most repositories operating in the world do not charge fees from users. Use is based on an account set up in the system or agreements signed with the institutions where the scientists who deposit the data work.
- What is the research data deposit process?
A. Depositing is the process of placing a set of files containing data in a repository. The files are related – they relate to a single publication, scientific project, experiment. This connection is indicated by descriptions in metadata.
- How to ensure long-term archiving of research data?
Answer: Long-term archiving is the storage of research data for a longer period of time. This process should be planned and described, among others, in a data management plan (DMP), and it is important to specify the time and place of data storage. When choosing an external institution that provides a research data repository, it is important to consider, among others: whether it has a plan for storing data for a longer period of time, whether the files in which the data is saved can be described with metadata, who is responsible for access to the data, e.g. for 10 or 15 years, who finances the repository and what are the storage conditions.
- What is the FAIR principle?
Answer Rules FAIR Date mean:
- Findable – easily found and searched
- Accessible – available to everyone
- Interoperable – interoperable so that it can be combined with other data
- Reusable – for multiple uses.
This means that research data should be:
(a) possible and easy to find – through metadata, persistent identifiers, indexing;
(b) accessible from open repositories, also based on metadata, unique identifiers and open communication protocols;
(c) interoperable, i.e. processable, subject to exchange, connection and linking processes with data from other studies, deposited in other computer systems, programs and databases; the format of data and metadata should enable their trouble-free reading and lead to related resources/objects via links;
(d) accessible and reusable under a specific, published license; the content of the metadata should enable researchers to assess the extent to which data from other authors are useful in the context of their own research.
The FAIR Data principles serve as guidelines for enabling the reuse of scientific data under clearly described conditions, by both humans and machines. A detailed characterization of FAIR Data is available at: .
- What is opening research data?
Answer: Opening research data is sharing the content of sets of results, scientific work results, in repositories or on other platforms, for their reuse, free of charge, without technical and legal barriers, but subject to the need to respect intellectual property rights. Opening research data is a process required by institutions financing research tasks. Opening data is also intended to support the processes of repeating research and verifying research results.
- What are datasets?
A. Datasets are packages of research data and metadata; they contain the broadest possible spectrum of research data and information about research data. They present research data in the context of the research conducted, experiments, conclusions derived from them, reports and publications.
- What is a Data Management Plan?
A. A Data Management Plan (DMP) is a document that provides information about the research data that is planned to be generated and how it will be managed throughout its life cycle. A research data management plan includes the following topics:
(a) what data will be created or collected (file format and type, amount of data),
(b) how the data will be organized and described (methodology, standards, metadata),
(c) ethical and legal issues (intellectual property, copyright, classified data),
(d) how the data will be shared (how, when, to whom),
(e) which data will be stored long term (the issue of how data is stored and protected).
The DMP document is developed in connection with a specific research project. It indicates the person responsible for managing and sharing data. The DMP is a requirement of granting institutions, including the NCN.
- Who/What institutions require the development of a Data Management Plan?
A. The development of a Data Management Plan is required by organizations, institutions and agencies funding scientific research, including:
- National Science Centre (NCN)
- Ministry of Education and Science (MEiN)
- Medical Research Agency (ABM)
- European Commission (EC)
For explanations and examples of DMP documents, please refer to the Horizon Europe program websites, DMPTool, DMPonline, Digital Curation Centre.
- What is anonymization in the context of research data?
Answer: Anonymisation is a process that involves the permanent and irreversible processing/transformation of personal data in order to effectively prevent the assignment of information to a person and to remove links between personal data and the person to whom they relate.
- What is DOI?
Answer: DOI (digital object identifier) – an identifier derived from a system of globally unique identifiers for various digital objects (publications, data, websites) that are available on the Internet. DOI is assigned to individual digital objects deposited in repositories. DOI is a permanent designation of an object that is independent of its location on the network, i.e. the URL address at which such an object is currently available. DOI should be distinguished from URL addresses. URL addresses change, while DOI identifiers remain with a digital object forever.
- Is assigning a DOI free of charge?
A. Assigning a DOI identifier is free of charge from the point of view of the depositor (the researcher who reports data to the repository), while the repository as an organization purchases a pool of DMP identifiers for the digital objects of its repository.
- Are there requirements for citing research data?
A. Usually, repository editors suggest the format for citing data and provide information about it on each dataset/repository page. For example, the RepOD repository allows you to generate bibliographic data for a dataset in EndNote XML, RIS, and BibTeX formats. It is recommended that, in the case of datasets with several versions, you indicate in the citation which version of the dataset is being cited.
- Under what legal principles are research data shared?
In accordance with the principle that research data should be "as open as possible - as closed as necessary", it is recommended to choose one of the open Creative Commons licenses. The type of license is decided by the author/owner of the research data. The National Science Center obliges researchers to share data associated with scientific articles in accordance with the terms of the Creative Commons Public Domain license (CC0 license) or Creative Commons Attribution (CC BY license).
- What is open science?
Answer: “The concept of Open Science can be defined as a series of changes in broadly understood science leading to better communication between researchers and openness in the dissemination of research results.”*
The European Commission defines three main pillars on which open science is based: open scientific communication, open research data and open access to publications.
*Kokot-Kanikuła, Kamila; Wałek, Anna (2021). Open educational resources – a review of initiatives in Poland and around the world. E-mentor, no. 4 (91). https://www.e-mentor.edu.pl/artykul/index/numer/91/id/1531