An authority file supports the identification of individuals or associations by differentiating similarly named individuals with the aid of dates, professions, etc. However, the authority file also collates the individual/association’s name variants to ensure ‘access’ to the same person/organisation’s work regardless of the form in which their name appears. The authority file also organises the database by linking a person’s various ‘public identities’, such as real name and pen name. In the same way, new and old names resulting from personal or organisational name changes are also linked. National libraries create name authority files for works published in their country. Search terms in accordance with national regulations are stored in a target’s authority file along with any relevant IDs (ISNI, ORCID) and name variants. The information is collated in international databases (such as the ISNI and ORCID databases and the Virtual International Authority File VIAF), which are freely available online. The current public administration recommendation is to also publish this information as open linked data.


Availability determines whether information is available in accordance with its purpose, in principle both technically and in accordance with other operational requirements.


A citation is used to refer to a source. It can be placed as an in-line citation within the text, in the footer as a footnote, as an endnote at the end of a publication or section thereof, or as a reference in a bibliography, for example. Sources have typically been research publications, but research data stored in a repository can also be a source. Repositories usually contain both the actual item (for example, a data file(s), code-book or questionnaire) as well as the metadata associated with the item. All these can be cited. When citing research data, the important differentiating elements are the author, the item’s title, the version number, the repository ID (usually the item number), the collection date, the collector, producer and distributor, and (if available) a PID.

According to the HTK guidelines: ‘Researchers should acknowledge the work and achievements of other researchers in an appropriate manner, showing respect for others’ work by citing other researchers’ publications in an appropriate manner and giving others’ achievements due value and significance both in their research and the publication of their results.’


The aspect of Intellectual property that grants creators the right to permit (or not permit) the reproduction of their creations. It is distinct from trademark rights or moral rights.


A suite of standardized licences that allow copyright holders to grant some rights to users by default. CC licences are widely used, simple to use, machine readable, and have been created by legal experts. There are a variety of CC licences, each of which use one or more clauses. Some licences are compatible with Open Access in the Budapest sense (CC0 or those carrying the BY, SA, and ND clauses), and some are not (carrying the NC clause).


Data in the sense used here are all digitally available objects (simple or complex) that emerge or are the result of the research process.


Data integrity means ‘1. (Data or a data system that is) genuine, authentic, free from internal conflicts, comprehensive, up-to-date, legal, and usable. 2. A characteristic by which information or a message has not been altered without authority, and any potential changes can be traced with an audit trail.‘ (Glossary of Government Information Security, 1/2000).


An analytic process designed to explore data in search of consistent patterns or systematic relationships between variables, transforming data into information for future use.

Digital Object Identifier (DOI)

A unique text string that is used to identify digital objects such as journal articles, data sets or open source software releases. A DOI is one type of Persistent Identifier (PID).


A documentation is detailed information as well as background and methodological approach about the data or code (e.g., description of the project, variables, and measuring instruments).


FAIR Data (according to FORCE11 principles and published in Nature Scientific Data) are Findable, Accessible, Interoperable, and Re-usable, in order to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of, task-appropriate scientific data and their associated algorithms and workflows.


(General Data Protection Regulation) seeks to create a harmonised data protection law framework across the EU. It aims to restitute the control of personal data to citizens, whilst imposing strict rules on those hosting and ‘processing’ these data, anywhere in the world. The Regulation also introduces rules relating to the free movement of personal data within and outside the EU.


A numerical measure that indicates the average number of citations to articles published over the previous two years in a journal. It is frequently used as a proxy for a journal’s relative importance. Its transfer to the impact of individual articles published in a journal is considered to be problematic.


A legal term that refers to creations of the mind. Examples of intellectual property include music, literature, paintings, sculpturing, video and other artistic works; discoveries and inventions; and phrases, symbols, and designs.


A series of published research articles. Historically divided into volumes and issues.


A license allows a third party to perform certain actions with a work or data. The license informs about the usage rights of a resource (e.g. text, data, source code).


Metadata provide a basic description of the data, often including authorship, dates, title, abstract, keywords, and license information. They serve first and foremost the findability of data (e.g. creator, time period, geographic location).


In its simplest form, open access publishing (articles, reports, monographs) means uploading a research publication to a data network and granting rights to read, copy, print and link to entire scientific publications. Open access publishing means free dissemination of scientific information. A scientific publication is openly available when both the scientific community and the general public have unrestricted access via the Internet without charge.

In simple terms, Golden OA (the Gold Road) means open journals, while Green OA (the Green Road) means self-archiving. More detailed information about the alternatives for open access publishing is available from, for example, Ilva & Lilja: Kotimaiset tieteelliset lehdet ja avoin julkaiseminen (Finnish scientific journals and open access publishing). 2014. URN:NBN:fi-fe2014050725729.


Open data refers to unprocessed information accumulated by research organisations, researchers, public administration, companies or private persons that is made freely accessible to third parties for use without charge.


An open interface refers to well-documented, free-to-use means of transferring data between software.6 For example, a database will provide software developers with an interface for queries.


Open knowledge refers to unrestricted access to digital content and data that users may use, amend and distribute without charge. To meet the criteria for open knowledge, items must available in full, in a usable and amendable format, via the Internet. Items must also be licensed for unrestricted use, amendment and distribution.


Research data and publications are usually protected by copyright. However, agreements can be signed to enable open use of these materials. Creative Commons licences can be used to grant selected rights and freedoms to users, readers or experiencers. By combining different terms and conditions, you handle your rights in a way that suits both you and the situation. You can try combining different terms and conditions with your choice of licence. The OSR Initiative (ATT) recommends CC0, CC 4.0 BY or, if necessary, other generally recognised licences. 7


Confidentiality is ‘1. Maintaining the confidentiality of data, and protecting the rights associated with data, data processing and communications from violation. 2. The extent to which confidentiality is considered important.‘ (Glossary of Government Information Security, 1/2000).


A research environment or infrastructure comprises the tools, equipment, materials and services that enable research to be carried out. Research infrastructures can be used to strengthen research communities and increase capacity. Research infrastructures can be located in one place, be decentralised, or be virtual. An open research infrastructure provides access to a comprehensive package — the research process via which the results will be produced. For a research infrastructure to be considered open, results, publications and background materials must be freely available to the research community.


Open science means the promotion of an open operating model in scientific research. The key objective is to publish research results, along with the data and methods used, so they can be examined and used by any interested party.

Open science includes practices such as promoting open access publishing, open access publishing itself, harnessing open-source software and open standards, and the public documentation of research processes with ‘memoing’.


Open source is a way to develop and share software. The software’s source code is freely available to be used, copied, altered, and shared. In the open source development world, both ideas and finished products are available for all to see and use. Any single company does not manage development — it is a global community consisting of private persons and companies. Everyone can participate in development work, and bugs can be quickly found and fixed. This often leads to high software quality, good data security, and software interoperability.


Open standard means adherence to established, commonly agreed standards, so developers can create replica or compatible software. Open standards are available to all, so anyone can find out about and adhere to them.


Peer review, or ‘refereeing’, comes from the custom in which scientific articles sent to journals or other publications are evaluated by the editorial team and selected external experts. Peer reviewers examine the content and scientific significance of the submitted article, along with its linguistic form and textual structures, ensuring that each article adheres to principles of scientific writing (such as succinctness, clarity in diagrams and tables, source citations).


Identifiers for publications and research data are used to, for example, search for, identify and link materials. Identifiers are also mandatory for long-term preservation. Depending on the type of publication involved, the identifier may be an ISBN (monographs) or a variety of persistent identifiers (PIDs). A Handle ID is used in repositories, a DOI in commercial publishers’ systems, and a URN in national libraries’ digital collections. A PID is almost exclusively used for research data in national and international projects, while the National Research Data Project (TTA) and OSR Initiative (ATT) use a URN. IDs are also required for researchers and other juridical bodies participating in the research process (universities and other institutions of higher education, research institutes, scientific communities and their institutions, research teams). These identifiers are always separately allocated in Finland.


Non-proprietary formats that follow documented international standards, are commonly used by the research community, use standard character encoding (e.g. ASCII, UTF-8), and were compression, if used at all, is lossless.


A manuscript draft that has not yet been subject to formal peer review, distributed to receive early feedback on research from peers.


A quality system consists of procedures and processes for assuring the quality of training, research, social dialogue and impact, human resources, services, and management.


Repository is defined as the infrastructure and corresponding service that allows for the persistent, efficient and sustainable storage of digital objects (such as documents, data and code).


Reproducibility is a spectrum and instructors should choose the definition most used by their audience. Generally speaking, reproducible research makes it possible to obtain similar results of a study or experiment and independent results obtained with the same methods but under different conditions (i.e., pertains to results). Some break the definition into levels of reproducibility, including computationally reproducible (also called “reproducible”): where code and data can be analyzed in a similar manner as in the original research to achieve the same results, and empirically reproducible (also called “replicable”): where an independent researcher can repeat a study using the same methods but creating new data.


Involve academic, economic and societal aspects, or some combination of all three. Impact is the demonstrable contribution that research makes in shifting understanding and advancing scientific, method, theory and application across and within disciplines, and the broader role that this plays outside of the research system.


An institute, corporation or government body that provides financial assistance for research.


The joint use of a resource or space. A fundamental aspect of collaborative research. As most research is digitally-authored & digitally-published, the resulting digital content is non-rivalrous and can be shared without any loss to the original creator.


A form of business model whereby a fee is paid in order to gain access to a product or service – in this case, the outputs of scholarly research.


Usability is defined as follows: According to the Glossary of Government Information Security (1/2000), usability means ‘1) a characteristic of data, an information system or service, by which it is available to those with the right to use it, and can be used at the desired time and in the required manner, and 2) ease of use’.


Version control is the management of changes to documents, computer programs, large web sites, and other collections of information in a logical and persistent manner, allowing for both track changes and the ability to revert a piece of information to a previous revision.

Based on:

– glossary provided by the FOSTER Open Science and Research activities.

– Open Research Glossary