Dataset Metadata in the Flemish Research Landscape

NEYENS, Evy; Dhollander, Evelien; Bloemen, Dieuwertje; Leonard, Kevin; BREBELS, Werner; Wuyts, Tom; Dengis, Pascale; Portier, Marc

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/44961

Title:	Dataset Metadata in the Flemish Research Landscape
Authors:	NEYENS, Evy Dhollander, Evelien Bloemen, Dieuwertje Leonard, Kevin BREBELS, Werner Wuyts, Tom Dengis, Pascale Portier, Marc
Issue Date:	2024
Publisher:	Zenodo
Abstract:	The scope of this project group (PG) was to optimize process flows and systems related to research (meta)data gathering and publishing for Flemish research performing institutions (RPOs). The PG aimed to reduce monetary and labour costs of managing shared research data for all partners involved, while improving the discoverability and quality of research metadata. A questionnaire was distributed and followed up by structured interviews with research support staff from various Flemish research performing organisations (RPOs) to understand the current ecosystem and identify areas for improvement. The scope of key Deliverable 1 was to visualise the current flow of research dataset metadata within the Flemish research landscape using a business process model (BPM), and to produce a list of recommendations to improve this flow. After mapping the current flow based on part 1 of the questionnaire, part 2 consulted the institutions on their business needs, which areas for improvement they considered necessary, and what the ideal flow would look like. A thorough analysis of the interview responses yielded a list of 10 recommendations. Using a prioritisation exercise, these 10 recommendations were scored for implementation ease and impact, after which the following 4 recommendations were selected for further elaboration: 1. RPOs should make greater use of the regional research data portal (Flanders Research Information Space, FRIS1) to pull in, validate and enrich metadata previously provided by partner institutions (for example in the case of collaborations, and researchers with multiple affiliations), and then feed it back to FRIS. This would greatly reduce administrative investments in metadata registering and publishing. 2. RPOs should explore opportunities to build out integrations with external repositories and aggregators to ingest dataset metadata from affiliated researchers (This is worked out as a proof of concept in key deliverable 2, pp.25). 3. RPOs should explore the opportunities of premium ORCID integration. When correct ORCID usage is promoted with researchers, this system ensures that RPOs know where and when to retrieve dataset metadata. 4. The FOSB application profile for research datasets contains more mandatory metadata fields compared to commonly used standards such as DataCite and Dublin Core, creating extra work due to the need for manual enrichments. Changing the status of some of the metadata fields of the FOSB application profile for research datasets from ‘Mandatory’ to ‘Recommended’ or ‘Optional’ would significantly reduce the workload of manually enriching fields with missing information. As a consequence, more dataset metadata records can also be submitted to FRIS. Here, it remains important to stimulate RPOs to provide as much metadata as possible, for instance by flagging records that do not contain all necessary metadata fields to be included in the Open Science KPI calculation. 1 https://researchportal.be/nl 1 Based on the interviews, it was found that RPOs encounter many challenges when trying to retrieve and capture dataset metadata (often manual entry by researchers or research staff). RPOs generally have a very limited overview of their own research data outputs, so a necessary first step is to find their affiliated datasets. The scope of Key Deliverable 2 was to develop a method to find and ingest dataset metadata associated with specific knowledge institutions, that can be implemented by all RPOs. To define this method, business needs were collected and prioritised, limitations of available technology were identified and a proof of concept (POC) was implemented. In this POC two new methods were explored and compared to the current flow. A first candidate is ‘The Modular Approach’ in which various sources (aggregators, repositories,...) are iteratively queried by the RPOs to find the metadata related to their organisation. The second candidate, ‘The Registration Approach’, is already being implemented by several RPOs (e.g., UGent, KU Leuven), and involves researchers entering DOIs (or other PIDs) into a system which then automatically harvests the relevant metadata from the appropriate repository. The main output of Key Deliverable 2 is a description of a two-pronged approach for registering more datasets within the Flemish ecosystem: a combination of a modular and a registration approach. The modular approach is based on an analysis of the coverage of existing dataset aggregator services and an examination of which sources could be consulted to find the greatest number of datasets affiliated with Flemish institutions. The analyses showed that the best method to find the highest number of datasets was to use the APIs of various data repositories to search directly for affiliated datasets. The proof-of-concept demonstrates that the modular approach was able to find over 800 datasets published in 2022, significantly more than the 320 currently registered in FRIS. We describe the methods used to search several of the main data repositories used by Flemish researchers, and we also describe the strategy for how the modular approach could be maintained and extended. We augment this with a discussion of the registration approach, whereby researchers could quickly and easily register their published datasets with their institutional CRIS and FRIS, using many of the same structural components as the modular approach. The registration method has the benefit of immediate manual enrichment by the researcher and precision, while the harvesting approach only pulls in whatever metadata is available via the API's of the aggregators and repositories and will likely always include some false positives. The two methods together are complementary in nature, i.e. the registration method can fill the gaps in coverage and precision of the harvesting method. Based on the POC, this project group advises to establish a formal group or community within the FRDN existing of research data staff (RDM support staff and technical staff) to develop and maintain a platform to exchange knowledge regarding metadata harvesting procedures (codes, queries, best practices etc.) based on the POC that resulted from this project group. This will benefit both dataset metadata registration and harvesting. Starting from the scripts for the most common repositories and aggregators, new scripts for other sources can be developed, shared and used by all RPOs. The RPOs can develop their own way of importing the output from the scripts into their CRIS-systems. This collaborative approach will ease the burden for all RPOs. However, in order for this approach to work, there is a need for long-term investments of time and (human) resources.
Keywords:	Dataset Metadata;Metadata enrichment;Metadata harvesting
Document URI:	http://hdl.handle.net/1942/44961
DOI:	10.5281/zenodo.10634663
Category:	R2
Type:	Research Report
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
Enriching Metadata in the Flemish Research Landscape.pdf	Published version	943.43 kB	Adobe PDF	View/Open

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM