Metadata quality
Metadata good practices
This section explains concrete examples and good practices to create expressive and high-quality metadata. Metadata guides users to relevant datasets while also providing important context. Without high-quality metadata, even the most comprehensive dataset will be difficult to find and to be identified as relevant. Ensuring that the metadata follows quality standards is essential.
Data and metadata quality training on the data.europa academy
Follow DCAT-AP
In general, we strongly recommend that you make yourself familiar with the DCAT-AP specification and follow it as close as possible. As a minimal requirement, you need to set all mandatory properties. Additionally, we strongly recommend providing all recommended properties as well. The correct application ensures that datasets are described in a standardized way, promoting consistency and interoperability across Open Data repositories as well as for the data in data.europa.eu.
The latest version and documentation of DCAT-AP is always accessible here: https://joinup.ec.europa.eu/collection/semic-support-centre/solution/dcat-application-profile-data-portals-europe
A minimal DCAT-AP example in the Turtle format with all mandatory properties:
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
<https://opendatarepo.org/mydataset>
a dcat:Dataset ;
dct:description "This is an example Dataset"@en ;
dct:title "Example Dataset"@en ;
dcat:distribution <https://opendatarepo.org/mydistribution> .
<https://opendatarepo.org/mydistribution>
a dcat:Distribution ;
dcat:accessURL <http://myaccessurl.org> .
Only use controlled vocabularies
DCAT-AP specifies a set of controlled vocabularies that must be used for certain properties, e.g. for media types, data formats, languages, and themes. This ensures interoperability, the correct rendering in the user interface, and guarantees that datasets are uniformly comprehensible. In addition, it supports the creation of precise queries for datasets. Please refer to the DCAT-AP documentation for details. The Publications Office maintains an authority table of EU controlled vocabularies.
The following statement would link the dataset to the data theme ‘Agriculture, fisheries, forestry and food’.
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
<https://opendatarepo.org/mydataset>
a dcat:Dataset ;
dcat:theme <http://publications.europa.eu/resource/authority/data-theme/AGRI> .
Apply the correct Data Ranges
For its properties, DCAT-AP indicates concrete data ranges that must be used. This ensures that any provided values are consistent with the expected format. Using the correct data ranges or RDF data types is essential for data integrity, validation, and UI representation. Furthermore, it supports more accurate querying, processing, and interpretation of datasets. Examples for such ranges are rdfs:Literal, xsd:dateTime, and foaf:Agent. Most of these ranges are basic RDF types, with many public resources explaining how these should be applied correctly. Some ranges are further elaborated in the DCAT-AP standard, such as foaf:Agent, where guidelines for sub-properties are provided.
As an example, we add a publisher and an issue date to a dataset:
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<https://opendatarepo.org/mydataset>
a dcat:Dataset ;
dct:issued "2015-08-28T00:00:00"^^xsd:dateTime ;
dct:publisher [
a foaf:Person ;
foaf:name "John Doe" ;
] .
Make your Dataset Identifiable
It is paramount to make your dataset uniquely identifiable within your data catalogue and if possible, even with a globally unique identifier. Assigning distinct identifiers ensures datasets remain distinguishable and facilitates efficient harvesting, updating, referencing, and preventing potential overlaps or duplications. For the identification within a data catalogue, the property dct:identifier must be used and assigned a literal that is unique within your catalogue. For global persistent identifiers, the property adms:identifier must be used allowing the specific type of the identifier (e.g. a DOI). More information about the concepts of persistent identifiers is available.
Here is an example for a dct:identifier and adms:identifier:
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix adms: <http://www.w3.org/ns/adms#> .
<https://opendatarepo.org/mydataset>
a dcat:Dataset ;
dct:identifier "b17be550-40b2-11ee-be56-0242ac120002" ;
adms:identifier [
a adms:Identifier ;
skos:notation "http://www.doi.org/123456789"^^<http://purl.org/spar/datacite/doi> ;
] .
General Remarks
Besides the concrete example, it is good practice to follow some general guidelines for high-quality metadata:
- Within your organisation, push for a uniform application of individual metadata standards across all your datasets. Such consistency is the basis for a systematic structure and presentation of metadata, that promoting familiarity and trust. Organizations that maintain such uniformity emphasize their commitment to quality and reliability.
- Comprehensive descriptions are paramount for open data. Therefore, make use of the property dct:description, since it contains the context of the dataset, methods employed, and content. By offering a thorough overview, data providers ensure users approach the data with an informed perspective, enhancing the accuracy of its reuse.
- Metadata, like all information repositories, evolve over time. Therefore, it is important to ensure that metadata accurately reflects any changes to the datasets it represents. Periodic reviews and consequent updates ensure that metadata remains relevant and accurate.
- The world of data standards is not static. Stay informed about any updates or changes to the DCAT-AP standard to ensure that your metadata remains compliant and to enable you to leverage new features.
Metadata quality dashboard
The Metadata Quality Assessment dashboard (MQA) is a tool developed by data.europa.eu to check the quality of metadata of published datasets on the portal and provide recommendations to data providers and data portals for improvements. The results are presented via the MQA and are also available to download. The functionality of the MQA and the methodology it uses are detailed in the data.europa.eu MQA service Metadata quality methodology.
Shacl validation
data.europa.eu offers a stand-alone user-friendly DCAT-AP SHACL validator and a RESTful API that can also be used. The API is described via OpenAPI under the mentioned URL. If you don’t know how to call a RESTful API, you can use a tool like Postman and follow the following step-by-step instructions.
-
Create a new ‘request’.
-
Give the request a name and put it in any collection that is convenient for you.
-
Choose to use HTTP POST and write https://data.europa.eu/api/mqa/shacl/validation/report/ as the request URL.
-
Specify the ‘Content-Type’ variables in the HTTP headers according to the type of DCAT-AP representation you will use, typically ‘application/rdf+xml’.
-
Copy and paste your DCAT-AP in the ‘body’ you are going to send, ensuring to specify that you are using ‘raw’ input.
-
Click on the ‘Send’ button.
-
Examine the results in the bottom pane. If your DCAT-AP is valid, you will get an empty report in JSON-LD that looks like this.
Otherwise, if there are mistakes, you will see a list of them. Every error has an entry in the JSON-LD file that looks like this.
Finally, amend the DCAT-AP to address the errors and go back to step 5 until all are solved.