Reporting guidelines for HVDs
This illustrates the European Commission’s proposed common approach to reporting High-value datasets (HVDs) using DCAT-AP HVD and data.europa.eu.
The process consists of three main stages:
-
Collection at national level: During this stage, Member States coordinate internally to identify high-value datasets. Subsequently, they make the HVDs metadata available in their national open data portals or geoportals, tagging them with relevant properties such as applicable legislation and HVDs categories. It is assumed that each MS performs internal coordination for HVDs. In this document we refer to this entity as the MS HVD contact.
Challenges & Risks
There is no MS HVD contact or national coordination is lacking. The absence of a MS HVD contact or national coordination has substantial impact on the correct implementation of the HVD Implementing Regulation (IR) in the MS. This risk is beyond the scope of this text.
Other national catalogue(s) than the ones intended for HVD reporting also use the DCAT-AP HVD. In such a case the DEU may show to the public a view of HVDs different from the one that a MS wishes to report to the Commission. To detect these anomalies the collection stage will perform an assessment of the catalogues that are outside the catalogues identified by a MS HVD contact.
-
Harvesting by data.europa.eu. Based on endpoints communicated by Member States through an EU Survey (dating from September 2024, with an update by the end of 2024), datasets annotated as HVDs in national open data portals and geoportals will be harvested by data.europa.eu and their metadata will be automatically updated to include HVDs properties as part of established automatic harvesting processes. This harvesting occurs daily and is based upon a common metadata standard, DCAT-AP for HVDs. Data provider Manual Data.europa.eu (DEU) will maintain communication with national HVDs coordinators to ensure full alignment of catalogues in terms of HVDs. The DEU harvests many catalogues from individual MS. Each of these harvested catalogues is uniquely available in the RDF store of the data.europa.eu. The MS HVD contact needs to inform the Commission which of the catalogues from a given MS are to be used for HVD reporting by that MS.
-
Reporting by Member States: if the initial steps have been followed, data.europa.eu will have a comprehensive overview of HVDs across Member States (see General queries below). In addition, the European Commission has made available specific queries on data.europa.eu to support the reporting by Member States. These reporting queries (see Reporting queries 1 to 7 below) include all metadata fields required by the HVD Implementing Regulation and Reporting requirements. They allow for assessment of Member States' implementation of and conformity with the regulation.
Member States can report their implementation status to the Commission by using these queries or submitting the exports of relevant metadata.
General queries
Because data.europa.eu harvests actively the MS endpoints it shows the most recent state of affairs. The following query allows to download from the DEU SPARQL endpoint a snapshot of the data.
Query 1:
The construction query below creates a snapshot of a MS HVD catalogue. To execute the query, the user must replace the parameter <?MScat?> with the MS HVD catalogue URI in the DEU. As the amount of data returned may be over the allowed number of results by the sparql endpoint, pagination must be applied to download the whole snapshot. Pagination is done by the query elements: - Limit: the size of a pagination. Max 50000, but to avoid other - Offset: the startpoint of the page.
Users must incrementally increase the offset value until the result is empty. The concatenation of all the downloaded files is the snapshot.
construct {?s ?p ?o.
?dist ?distp ?disto.
?distapi ?distapip ?distapio.
?API ?APIp ?APIo.
} where {
<?MScat?> ?cp ?s.
?s <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
{ ?s ?p ?o. }
union {
?s <http://www.w3.org/ns/dcat#distribution> ?dist.
?dist ?distp ?disto.
?dist <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
union {
?s <http://www.w3.org/ns/dcat#distribution> ?dist.
?dist <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?dist <http://www.w3.org/ns/dcat#accessService> ?distapi.
?distapi ?distapip ?distapio.
?distapi <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
union {
?API <http://www.w3.org/ns/dcat#servesDataset> ?s.
?API ?APIp ?APIo.
?API <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
} limit 10
Query 2:
In case one has to lookup the MS catalogue URI, to fill in the parameter <?MSCat?>, the following query can be applied. It results in all catalogues having a resource that is indicated to be published according to the HVD IR.
select distinct ?c where {
?s <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?c a <http://www.w3.org/ns/dcat#Catalog>.
?c ?p ?s.
} group by ?c
Reporting queries
Article 5 of the HVD IR states the following requirements for the report:
(a) a list of specific datasets at Member State level (and, where relevant, subnational level) corresponding to the description of each high-value dataset in the Annex to this Regulation and with online reference to metadata that follow existing standards, such as a single register or open data catalogue;
(b) persistent link to the licensing conditions applicable to the re-use of high-value datasets listed in the Annex to this Regulation, per dataset referred to in point a);
(c) persistent link to the APIs ensuring access to the high-value datasets listed in the Annex to this Regulation, per dataset referred to in point a);
(d) where available, guidance documents issued by the Member State on publishing and reusing their high-value datasets;
(e) where available, the existence of data protection impact assessments carried out in accordance with Article 35 of Regulation (EU) 2016/679;
(f) the number of public sector bodies exempted in accordance with Article 14(5) of Directive (EU) 2019/1024.
Only the first 3 points a) to c) can be supported via the reporting process described here. The assessment and delivery of the evidence expressed under points d) to f) are beyond the scope.
The Commission has developed several queries to help retrieve the HVDs. These are:
Reporting query 1 – reported High-value datasets
This query returns all the high-value datasets harvested from a given MS. This is done by replacing the parameter <?MSCat?> with the URI of the MS catalogue in the DEU.
The harvesting by the DEU performs for its own purposes a harmonisation step in which the source identifiers of datasets are replaced with DEU specific identifiers. The original identifiers provided by the harvested catalogues are maintained in the catalogue records of the DEU (as a result of the harvesting process). The following query retrieves the original identifiers for each HVD dataset so that MS can perform an internal cross-check.
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
select distinct ?s where {
<?MSCat?> ?cp ?s.
?s r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?s a dcat:Dataset.
}
Reporting query 2 – reported high-value datasets with key metadata
For any high-value dataset this query provides the title, description and HVD category. These are the mandatory DCAT-AP HVD key metadata. Note: - The query returns only the English texts. These can be the result of machine translation service embedded in the DEU harvesting -
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
select distinct ?s ?originalId where {
?cat ?cp ?s.
?s <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?s a <http://www.w3.org/ns/dcat#Dataset>.
?record foaf:primaryTopic ?s.
?record a dcat:CatalogRecord.
?record dct:identifier ?originalId.
}
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
select distinct ?d ?title ?desc ?Category where {
<?MSCat?> ?cp ?d.
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
optional { ?d dct:title ?title.
FILTER ( langMatches( lang(?title), "en" ))
}
optional { ?d dct:description ?desc.
FILTER ( langMatches( lang(?desc), "en" ))
}
optional { ?d r5r:hvdCategory ?Category. }
}
Reporting query 3 – reported Distributions for High-value datasets
High-value datasets are usually subject to the obligation to be provided as bulk download. This assessment query will allow to detect these aspects. Note: - There could be multiple Distributions for one Dataset. This multiplicity is the reason that this is a separate query, and that it cannot be part of a simple table with Datasets. - There could be Distributions for a High Value Dataset that are not considered to be reported in the context of the HVD IR. It may be assumed that the collection phase has removed those. But to guarantee that incorrect values are not returned, the identification condition is included.
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
select distinct ?d ?dist ?title ?accessURL where {
<?MSCat?> ?cp ?d.
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
optional { ?dist dct:title ?title.
FILTER ( langMatches( lang(?title), "en" ))
}
optional { ?dist dcat:accessURL ?accessURL. }
}
Reporting query 4 – reported APIs for high-value datasets
APIs are one of the main obligations imposed on the HVDs by the Implementing Regulation. APIs are denoted in DCAT-AP HVD with Data Services. DCAT-AP Data Services can be associated in two distinct ways with a Dataset. This query explores both.
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
select distinct ?d ?api where {
<?MSCat?> ?cp ?d.
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
{
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?dist dcat:accessService ?api.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
union {
?api dcat:servesDataset ?d.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
}
Reporting query 5 – reported APIs for high-value datasets with key information
APIs must be provided with sufficient information.
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
select distinct ?d ?api ?title ?desc ?category ?endpointURL ?endpointDesc where {
<?MSCat?> ?cp ?d.
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
{
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?dist dcat:accessService ?api.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
union {
?api dcat:servesDataset ?d.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
optional { ?api dct:title ?title.
FILTER ( langMatches( lang(?title), "en" ))
}
optional { ?api dct:description ?desc.
FILTER ( langMatches( lang(?desc), "en" ))
}
optional { ?api r5r:hvdCategory ?category. }
optional { ?api dcat:endpointDescription ?endpointDesc. }
optional { ?api dcat:endpointURL ?endpointURL. }
}
Reporting query 6 – reported legal information
High-value datasets must be made available under a permissive licence, such as Creative Commons BY 4.0. In DCAT-AP the legal information is associated with the ‘Distributions’ and Data Services associated with the Datasets. Because legal information is an important aspect of the HVD IR, a specific reporting query is provided. Note: - Legal information in DCAT-AP is a combination of three properties: the access rights, the licences and the rights. 'Access rights’ provides a condensed view on the limitations that restrict access to data. ‘Licences’ and ‘rights’ are the legal conditions on the use or reuse of the data.
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
select distinct ?d ?api ?title ?lic ?rights where {
<?MSCat?> ?cp ?d.
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
{
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?dist dcat:accessService ?api.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
union {
?api dcat:servesDataset ?d.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
optional { ?api dct:title ?title.
FILTER ( langMatches( lang(?title), "en" ))
}
OPTIONAL { ?api dct:license ?lic. }
OPTIONAL { ?api dct:rights ?rights. }
}
Reporting query 7 – reported legal information on provided licences
The licences that are provided must according to the HVD IR satisfy a number of quality requirements: - A licence must be provided in human and machine-readable format. - A licence must be provided with a persistent URI. - A licence must be at least as permissive as CC-BY 4.0.
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
select distinct ?d ?dist ?title ?lic ?rights where {
<?MSCat?> ?cp ?d.
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
optional { ?dist dct:title ?title.
FILTER ( langMatches( lang(?title), "en" ))
}
OPTIONAL { ?dist dct:license ?lic. }
OPTIONAL { ?dist dct:rights ?rights. }
}
This reporting query will assist the assessment whether the provided licences are in line with the last aspect. In DCAT-AP HVD it is recommended that for the reporting the MS HVD contact provides a mapping from all reported licences to the EU Vocabularies NAL (Name Authority List) licences. This query takes that knowledge into account. This recommendation allows for a quick assessment of permissiveness as compared to CC-BY 4.0. If no licence is provided, the provided rights will be investigated for those quality requirements. Since rights express usually a single aspect of reuse, this investigation is more complicated. In particular, there is no consolidated controlled vocabulary of rights available, to which one could match the specific rights provided in a MS. For that reason, no specific query for rights has been provided in this version of the document. The previous query 6 will check the presence (licence and/or rights) of legal information.
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
select distinct ?lic ?skos ?mapped where {
<?MSCat?> ?cp ?d.
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?dist dct:license ?lic.
Optional {
?lic ?skos ?mapped.
FILTER ( ?skos IN ( <http://www.w3.org/2004/02/skos/core#exactMatch>,
<http://www.w3.org/2004/02/skos/core#narrowMatch>,
<http://www.w3.org/2004/02/skos/core#broadMatch>
) )
}
}
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
select distinct ?lic ?skos ?mapped where {
<?MSCat?> ?cp ?d.
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
api a dcat:DataService.
{
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?dist dcat:accessService ?api.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
union {
?api dcat:servesDataset ?d.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
?api dct:license ?lic.
Optional {
?lic ?skos ?mapped.
FILTER ( ?skos IN ( <http://www.w3.org/2004/02/skos/core#exactMatch>,
<http://www.w3.org/2004/02/skos/core#narrowMatch>,
<http://www.w3.org/2004/02/skos/core#broadMatch>
) )
}
}
Reporting completeness validation
It cannot be excluded that the collected HVD metadata is only partially complete. This can have several reasons: - The MS HVD contact did not ensure that all metadata is available in the DEU. - The MS has not fulfilled its obligations according to the HVD IR and therefore information is missing. - The HVD IR has distinct requirements per kind of High Value Dataset that might result in collected metadata appearing to be incomplete, while still being in compliance with the HVD IR.
To detect these cases the collected HVD metadata must be assessed for completeness.
As it is challenging to detect metadata incompleteness via SPARQL queries, an alternative method is used: a SHACL validation. The following SHACL template is an extract of the SHACL, which will find Datasets that have no Data Service associated with them. Given the fact that in general high-value datasets should have a Data Service associated with them, the absence of a Data Service might indicate an issue.
prefix dct: <http://purl.org/dc/terms/>
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat: <http://www.w3.org/ns/dcat#>
select distinct ?lic ?skos ?mapped where {
<?MSCat?> ?cp ?d.
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
api a dcat:DataService.
{
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?dist dcat:accessService ?api.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
union {
?api dcat:servesDataset ?d.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
?api dct:license ?lic.
Optional {
?lic ?skos ?mapped.
FILTER ( ?skos IN ( <http://www.w3.org/2004/02/skos/core#exactMatch>,
<http://www.w3.org/2004/02/skos/core#narrowMatch>,
<http://www.w3.org/2004/02/skos/core#broadMatch>
) )
}
}
Acronyms
MS: EU member state
MS HVD contact: the person responsible for the HVD reporting for a MS.
DEU: data.europa.eu