Skip to content

Reporting guidelines for HVDs

The chart below illustrates the European Commission’s proposed common approach to reporting high-value datasets (HVDs) using the data catalogue vocabulary application profile for high-value datasets (DCAT-AP HVD) and data.europa.eu.

HVDreportingprocessimage.jpg

The process consists of three main stages:

  1. Collection at the national level. During this stage, Member States coordinate internally to identify HVDs. Subsequently, they make the HVDs metadata available in their national open data portals or geoportals, tagging it with relevant properties such as applicable legislation and HVD categories. It is assumed that each Member State performs internal coordination for HVDs. In this document we refer to this entity as the 'Member State HVD contact'.

    Challenges and risks

    There is no Member State HVD contact, or national coordination is lacking. The absence of a Member State HVD contact or national coordination has a substantial a impact on the correct application of Commission Implementing Regulation (EU) 2023/138 on HVDs (HVD IR) in the Member State. This risk is beyond the scope of this text.

    Other national catalogue(s) than the ones intended for HVD reporting also use the DCAT-AP HVD. In such a case the data.europa.eu may publish a view of HVDs that is different from the one that a Member State wishes to report to the Commission. To detect these anomalies the collection stage will include an assessment of the catalogues other than those identified by a Member State HVD contact.

  2. Harvesting by data.europa.eu. Based on endpoints communicated by Member States through an EU survey (dating from September 2024, with an update by the end of 2024), datasets annotated as HVDs in national open data portals and geoportals will be harvested by data.europa.eu and their metadata will be automatically updated to include HVD properties under the established automatic harvesting processes. This harvesting occurs daily and is based upon a common metadata standard, DCAT-AP for HVDs. Data provider Manual data.europa.eu will maintain communication with national HVD coordinators to ensure the full alignment of catalogues with respect to HVDs. data.europa.eu harvests many catalogues from individual Member State. Each of these harvested catalogues is only available in the RDF store of data.europa.eu. The Member State HVD contact needs to inform the Commission which of the catalogues from a given Membe State are to be used for HVD reporting by that Member State.

  3. Reporting by Member States. If the initial steps have been followed, data.europa.eu will have a comprehensive overview of HVDs across Member States (see 'general queries' below). In addition, the European Commission has made available specific queries on data.europa.eu to support the reporting by Member States. These reporting queries (see 'Reporting queries' sections numbered 1 to 7 below) include all metadata fields required by the HVD IR and the reporting requirements. They also allow Member States' implementation of and conformity with the regulation to be assessed.

Member States can report their implementation status to the Commission by using these queries or submitting the exports of relevant metadata.

General queries

Because data.europa.eu harvests actively the Member State endpoints it shows the most recent state of affairs. The following query allows a snapshot of the data to be downloaded from the data.europa.eu SPARQL endpoint.

Query 1 – High-value datasets catalogues per country

The construction query below creates a snapshot of a Member State HVD catalogue. To execute the query, the user must replace the parameter <?MScat?> with the Member State HVD catalogue uniform resource identifier (URI) in data.europa.eu. As the amount of data returned may be over the number of results allowed by the SPARQL endpoint, pagination must be applied to download the whole snapshot. Pagination is done by the query elements; - Limit: The size of a pagination setting. Max 50000, but to avoid possible side-effects from the system configuration, it is recommended to use a reduced amount e.g. 10 000. - Offset. The starting point of the page.

Users must incrementally increase the offset value until the result is empty. The concatenation of all downloaded files is the snapshot.

construct {?s ?p ?o.
           ?dist ?distp ?disto.
           ?distapi ?distapip ?distapio.
           ?API ?APIp ?APIo.
}  where {
<?MScat?> ?cp ?s.
?s <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
{ ?s ?p ?o. }
union {
?s <http://www.w3.org/ns/dcat#distribution> ?dist.
?dist ?distp ?disto.
?dist <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
union {
?s <http://www.w3.org/ns/dcat#distribution> ?dist.
?dist <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?dist <http://www.w3.org/ns/dcat#accessService> ?distapi.
?distapi ?distapip ?distapio.
?distapi <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
union {
?API <http://www.w3.org/ns/dcat#servesDataset> ?s.
?API ?APIp ?APIo.
?API <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}}

Query 2 – High-value datasets catalogues’ URIs per country.

In order to lookup the Member State HVD catalogue URI to fill in the parameter <?MSCat?>, the following query can be applied. This query results in all catalogues having a resource that is indicated to be published according to the HVD IR.

select distinct ?c  where { 

?s <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?c a <http://www.w3.org/ns/dcat#Catalog>.
?c ?p ?s.
} group by ?c

Reporting queries

Article 5 of the HVD IR lays down the following requirements for the report:
(a) a list of specific datasets at Member State level (and, where relevant, subnational level) corresponding to the description of each high-value dataset in the Annex to this Regulation and with online reference to metadata that follow existing standards, such as a single register or open data catalogue;
(b) persistent link to the licensing conditions applicable to the re-use of high-value datasets listed in the Annex to this Regulation, per dataset referred to in point a);
(c) persistent link to the APIs ensuring access to the high-value datasets listed in the Annex to this Regulation, per dataset referred to in point a);
(d) where available, guidance documents issued by the Member State on publishing and reusing their high-value datasets;
(e) where available, the existence of data protection impact assessments carried out in accordance with Article 35 of Regulation (EU) 2016/679;
(f) the number of public sector bodies exempted in accordance with Article 14(5) of Directive (EU) 2019/1024.

Only the first three points, (a) to (c), can be supported via the reporting process described here. The assessment and delivery of the evidence expressed under points (d) to (f) are beyond its scope.

The Commission has developed several queries to help retrieve the HVDs, as shown below. Additionally, you can utilise our High-Value Datasets Reporting Tool for comprehensive information and analysis, designed to simplify the reporting process.

Query 3 – High-value datasets per catalogue

This query returns all the HVDs harvested from a given Member State. This is done by replacing the parameter <?MSCat?> with the URI of the Member State catalogue in the data.europa.eu.

The harvesting by the data.europa.eu performs, for its own purposes, a harmonisation step in which the source identifiers of datasets are replaced with data.europa.eu specific identifiers. The original identifiers provided by the harvested catalogues are maintained in the catalogue records of the data.europa.eu (as a result of the harvesting process). The following query retrieves the original identifiers for each HVD dataset so the Member State can perform an internal cross-check.

prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>

select distinct ?s  where { 
<?MSCat?>  ?cp ?s. 
?s  r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?s  a dcat:Dataset.
} 

Query 4 – High-value datasets with key metadata

For any HVD, this query provides the title, description and HVD category. This is the mandatory DCAT-AP HVD key metadata. NB: - The query returns only the English texts, which may be the result of a machine translation service embedded in the data.europa.eu harvesting -

Query 4.1 - High value datasets with source data portal links

prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>
prefix foaf: <http://xmlns.com/foaf/0.1/> 

select distinct ?s ?originalId where { 
<?MSCat?>  ?cp ?s. 
?s <http://data.europa.eu/r5r/applicableLegislation> <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?s a <http://www.w3.org/ns/dcat#Dataset>.

?record foaf:primaryTopic ?s.
 ?record a dcat:CatalogRecord.
 ?record dct:identifier ?originalId.
}
Query 4.2 - High value datasets key metadata
prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>

select distinct ?d ?title ?desc ?Category where { 
<?MSCat?> ?cp ?d. 
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
optional { ?d dct:title ?title.
     FILTER ( langMatches( lang(?title),  "en" ))
 } 
optional { ?d dct:description ?desc.
       FILTER ( langMatches( lang(?desc), "en" ))
 } 
optional { ?d r5r:hvdCategory ?Category. } 
}

Query 5 – High-value datasets distributions

HVDs are usually subject to the obligation to be provided as a bulk download. This assessment query will allow these aspects to be detected. NB: - There could be multiple distributions for one dataset. This multiplicity is the reason that this is a separate query and that it cannot be part of a simple table with datasets. - There could be distributions for a HVD that are not considered to be reported in accordanc with the HVD IR. It may be assumed that the collection phase has removed them; however, the identification condition is included to guarantee that incorrect values are not returned.

prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>

select distinct ?d ?dist ?title ?accessURL where { 
<?MSCat?> ?cp ?d. 
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
optional { ?dist dct:title ?title.
     FILTER ( langMatches( lang(?title),  "en" ))
 } 
optional { ?dist dcat:accessURL ?accessURL. } 

}

Query 6 – High-value datasets APIs

Application programming interfaces (APIs) are one of the main obligations imposed on HVDs by the HVD IR. APIs are denoted in DCAT-AP HVD with data services. DCAT-AP data services can be associated in two distinct ways with a dataset. This query explores both.

prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>

select distinct ?d ?api where { 
<?MSCat?> ?cp ?d. 
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.

{
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.

?dist dcat:accessService ?api.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
} 
union {
?api dcat:servesDataset ?d.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}
} 

Query 7 – High-value datasets APIs with key information

APIs must be provided with sufficient information.

prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>

select distinct ?d ?api ?title ?desc ?category ?endpointURL ?endpointDesc where { 
<?MSCat?> ?cp ?d. 
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
{
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.

?dist dcat:accessService ?api.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
} 
union {
?api dcat:servesDataset ?d.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}

optional { ?api dct:title ?title.
     FILTER ( langMatches( lang(?title),  "en" ))
 } 
optional { ?api dct:description ?desc.
     FILTER ( langMatches( lang(?desc),  "en" ))
 } 
optional { ?api r5r:hvdCategory ?category. } 
optional { ?api dcat:endpointDescription ?endpointDesc. }
optional { ?api dcat:endpointURL ?endpointURL. }
} 

HVDs must be made available under a permissive licence, such as Creative Commons (CC) BY 4.0. In DCAT-AP the legal information is associated with the ‘distributions’ and data services associated with the datasets. Legal information is an important aspect of the HVD IR, therefore, a specific reporting query is provided. NB: - Legal information in DCAT-AP is a combination of three properties: the access rights, the licences and the rights. 'Access rights’ provides a condensed view on the limitations that restrict access to data, while ‘licences’ and ‘rights’ are the legal conditions on the use or reuse of that data.

prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>

select distinct ?d ?api ?title ?lic ?rights where { 
<?MSCat?> ?cp ?d. 
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
{
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.

?dist dcat:accessService ?api.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
} 
union {
?api dcat:servesDataset ?d.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}

optional { ?api dct:title ?title.
     FILTER ( langMatches( lang(?title),  "en" ))
 } 
OPTIONAL { ?api dct:license ?lic. } 
OPTIONAL { ?api dct:rights  ?rights. }


} 

Query 9 – High-value datasets licenses

The licences that are provided must, in accordance with the HVD IR, satisfy a number of quality requirements: - a licence must be provided in human and machine-readable format. - a licence must be provided with a persistent URI. - a licence must be at least as permissive as CC-BY 4.0.

Query 9.1 - High-value datasets API licenses

prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>

select distinct ?api ?lic ?skos ?mapped where { 
<?MSCat?> ?cp ?d. 
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
?api a dcat:DataService.
{
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.

?dist dcat:accessService ?api.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
} 
union {
?api dcat:servesDataset ?d.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}

?api dct:license ?lic.  
Optional {
  ?lic ?skos ?mapped.
  FILTER ( ?skos IN ( <http://www.w3.org/2004/02/skos/core#exactMatch>,
                   <http://www.w3.org/2004/02/skos/core#narrowMatch>,

                    <http://www.w3.org/2004/02/skos/core#broadMatch>
) )
}

}

This reporting query will help assess whether the licences provided are in line with the third requirement. In DCAT-AP HVD, it is recommended that the Member State HVD contact provides a mapping from all reported licences to the EU Vocabularies name authority list licences, and this query takes that knowledge into account. This recommendation allows for a quick assessment of a licence's permissiveness compared to CC-BY 4.0. If no licence is provided, the rights provided will be investigated for those quality requirements. Since rights usually express a single aspect of reuse, this investigation is more complicated. In particular, there is no consolidated controlled vocabulary of rights available to which the specific rights provided in a Member State can be matched. For this reason, no specific query for rights has been provided in this version of the document. Query 8 above will check the presence of legal information (i.e. licence and/or rights).

Query 9.2 - High-value datasets Distribution licenses (bulk downloads)

prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>


select distinct ?dist ?lic ?skos ?mapped where { 
<?MSCat?> ?cp ?d. 
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?dist dct:license ?lic.  
Optional {
  ?lic ?skos ?mapped.
  FILTER ( ?skos IN ( <http://www.w3.org/2004/02/skos/core#exactMatch>,
                    <http://www.w3.org/2004/02/skos/core#narrowMatch>,

                    <http://www.w3.org/2004/02/skos/core#broadMatch>
) )
}


}
Query 9.3 - The licences used by catalogue
prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>

select distinct ?lic ?skos ?mapped where { 
<?MSCat?> ?cp ?d. 
?d r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
?d a dcat:Dataset.
?api a dcat:DataService.
{
?d dcat:distribution ?dist.
?dist r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.

?dist dcat:accessService ?api.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
} 
union {
?api dcat:servesDataset ?d.
?api r5r:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj>.
}

?api dct:license ?lic.  
Optional {
  ?lic ?skos ?mapped.
  FILTER ( ?skos IN ( <http://www.w3.org/2004/02/skos/core#exactMatch>,
                   <http://www.w3.org/2004/02/skos/core#narrowMatch>,

                    <http://www.w3.org/2004/02/skos/core#broadMatch>
) )
}

}

Reporting completeness validation

It cannot be excluded that the HVD metadata collected is only partially complete. This can be for several reasons: - the Member State HVD contact did not ensure that all metadata is available in data.europa.eu; - the Member State has not fulfilled its obligations in accordance with the HVD IR, and therefore information is missing; - the HVD IR has distinct requirements for each kind of HVD that might result in the collected metadata appearing to be incomplete, while still being in compliance with the HVD IR.

To detect these cases, the collected HVD metadata must be assessed for completeness.

As it is difficult to detect metadata incompleteness via SPARQL queries, an alternative method is used: a SHACL validation. The following SHACL template is an extract of the SHACL, which will find datasets that have no data service associated with them. Given the fact that, in general, HVDs should have a data service associated with them, the absence of a data service could indicate an issue. This shape and others can be validated on your downloaded catalogue, that has been retrieved using the first step. One can use for this the SHACL validator at https://www.itb.ec.europa.eu/shacl/dcat-ap/upload selecting the option validate as "DCAT-AP HVD 3.0.0 Usage Notes".

prefix dct: <http://purl.org/dc/terms/> 
prefix r5r: <http://data.europa.eu/r5r/>
prefix dcat:  <http://www.w3.org/ns/dcat#>
prefix sh: <http://www.w3.org/ns/shacl#>


_:Dataset_Shape
    a sh:NodeShape ;
    sh:property [
       sh:path r5r:applicableLegislation;
       sh:nodeKind sh:IRI;
       sh:severity sh:Violation;
       sh:minCount 1
      ], [
        sh:path  [ sh:alternativePath ( [ sh:inversePath dcat:servesDataset ]  ( dcat:distribution dcat:accessService ) )];
        sh:nodeKind sh:IRI ; 
        sh:minCount 1 ;
        sh:severity sh:Violation
        ] ;
    sh:targetClass dcat:Dataset .
This shape and others can be validated on your downloaded catalogue, that has been retrieved using the first step. One can use for this the SHACL validator at https://www.itb.ec.europa.eu/shacl/dcat-ap/upload selecting the option validate as "DCAT-AP HVD 3.0.0 Usage Notes".

Abbreviations

API: application programming interface

DCAT-AP HVD: the data catalogue vocabulary application profile for high-value datasets

HVD: high-value dataset

HVD IR: Commission Implementing Regulation (EU) 2023/138 of 21 December 2022 laying down a list of specific high-value datasets and the arrangements for their publication and re-use

SHACL: shapes constraint language

SKOS: simple knowledge organisation system

SPARQL: SPARQL protocol and RDF query language

References

data.europa.eu SPARQL endpoint

DCAT-AP HVD

HVD IR

SHACL

SHACL validator Testbed

SKOS

SPARQL