Our metadata model
It is important to distinguish between metadata and data in data.europa.eu. Most information you discover on the portal constitutes metadata, i.e. information about data (title, description, publisher, etc). The metadata then links to the actual data, in most cases a downloadable file. The entirety of metadata and data is often called a dataset. The metadata is stored in the databases of data.europa.eu, while the data usually remains with the original data publisher. However, data.europa.eu is capable of storing both, metadata and data. With the data provider interface (DPI), both metadata and actual data can be inserted: metadata by filling in the relevant fields the actual data by uploading the data files (up to 10 Gigabytes each).
The (meta)data model used in data.europa.eu is DCAT-AP, the application profile for data portals in Europe. The specification of the DCAT-AP was a joint initiative of the Directorate-General for Communications Networks, Content and Technology, the Publications Office of the European Union and the interoperable Europe programme. The specification was elaborated by a multi-disciplinary working group with representatives from 16 EU Member States, some European institutions, and the United States.
DCAT defines a dataset as ‘a collection of data, published or curated by a single agent, and available for access or download in one or more representations’ (by data it is implicitly understood metadata and the actual data).
The portal collects the metadata from the data providers even when it contains deviations from the common data model. With the scope of improving the quality of the metadata and data, the portal constantly assesses the quality of the metadata errors and communicates them back to the owners of the datasets. Metadata quality evaluation is visible for each dataset and for entire collections as a dashboard. More information is provided in the Metadata Quality section.
The portal collects all datasets from the portals it harvests. The actual data is collected and made available in the file format provided by the data provider and shown under the distribution(s) of the dataset.
The DCAT-AP version used
The current version of DCAT-AP in the portal is version 2.1.1. These are the improvements of this version:
- improved Unified Modelling Language (UML) diagram in accordance with the agreed profile reading;
- improved coherency between the UML diagram and the specification text;
- a usage guide on the relationships between dataset, distribution and data service; and the consequences of this clarification on the model;
- various editorial fixes;
- consolidation of the SHACL shapes;
- minor specification updates:
- introduction of the named authority list (NAL) planned-availability, NAL access-right and NAL dataset-type,
- lift of the max-cardinality for dataset dct:type,
- lift of the max-cardinality for property dct:creator,
- allow other than SHA1 checksum algorithms,
- the range for temporal properties is enlarged to contain any temporal XSD (XML schema definitions) datatype,
- alignment of usage notes for used property adms:status with W3C DCAT,
- addition of max-cardinality 1 for dcat:temporalResolution and dcat:spatialResolutionInMeters to align with the usage note.
A complete list of the issues and their resolutions can be found on the DCAT-AP GitHub. The issue tracker for DCAT-AP can be consulted in the GitHub repository.
How to export the metadata of the dataset
The metadata of a dataset can be exported via the details page of the requested dataset. The Linked Data tab allows the user to download the metadata in various representations. In particular, RDF/XML, Turtle, Notation3, N-Triples and JSON-LD are exportable. Alternatively, the user can extend the dataset ID in the URL with the intended file extension. In particular and in same order, .rdf, .ttl, .n3, .nt and jsonld.
The metadata of the distribution of a dataset is located under the same hood. The user can export the metadata of a dataset to extract the metadata of a corresponding distribution. Analogously, the user can export the metadata of a catalogue under the catalogue details page. Go to the datasets page and select catalogues. From there select the requested catalogue and export metadata via Catalogues Metadata as Linked Data dropdown.
The available formats can be consulted with this query. This is a non-exhaustive list of the formats offered (as this is updated on a daily basis):
- Tabular/Text data:
- CSV
- Excel
- HTML (Hyper Text Markup Language)
- TSV
- Application/Script data:
- ATOM
- JSON
- JSON-LD
- OCTET STREAM
- RDF-N3
- RDF-Turtle
- RDF-XML
- RSS
- XML (eXtensible Markup Language)
- Geospatial data:
- GML
- KML (Keyhole Markup Language)
- SHP
- WFS
- WMS
- Images/Graphics:
- JPEG
- GIF (Graphic Interchange Format)
- PNG
- SVG (scalable vector graphics)