Data quality

Our team is aware that the demand for high-quality data is still growing, with a focus on data that is publicly available and can be easily reused for different purposes. Poor quality of data is a major barrier to data reuse. Some data cannot be interpreted due to ill-defined, inaccurate elements such as missing values, mismatches, missing data types, lack of documentation about the structure or format availability (HTML, GIF or PDF). Users find poor-quality data harder to understand and may use it less often. The data provider may even appear less reliable as a result.

For these reasons, our team is involved in different initiatives regarding data quality. One of them was the publication of Data Quality Guidelines. This publication contains a set of recommendations for delivering high-quality data. They are addressed to data providers to support them in preparing their data, developing their data strategies and ensuring data quality.

The document is composed of the following four parts.

  1. Recommendations for providing high-quality data. The recommendations cover general aspects of quality issues regarding the findability, accessibility, interoperability and reusability of data (including specific recommendations for common file formats like CSV, JSON, RDF and XML).

  2. Recommendations for data standardisation (with EU controlled vocabularies) and data enrichment.

  3. Recommendations for documenting data.

  4. Recommendations for improving the 'openness level'.

Useful links Data Quality Guidelines

Training on data and metadata quality on data.europa academy

In the following subsections you will find tips and quick-reference material for providing high-quality data, standardisation and data enrichment, documenting data and improving the 'openness level'.