API documentation
General introduction to APIs
In digital form, a machine can exchange data. To provide a seamless interaction for data exchange, APIs are defined by developers of such a machine. Those APIs can guide humans and machines on how to interact and understand the exchanged data. For each API call, the requesting service is defined as the client and the service providing data exchange is defined as the server. In case of a human, the browser or a dedicated application for API calls would represent the humans as the client. Beyond those basic principles, it is important to understand the concepts of HTTP, URL, data representations formats and predefined communication methods.
HTTP
The ‘Hypertext Transfer Protocol’ (HTTP) allows to exchange hypertext between machines. Unlike standard text, hypertext can be non-linear and can contain links to other text. Nevertheless, hypertext can easily be read and understood by humans. HTTP is not only indispensable for browsing the Web but has also established as a standard for modern APIs. The payload via HTTP can be information of any kind, e.g. executable code or data. Beyond payload, HTTP contains meta information in their headers, e.g. information about the communication session. To provide a secured communication HTTP is combined ‘Transport Layer Security’ (TLS). Together they form HTTPS, a secure version of HTTP. HTTPS is crucial to provide a private conversion against external entities.
URL
A ‘Uniform Resource Locator’ (URL) is form of an address on the Web. In the context of APIs, a URL can address entry points, functions or resources provided by a server. In general, URLs are used to varying degrees depending on the API type. Some APIs use URLs only to address an entry point, others use path and query parameters to provide information to the server.
Data representation format
In addition to the communication protocol of the API and where the API is located, an API can define the representation of the data payload. For this, data may be represented as plain text, structured, and encoded data. In practice, data representation formats are used for structured data that are machine-readable to a certain degree, e.g. XML, JSON or YAML.
Predefined methods
When it comes to the management of resource, APIs typically provide create, read, update and delete operations (CRUD). For this, the HTTP protocol has predefined methods that can be used to define an API.
- POST, to create a resource
- GET, to read a resource
- PUT, to create or update a resource
- PATCH, to update a resource
- DELETE, to delete a resource
What is OpenAPI and why is it relevant?
All APIs on data.europa.eu are documented via OpenAPI. OpenAPI provides a human and machine-readable language agnostic interface description that enables everyone to use the APIs without having to investigate the source code. By using the specified syntax of OpenAPI, tools can be used to automatically create accessible and human readable documentation of our APIs. The screenshot below showcases a human readable documentation of the MQA API.
It explains how to retrieve global quality measurements from the MQA. It gives the user a short description of the context of the provided features and concrete explanations of all required and optional parameters that can be used when using the API. In the case at hand the parameter ‘filter’ can be used, and it is explained that the value of the parameter must be one of the following: ‘findability’, ‘accessibility’, ‘interoperability’, ‘reusability’, ‘contextuality’, ‘score’
The right-hand side gives a concrete URL to request for testing the API. Please note that the example URL is not using any parameters. These must be added manually. Below the example URL an example response is given.
data.europa.eu APIs
data.europa.eu provides the following APIs to read our metadata:
- Search: https://data.europa.eu/api/hub/search/
- SPARQL: https://data.europa.eu/sparql
- Registry: https://data.europa.eu/api/hub/repo/
- Use cases: https://data.europa.eu/en/export-use-cases
- MQA: https://data.europa.eu/api/mqa/cache/
- SHACL metadata validation: https://data.europa.eu/api/mqa/shacl/
Manage datasets via API
You can manage the metadata in data.europa.eu via the Registry API. To illustrate the basic concepts and the access control methodology, this section guides you through the process of creating, updating and deleting a DCAT-AP dataset. You can apply the same flow to the other endpoints of the API. Please refer to the OpenAPI documentation. The API is mainly designed for programmatic use from your applications, and we recommend you use suitable third-party software libraries for the interaction:
- OpenID Connect is used for access control.
- The API follows the RESTful paradigm.
General
An overview on how to manage (create, update and delete) datasets in data.europa.eu via its API.
You will require at least write access to one catalogue that you are responsible for. Please contact the data.europa.eu team for further information.
The entire API for dataset management is documented with OpenAPI.
The following DCAT-AP dataset will be used as an example for this guide. It is serialised in Turtle. You can provide the dataset in any other common RDF format, such as RDF/XML, JSON-LD, N-Triples, Trig or N3.
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
<https://example.eu/set/data/test-dataset>
a dcat:Dataset ;
dct:title "DCAT-AP 2.1.0 Example Dataset"@en ;
dct:description "This is an example Dataset"@en ;
dcat:theme <http://publications.europa.eu/resource/authority/data-theme/TECH> ;
dcat:distribution <https://example.eu/set/distribution/1> .
<https://example.eu/set/distribution/1>
a dcat:Distribution ;
dct:format <http://publications.europa.eu/resource/authority/file-type/CSV> ;
dcat:accessURL <https://github.com/ec-jrc/COVID-19/blob/master/data-by-country/jrc-covid-19-countries-latest.csv> .
Access Control and Tokens
data.europa.eu uses Keycloak to implement OpenID Connect as backbone for access control. Hence, every interaction with a restricted API endpoint requires an interaction with Keycloak to obtain an access token (Party Token). It requires two API calls to get the token.
Prerequisites
- You require valid credentials (username and password) for the Keycloak.
- There is no self-registration. Please contact the Publications Office of the European Union data.europa.eu team for further information how to require credentials.
- You need a tool to interact with a HTTP API, such as Postman or curl lfor the command line.
Step 1 - Request User Token
First you will need to require a User Token by performing a x-www-form-urlencoded POST request to the following endpoint:
https://data.europa.eu/auth/realms/DEU/protocol/openid-connect/token
The following form values need to be set:
Key | Value |
---|---|
username | [yourusername] |
password | [yourpassword] |
grant_type | password |
client_id | piveau-hub-ui |
Example with curl:
$ curl --location --request POST "https://data.europa.eu/auth/realms/DEU/protocol/openid-connect/token" \
--header "Content-Type: application/x-www-form-urlencoded" \
--data-urlencode "grant_type=password" \
--data-urlencode "cliend_id=piveau-hub-ui" \
--data-urlencode "username=[yourusername]" \
--data-urlencode "password=[yourpassword]"
If successful, you will receive a JSON response like this:
{
"access_token": "[yourusertoken]",
"expires_in": 300,
"refresh_expires_in": 1800,
"refresh_token": "[yourrefreshtoken]",
"token_type": "Bearer",
"not-before-policy": 0,
"session_state": "694350c7-38b9-4051-bc2e-e15e34320133",
"scope": "email profile"
}
Step 2 Request Party Token
Now you will need to require a Party Token by again performing a x-www-form-urlencoded POST request to the following endpoint:
https://data.europa.eu/auth/realms/DEU/protocol/openid-connect/token
The following form values need to be set:
Key | Value |
---|---|
grant_type | urn:ietf:params:oauth:grant-type:uma-ticket |
audience | piveau-hub-repo |
In addition, place the User Token from Step 1 into the header field Authorization with the leading string Bearer.
Example with curl:
$ curl --location --request POST "https://data.europa.eu/auth/realms/DEU/protocol/openid-connect/token" \
--header "Content-Type: application/x-www-form-urlencoded" \
--header "Authorization: Bearer [yourusertoken]" \
--data-urlencode "grant_type=urn:ietf:params:oauth:grant-type:uma-ticket " \
--data-urlencode "audience=piveau-hub-repo "
If successful, you will get a JSON response like this:
{
"upgraded": false,
"access_token": "[yourpartytoken]",
"expires_in": 300,
"refresh_expires_in": 1800,
"refresh_token": "[yourrefreshtoken]",
"token_type": "Bearer",
"not-before-policy": 0
}
[yourpartytoken] can now be used to manage datasets in data.europa.eu.
Token expiry
For security reasons the tokens expire quickly. You can refresh them by just performing Step 1 and 2 again.
Create or Update a Dataset
Both creating and updating a dataset is performed using the same endpoint:
https://data.europa.eu/api/hub/repo/catalogues/[catalogueId]/datasets/origin?originalId=[dataset_id]
The [dataset_id] can be freely chosen by you and the [catalog_id] determines the catalogue to which the dataset is added. The dataset ID is scoped within the catalogue. If the combination of dataset ID and catalogue ID already exists, the dataset is updated. Otherwise, a new dataset is created.
Pre-creation check
If you intend to create a new dataset, to avoid accidentally updating an existing dataset it is highly recommend to check that the dataset ID is not yet present within the catalogue. To do so perform a GET request for the dataset id.
https://data.europa.eu/api/hub/repo/catalogues/[catalogueId]/datasets/origin?originalId=[dataset_id]
Create/Update
To submit a create or update you perform a PUT request to the endpoint by providing the Party Token and the RDF format in the header. The actual dataset is provided in the body of the request.
Example with curl:
$ curl --location --request PUT " https://data.europa.eu/api/hub/repo/catalogues/test-catalog/datasets/origin?originalId=example-dataset" \
--header "Content-Type: text/turtle" \
--header "Authorization: Bearer [yourpartytoken] \
--data-raw "@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
<https://example.eu/set/data/test-dataset>
a dcat:Dataset ;
dct:title \"DCAT-AP 2.1.0 Example Dataset\"@en ;
dct:description \"This is an example Dataset\"@en ;
dcat:theme <http://publications.europa.eu/resource/authority/data-theme/TECH> ;
dcat:distribution <https://example.eu/set/distribution/1> .
<https://example.eu/set/distribution/1>
a dcat:Distribution ;
dct:format <http://publications.europa.eu/resource/authority/file-type/CSV> ;
dcat:accessURL <https://github.com/ec-jrc/COVID-19/blob/master/data-by-country/jrc-covid-19-countries-latest.csv> ."
The following Content-Type values are valid:
Format | Value |
---|---|
RDF/XML | application/rdf+xml |
Turtle | text/turtle |
JSON-LD | application/ld+json |
N3 | text/n3 |
Trig | application/trig |
N-Triples | application/n-triples |
If the request was successful, you will receive a 201 response for a newly created dataset or a 204 for an updated dataset.
Delete a Dataset
You can delete a dataset by performing a DELETE to the same endpoint.
$ curl --location --request DELETE " https://data.europa.eu/api/hub/repo/catalogues/test-catalog/datasets/origin?originalId=example-dataset " \
--header "Authorization: Bearer [yourpartytoken]
If the request was successful, you will receive a 204 response.
Querying metadata via data.europa.eu APIs
There are several APIs for querying the metadata of data.europa.eu programmatically. Depending on your needs and requirements you can choose a suitable API:
The SPARQL API gives you a all features of the underlying RDF data structure of data.europa.eu and allows you to create complex and specific queries. It may be limited in regard to full text search capabilities.
The Registry API offers a convenient and easy way to directly retrieve the RDF representation of the metadata.
The Search API offers a high-performance full-text search with filtering capabilities over the metadata. Further examples of the Search API usage in Java, Javascript, Python, and Ruby are in the Gitlab repository.
Example of the SPARQL API usage
A simple example to retrieve the first 100 datasets and their title with the SPARQL API would look like this:
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT ?s ?title WHERE {
?s a dcat:Dataset .
?s dct:title ?title
}
LIMIT 100
This query gives you a simple table with two columns, where the first column represents the dataset URL and the second one the title.
Data.europa has an introduction and relevant links for the SPARQL usage.
Example for the Registry API usage
The Registry API gives you direct access to RDF representations, hence DCAT-AP, of the metadata.
You can get a list of all datasets by executing the following query. It will give you a list with 50 entries, containing the URIs of the datasets/
GET https://data.europa.eu/api/hub/repo/datasets?limit=50&valueType=identifiers
Based on this result you can query the entire metadata of a specific dataset, by passing one of the IDs to the following call:
GET https://data.europa.eu/api/hub/repo/datasets/91f2aec3-1aaf-42d3-8730-c567a46c0116
By default, this will return the JSON-LD representation of the dataset, You can request a different serialisation format, by append a file-type suffix. The following example would give you the same dataset as Turtle:
GET https://data.europa.eu/api/hub/repo/datasets/91f2aec3-1aaf-42d3-8730-c567a46c0116.ttl
Please refer to the OpenAPI documentation for further details how to use the API.
Example of the SearchAPI usage
The simplest possible Search API usage is to just request all metadata without any further parameters that would reduce the search results. Looking at the documentation of the search API this can be achieved by just adding the path ‘search’. Our first example API call looks like this:
GET https://data.europa.eu/api/hub/search/
It provides a limited list of all metadata that is stored in data.europa.eu in JSON This list is limited, because data.europa.eu stores too much metadata and this cannot be transmitted in one single request. If you would like to get all metadata available, multiple requests would be necessary. Here, a method called paging comes into play. With paging you can split the results in ‘pages’ and simply go through them. The examples below will showcase how to retrieve three pages of metadata via the search API:
GET https://data.europa.eu/api/hub/search/search?page=0&limit=10
GET https://data.europa.eu/api/hub/search/search?page=1&limit=10
GET https://data.europa.eu/api/hub/search/search?page=2&limit=10
Note, that these examples introduce the two parameters page and limit. Basically, the API call is asking for page 0, 1, 2 of the search results with maximum 10 results per page. The page parameter selects the requested page and the limit parameter sets the maximum number of results to be retrieved per page.
For a last example we will add a search term to the request. The following request shall return page 0 with a maximum of 100 results that match with the search term ‘water’:
GET https://data.europa.eu/api/hub/search/search?page=0&limit=100&q=water
Please consult the OpenAPI documentation for more information on how to use the APIs.