Skip to content

Provide your data manually

data.europa.eu is the official portal for European data. The portal provides access to open data from international, EU, national, regional, local and geodata portals. The data is organised in more than 170 individual data catalogues. Most data gets automatically and periodically harvested from the original data publishers. In addition, data.europa.eu supports the direct provision and storage of data via the portal and/or its API.

The following standards and specifications are closely related to the data provision process:

Key concepts

The provision of data is based on some essential key concepts, that are elaborated in the following.

Metadata vs. data

It is important to distinguish between metadata and data in data.europa.eu. Most information you discover on the portal constitutes metadata, i.e. information about data (title, description, publisher, etc). The metadata then links to the actual data, in most cases a downloadable file. The entirety of metadata and data is often called a dataset. The metadata is stored in the databases of data.europa.eu, where the data usually remains with the original data publisher. However, data.europa.eu is capable of storing both, metadata and data. With the data provider interface (DPI) you have to provide the metadata and you can also provide the actual data.

DCAT-AP data model

The data provision process builds on top of the core data model of data.europa.eu DCAT-AP. Essentially, DCAT-AP consists of three principal data classes: catalogues, datasets and distributions. Each data provider is represented by a catalogue. Each catalogue consists of datasets that constitute the general metadata of the data, and each dataset can have multiple distributions, where each distribution describes the actual data of the dataset in detail. All this data is serialised in the RDF format. Therefore, the DPI converts all user input into RDF. As a data provider you are concerned with the creation of datasets and distributions. The catalogues are managed by the administrators of data.europa.eu.

Access control model

The provision of datasets follows a straightforward and simple access control model. Let us assume a data provider organisation consists of multiple users (i.e. data providers), who want to manage datasets on behalf of the organisation. Each user is granted write access to one or more catalogue(s) that belong to the data provider organisation. This access allows the user to create, update, delete, and execute any available function on any dataset in that catalogue. All users of one data provider organisation have the same view on the datasets and their state. There are no individual user access rights. It is up to the internal processes of a data provider organisation to manage the detailed publication process and individual responsibilities. The write access to catalogues is set by administrators of data.europa.eu.

State of a dataset

Datasets can have two states: draft or public. A draft dataset is not publicly available via the frontend, API or SPARQL interface of the data section. It is only visible to permitted data providers. A public dataset is available like any other dataset on the data section. Datasets can be directly created as draft or public. It is possible to toggle the state of a dataset at any time.

Registration and login

This section describes the prerequisites and login process for providing data.

Registration

There is no self-registration to use the DPI. Please contact the OP for further information. After successful registration you will receive access credentials consisting of a username and password. The password you receive is just temporary and you will need to change it after your first login.

Login

You will find the link to the login page at very bottom and right side of the data section. The data provider features are currently only available in English, so make sure you switch to English in the language selection dropdown menu at the top of the page.

After clicking, the page redirects to the login form.

You have to enter your username and password here and click on 'Sign In'. If it is your first login process, you will be redirected to another form to change your initial password. Upon success, you are automatically redirected to the data section.

Data provider interface menu

When you are logged in, the DPI menu is rendered at the bottom of the data section. The DPI menu is the central access point to all functionalities of data provision. More details will be presented in the next section.

Logout

You can logout by simply clicking on 'Logout' in the DPI menu. You will be redirected to the data section.

Structure and functions

This section provides an overview of the structure and individual functionalities of the DPI as accessible via the DPI menu.

The menu gives you access to high-level pages for data providers and dataset-specific functions. Some functions are context sensitive, so they are only available on specific pages, such as a dataset details page. Functions that aren't available are grey in the menu.

High-level menu

Function
My Draft
Datasets
Description Gives you access to the list of draft datasets of the current user. Further functions regarding the draft datasets are available on that page.
My
Catalogues
Gives you access to the list of catalogues that are assigned to the current user. An assignment implies the right to create, edit and delete datasets in the respective catalogues.
User Profile Gives you access to the profile of the current user.
Logout Logs the current user out.

Dataset sub-menu

Function Description
Create Dataset Navigate to the form for creating a new dataset.
Delete Dataset Deletes the current dataset. *
Edit Dataset Navigates to the form for editing the current dataset. *
Set to draft Sets the current dataset as draft, so it is not publicly visible anymore. *
Register DOI Registers a DOI for the current dataset. *

* Only available on a dataset details page.

Create a dataset

You can create a dataset with a wizard-like form that guides you through the provision of the metadata and data. Just click on Dataset > Create Dataset in the DPI menu.

Structure and general remarks

The form is divided into four main steps: create dataset, define properties, create distributions and the dataset overview. The creation of distributions is divided into several sub steps. You can always switch between the steps by clicking on the step titles or using the 'Previous Step' or 'Next Step' buttons. There might be cases when a direct access to a step is not possible, for example, a mandatory field is missing.

In order to prevent accidental data loss, your input is constantly stored in the local storage of the browser.

Even after a reload of the page, your data will be there. You can clear the entire form by clicking on 'Clear'.

If you need information and help about the input fields, you can always click on the i behind each field.

Additional information is then displayed in grey box on the right side.

Special input fields

The form consists of specialised input fields, supporting the various properties of DCAT-AP.

Multi-lingual fields

Some properties can be provided in multiple languages. This is supported with the following kind of input field:

You can just add more languages by clicking on the blue button and remove them by clicking on the small minus sign.

Vocabulary fields

Many properties depend on controlled vocabularies. You can select the fitting value(s) from these vocabularies with a search-based dropdown field. Just type in some characters to find a suitable match. Below is an example for the language property.

For properties where multiple values can be selected from a vocabulary, you can easily repeat the process for each value and your selection is displayed under the input form.

Filling the form

By stepping through the wizard you are able to provide all DCAT-AP properties to describe your dataset. However, only a few properties are mandatory, such as the title and description. You will get a clear warning if a mandatory property is missing.

In the following, some important details about the form are presented. However, not every property is discussed. Please consult the DCAT-AP documentation for detailed information about every property.

Create dataset

In the first step, you provide the very basic metadata about the data. An important property is the dataset ID, which will be used in the URL to resolve the dataset after publication (http://data.europa.eu/88u/dataset/[dataset-id]). You can enter it yourself or it will be automatically generated based on the provided title. It can only contain lowercase letters, numbers and dashes. Its uniqueness is checked on-the-fly to avoid any clashes with existing datasets.

You have to select a catalogue, which the dataset will be part of. You can only select catalogues that you have access to.

By clicking on 'Next Step' you will be directed to the second step.

Define dataset properties

In the second step, the remaining properties besides the distribution information can be provided. From here it is also possible to skip the distribution steps by clicking on 'Skip Distribution Step'. This will redirect you directly to the final page of the wizard and allows you to create datasets without any distribution.

When you want to add one or more distributions click on 'Next Step'.

Create distributions

You can provide all possible distribution data in four steps. You can repeat these steps for each distribution you want to add. To navigate through the steps, use the buttons 'Previous Step' and 'Next Step' again, or click directly on the step names.

A central property is the access URL, which gives you access to the actual data of the dataset. Each distribution can have one access URL. If your data is already hosted and publicly available, you can just provide the URL by selecting the type 'Provide a URL'.

You can also upload your data directly here, by selecting the type 'Upload a file'.

If you do not provide a separate download URL, the access URL is automatically set to the access URL, after saving the dataset. On the last page of the distribution wizard you will find an overview of your created distributions. You can delete or edit them, or add another distribution by clicking on Add another Distribution.

When you click on 'Next Step' you will be redirected to the final overview of your dataset.

Dataset overview and storing

The final step provides you with an overview of your dataset. Note that the layout here is different than the final dataset detail page. You can still go back to previous steps and make changes to your data. If you want to finish the process you have two options. By clicking on 'Publish Dataset', your dataset will be published immediately and publicly visible to all users of the portal. You will be redirected to the public dataset details page. By clicking on 'Save as Draft', the dataset will be stored separately and will not be publicly available.

You will be redirected to the draft overview page. You can later edit or publish the draft dataset.

Since the access control is catalogue-based, all users that have access to the catalogue of your dataset can view and edit your draft datasets.

Managing datasets

You can edit, publish and delete datasets. Depending on whether the dataset is public or a draft, the access to the functions differ.

Managing public datasets

You can manage all datasets that are part of any catalogue you have access to. You can check and access these catalogues and their datasets by clicking on 'My Catalogues'.

If you are on a dataset details page you use the sub-menu to access the options.

'Delete Dataset' allows you to delete the current dataset. A final confirmation is required. 'Edit Dataset' will redirect you to the dataset wizard with already prefilled form fields. You can apply any changes, as if it were a new dataset. 'Set to draft' will un-publish the dataset and add it to the draft dataset pool.

Managing draft datasets

You can manage all draft datasets of your catalogues by clicking on 'My Draft Datasets'. Note that you will also see datasets here that where not created by you in person. Other users may have access to the same catalogues.

'Delete' allows you to delete the draft dataset. A final confirmation is required and this action cannot be undone. 'Edit' will redirect you to the dataset wizard with already prefilled form fields. 'Publish' will make the draft dataset publicly available. After that, it will not appear in the draft list anymore. The 'JSON-LD' button gives you access to the raw RDF representation of the dataset.

DOI registration

You can easily register a DOI for your dataset. We use the registration agency of the OP to issue DOIs with the prefix 10.2906. Therefore, your dataset will be available under 'https:// doi.org/10.2906/[id]', where [id] is a randomly assigned number.

The registration of a DOI is permanent and should only be considered for finalised datasets. You can only register one DOI for a single dataset.

Requirements

Since the dataset needs to be public, a DOI can only be registered for published datasets and not for drafts. In addition to the mandatory DCAT-AP properties, you must set the following fields in your dataset: publisher, creator and the issue date and time. Without this information, the registration process will fail.

Register a DOI

You can register a DOI for all datasets you have access to. Just browse to a dataset details page, open the dataset sub-menu and click on 'Register DOI'.

You will need to acknowledge the registration again.

After a successful registration you can reload the details page and find your DOI under additional information section.

It is possible to repeat the process, when you have updated the metadata of the dataset, such as the title. In that case, no new id is generated, but the existing one is updated accordingly.

Access Control and Tokens

data.europa uses Keycloak as backbone for access control. Hence, every interaction with a restricted API endpoint requires an interaction with Keycloak to obtain an access token (Party Token). It requires two API calls to get the token.

Prerequisites

  • You require valid credentials (username and password) for the Keycloak.
  • You need any tool to interact with a HTTP API, such as Postman or curl for the command line.

Step 1 - Request User Token

First you will need to require a User Token by performing a x-www-form-urlencoded POST request to the following endpoint:

https://data.europa.eu/auth/realms/DEU/protocol/openid-connect/token

The following form values need to be set:

Key Value
username [yourusername]
password [yourpassword]
grant_type password
client_id piveau-hub-ui

Example with curl:

$ curl --location --request POST "https://data.europa.eu/auth/realms/DEU/protocol/openid-connect/token" \
--header "Content-Type: application/x-www-form-urlencoded" \
--data-urlencode "grant_type=password" \ 
--data-urlencode "cliend_id=piveau-hub-ui" \ 
--data-urlencode "username=[yourusername]" \
--data-urlencode "password=[yourpassword]"

If successful you will get a JSON response like this:

{
    "access_token": "[yourusertoken]",
    "expires_in": 300,
    "refresh_expires_in": 1800,
    "refresh_token": "[yourrefreshtoken]",
    "token_type": "Bearer",
    "not-before-policy": 0,
    "session_state": "694350c7-38b9-4051-bc2e-e15e34320133",
    "scope": "email profile"
}
For then second step you will need [yourusertoken].

Step 2 - Request Party Token

Now you will need to require a Party Token by again performing a x-www-form-urlencoded POST request to the following endpoint:

https://data.europa.eu/auth/realms/DEU/protocol/openid-connect/token

The following form values need to be set:

Key Value
grant_type urn:ietf:params:oauth:grant-type:uma-ticket
audience piveau-hub-repo

In addition, place the User Token from Step 1 into the header field Authorization with the leading string Bearer.

Example with curl:

$ curl --location --request POST "https://data.europa.eu/auth/realms/DEU/protocol/openid-connect/token" \
--header "Content-Type: application/x-www-form-urlencoded" \
--header "Authorization: Bearer [yourusertoken]" \
--data-urlencode "grant_type=urn:ietf:params:oauth:grant-type:uma-ticket " \ 
--data-urlencode "audience=piveau-hub-repo "

If successful you will get a JSON response like this:

{
    "upgraded": false,
    "access_token": "[yourpartytoken]",
    "expires_in": 300,
    "refresh_expires_in": 1800,
    "refresh_token": "[yourrefreshtoken]",
    "token_type": "Bearer",
    "not-before-policy": 0
}
[yourpartytoken] can now be used to manage datasets in data.europa.eu. Please see Manage Datasets for details.

Remarks

  • For security reasons the tokens expire quickly. You can refresh them by just performing Step 1 and 2 again.

Manage Datasets

This guide gives an overview on how to manage (create, update and delete) datasets in data.europa.eu via its API.

General Remarks

  • You will the a token to perform the following operations. See Access Control and Tokens for details.
  • The entire API for dataset management is document with OpenAPI here.
  • You will require at least write access to one catalogue. Please contact the data.europa.eu administrators for further information.

Example Dataset

The following DCAT-AP dataset will be used as an example for this guide. It is serialised in Turtle. You can provide the dataset in any other common RDF format, such as RDF/XML, JSON-LD, N-Triples, Trig or N3.

@prefix dcat:   <http://www.w3.org/ns/dcat#> .
@prefix dct:    <http://purl.org/dc/terms/> .

<https://example.eu/set/data/test-dataset>
    a                               dcat:Dataset ;
    dct:title                       "DCAT-AP 2.1.0 Example Dataset"@en ;
    dct:description                 "This is an example Dataset"@en ;
    dcat:theme                      <http://publications.europa.eu/resource/authority/data-theme/TECH> ;
    dcat:distribution               <https://example.eu/set/distribution/1> .

<https://example.eu/set/distribution/1>
    a                               dcat:Distribution ;
    dct:format                      <http://publications.europa.eu/resource/authority/file-type/CSV> ;
    dcat:accessURL                  <https://github.com/ec-jrc/COVID-19/blob/master/data-by-country/jrc-covid-19-countries-latest.csv> .

Create or Update a Dataset

Creating or updating a dataset is performed with the same endpoint:

https://data.europa.eu/api/hub/repo/datasets/[dataset_id]?catalogue=[catalog_id]

The [dataset_id] can be freely chosen by you and the [catalog_id] determines the catalogue to which the dataset is added. The dataset ID is scoped within the catalogue. If the combination of dataset ID and catalogue ID already exists, the dataset is updated. Otherwise, a new dataset is created.

Danger

If you create a new dataset it is highly recommend to check if the dataset ID is already taken within the scope of the catalogue. Just perform a GET request to https://data.europa.eu/api/hub/repo/datasets/[dataset_id]?catalogue=[catalog_id]

Now you just perform a PUT request to the endpoint by providing the Party Token and the RDF format in the header. The actual dataset is provided in the body of the request.

Example with curl:

$ curl --location --request PUT "https://data.europa.eu/api/hub/repo/datasets/example-dataset?catalogue=test-catalog" \ 
--header "Content-Type: text/turtle" \ 
--header "Authorization: Bearer [yourpartytoken] \
--data-raw "@prefix dcat:   <http://www.w3.org/ns/dcat#> .
@prefix dct:    <http://purl.org/dc/terms/> .

<https://example.eu/set/data/test-dataset>
    a                               dcat:Dataset ;
    dct:title                       \"DCAT-AP 2.1.0 Example Dataset\"@en ;
    dct:description                 \"This is an example Dataset\"@en ;
    dcat:theme                      <http://publications.europa.eu/resource/authority/data-theme/TECH> ;
    dcat:distribution               <https://example.eu/set/distribution/1> .

<https://example.eu/set/distribution/1>
    a                               dcat:Distribution ;
    dct:format                      <http://publications.europa.eu/resource/authority/file-type/CSV> ;
    dcat:accessURL                  <https://github.com/ec-jrc/COVID-19/blob/master/data-by-country/jrc-covid-19-countries-latest.csv> ."

The following Content-Type values are valid:

Format Value
RDF/XML application/rdf+xml
Turtle text/turtle
JSON-LD application/ld+json
N3 text/n3
Trig application/trig
N-Triples application/n-triples

If the request was successful you will receive a 201 response for a newly created dataset or a 204 for an updated dataset.

Delete a Dataset

You can delete a dataset by performing a DELETE to the same endpoint.

$ curl --location --request DELETE "https://data.europa.eu/api/hub/repo/datasets/example-dataset?catalogue=test-catalog" \ 
--header "Authorization: Bearer [yourpartytoken] 

If the request was successful you will receive a 204 response.