Data Provider Interface
Most data in data.europa.eu gets automatically and periodically harvested from the original data publishers. In addition, data.europa.eu supports the direct provision and storage of data via the portal and/or its API. This Section provides guidance for registered data providers to manually upload datasets via the Data Provider Interface (DPI). This process makes the datasets findable on the portal.
This Section is aimed at the data providers of data originating from various EU bodies and technical staff of all involved in the maintenance and development of data.europa.eu. In addition, it targets all interested stakeholders and Open Data enthusiasts, who want to learn about the inner functionalities of data.europa.eu.
Further reading and links
The following standards and specifications are closely related to the data provision process:
Key concepts
The provision of data is based on some essential key concepts, that are elaborated in the following.
Metadata vs. data
It is important to distinguish between metadata and data in data.europa.eu. Most information you discover on the portal constitutes metadata, i.e. information about data (title, description, publisher, etc). The metadata then links to the actual data, in most cases a downloadable file. The entirety of metadata and data is often called a dataset. The metadata is stored in the databases of data.europa.eu, where the data usually remains with the original data publisher. However, data.europa.eu is capable of storing both, metadata and data. With the data provider interface (DPI) you have to provide the metadata and you can also provide the actual data.
DCAT-AP data model
The data provision process builds on top of the core data model of data.europa.eu DCAT-AP. Essentially, DCAT-AP consists of three principal data classes: catalogues, datasets and distributions. Each data provider is represented by a catalogue. Each catalogue consists of datasets that constitute the general metadata of the data, and each dataset can have multiple distributions, where each distribution describes the actual data of the dataset in detail. All this data is serialised in the RDF format. Therefore, the DPI converts all user input into RDF. As a data provider you are concerned with the creation of datasets and distributions. The catalogues are managed by the administrators of data.europa.eu.
Access control model
The provision of datasets follows a straightforward and simple access control model. Let us assume a data provider organisation consists of multiple users (i.e. data providers), who want to manage datasets on behalf of the organisation. Each user is granted write access to one or more catalogue(s) that belong to the data provider organisation. This access allows the user to create, update, delete, and execute any available function on any dataset in that catalogue. All users of one data provider organisation have the same view on the datasets and their state. There are no individual user access rights. It is up to the internal processes of a data provider organisation to manage the detailed publication process and individual responsibilities. The write access to catalogues is set by administrators of data.europa.eu.
State of a dataset
Datasets can have two states: draft or public. A draft dataset is not publicly available via the frontend, API or SPARQL interface of the data section. It is only visible to permitted data providers. A public dataset is available like any other dataset on the data section. Datasets can be directly created as draft or public. It is possible to toggle the state of a dataset at any time.
Registration and login
This section describes the prerequisites and login process for providing data.
Registration
There is no self-registration to use the DPI. Please contact the OP for further information. After successful registration you will receive access credentials consisting of a username and password. The password you receive is just temporary and you will need to change it after your first login.
Login
You will find the link to the login page at very bottom and right side of the data section. The data provider features are currently only available in English, so make sure you switch to English in the language selection dropdown menu at the top of the page.
After clicking, the page redirects to the login form.
You have to enter your username and password here and click on 'Sign In'. If it is your first login process, you will be redirected to another form to change your initial password. Upon success, you are automatically redirected to the data section.
Data provider interface menu
When you are logged in, the DPI menu is rendered at the bottom of the data section. The DPI menu is the central access point to all functionalities of data provision. More details will be presented in the next Section.
Logout
You can logout by simply clicking on 'Logout' in the DPI menu. You will be redirected to the data section.
Structure and functions
This section provides an overview of the structure and individual functionalities of the DPI as accessible via the DPI menu.
The menu gives you access to high-level pages for data providers and dataset-specific functions. Some functions are context sensitive, so they are only available on specific pages, such as a dataset details page. Functions that aren't available are grey in the menu.
High-level menu
Function | Description |
---|---|
Draft Datasets | Gives you access to the list of draft datasets of the current user. Further functions regarding the draft datasets are available on that page. |
My Catalogues |
Gives you access to the list of catalogues that are assigned to the current user. An assignment implies the right to create, edit and delete datasets in the respective catalogues. |
User Profile | Gives you access to the profile of the current user. |
Logout | Logs the current user out. |
Dataset sub-menu
Function | Description |
---|---|
Create Dataset | Navigate to the form for creating a new dataset. |
Delete Dataset | Deletes the current dataset. * |
Edit Dataset | Navigates to the form for editing the current dataset. * |
Set to draft | Sets the current dataset as draft, so it is not publicly visible anymore. * |
Register DOI | Registers a DOI for the current dataset. * |
* Only available on a dataset details page.
Create a dataset
You can create a dataset with a wizard-like form that guides you through the provision of the metadata and data. Just click on Dataset > Create Dataset in the DPI menu.
Structure and general remarks
The form is divided into five main steps:
- Essential Properties
- Advised Properties
- Additional Properties
- Distributions
- Dataset Overview
The creation of distributions is divided into similar sub steps. You can always switch between the steps by clicking on the step titles or using the 'Previous Step' or 'Next Step' buttons. There might be cases when a direct access to a step is not possible, for example, a mandatory field is missing.
In order to prevent accidental data loss, your input is constantly stored in the local storage of the browser. Even after a reload of the page, your data will be there. You can clear the entire form by clicking on 'Clear'. If you need information and help about the input fields, you can always click on the i behind each field. Additional information is then displayed in a pop-over.
Input Fields
The form consists of specialised input fields, supporting the various properties of DCAT-AP.
Multi-lingual fields
Some properties can be provided in multiple languages. This is supported with the following kind of input field:
You can just add more languages by clicking on the blue button and remove them by clicking on the small minus sign.
Vocabulary fields
Many properties depend on controlled vocabularies. You can select the fitting value(s) from these vocabularies with a search-based dropdown field. Just type in some characters to find a suitable match. Below is an example for the language property.
For properties where multiple values can be selected from a vocabulary, you can easily repeat the process for each value and your selection is displayed under the input form.
Filling the form
By stepping through the wizard you are able to provide all DCAT-AP properties to describe your dataset. However, only a few properties are mandatory, such as the title and description. You will get a clear warning if a mandatory property is missing.
In the following, some important details about the form are presented. However, not every property is discussed. Please consult the DCAT-AP documentation for detailed information about every property.
Create dataset
In the first step, you provide the essential metadata about the data.
An important property is the dataset ID, which will be used in the URL
to resolve the dataset after publication
(http://data.europa.eu/88u/dataset/\[dataset-id\]
). You can enter it
yourself or it will be automatically generated based on the provided
title. It can only contain lowercase letters, numbers and dashes. Its
uniqueness is checked on-the-fly to avoid any clashes with existing
datasets.
You have to select a catalogue, which the dataset will be part of. You can only select catalogues that you have access to.
By clicking on 'Next Step' you will be directed to the second step.
Define dataset properties
In the second and third step, the remaining properties (advised and additional) of the dataset can be provided. In order to save space the properties of these pages are collapsed.
Create distributions
In the fourth step you see an overview of all distributions of you dataset. When you start with a fresh dataset this view will be empty. You can create a distribution by clicking on 'Add Distribution'.
In the next view, you can provide all possible distribution data in four steps. You can repeat these steps for each distribution you want to add. To navigate through the steps, use the buttons 'Previous Step' and 'Next Step' again, or click directly on the step names.
A central property is the access URL, which gives you access to the actual data of the dataset. If your data is already hosted and publicly available, you can just provide the URL by selecting the type 'Provide a URL'.
You can also upload your data directly here, by selecting the type 'Upload a file'.
If you do not provide a separate download URL, the download URL is automatically set to the access URL, after saving the dataset. On the last page of the distribution wizard you will find an overview of your created distributions. You can delete or edit them, or add another distribution by clicking on 'Add Distribution'.
When you click on 'Next Step' you will be redirected to the final overview of your dataset.
Dataset overview and storing
The final step provides you with an overview of your dataset. Note that the layout here is different than the final dataset detail page. You can switch between different languages, if you have provided literals in multiple languages. You can still go back to previous steps and make changes to your data. If you want to finish the process you have two options. By clicking on 'Publish Dataset', your dataset will be published immediately and publicly visible to all users of the portal. You will be redirected to the public dataset details page. By clicking on 'Save as Draft', the dataset will be stored separately and will not be publicly available. You can also store a dataset as draft at any time in the process.
You will be redirected to the draft overview page. You can later edit or publish the draft dataset.
Since the access control is catalogue-based, all users that have access to the catalogue of your dataset can view and edit your draft datasets.
Managing datasets
You can edit, publish and delete datasets. Depending on whether the dataset is public or a draft, the access to the functions differ.
Managing public datasets
You can manage all datasets that are part of any catalogue you have access to. You can check and access these catalogues and their datasets by clicking on 'My Catalogues'.
If you are on a dataset details page you use the sub-menu to access the options.
'Delete Dataset' allows you to delete the current dataset. A final confirmation is required. 'Edit Dataset' will redirect you to the dataset wizard with already prefilled form fields. You can apply any changes, as if it were a new dataset. 'Set to draft' will un-publish the dataset and add it to the draft dataset pool.
Managing draft datasets
You can manage all draft datasets of your catalogues by clicking on 'My Draft Datasets'. Note that you will also see datasets here that where not created by you in person. Other users may have access to the same catalogues.
'Delete' allows you to delete the draft dataset. A final confirmation is required and this action cannot be undone. 'Edit' will redirect you to the dataset wizard with already prefilled form fields. 'Publish' will make the draft dataset publicly available. After that, it will not appear in the draft list anymore. The 'Linked data' button gives you access to the raw RDF representation of the dataset.
DOI registration
You can easily register a DOI for your dataset. We use the registration agency of the OP to issue DOIs with the prefix 10.2906. Therefore, your dataset will be available under 'https:// doi.org/10.2906/[id]', where [id] is a randomly assigned number.
The registration of a DOI is permanent and should only be considered for finalised datasets. You can only register one DOI for a single dataset.
Requirements
Since the dataset needs to be public, a DOI can only be registered for published datasets and not for drafts. In addition to the mandatory DCAT-AP properties, you must set the following fields in your dataset: publisher, creator and the issue date and time. Without this information, the registration process will fail.
Register a DOI
You can register a DOI for all datasets you have access to. Just browse to a dataset details page, open the dataset sub-menu and click on 'Register DOI'.
You will need to acknowledge the registration again.
After a successful registration you can reload the details page and find your DOI in the right sidebar.
It is possible to repeat the process, when you have updated the metadata of the dataset, such as the title. In that case, no new id is generated, but the existing one is updated accordingly.