How to request harvesting
Data.europa.eu harvests metadata from available national open data portals. To initiate the onboarding of your portal onto data.europa.eu, you will need to undertake two sequential steps: check that your portal is suitable for harvesting and issue a harvesting request via the data.europa.eu contact form.
Required information
The very first step is to go through the following checklist to gather all required information. The purpose of this checklist is to guide you in gathering and summarising the main requirements to enable the successful harvesting. Before contacting data.europa.eu please make sure that you can answer all listed questions. Of course, if anything is unclear, you can always reach out to us via the contact form.
Please remember that the preferred harvesting interface is OAI-PMH.
Requirement | Value | |
---|---|---|
1 | Which country does your portal cover? | Free text |
2 | Is your portal already being harvested by another portal? | Free text |
3 | What is the Uniform Resource Locator (URL) to your portal’s interface / endpoint? | URL |
4 | What is the default language of the metadata from your portal? | Free text |
5 | Which metadata standard is supported by your portal? | E.g. DCAT-AP, CKAN JSON, or INSPIRE |
6 | Which representation of the metadata can be used? | XML, JSON, or any RDF representation |
7 | Which API/Protocol is used to retrieve the data? | OAI-PMH (Recommended) RDF dump file CKAN API SPARQL endpoint |
8 | Is authentication required for you to access your API? | yes/no |
9 | Does your data use standard date/time formats as specified by the ISO8601? | Using the ISO8601 standard is mandatory.SO8601 |
10 | How often can/should the site be harvested? | E.g. daily/weekly/monthly/etc. |
11 | Are there any times when the site should not be harvested (e.g. scheduled maintenance)? | Free text |
12 | Who is the publisher of the portal (name and email address)? | Free text. e.g. Federal Open Data Agency; info@open-data.gv.eu |
13 | What is the URL to the homepage of the portal? | URL |
Harvesting request via contact form
Once you have gathered all answers to the checklist, the second step is to contact the data.europa.eu to initiate the harvesting onboarding of your portal. Please submit a request via the form, selecting 'Get harvested by data.europa.eu' as the issue type. Once we receive your request, we will assess it and keep you informed on progress.
In the contact request, please provide information on all questions listed in the checklist.
Technical requirements/constraints
The harvester accesses the endpoints of all catalogues on a daily basis by default. Depending on the total size of data provided by each portal as well as depending on other factors such as available resources, other harvesting intervals can be negotiated on an individual basis. The harvester is configured individually for each harvested portal. Metadata data is processed usually overnight. Every incoming non-DCAT-AP-format will be transformed to the most recent version of DCAT-AP.
Access to harvested sites
Authentication
Some source sites require authentication, meaning we need a login name and password before we can access the data (here data.europa.eu). If this applies to your portal, please state this in your message when using our contact form. API access to harvested site. For harvesting to take place, the source site needs to have in place one of the interfaces as. described in detail in the Interface supported for harvesting section.
FTP access to harvested site
Data.europa.eu does not support FTP for downloading datasets from a source site.
Operational requirements
Harvesting frequency
Due to the high volume of metadata that will be harvested from a growing list of data suppliers and the required runtime for the harvesting processes, data supplier sites are harvested daily by default. Furthermore, the harvesting processes must be clustered and scheduled on a fixed time schedule (e.g. during the night) in order to avoid any load impacts on the harvested sites during regular business hours usage. Other factors and circumstances permitting, harvesting intervals that are more or less frequent can be agreed individually.
Data source site API / endpoint
The data source endpoint should accept queries with, for example, offset / limit parameters for resumption, partitioning, and pagination of the datasets to be harvested.
Quality of the harvested datasets
Metadata quality is an important concern for data.europa.eu. This section explains the basic procedures implemented by data.europa.eu to monitor and improve metadata quality.
Ensuring uniquely identifiable datasets
Only when the same dataset always has the same unique id it can be ensured that it will be recognized as the same dataset on data.europa.eu and that it will not be duplicated.
Error reporting on harvested metadata
The MQA module provides a graphical report on the quality of the harvested datasets' metadata by providing access to a dashboard that summarises the main quality indicators, for example, availability and accessibility of distributions, compliance of datasets to metadata formats, and source of violations.
The MQA dashboard can be opened directly from the portal homepage.
User feedback on datasets
Users will be able to provide feedback on a dataset directly from the dataset detail page.
The system makes it possible to gather and extract all feedback received for all datasets and group those by data supplier, so that the feedback can be sent to the data supplier.
Categorisation
The data.europa.eu categories are based on the EU controlled data theme vocabulary. The following are the categories used on data.europa.eu. When providing data, publishers should always use these terms to thematically categorise the datasets. If a different vocabulary is used, it should be aligned (i.e. mapped) to these categories.
Acronym | Definition |
---|---|
AGRI | Agriculture, fisheries, forestry and food |
ECON | Economy and finance |
EDUC | Education, culture and sport |
ENER | Energy |
ENVI | Environment |
GOVE | Government and public sector |
HEAL | Health |
INTR | International issues |
JUST | Justice, legal system and public safety |
REGI | Regions and cities |
SOCI | Population and society |
TECH | Science and technology |
TRAN | Transport |
Auto translation
Harvested datasets are automatically translated in all official languages of the European Union. Only title and description of datasets and distributions are translated. The machine translation service eTranslation of the European Commission is used. The translation is asynchronously applied. Therefore, an appropriate disclaimer is displayed on the front end. The translated text is added to the DCAT-AP metadata to the corresponding properties with a language tag as specified in the DCAT-AP specification, e.g. “fr-t-en-t0-mtec”, which means “translated from English to French using machine translation service by EC”.
Supported formats and protocols
DCAT-AP via OAI-PMH is the preferred way of data harvesting. We can also accept data provided via CKAN APIs. However, we recommend that this is solution is only used for legacy systems.The following sections describe the list of interfaces that data suppliers (e.g. national portals, public data portals in the Member States, portals from international organisations etc.) must have in place in order to be harvested by data.europa.eu.
The main supported interfaces are the following:
- DCAT-AP;
- Comprehensive Knowledge Archive Network (CKAN) compliant sites;
- CSW/Inspire catalogue services (for geospatial datasets).
DCAT-AP
Providing data via a DCAT-AP interface is the official recommended method and will always be preferred for harvesting.
DCAT-AP is a metadata specification for describing public sector datasets in Europe. It’s based on the data catalogue vocabulary rdf/xml, n-triples or turtle is allowed.
Metadata model
For general information on the metadata model, please refer to the official documentation. The respective qualifiers (mandatory, recommended and optional) need to be adhered to.
The following is an example dataset with all the mandatory properties in rdf/xml.
<?xml version="1.0"?>
<rdf:RDF xmlns:edp="https://europeandataportal.eu/voc#"
xmlns:dct="http://purl.org/dc/terms/"
xmlns:spdx="http://spdx.org/rdf/terms#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="http://data.europa.eu/88u/ontology/dcatapop#"
xmlns:adms="http://www.w3.org/ns/adms#"
xmlns:dqv="http://www.w3.org/ns/dqv#"
xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:schema="http://schema.org/"
xmlns:dcat="http://www.w3.org/ns/dcat#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<dcat:Dataset rdf:about="http://data.europa.eu/88u/dataset/measures-of-solidarity-with-ukraine-eur">
<dct:alternative xml:lang="en">EU solidarity with Ukraine</dct:alternative>
<dct:subject rdf:resource="http://eurovoc.europa.eu/1914"/>
<dcat:distribution>
<dcat:Distribution rdf:about="http://data.europa.eu/88u/distribution/4608d86c-fe4e-4ae4-9fe7-a550b8d44b28">
<dcat:accessURL rdf:resource="https://europa.eu/!jb8pvX"/>
<dct:title xml:lang="en">List of measures</dct:title>
<dcat:downloadURL rdf:resource="https://europa.eu/!jb8pvX"/>
<dct:type rdf:resource="http://publications.europa.eu/resource/authority/distribution-type/DOWNLOADABLE_FILE"/>
<dct:rights>
<dct:RightsStatement rdf:about="http://publications.europa.eu/resource/authority/access-right/PUBLIC"/>
</dct:rights>
<dct:license>
<dct:LicenseDocument rdf:about="http://publications.europa.eu/resource/authority/licence/CC_BY_4_0"/>
</dct:license>
<dct:description xml:lang="en">List of measures in csv format</dct:description>
<dcat:mediaType>
<dct:MediaType rdf:about="http://www.iana.org/assignments/media-types/text/csv"/>
</dcat:mediaType>
<dct:format>
<dct:MediaTypeOrExtent rdf:about="http://publications.europa.eu/resource/authority/file-type/CSV"/>
</dct:format>
<dct:identifier>http://data.europa.eu/88u/distribution/813268f8-95e0-46b5-97fc-bc59b093e3d0</dct:identifier>
</dcat:Distribution>
</dcat:distribution>
<dcat:distribution>
<dcat:Distribution rdf:about="http://data.europa.eu/88u/distribution/6e39a598-5732-4204-b7c3-4e3abb3ef006">
<dct:format>
<dct:MediaTypeOrExtent rdf:about="http://publications.europa.eu/resource/authority/file-type/HTML"/>
</dct:format>
<dct:title xml:lang="en">List of measures</dct:title>
<dct:rights rdf:resource="http://publications.europa.eu/resource/authority/access-right/PUBLIC"/>
<dcat:accessURL rdf:resource="https://europa.eu/!bCfRRj"/>
<dcat:downloadURL rdf:resource="https://europa.eu/!bCfRRj"/>
<dct:identifier>http://data.europa.eu/88u/distribution/6ae2d223-11d6-469e-8f88-7436f77d34fa</dct:identifier>
<dct:type rdf:resource="http://publications.europa.eu/resource/authority/distribution-type/DOWNLOADABLE_FILE"/>
<dct:license rdf:resource="http://publications.europa.eu/resource/authority/licence/CC_BY_4_0"/>
<adms:status rdf:resource="http://purl.org/adms/status/Completed"/>
<dcat:mediaType>
<dct:MediaType rdf:about="http://www.iana.org/assignments/media-types/text/html"/>
</dcat:mediaType>
<dct:description xml:lang="en">List of measures in html format with actionable links</dct:description>
</dcat:Distribution>
</dcat:distribution>
<dct:accessRights rdf:resource="http://publications.europa.eu/resource/authority/access-right/PUBLIC"/>
<dcat:landingPage>
<foaf:Document rdf:about="https://eur-lex.europa.eu/content/news/eu-measures-solidarity-ukraine.html">
<foaf:topic rdf:resource="http://data.europa.eu/88u/dataset/measures-of-solidarity-with-ukraine-eur"/>
<schema:url>https://eur-lex.europa.eu/content/news/eu-measures-solidarity-ukraine.html</schema:url>
</foaf:Document>
</dcat:landingPage>
<dct:publisher>
<foaf:Organization>
<foaf:mbox rdf:resource="mailto:example@europa.eu"/>
<foaf:name>
Landesamt für Digitalisierung, Breitband und Vermessung
</foaf:name>
<foaf:homepage rdf:resource="http://www.europa.eu"/>
</foaf:Organization>
</dct:publisher>
<dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2022-03-17</dct:issued>
<dct:description xml:lang="en">This dataset contains a list of documents published on EUR-Lex that bring together the measures the EU has taken in solidarity with Ukraine. This list includes measures of assistance to Ukraine as well as related restrictive measures. Acts amended in the above mentioned context are also included. The list is updated regularly.</dct:description>
<dct:accrualPeriodicity>
<dct:Frequency rdf:about="http://publications.europa.eu/resource/authority/frequency/IRREG"/>
</dct:accrualPeriodicity>
<dct:title xml:lang="en">Measures in solidarity with Ukraine</dct:title>
<dct:language>
<dct:LinguisticSystem rdf:about="http://publications.europa.eu/resource/authority/language/ENG"/>
</dct:language>
<dcat:contactPoint>
<vcard:Kind rdf:about="http://www.w3.org/2006/vcard/ns#Kind/be870795-ffe2-4117-885d-191faca58bb6">
<vcard:hasEmail rdf:resource="mailto:EURLEX-HELPDESK@publications.europa.eu"/>
<vcard:hasURL>
<foaf:Document rdf:about="https://eur-lex.europa.eu/contact.html">
<schema:url>https://eur-lex.europa.eu/contact.html</schema:url>
<foaf:topic rdf:resource="file:///usr/verticles/"/>
</foaf:Document>
</vcard:hasURL>
</vcard:Kind>
</dcat:contactPoint>
</dcat:Dataset>
</rdf:RDF>
Requests
The harvester currently supports harvesting from an open archives initiative protocol for metadata harvesting (OAI-PMH) compliant source or from reading a dump file containing the RDF/XML representation of the datasets or directly reading DCAT-AP from a SPARQL endpoint. If datasets are provided as a dump file, it is recommended to split the file into pages, for example, by using the hydra core vocabulary.
For OAI-PMH-compliant sources, only the verb “ListRecords” is used.
Responses
As indicated above, the response must be DCAT-AP-compliant to be understood by the harvesting component.
Error handling
The OAI-PMH protocol provides methods for error handling that the harvester can understand. When using this protocol, these error methods should be used.
Service information for integration
As stated above, a categorisation mapping should be provided. Apart from that, the URL for the OAI-PMH endpoint or the dump file is needed.
CKAN API
The open-source data portal platform CKAN is still used by various data portals. Its Remote-Procedure-Call-style API (action API) is supported as an interface for data suppliers of data.europa.eu. The following options for using that interface are available:
- The data supplier uses CKAN for providing its open data metadata. It is important that the used CKAN version supports the action API. The legacy APIs of CKAN are not supported.
- The data supplier offers a CKAN compliant API, where the necessary endpoints reproduce the exact API behaviour.
Requests and responses
Only the ‘package_search’ API endpoint is needed in order to harvest the metadata. Its specifications are described in detail in the official documentation. This endpoint is used to get the metadata in a paginated way. Therefore, it accepts query parameters in a request and returns a dictionary with datasets as a result. The high-level use of this endpoint must be offered as specified in the CKAN documentation.
Example call: GET http://www.example.com/api/3/action/package_search?rows=50
Metadata model
Although the CKAN API can be used as is, the basic CKAN data schema was extended and modified to meet the requirements of the underlying data structure (DCAT-AP) of the data.europa.eu. The response of the ‘package_search’ action exposes a ‘results’ field, which holds a list of dictised datasets. The data structure of such a dataset differs from the one of a plain CKAN installation.
Please note:
- Fields marked with an asterisk (*) are CKAN standard. Further information in the official documentation.
- Type specifications according to official JSON standard (http://json.org/).
- Besides the mandatory fields, the field names and types are not strict, but data suppliers must make sure an obvious mapping is possible.
- For a detailed explanation of each field, refer to the DCAT-AP specifications.
Dataset schema
The following fields are mandatory.
Field | Type | DCAT-AP dataset equivalent |
---|---|---|
title * | string | dct:title |
notes * | string | dct:description |
The following fields are optional but highly recommended.
Field | Type | DCAT-AP dataset equivalent |
---|---|---|
contact_point | array of objects (allowed members: type, name, email, resource) | dcat:contactPoint |
tags * | array of objects | dcat:keyword |
publisher | object | dct:publisher |
groups * | array of objects – the name of each group needs to fit the official categorisation (see Error! Reference source not found..) | dcat:theme |
resources * | array | of objects (see distribution schema) |
The following fields are optional.
Field | Type | DCAT-AP dataset equivalent |
---|---|---|
conforms_to | array of objects (allowed members: label, resource) | dct:conformsTo |
creator | object | dct:creator |
accrucal_periodicity | object | dct:accrucalPeriodicity |
identifier | object | dct:identifier |
url | string | dcat:landing_page |
language | array of objects (allowed members: label, resource) | dct:language |
other_identifier | object | adms:identifier |
issued | string | dct:issued |
dcat_spatial | array of objects (allowed members: label, resource) | dct:spatial |
temporal | array of objects (allowed members: start_date, end_date) | dct:temporal |
modified | string | dct:modified |
version_info | string | owl:versionInfo |
version_notes | string | adms:versionNotes |
provenance | array of objects (allowed members: label, resource) | dct:provenance |
source | array of strings | dct:source |
access_rights | object | dct:accessRights |
has_version | array of strings | dct:hasVersion |
is_version_of | array of strings | dct:isVersionOf |
relation | array of strings | dct:relation |
page | array of strings | foaf:page |
sample | array of strings | adms:sample |
dct_type | string | dct:type |
Distribution schema
The following fields are mandatory.
Field | Type | DCAT-AP distribution equivalent |
---|---|---|
url | string | dcat:accessURL |
The following fields are optional but highly recommended.
Field | Type | DCAT-AP distribution equivalent |
---|---|---|
description | string | dct:description |
format | string | dct:format |
license | object | dct:license |
Note that the list of licences recognised by data.europa.eu’s DCAT-AP parser is available online (https://data.europa.eu/en/training/licensing-assistant). This is also used by our metadata quality assessment (MQA) tool () for assessing the data providers’ performance in using known licences.
The following fields are optional.
Field | Type | DCAT-AP distribution equivalent |
---|---|---|
checksum | object | spdx:checksum |
mimetype | string | dcat:mediaType |
download_url | array of strings | dcat:downloadURL |
issued | string | dct:issued |
status | object | adms:status |
name | string | dct:title |
modified | string | dct:modified |
rights | object | dct:rights |
page | array of strings | foaf:page |
size | number | dcat:byteSize |
language | array of objects | dct:language |
conforms_to | array of objects | dct:conformsTo |
Example
A result of the ‘package_search’ action looks like this.
{
"help": "http://example.eu/data/api/3/action/help_show?name=package_search",
"success": true,
"result": {
"count": 113948,
"sort": "score desc, metadata_modified desc",
"facets": {},
"results": [
{
"issued": "2011-10-20T00:00:00Z",
"id": "525abe30-ef60-4bf9-824e-916368c1fad8",
"metadata_created": "2015-09-15T12:08:54.860742",
"metadata_modified": "2015-09-15T13:17:51.405474",
"temporal": [
{
"start_date": "2011-10-19T22:00:00Z",
"end_date": "2011-10-19T22:00:00Z"
}
],
"state": "active",
"type": "dataset",
"resources": [
{
"package_id": "525abe30-ef60-4bf9-824e-916368c1fad8",
"id": "7166a1fa-d994-4d88-8e76-3378930b1e16",
"state": "active",
"format": "XHTML",
"mimetype": "application/xhtml+xml",
"name": "Example",
"created": "2015-09-15T14:39:43.865240",
"url": "http://example.com"
}
],
"tags": [
{
"vocabulary_id": null,
"state": "active",
"display_name": "Example Tag",
"id": "06993102-a2ee-4e40-b9e4-ed3e4b86e943",
"name": "example-tag"
}
],
"groups": [
{
"display_name": "Economy and finance",
"description": "",
"title": "Economy and finance",
"id": "128d0956-4526-440e-a951-f153c190d890",
"name": "economy-and-finance"
}
],
"creator_user_id": "0ab3c2ec-c2a2-4eef-b70f-ed093e028063",
"publisher": {
"resource": "http://example.com "
},
"organization": {
"description": "Example Organization",
"created": "2015-09-15T13:56:32.985936",
"title": "Example Organization",
"name": "example-orag",
"is_organization": true,
"state": "active",
"image_url": "",
"revision_id": "ea70fb1f-29a8-4e7b-8527-809e4792a75b",
"type": "organization",
"id": "0897b420-3c3d-4a19-9c2c-a9815e2db2be",
"approval_status": "approved"
},
"name": "example-dataset",
"notes": "Example",
"owner_org": "0897b420-3c3d-4a19-9c2c-a9815e2db2be",
"modified": "2011-10-20T00:00:00Z",
"url": "",
"title": "Example Dataset",
"identifier": [
"http://example-ident.com"
]
}
],
"search_facets": {}
}
}
CSW/INSPIRE catalogue services (for geospatial metadata)
General remarks
This interface represents an INSPIRE compliant catalogue (discovery) service. It is defined as a slightly extended version of the OGC CSW AP ISO.
The GetCapabilities operation (mandatory for all OGC Services) is not needed for running the harvesting. But this operation could be helpful upon registration of the catalogue service within the EU Data Portal as the service’s response provides additional information which must otherwise be found out during the registration (e.g. the supported protocol bindings or the support of the ‘modified’ queryable for selective harvesting).
For the harvesting process only the GetRecords operation will be called. The GetRecordById is not needed.
Operation | Operation description | data.europa.eu usage |
---|---|---|
GetCapabilities | Retrieve catalogue service metadata | Only for gathering service information upon registration |
GetRecords | Retrieval of a bunch of metadata items | Yes |
GetRecordById | Retrieval information of single metadata items | No |
Metadata model
The metadata model considered is as defined in the INSPIRE Technical Guidance on Discovery Services and on Metadata.
Within a GetRecords query (constraint) just the following metadata model elements (queryables) are used.
Request parameter | Definition a | Used values in data.europa.eu | XPath b |
---|---|---|---|
Type | Provides the desired information resources. | Always the following fixed values used: ‘dataset’, ‘datasetcollection’,‘series’ and ‘service’ | /gmd:MD_Metadata/gmd:hierarchyLevel/gmd:MD_ScopeCode/@codeListValue |
Modified | The metadata date stamp in case of selective harvesting (if supported), see below. | Date | /gmd:MD_Metadata/gmd:dateStamp/gco:Date |
a: ‘Definition’ represents the semantic meaning of element in data.europa.eu, it is slightly different from the genetic meaning in OGC CSW.
b: Element’s XML path in GetRecords request.
Example query (constraint).
<Constraint version="1.1.0">
<ogc:Filter>
<ogc:Or>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>Type</ogc:PropertyName>
<ogc:Literal>dataset</ogc:Literal>
</ogc:PropertyIsEqualTo>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>Type</ogc:PropertyName>
<ogc:Literal>datasetcollection</ogc:Literal>
</ogc:PropertyIsEqualTo>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>Type</ogc:PropertyName>
<ogc:Literal>series</ogc:Literal>
</ogc:PropertyIsEqualTo>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>Type</ogc:PropertyName>
<ogc:Literal>service</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Or>
</ogc:Filter>
</Constraint>
As defined in the INSPIRE Technical Guidance on Discovery Services the operation must be able to return ISO19139 metadata aligned with the INSPIRE Technical Guidance on Metadata.
Requests
The mandatory GetRecords operation works as the primary means of metadata item discovery with HTTP protocol binding. It executes an inventory search and returns the metadata items. Only OGC Filter XML encoding is supported. For the GetRecords requests a few additional requirements exist. These will be explained in the following.
Bindings
One or more of HTTP POST/XML, POST/XML/SOAP1.1 and POST/XML/SOAP1.2 have to be supported as bindings.
Operation parameters
The following parameters (not the queryables) and parameter values are used in data.europa.eu for the GetRecords requests.
Request parameter | Definition a | Used values in data.europa.eu | XPath b |
---|---|---|---|
service | Tells this is a CSW service. | Always fixed value: CSW | /GetRecords@service |
version | Tells which version of CSW service is requested. | Always fixed value; 2.0.2 | /GetRecords@version |
resultType | Specifies the type of result | Always fixed value: ‘results’ | /GetRecords@resultType |
outputFormat | Specifies the output format of GetRecords returned document | Always fixed value: ‘application/xml’ | /GetRecords@outputFormat |
outputSchema | Specifies the schema of GetRecords returned document | Always fixed value (namespace): ‘http://www.isotc211.org/2005/gmd’ | /GetRecords@outputSchema |
startPosition | Specifies the sequence number of first returned record | Used: integer between 1 and returned number Default value is 1 | /GetRecords@startPosition |
maxRecords | Specifies number of returned records | Supported: positive integer between 1 and N. Default value is: 50 | /GetRecords@maxRecords |
typeNames | Specifies the query- and elementSetName type | Always fixed value: ‘gmd:MD_Metadata’ ‘gmd’ is valid namespace prefix for ‘http://www.isotc211.org/2005/gmd’ | /GetRecords/Query@typeName and /GetRecords/Query/ElementSetName@typeName |
ElementSetName | Specifies the type of GetRecords returned document | As only full metadata sets will be requested by the harvester this parameter will always be set to ‘full’. | /GetRecords/Query/ElementSetName |
a: ‘Definition’ represents the semantic meaning of element in data.europa.eu. it is slightly different from the genetic meaning in OGC CSW.
b: Element’s XML path in GetRecords request.
Partitioning
For partitioning (pagination) the following parameters are used (see table on GetRecords):
- startPosition;
- maxRecords.
Selective harvesting
Selective harvesting allows harvesters to limit harvest requests to just those portions of the metadata available from a repository which have been changed within a specified time frame.
Selective harvesting often makes sense as this would require that only a few metadata records be harvested daily as only a few metadata records are changed within a day.
For selective harvesting the predefined queryable (usually ‘modified’ GetRecords) is used.
Responses
As defined in the INSPIRE Technical Guidance on Discovery Services the operation must be able to return ISO19139 metadata aligned with the INSPIRE Technical Guidance on Metadata.
Partitioning
For partitioning (pagination) as part of the search response, it is mandatory to have the total count of matching metadata items returned, even if the metadata for these items is not contained in the search response. This parameter, coupled with the ability to specify the startPosition and the number of desired records (maxRecs) from the catalogue , will allow data.europa.eu to implement results paging and reducing the load on both the data.europa.eu system and on the data partners.
Error handling
Useful status and error messages help data.europa.eu manage client sessions effectively. Any limitations on submitted search requests to the inventory systems should be noted in the response (e.g. ‘too many records requested’, ‘search timed out’) so that predictable error handling can be managed by data.europa.eu.
Service information for integration
To be able to integrate an INSPIRE Discovery Service /CSW, the following information need to be provided by the data supplier.
Service information | Definition a | Obligation (M=Mandatory, O=Optional, C=Conditional) | Datatype |
---|---|---|---|
GetRecords URL | URL of the CSW GetRecords operation | M | URL |
GetRecords Binding | URL of the CSW GetRecords operation | M | Codelist (one of): ‘POST/XML’, ‘POST/XML/SOAP1.1’ ‘POST/XML/SOAP1.2’ |
Modified (a) | Name of the queryable (if supported) for filtering on the metadata date stamp (for selective harvesting) | Possibly for future use | String.[Namespace”:”]QueryableName |
MaxRecordsMax | Specifies the maximal number of maximal returned records | Possibly for future use (currently always set to ‘50’) | Integer |
a = Value in CSW filter will be formatted as ‘MM-DD-YYYY’. Operators: ‘>=’, ‘<=’ will be used.