Skip to content

How to request harvesting

Data.europa.eu harvests metadata from available national open data portals. To initiate the onboarding of your portal onto data.europa.eu, you will need to undertake two sequential steps: check that your portal is suitable for harvesting and issue a harvesting request via the data.europa.eu contact form.

Required information

The very first step is to go through the following checklist to gather all required information. The purpose of this checklist is to guide you in gathering and summarising the main requirements to enable the successful harvesting. Before contacting data.europa.eu please make sure that you can answer all listed questions. Of course, if anything is unclear, you can always reach out to us via the contact form.

Please remember that the preferred harvesting interface is OAI-PMH.

Requirement Value
1 Which country does your portal cover? Free text
2 Is your portal already being harvested by another portal? Free text
3 What is the Uniform Resource Locator (URL) to your portal’s interface / endpoint? URL
4 What is the default language of the metadata from your portal? Free text
5 Which metadata standard is supported by your portal? E.g. DCAT-AP, CKAN JSON, or INSPIRE
6 Which representation of the metadata can be used? XML, JSON, or any RDF representation
7 Which API/Protocol is used to retrieve the data? OAI-PMH (Recommended)
RDF dump file
CKAN API
SPARQL endpoint
8 Is authentication required for you to access your API? yes/no
9 Does your data use standard date/time formats as specified by the ISO8601? Using the ISO8601 standard is mandatory.SO8601
10 How often can/should the site be harvested? E.g. daily/weekly/monthly/etc.
11 Are there any times when the site should not be harvested (e.g. scheduled maintenance)? Free text
12 Who is the publisher of the portal (name and email address)? Free text. e.g. Federal Open Data Agency; info@open-data.gv.eu
13 What is the URL to the homepage of the portal? URL

Harvesting request via contact form

Once you have gathered all answers to the checklist, the second step is to contact the data.europa.eu to initiate the harvesting onboarding of your portal. Please submit a request via the form, selecting 'Get harvested by data.europa.eu' as the issue type. Once we receive your request, we will assess it and keep you informed on progress.

In the contact request, please provide information on all questions listed in the checklist.

Technical requirements/constraints

The harvester accesses the endpoints of all catalogues on a daily basis by default. Depending on the total size of data provided by each portal as well as depending on other factors such as available resources, other harvesting intervals can be negotiated on an individual basis. The harvester is configured individually for each harvested portal. Metadata data is processed usually overnight. Every incoming non-DCAT-AP-format will be transformed to the most recent version of DCAT-AP.

Access to harvested sites

Authentication

Some source sites require authentication, meaning we need a login name and password before we can access the data (here data.europa.eu). If this applies to your portal, please state this in your message when using our contact form. API access to harvested site. For harvesting to take place, the source site needs to have in place one of the interfaces as. described in detail in the Interface supported for harvesting section.

FTP access to harvested site

Data.europa.eu does not support FTP for downloading datasets from a source site.

Operational requirements

Harvesting frequency

Due to the high volume of metadata that will be harvested from a growing list of data suppliers and the required runtime for the harvesting processes, data supplier sites are harvested daily by default. Furthermore, the harvesting processes must be clustered and scheduled on a fixed time schedule (e.g. during the night) in order to avoid any load impacts on the harvested sites during regular business hours usage. Other factors and circumstances permitting, harvesting intervals that are more or less frequent can be agreed individually.

Data source site API / endpoint

The data source endpoint should accept queries with, for example, offset / limit parameters for resumption, partitioning, and pagination of the datasets to be harvested.

Quality of the harvested datasets

Metadata quality is an important concern for data.europa.eu. This section explains the basic procedures implemented by data.europa.eu to monitor and improve metadata quality.

Ensuring uniquely identifiable datasets

Only when the same dataset always has the same unique id it can be ensured that it will be recognized as the same dataset on data.europa.eu and that it will not be duplicated.

Error reporting on harvested metadata

The MQA module provides a graphical report on the quality of the harvested datasets' metadata by providing access to a dashboard that summarises the main quality indicators, for example, availability and accessibility of distributions, compliance of datasets to metadata formats, and source of violations.

The MQA dashboard can be opened directly from the portal homepage.

User feedback on datasets

Users will be able to provide feedback on a dataset directly from the dataset detail page.

The system makes it possible to gather and extract all feedback received for all datasets and group those by data supplier, so that the feedback can be sent to the data supplier.

Categorisation

The data.europa.eu categories are based on the EU controlled data theme vocabulary. The following are the categories used on data.europa.eu. When providing data, publishers should always use these terms to thematically categorise the datasets. If a different vocabulary is used, it should be aligned (i.e. mapped) to these categories.

Acronym Definition
AGRI Agriculture, fisheries, forestry and food
ECON Economy and finance
EDUC Education, culture and sport
ENER Energy
ENVI Environment
GOVE Government and public sector
HEAL Health
INTR International issues
JUST Justice, legal system and public safety
REGI Regions and cities
SOCI Population and society
TECH Science and technology
TRAN Transport

Auto translation

Harvested datasets are automatically translated in all official languages of the European Union. Only title and description of datasets and distributions are translated. The machine translation service eTranslation of the European Commission is used. The translation is asynchronously applied. Therefore, an appropriate disclaimer is displayed on the front end. The translated text is added to the DCAT-AP metadata to the corresponding properties with a language tag as specified in the DCAT-AP specification, e.g. “fr-t-en-t0-mtec”, which means “translated from English to French using machine translation service by EC”.

Supported formats and protocols

DCAT-AP via OAI-PMH is the preferred way of data harvesting. We can also accept data provided via CKAN APIs. However, we recommend that this is solution is only used for legacy systems.The following sections describe the list of interfaces that data suppliers (e.g. national portals, public data portals in the Member States, portals from international organisations etc.) must have in place in order to be harvested by data.europa.eu.

The main supported interfaces are the following:

  • DCAT-AP;
  • Comprehensive Knowledge Archive Network (CKAN) compliant sites;
  • CSW/Inspire catalogue services (for geospatial datasets).

DCAT-AP

Providing data via a DCAT-AP interface is the official recommended method and will always be preferred for harvesting.

DCAT-AP is a metadata specification for describing public sector datasets in Europe. It’s based on the data catalogue vocabulary rdf/xml, n-triples or turtle is allowed.

Metadata model

For general information on the metadata model, please refer to the official documentation. The respective qualifiers (mandatory, recommended and optional) need to be adhered to.

The following is an example dataset with all the mandatory properties in rdf/xml.

<?xml version="1.0"?>
<rdf:RDF xmlns:edp="https://europeandataportal.eu/voc#"
         xmlns:dct="http://purl.org/dc/terms/"
         xmlns:spdx="http://spdx.org/rdf/terms#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:j.0="http://data.europa.eu/88u/ontology/dcatapop#"
         xmlns:adms="http://www.w3.org/ns/adms#"
         xmlns:dqv="http://www.w3.org/ns/dqv#"
         xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
         xmlns:skos="http://www.w3.org/2004/02/skos/core#"
         xmlns:schema="http://schema.org/"
         xmlns:dcat="http://www.w3.org/ns/dcat#"
         xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <dcat:Dataset rdf:about="http://data.europa.eu/88u/dataset/measures-of-solidarity-with-ukraine-eur">
        <dct:alternative xml:lang="en">EU solidarity with Ukraine</dct:alternative>
        <dct:subject rdf:resource="http://eurovoc.europa.eu/1914"/>
        <dcat:distribution>
            <dcat:Distribution rdf:about="http://data.europa.eu/88u/distribution/4608d86c-fe4e-4ae4-9fe7-a550b8d44b28">
                <dcat:accessURL rdf:resource="https://europa.eu/!jb8pvX"/>
                <dct:title xml:lang="en">List of measures</dct:title>
                <dcat:downloadURL rdf:resource="https://europa.eu/!jb8pvX"/>
                <dct:type rdf:resource="http://publications.europa.eu/resource/authority/distribution-type/DOWNLOADABLE_FILE"/>
                <dct:rights>
                    <dct:RightsStatement rdf:about="http://publications.europa.eu/resource/authority/access-right/PUBLIC"/>
                </dct:rights>
                <dct:license>
                    <dct:LicenseDocument rdf:about="http://publications.europa.eu/resource/authority/licence/CC_BY_4_0"/>
                </dct:license>
                <dct:description xml:lang="en">List of measures in csv format</dct:description>
                <dcat:mediaType>
                    <dct:MediaType rdf:about="http://www.iana.org/assignments/media-types/text/csv"/>
                </dcat:mediaType>
                <dct:format>
                    <dct:MediaTypeOrExtent rdf:about="http://publications.europa.eu/resource/authority/file-type/CSV"/>
                </dct:format>
                <dct:identifier>http://data.europa.eu/88u/distribution/813268f8-95e0-46b5-97fc-bc59b093e3d0</dct:identifier>
            </dcat:Distribution>
        </dcat:distribution>
        <dcat:distribution>
            <dcat:Distribution rdf:about="http://data.europa.eu/88u/distribution/6e39a598-5732-4204-b7c3-4e3abb3ef006">
                <dct:format>
                    <dct:MediaTypeOrExtent rdf:about="http://publications.europa.eu/resource/authority/file-type/HTML"/>
                </dct:format>
                <dct:title xml:lang="en">List of measures</dct:title>
                <dct:rights rdf:resource="http://publications.europa.eu/resource/authority/access-right/PUBLIC"/>
                <dcat:accessURL rdf:resource="https://europa.eu/!bCfRRj"/>
                <dcat:downloadURL rdf:resource="https://europa.eu/!bCfRRj"/>
                <dct:identifier>http://data.europa.eu/88u/distribution/6ae2d223-11d6-469e-8f88-7436f77d34fa</dct:identifier>
                <dct:type rdf:resource="http://publications.europa.eu/resource/authority/distribution-type/DOWNLOADABLE_FILE"/>
                <dct:license rdf:resource="http://publications.europa.eu/resource/authority/licence/CC_BY_4_0"/>
                <adms:status rdf:resource="http://purl.org/adms/status/Completed"/>
                <dcat:mediaType>
                    <dct:MediaType rdf:about="http://www.iana.org/assignments/media-types/text/html"/>
                </dcat:mediaType>
                <dct:description xml:lang="en">List of measures in html format with actionable links</dct:description>
            </dcat:Distribution>
        </dcat:distribution>
        <dct:accessRights rdf:resource="http://publications.europa.eu/resource/authority/access-right/PUBLIC"/>
        <dcat:landingPage>
            <foaf:Document rdf:about="https://eur-lex.europa.eu/content/news/eu-measures-solidarity-ukraine.html">
                <foaf:topic rdf:resource="http://data.europa.eu/88u/dataset/measures-of-solidarity-with-ukraine-eur"/>
                <schema:url>https://eur-lex.europa.eu/content/news/eu-measures-solidarity-ukraine.html</schema:url>
            </foaf:Document>
        </dcat:landingPage>
        <dct:publisher>
            <foaf:Organization>
                <foaf:mbox rdf:resource="mailto:example@europa.eu"/>
                <foaf:name>
                    Landesamt für Digitalisierung, Breitband und Vermessung
                </foaf:name>
                <foaf:homepage rdf:resource="http://www.europa.eu"/>
            </foaf:Organization>
        </dct:publisher>
        <dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2022-03-17</dct:issued>
        <dct:description xml:lang="en">This dataset contains a list of documents published on EUR-Lex that bring together the measures the EU has taken in solidarity with Ukraine. This list includes measures of assistance to Ukraine as well as related restrictive measures. Acts amended in the above mentioned context are also included. The list is updated regularly.</dct:description>
        <dct:accrualPeriodicity>
            <dct:Frequency rdf:about="http://publications.europa.eu/resource/authority/frequency/IRREG"/>
        </dct:accrualPeriodicity>
        <dct:title xml:lang="en">Measures in solidarity with Ukraine</dct:title>
        <dct:language>
            <dct:LinguisticSystem rdf:about="http://publications.europa.eu/resource/authority/language/ENG"/>
        </dct:language>
        <dcat:contactPoint>
            <vcard:Kind rdf:about="http://www.w3.org/2006/vcard/ns#Kind/be870795-ffe2-4117-885d-191faca58bb6">
                <vcard:hasEmail rdf:resource="mailto:EURLEX-HELPDESK@publications.europa.eu"/>
                <vcard:hasURL>
                    <foaf:Document rdf:about="https://eur-lex.europa.eu/contact.html">
                        <schema:url>https://eur-lex.europa.eu/contact.html</schema:url>
                        <foaf:topic rdf:resource="file:///usr/verticles/"/>
                    </foaf:Document>
                </vcard:hasURL>
            </vcard:Kind>
        </dcat:contactPoint>
    </dcat:Dataset>
</rdf:RDF>

Requests

The harvester currently supports harvesting from an open archives initiative protocol for metadata harvesting (OAI-PMH) compliant source or from reading a dump file containing the RDF/XML representation of the datasets or directly reading DCAT-AP from a SPARQL endpoint. If datasets are provided as a dump file, it is recommended to split the file into pages, for example, by using the hydra core vocabulary.

For OAI-PMH-compliant sources, only the verb “ListRecords” is used.

Responses

As indicated above, the response must be DCAT-AP-compliant to be understood by the harvesting component.

Error handling

The OAI-PMH protocol provides methods for error handling that the harvester can understand. When using this protocol, these error methods should be used.

Service information for integration

As stated above, a categorisation mapping should be provided. Apart from that, the URL for the OAI-PMH endpoint or the dump file is needed.

CKAN API

The open-source data portal platform CKAN is still used by various data portals. Its Remote-Procedure-Call-style API (action API) is supported as an interface for data suppliers of data.europa.eu. The following options for using that interface are available:

  • The data supplier uses CKAN for providing its open data metadata. It is important that the used CKAN version supports the action API. The legacy APIs of CKAN are not supported.
  • The data supplier offers a CKAN compliant API, where the necessary endpoints reproduce the exact API behaviour.

Requests and responses

Only the ‘package_search’ API endpoint is needed in order to harvest the metadata. Its specifications are described in detail in the official documentation. This endpoint is used to get the metadata in a paginated way. Therefore, it accepts query parameters in a request and returns a dictionary with datasets as a result. The high-level use of this endpoint must be offered as specified in the CKAN documentation.

Example call: GET http://www.example.com/api/3/action/package_search?rows=50

Metadata model

Although the CKAN API can be used as is, the basic CKAN data schema was extended and modified to meet the requirements of the underlying data structure (DCAT-AP) of the data.europa.eu. The response of the ‘package_search’ action exposes a ‘results’ field, which holds a list of dictised datasets. The data structure of such a dataset differs from the one of a plain CKAN installation.

Please note:

  • Fields marked with an asterisk (*) are CKAN standard. Further information in the official documentation.
  • Type specifications according to official JSON standard (http://json.org/).
  • Besides the mandatory fields, the field names and types are not strict, but data suppliers must make sure an obvious mapping is possible.
  • For a detailed explanation of each field, refer to the DCAT-AP specifications.

Dataset schema

The following fields are mandatory.

Field Type DCAT-AP dataset equivalent
title * string dct:title
notes * string dct:description

The following fields are optional but highly recommended.

Field Type DCAT-AP dataset equivalent
contact_point array of objects (allowed members: type, name, email, resource) dcat:contactPoint
tags * array of objects dcat:keyword
publisher object dct:publisher
groups * array of objects – the name of each group needs to fit the official categorisation (see Error! Reference source not found..) dcat:theme
resources * array of objects (see distribution schema)

The following fields are optional.

Field Type DCAT-AP dataset equivalent
conforms_to array of objects (allowed members: label, resource) dct:conformsTo
creator object dct:creator
accrucal_periodicity object dct:accrucalPeriodicity
identifier object dct:identifier
url string dcat:landing_page
language array of objects (allowed members: label, resource) dct:language
other_identifier object adms:identifier
issued string dct:issued
dcat_spatial array of objects (allowed members: label, resource) dct:spatial
temporal array of objects (allowed members: start_date, end_date) dct:temporal
modified string dct:modified
version_info string owl:versionInfo
version_notes string adms:versionNotes
provenance array of objects (allowed members: label, resource) dct:provenance
source array of strings dct:source
access_rights object dct:accessRights
has_version array of strings dct:hasVersion
is_version_of array of strings dct:isVersionOf
relation array of strings dct:relation
page array of strings foaf:page
sample array of strings adms:sample
dct_type string dct:type

Distribution schema

The following fields are mandatory.

Field Type DCAT-AP distribution equivalent
url string dcat:accessURL

The following fields are optional but highly recommended.

Field Type DCAT-AP distribution equivalent
description string dct:description
format string dct:format
license object dct:license

Note that the list of licences recognised by data.europa.eu’s DCAT-AP parser is available online (https://data.europa.eu/en/training/licensing-assistant). This is also used by our metadata quality assessment (MQA) tool () for assessing the data providers’ performance in using known licences.

The following fields are optional.

Field Type DCAT-AP distribution equivalent
checksum object spdx:checksum
mimetype string dcat:mediaType
download_url array of strings dcat:downloadURL
issued string dct:issued
status object adms:status
name string dct:title
modified string dct:modified
rights object dct:rights
page array of strings foaf:page
size number dcat:byteSize
language array of objects dct:language
conforms_to array of objects dct:conformsTo

Example

A result of the ‘package_search’ action looks like this.

{
  "help": "http://example.eu/data/api/3/action/help_show?name=package_search",
  "success": true,
  "result": {
    "count": 113948,
    "sort": "score desc, metadata_modified desc",
    "facets": {},
    "results": [
      {
        "issued": "2011-10-20T00:00:00Z",
        "id": "525abe30-ef60-4bf9-824e-916368c1fad8",
        "metadata_created": "2015-09-15T12:08:54.860742",
        "metadata_modified": "2015-09-15T13:17:51.405474",
        "temporal": [
          {
            "start_date": "2011-10-19T22:00:00Z",
            "end_date": "2011-10-19T22:00:00Z"
          }
        ],
        "state": "active",
        "type": "dataset",
        "resources": [
          {
            "package_id": "525abe30-ef60-4bf9-824e-916368c1fad8",
            "id": "7166a1fa-d994-4d88-8e76-3378930b1e16",
            "state": "active",
            "format": "XHTML",
            "mimetype": "application/xhtml+xml",
            "name": "Example",
            "created": "2015-09-15T14:39:43.865240",
            "url": "http://example.com"
          }
        ],
        "tags": [
          {
            "vocabulary_id": null,
            "state": "active",
            "display_name": "Example Tag",
            "id": "06993102-a2ee-4e40-b9e4-ed3e4b86e943",
            "name": "example-tag"
          }
        ],
        "groups": [
          {
            "display_name": "Economy and finance",
            "description": "",
            "title": "Economy and finance",
            "id": "128d0956-4526-440e-a951-f153c190d890",
            "name": "economy-and-finance"
          }
        ],
        "creator_user_id": "0ab3c2ec-c2a2-4eef-b70f-ed093e028063",
        "publisher": {
          "resource": "http://example.com "
        },
        "organization": {
          "description": "Example Organization",
          "created": "2015-09-15T13:56:32.985936",
          "title": "Example Organization",
          "name": "example-orag",
          "is_organization": true,
          "state": "active",
          "image_url": "",
          "revision_id": "ea70fb1f-29a8-4e7b-8527-809e4792a75b",
          "type": "organization",
          "id": "0897b420-3c3d-4a19-9c2c-a9815e2db2be",
          "approval_status": "approved"
        },
        "name": "example-dataset",
        "notes": "Example",
        "owner_org": "0897b420-3c3d-4a19-9c2c-a9815e2db2be",
        "modified": "2011-10-20T00:00:00Z",
        "url": "",
        "title": "Example Dataset",
        "identifier": [
          "http://example-ident.com"
        ]
      }
    ],
    "search_facets": {}
  }
}

CSW/INSPIRE catalogue services (for geospatial metadata)

General remarks

This interface represents an INSPIRE compliant catalogue (discovery) service. It is defined as a slightly extended version of the OGC CSW AP ISO.

The GetCapabilities operation (mandatory for all OGC Services) is not needed for running the harvesting. But this operation could be helpful upon registration of the catalogue service within the EU Data Portal as the service’s response provides additional information which must otherwise be found out during the registration (e.g. the supported protocol bindings or the support of the ‘modified’ queryable for selective harvesting).

For the harvesting process only the GetRecords operation will be called. The GetRecordById is not needed.

Operation Operation description data.europa.eu usage
GetCapabilities Retrieve catalogue service metadata Only for gathering service information upon registration
GetRecords Retrieval of a bunch of metadata items Yes
GetRecordById Retrieval information of single metadata items No

Metadata model

The metadata model considered is as defined in the INSPIRE Technical Guidance on Discovery Services and on Metadata.

Within a GetRecords query (constraint) just the following metadata model elements (queryables) are used.

Request parameter Definition a Used values in data.europa.eu XPath b
Type Provides the desired information resources. Always the following fixed values used: ‘dataset’, ‘datasetcollection’,‘series’ and ‘service’ /gmd:MD_Metadata/gmd:hierarchyLevel/gmd:MD_ScopeCode/@codeListValue
Modified The metadata date stamp in case of selective harvesting (if supported), see below. Date /gmd:MD_Metadata/gmd:dateStamp/gco:Date

a: ‘Definition’ represents the semantic meaning of element in data.europa.eu, it is slightly different from the genetic meaning in OGC CSW.

b: Element’s XML path in GetRecords request.

Example query (constraint).

<Constraint version="1.1.0">
    <ogc:Filter>
        <ogc:Or>
            <ogc:PropertyIsEqualTo>
                <ogc:PropertyName>Type</ogc:PropertyName>
                <ogc:Literal>dataset</ogc:Literal>
            </ogc:PropertyIsEqualTo>
            <ogc:PropertyIsEqualTo>
                <ogc:PropertyName>Type</ogc:PropertyName>
                <ogc:Literal>datasetcollection</ogc:Literal>
            </ogc:PropertyIsEqualTo>
            <ogc:PropertyIsEqualTo>
                <ogc:PropertyName>Type</ogc:PropertyName>
                <ogc:Literal>series</ogc:Literal>
            </ogc:PropertyIsEqualTo>
            <ogc:PropertyIsEqualTo>
                <ogc:PropertyName>Type</ogc:PropertyName>
                <ogc:Literal>service</ogc:Literal>
            </ogc:PropertyIsEqualTo>
        </ogc:Or>
    </ogc:Filter>
</Constraint>

As defined in the INSPIRE Technical Guidance on Discovery Services the operation must be able to return ISO19139 metadata aligned with the INSPIRE Technical Guidance on Metadata.

Requests

The mandatory GetRecords operation works as the primary means of metadata item discovery with HTTP protocol binding. It executes an inventory search and returns the metadata items. Only OGC Filter XML encoding is supported. For the GetRecords requests a few additional requirements exist. These will be explained in the following.

Bindings

One or more of HTTP POST/XML, POST/XML/SOAP1.1 and POST/XML/SOAP1.2 have to be supported as bindings.

Operation parameters

The following parameters (not the queryables) and parameter values are used in data.europa.eu for the GetRecords requests.

Request parameter Definition a Used values in data.europa.eu XPath b
service Tells this is a CSW service. Always fixed value: CSW /GetRecords@service
version Tells which version of CSW service is requested. Always fixed value; 2.0.2 /GetRecords@version
resultType Specifies the type of result Always fixed value: ‘results’ /GetRecords@resultType
outputFormat Specifies the output format of GetRecords returned document Always fixed value: ‘application/xml’ /GetRecords@outputFormat
outputSchema Specifies the schema of GetRecords returned document Always fixed value (namespace): ‘http://www.isotc211.org/2005/gmd’ /GetRecords@outputSchema
startPosition Specifies the sequence number of first returned record Used: integer between 1 and returned number Default value is 1 /GetRecords@startPosition
maxRecords Specifies number of returned records Supported: positive integer between 1 and N. Default value is: 50 /GetRecords@maxRecords
typeNames Specifies the query- and elementSetName type Always fixed value: ‘gmd:MD_Metadata’ ‘gmd’ is valid namespace prefix for ‘http://www.isotc211.org/2005/gmd’ /GetRecords/Query@typeName
and
/GetRecords/Query/ElementSetName@typeName
ElementSetName Specifies the type of GetRecords returned document As only full metadata sets will be requested by the harvester this parameter will always be set to ‘full’. /GetRecords/Query/ElementSetName

a: ‘Definition’ represents the semantic meaning of element in data.europa.eu. it is slightly different from the genetic meaning in OGC CSW.

b: Element’s XML path in GetRecords request.

Partitioning

For partitioning (pagination) the following parameters are used (see table on GetRecords):

  • startPosition;
  • maxRecords.

Selective harvesting

Selective harvesting allows harvesters to limit harvest requests to just those portions of the metadata available from a repository which have been changed within a specified time frame.

Selective harvesting often makes sense as this would require that only a few metadata records be harvested daily as only a few metadata records are changed within a day.

For selective harvesting the predefined queryable (usually ‘modified’ GetRecords) is used.

Responses

As defined in the INSPIRE Technical Guidance on Discovery Services the operation must be able to return ISO19139 metadata aligned with the INSPIRE Technical Guidance on Metadata.

Partitioning

For partitioning (pagination) as part of the search response, it is mandatory to have the total count of matching metadata items returned, even if the metadata for these items is not contained in the search response. This parameter, coupled with the ability to specify the startPosition and the number of desired records (maxRecs) from the catalogue , will allow data.europa.eu to implement results paging and reducing the load on both the data.europa.eu system and on the data partners.

Error handling

Useful status and error messages help data.europa.eu manage client sessions effectively. Any limitations on submitted search requests to the inventory systems should be noted in the response (e.g. ‘too many records requested’, ‘search timed out’) so that predictable error handling can be managed by data.europa.eu.

Service information for integration

To be able to integrate an INSPIRE Discovery Service /CSW, the following information need to be provided by the data supplier.

Service information Definition a Obligation (M=Mandatory, O=Optional, C=Conditional) Datatype
GetRecords URL URL of the CSW GetRecords operation M URL
GetRecords Binding URL of the CSW GetRecords operation M Codelist (one of): ‘POST/XML’, ‘POST/XML/SOAP1.1’ ‘POST/XML/SOAP1.2’
Modified (a) Name of the queryable (if supported) for filtering on the metadata date stamp (for selective harvesting) Possibly for future use String.[Namespace”:”]QueryableName
MaxRecordsMax Specifies the maximal number of maximal returned records Possibly for future use (currently always set to ‘50’) Integer

a = Value in CSW filter will be formatted as ‘MM-DD-YYYY’. Operators: ‘>=’, ‘<=’ will be used.