How to search for datasets
The metadata catalogue of datasets can be explored through a search engine (data tab), through a SPARQL endpoint and API endpoint.
An up-to-date list of all catalogues can always be retrieved by this SPARQL query.
Search manually
Basic Search
You can search for datasets containing one or more keywords by typing the keywords into the search field and then clicking on the search button. The search process will reduce your keywords to their root form, to ensure variants of your keyword also match. For example, "Europa" can also match "Europe", or "walking" can also match "walk" and "walked".
Advanced Search
When entering more than one search term it is also possible to use the logical operators (connectors) AND, OR as well as parentheses (). Furthermore, it is also possible to do a wildcard search and search for an exact phrase.
Logical Operator OR
The OR operator can be used in order to
- connect two or more similar concepts (synonyms)
- broaden your results, telling the database that ANY of your search terms can be present in the resulting dataset
Example: population OR education OR science
All three circles represent the result set for this search. It is a big set because any of those words are valid using the OR operator.
Logical Operator AND
The AND operator can be used to:
- find sources containing two or more ideas
- narrow the search
The database will only retrieve items containing both keywords. The AND operator can be used multiple times in one query.
Example: population AND education AND science
Parantheses ()
Parentheses are used to:
- perform a more complex search using both AND and OR by placing parentheses around synonyms
- save time by searching multiple synonyms at once
Example: (environment OR nature) AND (refurbish OR reuse)
This avoids the need to perform multiple searches for combinations of keywords.
Basic rules for using AND, OR operators and parentheses ()
- The OR is implicit: search function automatically puts an OR in between your search terms. It means that the search for population OR education OR science gets the same matches as the search for population education science.
- Always enter AND and OR operators in uppercase letters.
- Never translate AND and OR operators into any other language. It does not matter in what language your keywords are the operators always must be AND and OR.
- Please keep in mind to always close parentheses (). The combination of logical operators with an odd number of parentheses in the query leads to incorrect results in the search.
Wildcard
Wildcard search can be used to optimize the search result when you do not know the entire keyword by using ? (question mark) to replace a single character and * (asterisk) to replace zero or more characters.
Example: ois?au dat*
The example above will return datasets that can contain oisiau data, oisoau dataset, or oiseau datasets. Wildcard search does not reduce a keyword to its root form, like in basic search, since it would be wrong to do so on a term that some of its letters are unknown.
Please note that the more keywords need to be checked, just in case they match (e.g. a*, or b*, or c*), the heavier and the poorer the wildcard search performance can be. Having a wildcard at the beginning of a keyword (e.g. *ing, or ?iseau) is ignored and the search term will be treated as it is to avoid such a heavy and expensive process. Invalid wildcard search terms (e.g. ois?au OR (data* AND organization (unbalanced parentheses)) will be also treated as it is.
Exact Phrase
When you place your keywords in double quotes, they will be considered as a phrase, not case sensitive and characters are taken literally as it is. The search will return datasets containing the keywords in exactly the same order.
Example: "Manual public space"
The example above returns datasets containing exactly “manual public space”. It will not return datasets containing “manual space public”, or “space public manual”, or any other results with the search terms appearing in different sequences than “manual-public-space”. If you search for "public spa*", the results will only list datasets containing “public spa*”.
SPARQL search (graph search)
SPARQL search enables even more advanced users to find datasets using Resource Description Framework (RDF) query language. SPARQL can help to find specific information from a large amount of RDF dataset, even if it organized in a complex way. For more information see the data.europa.eu SPARQL, and the SPARQL search.
Terms of reuse
Most of the data accessible via data.europa.eu is released by the respective data providers using an open licence. Data can be used for free for commercial and non-commercial purposes, provided the source is acknowledged. Specific conditions for reuse, relating mostly to the protection of data privacy and intellectual property, apply to a small amount of data. A link to these conditions can be found for each dataset.
The terms of use can be found in the data.europa.eu copyright notice. Most data is covered by open licences. As of September 2021, the most common open licences were the Creative Commons ‘CC‑BY‑4.0’ licence, the ‘Data licence Germany – attribution’ licence or Etalab’s Open Licence (used by the French government).