Introduction to Wikidata

What is Wikidata?

Wikidata is Wikimedia’s international project, which aims to be the largest free database, just as Wikipedia has become the world’s most popular source of knowledge.

Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.

Wikidata also provides support to many other sites and services beyond just Wikimedia projects! The content of Wikidata is available under a free license, exported using standard formats, and can be interlinked to other open data sets on the linked data web.

Wikidata/linked data explained

Wikidata is a central storage repository, consisting mainly of items. An item is a thing, an entity, a concept. It can be an object, a person, an event, a place, an artwork, but also more abstract concepts such as love or socialism. Items are uniquely identified by a Q followed by a number.

For example, item Q17738 represents the 1977 film “Star Wars”. Every item also has a label, which is the main name given to it in a particular language. This enables the basic information required to identify the topic the item covers to be translated without favouring any language.

Item labels need not be unique. For example, Star Wars (Q462) represents the Star Wars film series and media franchise, and Star Wars (Q54317) represents the 1983 video game. The description on a Wikidata entry is a short phrase designed to disambiguate items with the same or similar labels. A description does not need to be unique; multiple items can have the same description, however no two items can have both the same label and the same description.

The structure of Wikidata

Tabular vs. Linked Data 

To learn how to use the Wikidata Query Service, you will first need to understand the structure of Wikidata, that is, what a database of linked data looks like.

In this tutorial, many examples will be based on the data presented in the following table:

Item ID TitleDirectorDuration$Box office
wd:Q17738 Star Wars Episode IV: A New Hope George Lucas121775398007
wd:Q181795 Star Wars Episode V: The Empire Strikes BackIrvin Kershner124538400000
wd:Q181803 Star Wars Episode VI: Return of the JediRichard Marquand134475100000
wd:Q165713 Star Wars: Episode I – The Phantom MenaceGeorge Lucas1361027044677
wd:Q181069 Star Wars Episode II: Attack of the ClonesGeorge Lucas142649398328
wd:Q42051 Star Wars: Episode III – Revenge of the SithGeorge Lucas140848800000
wd:Q6074 Star Wars: The Force AwakensJ. J. Abrams1352068223624
wd:Q18486021 Star Wars: The Last JediRian Johnson1521332539889
wd:Q20977110 Star Wars: The Rise of SkywalkerJ. J. Abrams 141 851058441

This is a small dataset that details some information about films in the Star Wars series. For each film, a few attributes or properties are shown: the title of the film, its director, its duration (in minutes), and the box office takings accumulated by the film (in dollars). If you are familiar with Excel or SQL, this way of presenting data should look familiar to you. However, Wikidata is not a database based on tables, like the one above, but rather has a “Linked Data” format. What does that mean?

In a linked data model, the data in the first row of the table above would be represented as:

In a linked data (or “graph”) view, the property (black lines) links the item (shown in blue) to the corresponding property value (shown in green).

Wikidata, which uses the linked data format, stores information in the form of statements. Statements, formally known as “subject, predicate, object” triples, have an Item-Property-Value structure.

For instance, the statement “The sky has the color blue” consists of:
(1) a subject (“the sky”)
(2) a predicate (“has the color”)
(3) an object (“blue”).
Likewise, the statement “Star Wars Episode IV: A New Hope was directed by George Lucas” consists of (1) a subject/Item “Star Wars Episode IV: A New Hope ”, (2) a predicate/Property “was directed by”, and (3) an object/Value “George Lucas”.

You can think of each row in the data table above as an Item, the column headers as Property names, and the data cells as property Values. 

So another way of describing this data is through statements. For example, for the item in the first row of the table, the data can be described with the statements:

ItemPropertyValue
Q17738titleStar Wars Episode IV: A New Hope
Q17738 directorGeorge Lucas
Q17738 duration121 minutes
Q17738 box office775398007

Statements describe detailed characteristics of an Item, and consist of property-value pairs, such as “director: George Lucas”, or “duration: 121 minutes”.
Properties in Wikidata have a P followed by a number. For example, the property “director” is P57.
The value of this property for the item Q17738 (Star Wars Episode IV: A New Hope) is George Lucas, which is also an item – Q38222.
Not all values are also items. For example, the value for the property “duration” (P2047) for the item Q17738 is 121 minutes.

Some properties might have values that aren’t items. As noted, for example, the value of the property “duration” (P2047) for the item Q17738 is “121 minutes”, which is a quantity. The value of “publication date” (P577) in the United States is “25 May 1977”, a date. Other data types frequently used are strings (a chain of characters, such as texts or codes), globe coordinates and monolingual texts ( a string that isn’t translated to other languages). Wikidata currently has 27 different data types, and you can find more information about them here.

Basic SPARQL query

Films in the Star Wars Series

Here is your first SPARQL query. This simple query displays the films that are part of the Star Wars series.

#Star Wars Films
SELECT ?item  
WHERE 
{
  ?item wdt:P179 wd:Q22092344.
}

WDQS color coding

WDQS color codes certain SPARQL words, which helps to understand and to explain the SPARQL code:

  • Grey: the first line of the query has a hashtag (#) followed by a grey text. This indicates a comment – something we write that does not affect the query itself. Everything after a ‘#’ is a comment and ignored by WDQS, except certain patterns, which we will see in other sections.
  • Red: the words SELECT and WHERE on lines 2 and 3 are shown in capital red letters. This is reserved for SPARQL syntax words, a kind of “command”:
    • SELECT – here we state the names of the variables that will be returned and displayed by the query.
    • WHERE – here we provide the pattern to match with the data, i.e., define the conditions for finding the values for the variables in the SELECT clause, often in the forms of triples that reflect the Item-Property-Value structure of the database.
  • Green: on lines 2 and 5, “?item” appears in green. In SPARQL, variable names are preceded by a question mark and indicated in green. Almost any string can be a variable name (except reserved words), such as ?film, ?x, or ?any_variable_name. Variables hold values for future use.
  • Blue: on line 5, we see some terms shown in blue. These indicate prefixes. Prefixes are shortcut abbreviations, allowing queries to be presented in a more compact and transparent manner. Without the prefixes we would need to specify the full URL for each element of the query.
    • wdt: a prefix for Wikidata properties + the property (P) number.
      Without the prefix we would need to refer to this property as: http://www.wikidata.org/prop/direct/>:P179
      
    • wd: a prefix for Wikidata items + the Q-ID of the value.
      Without the prefix we would need to precede the Q-ID by <http://www.wikidata.org/entity/>:Q22092344

Query explanation

Now we turn to look at how the query works.

A SPARQL query is usually composed of two or more clauses, which are parts of the query which have different functions. Our basic SPARQL query has two clauses:

The SELECT clause

The SELECT clause lists variables that you want returned – in this case the variable ?item. Essentially this line says “We’re going to find some values to store in ?item, and then we’re going to display those values.”

The WHERE clause

The WHERE clause specifies the conditions for the variables, mostly in the form of triples. SPARQL works on the logic of pattern matching. The SPARQL processor matches patterns you declare within the WHERE clause against the three-part statements stored on Wikidata. The WHERE clause tells the processor which property values to fill into the variables you are SELECTing and how to pull the data out.

This basic query pattern of SPARQL reflects the Item-Property-Value structure of the linked data database. The pattern for matching in our simple query is given as a statement:
?item – has property:part of a series – with value:Star Wars (film series)

In other words, this query selects Wikidata items that: (1) have a property:part of a series (P179), and (2) that property has as value “Star Wars (film series)”  (Q22092344).
Note that if you hover with your cursor over the prefix or its associated element, the WDQS interface will show the label and description of that property or item.

Press the “Play” button to run the query. Scroll within the frame to see the results.

The query retrieves the Q number of each of the films in the Star Wars series, but not the title of the film.
In the next part you will learn how to retrieve the names of the films as well.

Adding labels

Label service

Items on Wikidata are identified by a unique Q number. Every item also has a label, which is the main name given to it in a particular language. The following query retrieves the films of the Star Wars series and their label.

#Star Wars films

SELECT ?item  ?itemLabel 
WHERE 
{ 
  ?item wdt:P179 wd:Q22092344.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

Query explanation

In the SELECT clause, two variables are stated: ?item and ?itemLabel.
Label is a special string: it refers to the label of the item that is specified by the variable name. If we named our variable ?x, the variable for the label would be ?xLabel, if we named it ?film, the variable for the label would be ?filmLabel, and so on.

In the WHERE clause we see the same pattern for selection as before:
?item wdt:P179 wd:Q22092344.
In addition, we see in red the SERVICE command followed by the text (in blue):
wikibase:label { bd:serviceParam wikibase:language “[AUTO_LANGUAGE]”.

This code snippet is a kind of shortcut to a service which retrieves labels of items. In other words, if a variable in SELECT is named ?NAMELabel, WDQS retrieves the label for the item in variable ?NAME. This specialized service is very helpful, as it reduces the complexity of SPARQL queries that would otherwise be needed to achieve the same effect.

What is the AUTO_LANGUAGE?

The default language for WDQS is English, as indicated on the upper right hand corner. This affects both the language of the WDQS interface, as well as the label retrieved for items.

Run the query. Then change the auto language and run the query again.

Specifying label language

Choosing one or more Label language

Another way to specify the retrieved label language is to state it in the SERVICE code snippet instead of the “AUTO LANGUAGE”.

For example, in the following query, Dutch (nl) labels will be retrieved and presented:

#Star Wars films

SELECT ?item  ?itemLabel 
WHERE 
{ 
  ?item wdt:P179 wd:Q22092344.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "nl". }
}

One can specify several languages and the order in which labels are retrieved. For example, the following query first retrieves the French (fr) label, but if there is no label in French then the German (de) label will be retrieved, and if that one is missing the Italian (it) label will be shown, and so on.

#Star Wars films

SELECT ?item  ?itemLabel 
WHERE 
{ 
  ?item wdt:P179 wd:Q22092344.
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr,de,it,es,no,pt". }
}


Autocompletion

WDQS built-in search function

The code snippet for the Wikidata Label Service is not easy to remember. Fortunately, WDQS offers a great solution to this: autocompletion. In the WDQS window, whenever you press Ctrl+Space (or Alt+Enter) at any point in the query, a drop-down menu will offer suggestions for code that might be appropriate; select the right suggestion with the up/down arrow keys, and press Enter to select it.

For example, instead of writing out 

SERVICE wikibase:label { bd:serviceParam wikibase:language “[AUTO_LANGUAGE]”.}

you can just type SERV, hit Ctrl+Space, and the first suggestion will be that complete label service code. Just hit Enter to accept it. You may want to change the code to retrieve labels in specific languages.

Autocompletion can also search for Q numbers for items or P numbers for properties. If you type one of the Wikidata prefixes, like wd: (for items) or wdt: (for properties) and write text afterwards, Ctrl+Space will search for that text on Wikidata and suggest results for items and properties. 

Adding description

Retrieving an item’s description

As was previously explained in the introduction to Wikidata, each item on Wikidata has a unique Q number. Items also have a label in one or more languages. In addition, an item may also have a description – a short phrase in a particular language. Because item labels are not necessarily unique, the description on a Wikidata item is designed to disambiguate items with the same or similar labels. A description does not need to be unique either; however no two items can have both the same label and the same description.

In WDQS, the string “Description”, like the string “Label”, is a special reserved string for retrieving the description of the item that is specified by the variable name. In other words, if a variable in the SELECT clause is named ?NAMEDescription, then WDQS produces the description for the item in variable ?NAME.

The following query retrieves the films of the Star Wars series, their label and their description.


Exercise: simple query

| Exercise: list the items, including label and description, that are part of the Harry Potter film series (Q216930)

Write your SPARQL query in the window below using SELECT, and WHERE.

Press the “Play” button to run your query.

Show solution

Your solution may not look exactly like ours. You may have used different variable names or chosen a different label language. These small differences are okay; it’s the results that matter.

Press the “Play” button to run the query.

Retrieving a statement value

Queries retrieving a statement value 

As noted in the introduction, Wikidata stores information in the form of statements that have an Item-Property-Value structure.

In our first example, our query returns Wikidata items that match our query pattern: they match a specific combination of Property and Value, namely, they have a P179 property (part of a series) with a value of Q22092344 (Star Wars film series). In other words, the variable with the name “item” collects those Wikidata items that have a statement which matches the pattern:

?item – has property:part of a series – with value:Star Wars (film series)

In this case, the item part of a matching statement is retrieved. But queries do not necessarily have to return the first part (Item) of statements.

In the following query the ?actor variable collects values of the property cast member (P161) of the item Star Wars Episode IV: A New Hope (Q17738). That is to say, the variable collects the third part (the object) of the statements:

Skip to content