Property paths

Property paths

The query construction that allows us to select items that belong to the same class makes use of property paths. Property paths are shorthand for writing down a path of properties between two items. 

To understand how this construction works, take a look at the graphic view of some information about the item Star Wars: Episode I – The Phantom Menace (Q165713):

Item Q165713 has a P31 (instance of) statement with “feature film” (Q24869) as its value. So the Item-Property-Value statement would be:

Q165713 – P31 – Q24869

The path between Q165713 and Q24869 is the simplest path: a single property.

Item Q24869 (feature film) has the property P279 (subclass of) with the value Q11424 (film). So the Item-Property-Value statement would be:

Q24869 – P279 – Q11424

The path between Q24869 and Q11424 is also just a single property.

Path elements can be put together with a forward slash (/). So a query statement that uses the construction wdt:P31/wdt:P279 denotes a property path between two items consisting of P31 (instance of) and P279 (subclass of).

However, if our pattern for matching would be:
?item wdt:P31/wdt:P279 wd:Q11424.
the query would match only items that are an instance of a subclass of film, meaning only items that have a path consisting of P31 and P279 to the item film (Q11424). Items whose P31 property has the value Q11424 would not be retrieved, because they do not match the construction pattern.

The construction wdt:P31/wdt:P279* on line 6 is shorthand for saying that there’s an “instance of” property and then any number of “subclass of” properties between ?item and the item “film” (Q11424).

If you remove the asterisk (*) on line 6 of the query above and run the query again you will see that the query does not retrieve those items that are themselves an instance of film (Q11424).
The asterisk (*) after the path element means “zero or more of this element”. Thus the matching pattern
?item wdt:P31/wdt:P279* wd:Q11424
could match:
?item wdt:P31 wd:Q11424.
or
?item wdt:P31/wdt:P279 wd:Q11424.
or
?item wdt:P31/wdt:P279/wdt:P279 wd:Q11424.
or
?item wdt:P31/wdt:P279/wdt:P279/wdt:P279 wd:Q11424.
and so on.

DISTINCT

DISTINCT – showing unique results

Let’s have a look again at the query that lists all the works of art (Q838948) that take place in the Star Wars Universe.

If you run the query you will see that there are some duplicates in the results: certain items, like Q19590955 (Rogue One) and Q6074 (Star Wars Episode VII: The Force Awakens) appear more than once.
Query patterns often return duplicates – this can happen if, for example, you use the pattern “?item wdt:P31/wdt:P279* ?class”, and there are multiple paths from ?item to ?class: you will get a new result for each of those paths. For example, item Q19590955 (Rogue One) has both “film” and “3D film” as the values of P31, and each has a path to the “work of art” class, so the item shows up twice in the results.

To eliminate duplicates, we add the modifier DISTINCT after SELECT:

Retrieving data linked to values

Retrieving data linked to values

So far the values we retrieved in our queries were directly related to the item we were selecting. Suppose we wanted to show the birthplace of the director of each of the Star Wars films.

#Star Wars films

SELECT ?item  ?itemLabel ?directorLabel ?pobLabel
WHERE 
{ 
  ?item wdt:P179 wd:Q22092344. # item is part of the series Star Wars (film series)
  ?item wdt:P57 ?director.     # item’s director property’s value is collected by the director variable
  ?director wdt:P19 ?pob.      # the place of birth of the director is collected in the ?pob variable
 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

The query selects items that are part of the star wars series, and retrieves each film’s director, and the director’s P19 (place of birth) property and its value.

Importantly, P19 and its value are linked to the director – not to the film! We are retrieving the value (the place of birth) of a property (P19) of a value (the director) of a property (P57) of the item that we are selecting.

Here is a graph view of the data for the film Star Wars Episode V: The Empire Strikes Back (Q181795):

A graph view showing Q119348 (Irving Kershner) as both the object (Value) of the statement regarding the director property of Q181795, and the subject (Item) of the statement regarding the place of birth property.

Item Q181795 is the subject (shown in blue) in the Item-Property-Value statement:
Q181795 – P57 (director) – Q119348 (Irvin Kershner).
Item Q119348 is the value or object (shown in green) of the director property (shown in black).

Item Q119348 is also the subject of the statement:
Q119348 – P19 (place of birth) – Q1345 (Philadelphia)

Now run the query:

# A little bit of syntax

A shorter way of formulating the above query is using square brackets to join the two match patterns on lines 7 and 8. So instead of:
?item wdt:P57 ?director. 
?director wdt:P19 ?dob.

We write:
?item wdt:P57 [wdt:P19 ?dob]. 

What has changed in the results? Why?

This syntax omits the ?director variable, so ?directorLabel is empty.

Exercise

| Exercise: Which films were filmed on location in New Zealand (Q664)?

Hint: Make sure that your query retrieves films for which filming location (P915) is in the administrative territorial entity (P131) of New Zealand.

show solution

The following solution shows the filming locations:

Or, using abbreviated syntax (filming locations not listed):

Note that duplicates are removed from the results using the DISTINCT modifier.

Statements with Qualifiers

Query statements with Qualifiers

Many statements in Wikidata are not just simple triplets in the form of Item-Property-Value, but may also include a qualifier. Qualifiers allow statements to be expanded on, annotated, or contextualized beyond what can be expressed in just a simple property-value pair. Qualifiers are used in order to further describe or refine the value of a property given in a statement. Note that a statement should still provide useful data even without a qualifier; the qualifier is just there to provide additional information.

Take for example the property “publication date” (P577). The following query retrieves the publication date for each of the films in the Star Wars series:

Although there are 9 films in the series, we get 32 results. Why?

As we have seen before, a property can have multiple values. In this case, there are several values for “publication date” per film. Why? Because there are different release dates for different countries. For each value there is a qualifier indicating the property “place of publication“ (P291) and its value. For example, for the item Star Wars Episode IV: A New Hope (Q17738):

Publication date (P577) has the property place of publication (P291) as qualifier

How can our query retrieve only the publication date in the USA?

To do this we need to understand how this data is represented.

Qualifiers explained

Until now we had statements such as:

This link between the item and its value is a “direct property”. In our queries we referred to it with the prefix wdt.

In the Wikidata data model, for every direct property linking an item and a value, there is also a simple property (p) that connects the item to a statement node. That statement node is then linked to the value of the direct property by a property statement (ps), as shown in this graphic view:

Another way of describing this data is through statements:

ItemPropertyValue
Q17738wdt:P5771977-05-25
Q17738p:P577Q17738-statement node
Q17738-statement nodeps:P5771977-05-25

In the last row, the statement node serves as the “Item” in an Item-Property-Value statement.

Statements that have a qualifier have an additional link from the statement node  – a property qualifier (pq) – as shown in the following diagram:

We can now formulate a query to get the publication date (P577) of the films in the Star Wars series if the place of publication (P291) is the USA:

Another exercise

| Exercise: For each of the movies in the original Star Wars trilogy (Q25540859) list the cast members (P161) and the character role (P453) they played

show solution

If you’ve learned to remove duplicates using the DISTINCT modifier, edit the query to list the unique character roles in the original Star Wars trilogy and the actors that played them.

show solution

Coordinates

Where queries: coordinates

“Where” queries refer to retrieving geographical locations which can be expressed in coordinates. It is then possible to present this information in the form of a map.

We recommend going over the introduction to Wikidata and learning about the structure of Wikidata, as well as the section regarding the simplest query before proceeding.

Let’s start with a simple example: Where did aviation accidents happen?

After running the query, scroll to the results. Above the table, click on the arrow next to the eye. A drop-down menu will pop up. Choose “Map” from the menu, and WDQS will display the locations on a map. You can click on each point to see the item’s label and the associated coordinates.

Another way to display the results in the Map view is to specify it in the query itself. The code on line 2, after the hashtag sign, tells the query that the query results should be shown not as a table, but as a map.

You don’t need to remember the exact code for the map view: thanks to the autocompletion function of WDQS, once you type a hashtag in the query window, a drop-down menu will suggest the different display options.


Linked coordinates

Retrieving coordinates linked to values

In the previous query, coordinates were directly linked to the items we selected. But coordinates are not always directly linked to the items of interest to us. The following query shows the filming locations for each film in the Star Wars series. 

The query gives back the value of P915 property (filming location) for each film. Once you have learned how to retrieve data linked to values, it is quite a straightforward task to retrieve the coordinates for each location:

# A little bit of syntax

A shorter way of formulating the above query is using square brackets to join the two statements on lines 7 and 8. So instead of:
?item wdt:P915 ?location. 
?location wdt:P625 ?coords.

We write:
?item wdt:P915 [wdt:P625 ?coords]. 

What has changed in the results? Why?

The map shows the same locations but because this syntax omits the ?location variable ?locationLabel is empty and the location’s label does not appear.

Skip to content