Let’s have a look again at the query that lists all the works of art (Q838948) that take place in the Star Wars Universe.
If you run the query you will see that there are some duplicates in the results: certain items, like Q19590955 (Rogue One) and Q6074 (Star Wars Episode VII: The Force Awakens) appear more than once. Query patterns often return duplicates – this can happen if, for example, you use the pattern “?item wdt:P31/wdt:P279* ?class”, and there are multiple paths from ?item to ?class: you will get a new result for each of those paths. For example, item Q19590955 (Rogue One) has both “film” and “3D film” as the values of P31, and each has a path to the “work of art” class, so the item shows up twice in the results.
To eliminate duplicates, we add the modifier DISTINCT after SELECT:
So far the values we retrieved in our queries were directly related to the item we were selecting. Suppose we wanted to show the birthplace of the director of each of the Star Wars films.
#Star Wars films
SELECT ?item ?itemLabel ?directorLabel ?pobLabel
WHERE
{
?item wdt:P179 wd:Q22092344. # item is part of the series Star Wars (film series)
?item wdt:P57 ?director. # item’s director property’s value is collected by the director variable
?director wdt:P19 ?pob. # the place of birth of the director is collected in the ?pob variable
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
The query selects items that are part of the star wars series, and retrieves each film’s director, and the director’s P19 (place of birth) property and its value.
Importantly, P19 and its value are linked to the director – not to the film! We are retrieving the value (the place of birth) of a property (P19) of a value (the director) of a property (P57) of the item that we are selecting.
Here is a graph view of the data for the film Star Wars Episode V: The Empire Strikes Back (Q181795):
Item Q181795 is the subject (shown in blue) in the Item-Property-Value statement: Q181795 – P57 (director) – Q119348 (Irvin Kershner). Item Q119348 is the value or object (shown in green) of the director property (shown in black).
Item Q119348 is also the subject of the statement: Q119348 – P19 (place of birth) – Q1345 (Philadelphia)
Now run the query:
# A little bit of syntax
A shorter way of formulating the above query is using square brackets to join the two match patterns on lines 7 and 8. So instead of: ?item wdt:P57 ?director. ?director wdt:P19 ?dob.
We write: ?item wdt:P57 [wdt:P19 ?dob].
What has changed in the results? Why?
This syntax omits the ?director variable, so ?directorLabel is empty.
Many statements in Wikidata are not just simple triplets in the form of Item-Property-Value, but may also include a qualifier. Qualifiers allow statements to be expanded on, annotated, or contextualized beyond what can be expressed in just a simple property-value pair. Qualifiers are used in order to further describe or refine the value of a property given in a statement. Note that a statement should still provide useful data even without a qualifier; the qualifier is just there to provide additional information.
Take for example the property “publication date” (P577). The following query retrieves the publication date for each of the films in the Star Wars series:
Although there are 9 films in the series, we get 32 results. Why?
As we have seen before, a property can have multiple values. In this case, there are several values for “publication date” per film. Why? Because there are different release dates for different countries. For each value there is a qualifier indicating the property “place of publication“ (P291) and its value. For example, for the item Star Wars Episode IV: A New Hope (Q17738):
How can our query retrieve only the publication date in the USA?
To do this we need to understand how this data is represented.
Qualifiers explained
Until now we had statements such as:
This link between the item and its value is a “direct property”. In our queries we referred to it with the prefix wdt.
In the Wikidata data model, for every direct property linking an item and a value, there is also a simple property (p) that connects the item to a statement node. That statement node is then linked to the value of the direct property by a property statement (ps), as shown in this graphic view:
Another way of describing this data is through statements:
Item
Property
Value
Q17738
wdt:P577
1977-05-25
Q17738
p:P577
Q17738-statement node
Q17738-statement node
ps:P577
1977-05-25
In the last row, the statement node serves as the “Item” in an Item-Property-Value statement.
Statements that have a qualifier have an additional link from the statement node – a property qualifier (pq) – as shown in the following diagram:
We can now formulate a query to get the publication date (P577) of the films in the Star Wars series if the place of publication (P291) is the USA:
| Exercise: For each of the movies in the original Star Wars trilogy (Q25540859) list the cast members (P161) and the character role (P453) they played
show solution
If you’ve learned to remove duplicates using the DISTINCT modifier, edit the query to list the unique character roles in the original Star Wars trilogy and the actors that played them.