DISTINCT

DISTINCT – showing unique results

Let’s have a look again at the query that lists all the works of art (Q838948) that take place in the Star Wars Universe.

If you run the query you will see that there are some duplicates in the results: certain items, like Q19590955 (Rogue One) and Q6074 (Star Wars Episode VII: The Force Awakens) appear more than once.
Query patterns often return duplicates – this can happen if, for example, you use the pattern “?item wdt:P31/wdt:P279* ?class”, and there are multiple paths from ?item to ?class: you will get a new result for each of those paths. For example, item Q19590955 (Rogue One) has both “film” and “3D film” as the values of P31, and each has a path to the “work of art” class, so the item shows up twice in the results.

To eliminate duplicates, we add the modifier DISTINCT after SELECT:

Retrieving data linked to values

Retrieving data linked to values

So far the values we retrieved in our queries were directly related to the item we were selecting. Suppose we wanted to show the birthplace of the director of each of the Star Wars films.

#Star Wars films

SELECT ?item  ?itemLabel ?directorLabel ?pobLabel
WHERE 
{ 
  ?item wdt:P179 wd:Q22092344. # item is part of the series Star Wars (film series)
  ?item wdt:P57 ?director.     # item’s director property’s value is collected by the director variable
  ?director wdt:P19 ?pob.      # the place of birth of the director is collected in the ?pob variable
 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

The query selects items that are part of the star wars series, and retrieves each film’s director, and the director’s P19 (place of birth) property and its value.

Importantly, P19 and its value are linked to the director – not to the film! We are retrieving the value (the place of birth) of a property (P19) of a value (the director) of a property (P57) of the item that we are selecting.

Here is a graph view of the data for the film Star Wars Episode V: The Empire Strikes Back (Q181795):

A graph view showing Q119348 (Irving Kershner) as both the object (Value) of the statement regarding the director property of Q181795, and the subject (Item) of the statement regarding the place of birth property.

Item Q181795 is the subject (shown in blue) in the Item-Property-Value statement:
Q181795 – P57 (director) – Q119348 (Irvin Kershner).
Item Q119348 is the value or object (shown in green) of the director property (shown in black).

Item Q119348 is also the subject of the statement:
Q119348 – P19 (place of birth) – Q1345 (Philadelphia)

Now run the query:

# A little bit of syntax

A shorter way of formulating the above query is using square brackets to join the two match patterns on lines 7 and 8. So instead of:
?item wdt:P57 ?director. 
?director wdt:P19 ?dob.

We write:
?item wdt:P57 [wdt:P19 ?dob]. 

What has changed in the results? Why?

This syntax omits the ?director variable, so ?directorLabel is empty.

Exercise

| Exercise: Which films were filmed on location in New Zealand (Q664)?

Hint: Make sure that your query retrieves films for which filming location (P915) is in the administrative territorial entity (P131) of New Zealand.

show solution

The following solution shows the filming locations:

Or, using abbreviated syntax (filming locations not listed):

Note that duplicates are removed from the results using the DISTINCT modifier.

Statements with Qualifiers

Query statements with Qualifiers

Many statements in Wikidata are not just simple triplets in the form of Item-Property-Value, but may also include a qualifier. Qualifiers allow statements to be expanded on, annotated, or contextualized beyond what can be expressed in just a simple property-value pair. Qualifiers are used in order to further describe or refine the value of a property given in a statement. Note that a statement should still provide useful data even without a qualifier; the qualifier is just there to provide additional information.

Take for example the property “publication date” (P577). The following query retrieves the publication date for each of the films in the Star Wars series:

Although there are 9 films in the series, we get 32 results. Why?

As we have seen before, a property can have multiple values. In this case, there are several values for “publication date” per film. Why? Because there are different release dates for different countries. For each value there is a qualifier indicating the property “place of publication“ (P291) and its value. For example, for the item Star Wars Episode IV: A New Hope (Q17738):

Publication date (P577) has the property place of publication (P291) as qualifier

How can our query retrieve only the publication date in the USA?

To do this we need to understand how this data is represented.

Qualifiers explained

Until now we had statements such as:

This link between the item and its value is a “direct property”. In our queries we referred to it with the prefix wdt.

In the Wikidata data model, for every direct property linking an item and a value, there is also a simple property (p) that connects the item to a statement node. That statement node is then linked to the value of the direct property by a property statement (ps), as shown in this graphic view:

Another way of describing this data is through statements:

ItemPropertyValue
Q17738wdt:P5771977-05-25
Q17738p:P577Q17738-statement node
Q17738-statement nodeps:P5771977-05-25

In the last row, the statement node serves as the “Item” in an Item-Property-Value statement.

Statements that have a qualifier have an additional link from the statement node  – a property qualifier (pq) – as shown in the following diagram:

We can now formulate a query to get the publication date (P577) of the films in the Star Wars series if the place of publication (P291) is the USA:

Another exercise

| Exercise: For each of the movies in the original Star Wars trilogy (Q25540859) list the cast members (P161) and the character role (P453) they played

show solution

If you’ve learned to remove duplicates using the DISTINCT modifier, edit the query to list the unique character roles in the original Star Wars trilogy and the actors that played them.

show solution

Skip to content