Query with more than one variable

Queries with more than one variable

Until now our queries essentially had just one variable, even if additional variables were declared for the label and/or description of the item. Now we will look at queries with more variables.
The following query retrieves items that are part of the Star Wars film series, and the director of each film.

#Star Wars films

SELECT ?item  ?itemLabel ?director
WHERE 
{ 
  ?item wdt:P179 wd:Q22092344. # item is part of the series Star Wars (film series)
  ?item wdt:P57 ?director.     # item’s director property’s value is collected by the director variable
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

Query explanation

In the SELECT section we have the variables ?item and ?director, as well as ?itemLabel which retrieves the label for ?item.

In the WHERE section, on lines 6 and 7, we see the pattern matching conditions:
?item wdt:P179 wd:Q22092344.
?item wdt:P57 ?director.

SPARQL seeks Wikidata items with statements that match the pattern defined in the WHERE section. So, as before, our WHERE section’s first line says, “Find me items that have a statement with a P179 property (part of a series) with the value Q22092344 (Star Wars (film series))”.
The second line says, “Then for each of those items, find me their P57 property (director) and put its value in the variable ?director.”
As we’ve seen, there is an implicit AND between each statement in the WHERE section, i.e. only patterns that match all statements will be returned by this query. 

Now let’s run the query:

What needs to be added to see the names of the directors? Add it and run the query again.

Show solution

# A little bit of syntax

If we wanted the query to show the name of each director, but not the Q number, we could omit the ?director from the SELECT section and only declare ?directorLabel. This implies there is a ?director variable (which we see in the WHERE section) but it isn’t presented in the query results.

Exercise: more than one variable

| Exercise: Write a query to show the director and screenwriter (P58) of each film in the Star Wars film series

show solution

or, using a different syntax (see the explanation here):

The previous query which listed the director of each Star Wars film returned 9 results. This query returns 16 results. Why?

Because there are more than one screenwriter for some of the films. So we get one line for each screenwriter.

Multiple values

Properties with multiple values

Multiple values for a property is not a problem for a database like Wikidata. There is nothing wrong with having several values for the same property. For certain statements – such as the children of a person or the official languages of a country – it is perfectly reasonable to have multiple values. Essentially these are additional property-value pairs. 

For example for item Star Wars Episode V: The Empire Strikes Back (Q181795) the screenwriter property has these values:

A graphic representation would be:

A graphic representation of the director and screenwriter properties and their values for Wikidata item Q181795

Another exercise

| Exercise: Show director and cost (P2130) of each film in the Star Wars series

show solution

or using a different syntax (see the explanation here):

previous query which retrieved each Star Wars film and its director returned 9 results. This current query only returns 8 results. Why?

Because some items don’t have cost (P2130) as a property, and are therefore ignored by the query. 

The OPTIONAL clause

Missing values and the OPTIONAL clause

As we’ve seen, when there is more than one pattern to match in the WHERE clause, there is an implicit AND between the statements, such that only patterns that match all statements will be returned by the query. For example, in the last exercise, there were three matching patterns:

#Star Wars films

SELECT ?item  ?itemLabel ?directorLabel ?cost
WHERE 
{ 
  ?item wdt:P179 wd:Q22092344.       # item is part of the series Star Wars (film series)
  ?item wdt:P57 ?director.           # item’s director property’s value is collected by the director variable
  ?item wdt:P2130 ?cost.  			 # item's cost property's value is collected by the cost variable
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

The statements on line 6, 7, and 8 have an AND relation between them. Therefore, an item will be considered a match only if it has a P179 property (part of a series) with a value of Q22092344 (Star Wars film series), as well as a P57 (director) property and a P2130 (cost) property. If an item doesn’t match one of these statements (e.g., doesn’t have the property cost) it is ignored by the query.  

SPARQL is a pattern-matching query language. SPARQL queries will only return data when the pattern in the WHERE clause exactly matches the pattern in the data you’re querying. But many datasets have missing values, and data is only a match for the query if there is valid data in each piece of the statement declared within WHERE. This means SPARQL will not return an item that is missing property name or value requested in WHERE.

The OPTIONAL keyword within the WHERE clause denotes optional patterns you’d like to match in the data. OPTIONAL allows searching for data that may or may not be there.

Run the query again with cost as an optional pattern.

Image Grid View

Image Grid View

WDQS can display the results of our queries in different formats. So far we have always seen the results in table format. Let’s look again at the previous exercise:

After running the query, scroll to the results. Above the table, click on the arrow next to the eye. A drop-down menu will pop up. Choose “Image grid” from the menu, and WDQS will display each item’s logo, with each item’s results underneath.

Another way to display the results in the Image grid view is to specify it in the query itself. The code on line 1, after the hashtag sign, tells the query that the query results should be shown not as a table, but as a grid with images.

You don’t need to remember the exact code for the image grid: thanks to the autocompletion function of WDQS, once you type a hashtag in the query window, a drop-down menu will suggest the different display options.

Instances and Classes

Property P31 – “instance of”

Most Wikidata properties describe features that an item has – the item for Star Wars Episode IV: A New Hope (Q17738) has the property director (P57), it has a certain duration (P2047), it has the property cost (P2130), and so on. But often we are interested in what something is. The majority of Wikidata items have at least one statement with the property P31 – instance of –which tells us the class of which this item is a particular example and member:

  • Star Wars Episode IV: A New Hope (Q17738) is an instance of a film (Q11424).
  • Star Wars (Q22092344) is an instance of a film series (Q24856).
  • Star Wars (Q462) is an instance of a media franchise (Q196600).

Note that an item is not limited to one P31 statement. For example, Star Wars: Episode VIII – The Last Jedi (Q18486021) is an instance of a film (Q11424) and also an instance of a 3D film (Q229390).

Also note that P31 statements aim to make the most general distinctions and relegate other data to other properties: 
George Lucas (Q38222) is an instance of human (Q5).
We could also make a statement that George Lucas is an instance of film director (Q2526255), because Lucas is obviously an example and member of the class of film directors. However, the classification strategy is to set the “instance of” statement to the most general value, and include more specific information with other properties. For example, that Lucas is a film director is given with a statement using the occupation (P106) property.

Property P279 – “subclass of”

So, while Q17738 (Star Wars Episode IV: A New Hope) represents a particular film – it has a particular director (George Lucas), a specific duration (121 minutes), a list of cast members (Carrie Fisher, Harrison Ford, …), and so on – the item film (Q11424) is a general concept. Films can have directors, durations, and cast members, but the general concept “film” does not have any particular director, duration, or cast members. 

General concepts receive the property of subclass (P279) – and can have more than one. For example:

  • Film (Q11424) is a subclass of visual artwork (Q4502142), but also of audiovisual work (Q2431196).
  • Film series (Q24856) is a subclass of series of creative works (Q7725310), work of art (Q838948), audiovisual work (Q2431196), and media franchise (Q196600).

The significance of the instance/subclass distinction

Suppose we wanted a list of all the films that take place in the fictional Star Wars universe. We could run the following query:

The query returns only 10 films. Clearly, some films are missing in the results, such as Star Wars: Episode I – The Phantom Menace (Q165713). Why? 

Because some items have “feature film” (Q24869) as the value of their P31 statement. “Feature film” is a subclass of “film” (Q11424), but as far as the query is concerned the pattern in the WHERE part of the query does not match that of the item, and therefore items that are not an instance of “film” are not a match and are not retrieved.

We could use the UNION construction to select films that are either an instance of “film” or an instance of “feature film”:

This query retrieves more results, but it is still possible that there are relevant items (i.e., films taking place in the Star Wars universe) that have an “instance of” property with a value which is some other subclass of film – action film, 3D film, epic film… Listing all the different subclasses of film in UNION statements is not a very good strategy. A more general solution is shown in the next section.

Property paths

Property paths

The query construction that allows us to select items that belong to the same class makes use of property paths. Property paths are shorthand for writing down a path of properties between two items. 

To understand how this construction works, take a look at the graphic view of some information about the item Star Wars: Episode I – The Phantom Menace (Q165713):

Item Q165713 has a P31 (instance of) statement with “feature film” (Q24869) as its value. So the Item-Property-Value statement would be:

Q165713 – P31 – Q24869

The path between Q165713 and Q24869 is the simplest path: a single property.

Item Q24869 (feature film) has the property P279 (subclass of) with the value Q11424 (film). So the Item-Property-Value statement would be:

Q24869 – P279 – Q11424

The path between Q24869 and Q11424 is also just a single property.

Path elements can be put together with a forward slash (/). So a query statement that uses the construction wdt:P31/wdt:P279 denotes a property path between two items consisting of P31 (instance of) and P279 (subclass of).

However, if our pattern for matching would be:
?item wdt:P31/wdt:P279 wd:Q11424.
the query would match only items that are an instance of a subclass of film, meaning only items that have a path consisting of P31 and P279 to the item film (Q11424). Items whose P31 property has the value Q11424 would not be retrieved, because they do not match the construction pattern.

The construction wdt:P31/wdt:P279* on line 6 is shorthand for saying that there’s an “instance of” property and then any number of “subclass of” properties between ?item and the item “film” (Q11424).

If you remove the asterisk (*) on line 6 of the query above and run the query again you will see that the query does not retrieve those items that are themselves an instance of film (Q11424).
The asterisk (*) after the path element means “zero or more of this element”. Thus the matching pattern
?item wdt:P31/wdt:P279* wd:Q11424
could match:
?item wdt:P31 wd:Q11424.
or
?item wdt:P31/wdt:P279 wd:Q11424.
or
?item wdt:P31/wdt:P279/wdt:P279 wd:Q11424.
or
?item wdt:P31/wdt:P279/wdt:P279/wdt:P279 wd:Q11424.
and so on.

Skip to content