Instances and Classes

Property P31 – “instance of”

Most Wikidata properties describe features that an item has – the item for Star Wars Episode IV: A New Hope (Q17738) has the property director (P57), it has a certain duration (P2047), it has the property cost (P2130), and so on. But often we are interested in what something is. The majority of Wikidata items have at least one statement with the property P31 – instance of –which tells us the class of which this item is a particular example and member:

  • Star Wars Episode IV: A New Hope (Q17738) is an instance of a film (Q11424).
  • Star Wars (Q22092344) is an instance of a film series (Q24856).
  • Star Wars (Q462) is an instance of a media franchise (Q196600).

Note that an item is not limited to one P31 statement. For example, Star Wars: Episode VIII – The Last Jedi (Q18486021) is an instance of a film (Q11424) and also an instance of a 3D film (Q229390).

Also note that P31 statements aim to make the most general distinctions and relegate other data to other properties: 
George Lucas (Q38222) is an instance of human (Q5).
We could also make a statement that George Lucas is an instance of film director (Q2526255), because Lucas is obviously an example and member of the class of film directors. However, the classification strategy is to set the “instance of” statement to the most general value, and include more specific information with other properties. For example, that Lucas is a film director is given with a statement using the occupation (P106) property.

Property P279 – “subclass of”

So, while Q17738 (Star Wars Episode IV: A New Hope) represents a particular film – it has a particular director (George Lucas), a specific duration (121 minutes), a list of cast members (Carrie Fisher, Harrison Ford, …), and so on – the item film (Q11424) is a general concept. Films can have directors, durations, and cast members, but the general concept “film” does not have any particular director, duration, or cast members. 

General concepts receive the property of subclass (P279) – and can have more than one. For example:

  • Film (Q11424) is a subclass of visual artwork (Q4502142), but also of audiovisual work (Q2431196).
  • Film series (Q24856) is a subclass of series of creative works (Q7725310), work of art (Q838948), audiovisual work (Q2431196), and media franchise (Q196600).

The significance of the instance/subclass distinction

Suppose we wanted a list of all the films that take place in the fictional Star Wars universe. We could run the following query:

The query returns only 10 films. Clearly, some films are missing in the results, such as Star Wars: Episode I – The Phantom Menace (Q165713). Why? 

Because some items have “feature film” (Q24869) as the value of their P31 statement. “Feature film” is a subclass of “film” (Q11424), but as far as the query is concerned the pattern in the WHERE part of the query does not match that of the item, and therefore items that are not an instance of “film” are not a match and are not retrieved.

We could use the UNION construction to select films that are either an instance of “film” or an instance of “feature film”:

This query retrieves more results, but it is still possible that there are relevant items (i.e., films taking place in the Star Wars universe) that have an “instance of” property with a value which is some other subclass of film – action film, 3D film, epic film… Listing all the different subclasses of film in UNION statements is not a very good strategy. A more general solution is shown in the next section.

Property paths

Property paths

The query construction that allows us to select items that belong to the same class makes use of property paths. Property paths are shorthand for writing down a path of properties between two items. 

To understand how this construction works, take a look at the graphic view of some information about the item Star Wars: Episode I – The Phantom Menace (Q165713):

Item Q165713 has a P31 (instance of) statement with “feature film” (Q24869) as its value. So the Item-Property-Value statement would be:

Q165713 – P31 – Q24869

The path between Q165713 and Q24869 is the simplest path: a single property.

Item Q24869 (feature film) has the property P279 (subclass of) with the value Q11424 (film). So the Item-Property-Value statement would be:

Q24869 – P279 – Q11424

The path between Q24869 and Q11424 is also just a single property.

Path elements can be put together with a forward slash (/). So a query statement that uses the construction wdt:P31/wdt:P279 denotes a property path between two items consisting of P31 (instance of) and P279 (subclass of).

However, if our pattern for matching would be:
?item wdt:P31/wdt:P279 wd:Q11424.
the query would match only items that are an instance of a subclass of film, meaning only items that have a path consisting of P31 and P279 to the item film (Q11424). Items whose P31 property has the value Q11424 would not be retrieved, because they do not match the construction pattern.

The construction wdt:P31/wdt:P279* on line 6 is shorthand for saying that there’s an “instance of” property and then any number of “subclass of” properties between ?item and the item “film” (Q11424).

If you remove the asterisk (*) on line 6 of the query above and run the query again you will see that the query does not retrieve those items that are themselves an instance of film (Q11424).
The asterisk (*) after the path element means “zero or more of this element”. Thus the matching pattern
?item wdt:P31/wdt:P279* wd:Q11424
could match:
?item wdt:P31 wd:Q11424.
or
?item wdt:P31/wdt:P279 wd:Q11424.
or
?item wdt:P31/wdt:P279/wdt:P279 wd:Q11424.
or
?item wdt:P31/wdt:P279/wdt:P279/wdt:P279 wd:Q11424.
and so on.

DISTINCT

DISTINCT – showing unique results

Let’s have a look again at the query that lists all the works of art (Q838948) that take place in the Star Wars Universe.

If you run the query you will see that there are some duplicates in the results: certain items, like Q19590955 (Rogue One) and Q6074 (Star Wars Episode VII: The Force Awakens) appear more than once.
Query patterns often return duplicates – this can happen if, for example, you use the pattern “?item wdt:P31/wdt:P279* ?class”, and there are multiple paths from ?item to ?class: you will get a new result for each of those paths. For example, item Q19590955 (Rogue One) has both “film” and “3D film” as the values of P31, and each has a path to the “work of art” class, so the item shows up twice in the results.

To eliminate duplicates, we add the modifier DISTINCT after SELECT:

Skip to content