Introduction to Wikidata

What is Wikidata?

Wikidata is Wikimedia’s international project, which aims to be the largest free database, just as Wikipedia has become the world’s most popular source of knowledge.

Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.

Wikidata also provides support to many other sites and services beyond just Wikimedia projects! The content of Wikidata is available under a free license, exported using standard formats, and can be interlinked to other open data sets on the linked data web.

Wikidata/linked data explained

Wikidata is a central storage repository, consisting mainly of items. An item is a thing, an entity, a concept. It can be an object, a person, an event, a place, an artwork, but also more abstract concepts such as love or socialism. Items are uniquely identified by a Q followed by a number.

For example, item Q17738 represents the 1977 film “Star Wars”. Every item also has a label, which is the main name given to it in a particular language. This enables the basic information required to identify the topic the item covers to be translated without favouring any language.

Item labels need not be unique. For example, Star Wars (Q462) represents the Star Wars film series and media franchise, and Star Wars (Q54317) represents the 1983 video game. The description on a Wikidata entry is a short phrase designed to disambiguate items with the same or similar labels. A description does not need to be unique; multiple items can have the same description, however no two items can have both the same label and the same description.

The structure of Wikidata

Tabular vs. Linked Data 

To learn how to use the Wikidata Query Service, you will first need to understand the structure of Wikidata, that is, what a database of linked data looks like.

In this tutorial, many examples will be based on the data presented in the following table:

Item ID TitleDirectorDuration$Box office
wd:Q17738 Star Wars Episode IV: A New Hope George Lucas121775398007
wd:Q181795 Star Wars Episode V: The Empire Strikes BackIrvin Kershner124538400000
wd:Q181803 Star Wars Episode VI: Return of the JediRichard Marquand134475100000
wd:Q165713 Star Wars: Episode I – The Phantom MenaceGeorge Lucas1361027044677
wd:Q181069 Star Wars Episode II: Attack of the ClonesGeorge Lucas142649398328
wd:Q42051 Star Wars: Episode III – Revenge of the SithGeorge Lucas140848800000
wd:Q6074 Star Wars: The Force AwakensJ. J. Abrams1352068223624
wd:Q18486021 Star Wars: The Last JediRian Johnson1521332539889
wd:Q20977110 Star Wars: The Rise of SkywalkerJ. J. Abrams 141 851058441

This is a small dataset that details some information about films in the Star Wars series. For each film, a few attributes or properties are shown: the title of the film, its director, its duration (in minutes), and the box office takings accumulated by the film (in dollars). If you are familiar with Excel or SQL, this way of presenting data should look familiar to you. However, Wikidata is not a database based on tables, like the one above, but rather has a “Linked Data” format. What does that mean?

In a linked data model, the data in the first row of the table above would be represented as:

In a linked data (or “graph”) view, the property (black lines) links the item (shown in blue) to the corresponding property value (shown in green).

Wikidata, which uses the linked data format, stores information in the form of statements. Statements, formally known as “subject, predicate, object” triples, have an Item-Property-Value structure.

For instance, the statement “The sky has the color blue” consists of:
(1) a subject (“the sky”)
(2) a predicate (“has the color”)
(3) an object (“blue”).
Likewise, the statement “Star Wars Episode IV: A New Hope was directed by George Lucas” consists of (1) a subject/Item “Star Wars Episode IV: A New Hope ”, (2) a predicate/Property “was directed by”, and (3) an object/Value “George Lucas”.

You can think of each row in the data table above as an Item, the column headers as Property names, and the data cells as property Values. 

So another way of describing this data is through statements. For example, for the item in the first row of the table, the data can be described with the statements:

Q17738titleStar Wars Episode IV: A New Hope
Q17738 directorGeorge Lucas
Q17738 duration121 minutes
Q17738 box office775398007

Statements describe detailed characteristics of an Item, and consist of property-value pairs, such as “director: George Lucas”, or “duration: 121 minutes”.
Properties in Wikidata have a P followed by a number. For example, the property “director” is P57.
The value of this property for the item Q17738 (Star Wars Episode IV: A New Hope) is George Lucas, which is also an item – Q38222.
Not all values are also items. For example, the value for the property “duration” (P2047) for the item Q17738 is 121 minutes.

Some properties might have values that aren’t items. As noted, for example, the value of the property “duration” (P2047) for the item Q17738 is “121 minutes”, which is a quantity. The value of “publication date” (P577) in the United States is “25 May 1977”, a date. Other data types frequently used are strings (a chain of characters, such as texts or codes), globe coordinates and monolingual texts ( a string that isn’t translated to other languages). Wikidata currently has 27 different data types, and you can find more information about them here.

Skip to content