Here at Coventry University we are fortunate to have a designated ‘Wikimedian in Residence’ working with us part-time, one of only a handful of UK Universities to have such a position. The mission statement of the Wikimedia Foundation (The US-based not-for profit that hosts Wikipedia and its sister projects) is to ‘imagine a world in which every single human being can freely share in the sum of all knowledge’. A noble but vast undertaking, and one which rightly appeals to library and information professionals.
Not one Wikipedia, but many
Whilst many people will be familiar with Wikipedia, the vast online encyclopaedia which anyone can edit, the breadth of the other services offered by the Wikimedia Foundation are less well known. Thanks to Andy Mabbett, out Wikimedian in Residence, we have been learning more about Wikidata, the knowledge base which underpins Wikipedia and which has big ambitions and the potential to change how academic publication data is recorded and its relationships mapped out.
The first thing to mention is that there is not just one version of Wikipedia: there are around 300 differentiated by language, although the English language version is the largest by number of entries. Coventry University has an entry in 17 different iterations of Wikipedia, these entries will not necessarily be directly translated from a central authority and can instead vary across different Wikipedia platforms. Our neighbours at the University of Warwick, by contrast, have an entry in 35 separate Wikipedias. (Not that we are competitive about this!)
Linked Open Data
Prior to the development of Wikidata the data underpinning Wikipedia articles was stored in separate places, with Wikidata this is now centralised. However, Wikidata doesn’t just represent the administrative ‘backend’ of Wikipedia, it also contains a good deal of additional data such as information pertaining to academic publications. In the long-term Wikidata aims to include all academic publications indexed by CrossRef with Digital Object Identifiers (DOIs) and is in the process of harvesting publicly available content from the ORCiD system. The number of entries in Wikidata therefore dwarfs the number in Wikipedia. The overarching purpose of Wikidata is to provide linked open data under a CC0 licence, which means copyright restrictions are completely removed, which cross references against a vast number of authority databases, from PubMed to the Internet Movie Database.
Each data item is assigned a unique ‘Q’ number identifier and each Property a ‘P’ number identifier. Items are used to represent all of the ‘things’ in human knowledge, this can mean a place, a person, a concept, an event etc. Properties are used to describe the data values of a statement, for instance ‘occupation’, ‘child of’ etc. While new items can be created directly, new properties are subject to community approval.
Utilising the power of Wikidata through SPARQL
SPARQL is a query language which uses ‘data triples’ (consisting of ‘subject’, ‘predicate’ and ‘value’) to extract information from Wikidata.
The information in Wikidata can be used to gather together disparate pieces of information. For instance, the Wikidata query service using SPARQL can identify cities in the world which currently have a female mayor, with those cities listed by population size.
This is information which would be very hard to obtain from conventional sources. However, the ability of Wikidata to link information together and cross-reference against it, means queries of this nature can be answered within a matter of seconds. The accuracy of the results provided of course depends on the accuracy of the data within Wikidata, which is where all of us can help play a role in improving this (see ‘Practical things you can do’ below).
Visualising data with Scholia
Scholia is a more accessible way to interpret data from Wikidata and see it visualised.
For instance, do you want to know which Coventry University academic has the most Wikipedia profiles in different languages?
It is as easy as typing Coventry University in the search box and then filtering the results:
Congratulations to Professor Kevin Warwick, our Deputy Vice Chancellor for Research who has an entry on 21 different Wikipedias!
The data contained in Scholia in terms of academic publications comes from sources such as ORCiD – if you are a researcher who has publications recorded in your ORCiD profile you can help facilitate having these added to Wikidata by ensuring your publication record is up to date and your profile is publicly visible.
Practical things you can do to support Wikidata’s development
0 Comments.