During a conversation with friends, the question of what cities in the US are sisters with what cities in Japan. How would you know? How would you find out?
Since I'm a programmer, it's tempting to whip out the Python and scrape Wikipedia. However, there's a much better option: using Wikidata.
Wikidata has Wikipedia's data, but structured in a triple store.
Unlike a relational database, a triple store saves data in triplets representing a relationship: subject - predicate - object. And like a relational database, triple store databases have a query language.
Unfortunately, for the sake to namespacing and localization, predicates are assigned alphanumerical codes. It helps to have a Wikidata tab open to look up predicate names.
Wikidata lets you write and run queries at query.wikidata.org. The web interface also comes with helpful visualization options: for example, in this exercise, adding #defaultView:Map
plots geographic data on a world map.
When writing these queries, it becomes painfully obvious how messy the real world is. In this example, finding "US cities with Japanese sister cities", the only well defined terms are "United States" and "Japan". There is no predicate for sister cities, only the more general wdt:P190
for twinned administrative body. Similarly, the terminology of city is vague too. New York City is sister cities with Tokyo, but under Japanese definition, Tokyo is a metropolis (都), similar to a prefecture, not a city (市). This is the result of differing government definitions of city and the data entry in Wikidata.
This is what I came up with, learning SPARQL as I went along:
```SPARQL
SELECT DISTINCT ?usaPlace ?usaPlaceLabel ?geo ?jpnPlace ?jpnPlaceLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } { ?usaPlace (wdt:P17/(wdt:P279)) wd:Q30; wdt:P190 ?jpnPlace. ?usaPlace wdt:P1082 ?population. ?usaPlace wdt:P625 ?geo. ?jpnPlace (wdt:P17/(wdt:P279)) wd:Q17. } } ORDER BY DESC(?population) ```
#defaultView:Map
draws ?geo
lat-long points onto an interactive world mapSELECT DISTINCT ...
This is just like SQL.?usaPlace (wdt:P17/(wdt:P279*)) wd:Q30
?usaPlace
is defined as...(wdt:P17/(wdt:P279*))
something in a country or something in something in a country... (any number of nestings is valid)wd:Q30
and that country is the United States...wdt:P190 ?jpnPlace.
and is a "twinned administrative body" with ?jpnPlace
?jpnPlace (wdt:P17/(wdt:P279*)) wd:Q17.
?usaPlace
. Looking for something contained within (wdt:P279
) Japan (wd:Q17
)?population
is the population (wdt:P1082
) of ?usaPlace
?geo
is the latitude and longitude (wd:P625
) of ?usaPlace
I ended up with the broader query of location (rather than strictly city) because it gave interesting results
Created . Updated .
Home > Q Science > QA Mathematics Computer science > Querying Wikidata with SPARQL