by Karen Li-Hun Hwang
In late September 2018, cultural heritage organizations and project teams from around the country, including METRO, sat together for a multi-day Wikibase Summit at the New Museum in New York City. The summit was part of a highly organized series of events put together by Rhizome and Wikimedia Deutschland (Germany) to build a Wikibase community and understand how people want to use Wikibase.
What is Wikibase?
Wikibase is one of many powerful tools developed under the umbrella of Wikimedia, the most popular platform of which is currently Wikipedia. Another Wikimedia resource that is gaining popularity, especially within the LIS community and others working with data, is Wikidata, an open knowledge base of structured data where anyone can sign up for an account and begin contributing data.
Wikibase is the software that has been quietly powering Wikidata since it began in 2012. The ability to use this software to organize local data and model local data like on Wikidata is causing many to take a closer look at whether Wikibase is the right data management choice for them.
In 2014 Rhizome was one of the first to inquire about the database software used for Wikidata, and itself started to explore the use of Wikibase as a replacement for their catalog of web art called ArtBase. Dragan Espenscheid, Rhizome’s Preservation Director and one of the main organizers of the summit, explained the reason for this interest:
Wikibase had always been open source, but until then, few had attempted to redefine a use for the software.
Fast forward to October 2017, and not only was Rhizome continuing to use and further integrate Wikibase in their work, but through the efforts of the Wikimedia Foundation and dedication of volunteers, Wikibase became available as a Docker image, providing the means for anyone to download and install the software themselves.
Due to the high number of downloads of the Wikibase Docker image after its release, a series of events was organized in Europe and North America. Each location had a different area of focus: data federation, data modeling, grants, and GLAM (galleries, libraries, archives, and museums). The focus of the New Museum summit here in New York was Wikibase and GLAM, in particular Wikibase, GLAM, and linked open data (LOD). Day 1 focused almost entirely on introductions and understanding common threads among us as a group of practitioners, case studies of current and ongoing Wikibase implementations, as well as reports from those not yet using Wikibase but who had already arrived at the conclusion that Wikibase and/or Wikidata present a possible solution pathway for their work.
What came out of the Linked Data/Wikibase conference in NYC? What exactly does Wikibase offer?
During introductions, some common reasons surfaced for exploring the use of Wikibase:
- better search capability across data;
- working with local data as linked data;
- flexibility in content-specific data modeling;
- more robust and granular versioning of changes and edits;
- built-in SPARQL endpoint to provide access to project data;
- potential to align project data with Wikidata and even with other projects using Wikibase, including the possibility of federated SPARQL queries.
Catalogers and metadata librarians might note how Wikidata URIs for terms and people have been replacing Wikipedia URIs in recent years on platforms that cross-reference other linked open data vocabularies, like on VIAF. And certainly among this group at the Wikibase Summit, there was clear consensus that organizing data with Wikibase holds great potential to boost the performance of local data (vis-à-vis discoverability, interoperability, enrichability) in the way it automatically graphs and easily creates linked data for the data stored.
So, is Wikibase for me?
That depends. There is the current technical skill overhead that goes into installing Wikibase so that it works for you. Matt Miller of the research group Linked Jazz has documented his process for installing Wikibase from the Docker image, but the general consensus, even from the organizers, was that the installation process needs to be simplified even further. Presumably over time and given the feedback from this summit, an effort will be made to improve the installation process, as well as provide better documentation, especially since understanding what could make the install process easier was one of the goals of the summit. A second potential technical consideration depends on your ambitions on the data modeling side: Are you going to fashion your model, properties, and qualifiers like Wikidata or design something more homegrown? Unless you are very familiar with how Wikidata works, the former could add a considerable amount of lead time to designing your system. On the other hand, if you are already thinking that alignment with Wikidata is advantageous, you are probably willing to put in that time.
And then there is the question of whether going the extra step of installing and maintaining your own Wikibase makes sense, or — if, for example, the end goal is to put the data on Wikidata anyway — whether it is better to contribute your data directly to Wikidata from the start.
An example of Wikibase surfacing as the better option was provided at the conference by Linked Jazz’s Matt Miller. Linked Jazz began its investigation to contribute project data to the linked open data cloud via Wikidata, but quickly realized only a small subset of their project data, like musician and venue names, would be interesting for general public use and reuse and be covered by the modeling conventions of Wikidata. Other, more esoteric project data like passages from interviews would be more appropriately housed, modeled, and made available from their own Wikibase installation, where local decisions could be made about how to graph the data. Shared Wikidata URIs, however, for more common entities, like the aforementioned musician names, could serve as interlinking points to their Wikibase data.
A converse example of Wikibase becoming an extra, possible unnecessary step was provided by Andra Waagmeester, a veteran editor of biomedical knowledge on both Wikipedia and Wikidata, who later in the summit led break-out groups on the data modeling track. For his purposes, contributing data directly to Wikidata still represents a more streamlined process, rather than having to duplicate the work of modeling and inputting/importing data into Wikibase and then adding it again to Wikidata. And for the type of data he works with, it is an advantage to put his data on Wikidata, since it operates as a centralized system with conventions known to the community and that can be expanded. At one point in his presentation, he summarized, “Wikibase is good for data that can’t go into Wikidata yet.”
At the closing of the Wikibase Summit at the New Museum — after three days of presentations, workshops, and tool talk — one attendee called out to the group, “So, hey, will we have our own conference next year?”, a reference to the fact that there are many conferences associated with different branches of the Wikimedia movement (Wikimania, WikiConference, WikidataCon, WikiCite, to name a few). But within this, there was also the implied question: “Are we now our own community?”.
The question met with silence for a brief moment, until everyone seemed to agree, “No, not yet. But maybe soon.”
- Fauconnier, Sandra. “Many faces of Wikibase: Rhizome’s archive of born-digital art and digital preservation”, Wikimedia Foundation, September 6, 2018. (Interview with Dragan Espenschied of Rhizome) https://wikimediafoundation.org/2018/09/06/rhizome-wikibase/
Further reading and links
Miller, Matt. “Wikibase for Research Infrastructure — Part 1”, Medium, March 9, 2018. https://medium.com/@thisismattmiller/wikibase-for-research-infrastructure-part-1-d3f640dfad34
Wikibase site: http://wikiba.se/
Wikibase Docker Image on Github: https://github.com/wmde/wikibase-docker/
Wikibase User Group: https://lists.wikimedia.org/mailman/listinfo/wikibaseug
Registry of Wikibase instances: https://wikibase-registry.wmflabs.org/wiki/Main_Page