Help:Introduction to vocabulary management based on Semantic MediaWiki
Contents
Managing ontology vocabularies needs to handle multiple tasks, mainly the definition, annotation, discussion, translation and export of terms or concepts. It should also be capable of using externally defined ontology vocabularies and enable a community to reuse those. The solution chosen by ViBRANT is based on the MediaWiki software (Figure 1)
A major advantage to of using MediaWiki as core software is that it stores data in a simple text format. Internally these base documents are then re-harvested and provided through powerful analysis and reporting interfaces. The advantage of simple storage plus harvesting is primarily the flexibility it provides when implementing new features of adapting old strategies. All harvested information can be discarded during updates and recollected into new data structures, making MediaWiki a long-term stable platform in the face of continuous change and evolution.
An important aspect in choosing MediaWiki as core software is that this software it is maintained by a large open source community of MediaWiki. Because MediaWiki is required for running the Wikipedia and other Wikimedia Foundation projects (e. g., Wikisource, Wikispecies, Wikidata) it is unlikely that development and maintenance will cease for a long period of time.
Collaboration features of MediaWiki Core
MediaWiki is designed to let authors do their work in a collaborative manner. Important features are:
- All versions of a page are saved and they can be reviewed by comparing page versions side by side (Figure 2).
- For discussion each content page has a linked talk page (Figure 3).
- All changes made in the can GBIF-Terms-Wiki can be listed by feature “Recent changes” and feature “Contributions” allows to see one’s contributions
- A "watch list" of pages helps an author to keep track of changes of pages the author is interested in (Figure 4) and she or he can be emailed upon changes made to those pages to be notified. A watch list is the equivalent to the global recent changes of the entire platform, but limited to the set of pages authors have decided to be involved with or interested in. Pages are added to watch list simply by clicking on the star symbol shown on the top of each page.
- All authors have their own talk pages for contacting them. They will be notified of changes on his talk page when they log in. If they enabled email notification, they will further receive notification emails.
- User group management allows restricting management tasks or editing rights to certain user groups.
- Unlike on Wikipedia, editing is enabled only for registered users to avoid spam editing
The features listed above are provided by the MediaWiki core and make it possible to work together as a collaborative community. Content changes are always traceable and it can be seen who changed which content. To make changes of page content more comprehensible, the system furthermore provides for short summary of changes, e.g. why a change was deemed necessary.
MediaWiki provides a template system that allows defining a template (reused on many content pages) without or with template parameters. On each page where a template is used the content of the defined template is used instead and so it is possible to keep repetitious text defined on one template page. If it needs to be adjusted, changes will automatically affect all pages where the template is used. This allows managing page content more flexibly when text or structured elements need to be the same on many pages.
Vocabularies defined in the GBIF-Terms-Wiki are organized in pages that use templates as well. For managing ontology vocabularies it was decided to follow the generic concepts defined by the Simple Knowledge Organization System (SKOS). Pages were organized as follows:
- Concept pages are pages that define a concept of an ontology (they are equal to SKOS Concept).
- Concept collection pages reflect SKOS Collection and list the member concepts of this collection. The semantic definition of a skos collection is very wide, and it can be used to mirror a large number of purposes, including e.g. subheading, or thematic groupings of concepts.
- Concept schemes list all collections and concepts defined in a vocabulary scheme. A concept scheme reflects that an expert group is authoring, curating and versioning a set of concepts. The Concept scheme pages model SKOS ConceptScheme.
Semantic MediaWiki enhancements
Importantly, the ViBRANT/GBIF-Terms platform enables several extensions for the semantic web (Semantic MediaWiki, SMW) that are not yet enabled on some other MediaWiki installations (esp. not on the Wikipedias). Semantic MediaWiki (SMW) provides improved user-friendly means to create, edit and discuss contents of pages, as well as exposing content in the form of RDF.
The SMW extensions are designed to search, organize, tag, browse, evaluate, display and share the Wiki's content data. While traditional Wikis contain only text which computers can neither understand nor evaluate, Semantic MediaWiki adds semantic annotations that allow the ViBRANT terms platform to function as a collaborative database.
Semantic MediaWiki functionality includes:
- semantic data annotation (SMW core)
- import/reuse vocabulary (SMW core, credit resources, reuse concepts in new concept schemes)
- exchange data using RDF/XML and other formats (SMW core, other extensions e.g. Data Transfer)
- visually display and browse information (SMW core, Semantic Result Formats, Semantic Drilldown)
- simplified editing by using forms (Semantic Forms)
Form based editing support
Semantic annotations to page contents of the GBIF-Terms-Wiki is achieved by editing pages using forms (Figure 5) and allow authors to contribute without knowing the background of the template structure or the Wiki code behind it. Using this approach an author’s contribution can be validated through a form and wrong data input can be narrowed or the form facilitates an author’s contribution by using auto-completion of adequate input data in form fields. Form elements can show information about the input format using, e.g., placeholder text that displays short information in a yet empty form field or small question mark icons can provide further pop up information to a form field if necessary.
Browsing information in a Semantic Wiki
All concept pages and vocabularies defined or contributed to the GBIF-Terms-Wiki can simply be browsed by categories, concept scheme pages or concepts can be found by using MediaWiki’s search box. However the extension "Semantic DrillDown" provides a more flexible and convenient way of browsing pages (Figure 6). It allows browsing by semantic properties and filters. By defining and set up a semantic filter this special page presents semantic data as a tag cloud that means the frequency of semantic values corresponds to font size: the larger a link the more frequently it is used among all concept pages. Beside extension Semantic DrillDown, extension Semantic Result Formats is capable of generating many different output formats e.g. word clouds, graphs, charts, maps, tree lists and more.
Specific needs for ViBRANT, GBIF and TDWG Vocabulary
A Semantic MediaWiki platform can manage with all its features a vocabulary web site, because it supports collaboration among authors and can organize ontology vocabularies in a flexible manner. Existing standard vocabularies (e.g. Dublin Core, Dublin Core Metadata Terms, Darwin Core) were imported into the GBIF-Terms-Wiki for this purpose. Those imports allow then reusing concepts from other ontology vocabularies in a different context, e.g. concept scheme Audubon Core. Part of standard concepts of external vocabularies can be reused and mixed with new concept definitions to design a concept scheme that meets different needs the standard vocabulary scheme was not meant to cover and thus it is avoided to reinvent the wheel again.
For the GBIF-Terms-Wiki it was decided to use the Simple Knowledge Organization System (SKOS) as base layer enhanced with some advanced RDF/OWL oriented properties to ensure a specification of ontology concepts that is not too specific.
Translation of concepts needed for GBIF was implemented by using form editing (Figure 7). Any translation can be appended to a concept page by adding a new sub form and fill in a translated label, example and definition etc.
Semantic properties are then defined using the form/template mechanism on a concept page. Unlike in core MediaWiki, SMW supports to retrieve this information in another context, for instance to gather all concepts of a concept collection (Figure 8) or a concept scheme (Figure 9) and these retrieved data can then be applied to templates again to generate any format necessary, either simple list or complex table layouts as done for the Audubon Core (Figure 10). An advantage of this semantic approach is that it reduces redundant information to a minimum, because the information is stored on one page (e.g. a concept page) and other pages can extract data of it. Changes made than on a concept page will take effect on subsequently depending templates and so automatically generated lists, table layouts will be updated just by managing the content of a single concept page.
The ultimate success of managing vocabularies not only depends on convenient editing and version control of contributions but also on the proficiency of information and data export to other services or project partners, e.g. ViBRANT and GBIF. Although a Semantic MediaWiki’s RDF output provides a mean to do this, this RDF is fixed to the specific format defined by SMW. To improve on this fixed RDF export, the extension XMLTransformation was created which serves to transform an XML (RDF/XML, xhtml) resource to any output format using Extensible Stylesheet Language Transformation file (XSLT). The implemented mechanism is simple and generic. Stylesheets can be stored and updated directly on-wiki. It is applied to transform the generic RDF/XML Semantic data output to a format closer to SKOS and more suitable for further processing by ViBRANT and GBIF. Presently, on each page that represents a concept scheme, data can be obtained by consuming RDF using properties from SKOS or using a standard RDF notation.
Figure 9: Concept scheme page Darwin Core lists automatically all concepts that are defined to Darwin Core. Listing is sorted alphabetically by concept collections and once by label and once by concept name. A form field helps an author to contribute new concepts or edit one of these.
Figure 10: Concepts defined to concept scheme Audubon Core are automatically rendered to a table layout to list all definitions on one page. Data are gathered from semantic properties set on the concept page. Changes made on the concept page are automatically passed on to this auto-generated table. The table represents just the layout for the data obtained from concept pages.
Summary
The ViBRANT/GBIF-Terms-Wiki is capable of:
- supporting the collaboration of a community of authors for vocabulary concepts used in biodiversity
- managing and translating ontology vocabulary (e.g., Darwin Core)
- reuse and define vocabularies (e.g., Darwin Core or Audubon Core share many concepts of other concept schemes)
- convenient editing and contribution of concepts, concept schemes by using forms
- export vocabularies online (RDF of Semantic MediaWiki or transformed output provided by the newly created MediaWiki extension XMLTransformation)