DocTypes: A light-weight approach to semantics
MediaWiki uses text articles to represent knowlede. It offers ways to assign articles to categories and supports links between articles. Templates can be used to add a set of parameters to an article.
(made with Wgraph)
Sometimes this is not sufficient. To represent knowledge in a more structured way a typing concept is needed. Instead of Articles you want to have semantically meaningful entities like Person, Trip or Location. Instead of Links you want to have semantically meaningful relationships like takes part in or starts at. A conceptual scheme for our example might look like this:
(made with Wgraph)While the above diagram acts on type level ('class level', 'relation type level'), the real pieces of knowledge are instances (objects) of the above types and they are related by instances (relations) of the above relation types.
So on the level of individual instances ('objects', 'relations') we would see the following:
(made with Wgraph)
How does it map to MediaWiki?
DocTypes is a simple and conservative approach to represent semantically meaningful objects and relations within the world of MediaWiki.
- Objects are defined by calling a MediaWiki template which is named after the Type. There is also a help page for the user which explains the semantics of the Type.
- Apart from that there are some other Type-related templates which care for XML export and reporting.
- Articles are seen as containers which store one or more objects (usually of the same class).
- As soon as an Article contains an object of some class the article will become part of a category which has the same name as the template used to define the object.
- Relations are basically links between pages, but they point directly to objects using the object ID as a link target.
(made with Wgraph)
How about OWL, RDF, Semantic Wiki etc. ?
DocTypes is somewhat less abstract and less generic than these concepts. It does not introduce ontologies and annotations and there is no general abstract query language for traversing relations. DocTypes is based on the idea of semantic triples but it does not put them in the foreground.
Instead, DocTypes is very much straight forward and rather easy to use for the average MediaWiki user as there is nothing new to learn for him. There is no additional syntax, no need to qualify relationships while writing documents. Instead the author fills his text into the parameter list of a template. So he is essentially being guided by a 'form' but still has the full power of expressing himself with rich text and embedded media.
Note that we are not talking about a traditional screen form. This would be too rigid and would put too much burden onto the DocTypes-Designer. Rather we talk of creating a template which essentially means to list the attributes which will make up an object.
In general, you should not expect the full power of semantic modelling (OWL/RDF etc.) from DocTypes, but you may be astonished how much can be done. The biggest benefit of DocTypes is probably its simplicity.
Comparison between the traditional way and DocTypes
Today a wiki author uses basically rich text when writing. If he wants to add a set of standardized descriptive attributes to his text he will create a template and use the attribute values as parameters. The template will insert theses values into his text, typically as a nice little table.
The core idea of DocTypes is to reverse that principle. Using DocTypes you put your whole piece of knowledge into a template call. While some of your parameters may be quite simple (a word, a number, a sentence, a link) others may consist of several text paragraphs including headlines on various levels and images.
Of course this only makes sense if there is an appropriate structure which will be accepted by the authors because it is considered to be helpful for a certain knowledge domain. A typical wiki may have 70% articles in traditional form and 30% of the articles containing DocTypes.
The good thing is that it doesn´t make a difference to the authors. But, of course, it makes a difference for the designer of the wiki.
The following table gives a summary:
|Paradigm 1||a collection of stories||a collection of fact sheets|
|Paradigm 2||things are somehow connected to each other by 'free association'||objects have distinct typed references between each other|
|When to use||Broad range of topics, weakly structured text, no common scheme applicable||High degree of structural similarity between certain instances of your knowledge domain. Commonly agreed 'reasonable' scheme on how to present information|
|output / appearance||heterogeneous, totally left to the user (apart from the sporadic use of templates which produce some standardized pieces of text||homogeneous, standardized scheme how information is presented; there may be areas where "stories" are embedded, but they have their fixed place in the overall schema design.|
|Navigation||The author puts hyperlinks where he feels it makes sense. The kind of relationship which goes along with a link can only be derived from careful reading the text portion around the link.||The system expects references at some pre-defined positions and assigns a semantic meaning to them. The reader will find such references always in the same place and can traverse them backwards specifically. Even reports are possible.|
|Burden for the average article writer||
|Burden for the wiki designer||
EX POST approach:
EX ANTE approach:
|Import / Export||The contents can technically be exported as XML but the contents is opaque, i.e. it is nothing more than a sequence of characters in the XML scheme.||The text can be exported as semantically structured XML or as a csv with named columns.|
Before we are going to show an example and give more details we need a short definition of the terminology of DocTypes:
- A page (article) in your wiki which is designed in alignment with DocTypes principles. Pages contain one or more Objects of a certain Type.
- A definition of common Properties for all Objects (Instances) belonging to that Type.
- A piece of knowledge contained in a Page which has a certain Type.
- An attribute of an Object; it can be a plain value, a complex value (consisting of Instances of other Types) or a Reference to another Object.
- A Property which points to another Object.
- Some text which can go along with a Reference; it explains more about the kind of relationship.
DocTypes is basically a series of clever templates which use standard Mediawiki features and some existing MediaWiki extensions like DPL and Variables. DocTypes is more a certain way to use existing MediaWiki technology than a new technology.
Access DocTypes defined in this wiki: Category:DocType
Look at the template scripts which implement DocTypes: Category:DocTypeScript
If you are interested in Semantic Mediawiki, you can play around in this wiki, too. See SMW Demo.