Help talk:Mapping from SMW SKOS/RDF to “clean” normative SKOS/RDF

From Diversity Workbench
Jump to: navigation, search

The aim is to produce SKOS/RDF as in the “normative” translation document for Darwin Core: http://rs.gbif.org/terms/dwc/dwc_translations.rdf. The translation document has header/introductory information (table 2) about the scheme as a whole and for each concept in the scheme (table 1).

Mappings

Table 1: Fields for each concept in a scheme
# SMW element Normative SKOS/RDF
1 <swivt:Subject rdf:about="http://rs.tdwg.org/dwc/terms/#acceptedNameUsage"> <skos:Concept rdf:about="http://rs.tdwg.org/dwc/terms/acceptedNameUsage">
2 <skos:prefLabel rdf:datatype="http://www.w3.org/2001/XMLSchema#string">en:Accepted Name Usage</skos:prefLabel> <skos:prefLabel xml:lang="en">Accepted Name Usage</skos:prefLabel>
3 <skos:prefLabel rdf:datatype="http://www.w3.org/2001/XMLSchema#string">es:Nombre aceptado en uso</skos:prefLabel> <skos:prefLabel xml:lang="es">Nombre aceptado en uso</skos:prefLabel>
4 <skos:prefLabel rdf:datatype="http://www.w3.org/2001/XMLSchema#string">zh-Hans:公认使用名称</skos:prefLabel> <skos:prefLabel xml:lang="zh-Hans">公认使用名称</skos:prefLabel>
5 <skos:prefLabel rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ja:Accepted Name Usage</skos:prefLabel> <skos:prefLabel xml:lang="ja"/>
6
<skos:definition rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
en:The full name, with authorship and date information if known, of the currently valid (zoological) or accepted (botanical) taxon.
</skos:definition>
<skos:definition xml:lang="en">
The full name, with authorship and date information if known, of the currently valid (zoological) or accepted (botanical) taxon.
</skos:definition>
7
<skos:definition rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
es:El nombre completo, con autoría e información de fecha si se conoce, del taxón actualmente válido (zoológico) o aceptado (botánico).
</skos:definition>
<skos:definition xml:lang="es">
El nombre completo, con autoría e información de fecha si se conoce, del taxón actualmente válido (zoológico) o aceptado (botánico).
</skos:definition>
8 <skos:definition rdf:datatype="http://www.w3.org/2001/XMLSchema#string">zh-Hans:目前有效(动物学)的或公认(植物学)的分类单元全称,如已知来源和日期信息则需注明。</skos:definition> <skos:definition xml:lang="zh-Hans">目前有效(动物学)的或公认(植物学)的分类单元全称,如已知来源和日期信息则需注明。</skos:definition>
9
<skos:definition rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
ja:もし分かるなら著者と日付情報を付記した、現在妥当なあるいは受け入れられている(動物および植物学的な)分類群のフルネーム。
</skos:definition>
<skos:definition xml:lang="ja">
もし分かるなら著者と日付情報を付記した、現在妥当なあるいは受け入れられている(動物および植物学的な)分類群のフルネーム。
</skos:definition>
10
<skos:example rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
en:"Tamias minimus" valid name for "Eutamias minimus"
</skos:example>
<skos:example xml:lang="en">
Example: "Tamias minimus" valid name for "Eutamias minimus"
</skos:example>
11
<skos:example rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
es:"Tamias minimus" nombre válido para "Eutamias minimus"
</skos:example>
<skos:example xml:lang="es">
Ejemplo: "Tamias minimus" nombre válido para "Eutamias minimus"
</skos:example>
12 No output if example field is left empty on SMW concept page
<skos:example xml:lang="zh-Hans"/>
13
<skos:example rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
ja:例: "Tamias minimus" は"Eutamias minimus"の有効な名前である
</skos:example>
<skos:example xml:lang="ja">
例: "Tamias minimus" は"Eutamias minimus"の有効な名前である
</skos:example>
14 This is different to the skos:inScheme output by SMW:

<skos:inScheme rdf:resource="http://terms.gbif.org/wiki/Special:URIResolver/Darwin_Core"/> … and refers to the RDF doc itself with the translations

<skos:inScheme rdf:resource="http://rs.gbif.org/terms/dwc/dwc_translations.rdf"/>
15 Need to pick up property or class designation from concept page <rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>
16 <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#string">2009-09-21</dcterms:modified>
<skos:historyNote
  rdf:value="Modified"
  dc:date="2009-09-21"/>

… based on example:

<skos:historyNote
  rdf:value="Deprecated in favour of foaf:name"
  dc:date="2008-09-25"
  dc:creator="Rob Styles"/>

See here: http://www.xml.com/pub/a/2005/06/22/skos.html for additional info:

“To clarify the difference between a skos:historyNote and a skos:changeNote, a history note is a piece of information intended for users of the scheme, documenting significant changes to the meaning, form, or state of a concept, whereas a change note is intended for documenting fine-grained changes to a concept for the purposes of administration and management.”
17 <rdf:type rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> http://www.w3.org/1999/02/22-rdf-syntax-ns#Property</rdf:type>
Table 2: Header information for the complete scheme
# SMW element Normative SKOS/RDF
1 <skos:conceptScheme rdf:about="http://rs.gbif.org/terms/dwc/dwc_translations.rdf">
2 <dc:description>

This document contains translations of the terms in the Darwin Core glossary expressed using the Simple Knowledge Organisation System (SKOS) data model. Currently four languages are included: English, Spanish, Simplified Chinese and Japanese. </dc:description>

3 <dc:creator>GBIF</dc:creator>
4 <dc:created>2012-05-31</dc:created>
5 <dc:modified>2012-08-17</dc:modified>
6 <dc:contributor>Arturo Ariño, University of Navarra, Spain</dc:contributor>


<dc:contributor>Hiroshi Mori, Tokyo Institute of Technology, Japan</dc:contributor>
<dc:contributor>Laura Roldan, Sistema de información sobre biodiversidad de Colombia</dc:contributor>
<dc:contributor>...</dc:contributor>

Issues

1. how close can you come to the desired outputs in right hand column, e.g., removing the language designations e.g.,"en:" from labels, definitions, examples? --Éamonn Ó Tuama 12:19, 21 November 2012 (CET)

In short not further. We played around a lot trying SMW subobjects, SMW internal objects. Using subobject or internal object causes the language specific notes, definitions examples to be enclosed in their own swivt:Subject It will look like:
<swivt:Subject  rdf:about=".."><skos:definition rdf:datatype="http://www.w3.org/2001/XMLSchema#string">text</skos:definition><dc:language rdf:datatype="…">es</dc:language></swivt:Subject>
So we decided for the simple pseudo solution to prefix the skos:definition. skos:example etc. with the language-code. --Andreas Plank 14:03, 21 November 2012
That's OK. Once I know what the stable output from SMW is, I can work with it. --Éamonn Ó Tuama 14:47, 21 November 2012 (CET)
A question for me is: is that possible to transform it with XSLT into xml:lang="zh-Hans" as it is now with a definition like "zh-Hans:目前有效(动物学)的或公认(植物学)的分类单元全称,如已知来源和日期信息则需注明。" ? --Andreas Plank 15:51, 21 November 2012 (CET)
Yes, it is easy to do this transformation. I have now prepared a test XSLT file available here with sample input and output: http://species-id.net/openmedia/File:SMWToSkos.zip. --Éamonn Ó Tuama 13.56, 22 November 2012 (CET)

2. how to handle the situation as for Japanese where the label is not translated (see line 5 in table 1)? I have been defaulting to the English label for completeness but maybe it should be left blank? --Éamonn Ó Tuama 12:19, 21 November 2012 (CET)

Why not write <skos:prefLabel xml:lang="ja">Accepted Name Usage</skos:prefLabel> ? When I read "<skos:prefLabel xml:lang="ja"/>" i'd think first: oh is the value missing? And maybe later I'd conclude: ah it may seem the English and Japanese Label are of same value. But <skos:prefLabel xml:lang="ja">Accepted Name Usage</skos:prefLabel> would be more clear (at least to me ;-) --Andreas Plank 14:03, 21 November 2012
<skos:prefLabel xml:lang="ja">Accepted Name Usage</skos:prefLabel> seems contradictory to me - it’s saying the content language is Japanese but English is actually being used. Given the output limitations my preference now is to leave the label slot empty in SMW if a translation does not exist. It may actually encourage others to provide a translation! --Éamonn Ó Tuama 14:47, 21 November 2012 (CET)

3. Different use of skos:inScheme (row 14) - but I do not think this matters? Do you agree that the translation document incorporates a scheme in itself? --Éamonn Ó Tuama 12:19, 21 November 2012 (CET)

4. SMW needs to output class/property designation from concept page. Should be easy to pick up these. --Éamonn Ó Tuama 12:19, 21 November 2012 (CET)

I guess it is the concept type parameter in template Concept which maps to rdf:type. This might presently not export, because a bug in SMW version 1.7.1 does not allow # characters in URLs URIs but it is fixed in SMW 1.8. Gregor mentioned to upgrade to MediaWiki 1.20 (which is published some days ago) but we haven't done this yet. But SMW 1.8 is not published yet. So a work around may be in the meantime to set this SMW datatype to String and not URL. (I changed it to string in the mean time)
Good - any work-around will do. --Éamonn Ó Tuama 14:47, 21 November 2012 (CET)

5. John Wieczorek got some feedback (from Japanese, I think) about need to include version of concept that is being translated. This is not in the current translation RDF file. SKOS provides various "notes" changeNote, historyNote, etc. I propose using a simple approach <skos:historyNote rdf:value="Modified" dc:date="2009-09-21"/>. See row 16. Do you agree? --Éamonn Ó Tuama 12:19, 21 November 2012 (CET)

6. Header information: this could be mostly static; for the contributors, I imagine that we would not list every single person who made a suggestion/edit on the wiki but rather have an appointed set of "keepers" for each language. Or is that too exclusive? Note here I'm thinking of translations mainly. --Éamonn Ó Tuama 12:19, 21 November 2012 (CET)

Modified RDF export via #ask

Example: Concepts from class Darwin Core are exported to this RDF.

#ask Syntax (may be easily adapted to meet needs of GBIF) …
{{#ask: [[Category: Darwin Core]] [[Category:Concept]]
|?dcterms:identifier
|?dcterms:issued
|?dcterms:modified
|?skos:prefLabel
|?skos:altLabel
|?skos:definition
|?skos:example
|?skos:inScheme
|?skos:closeMatch
|?vann:termGroup<!-- we use it also for dwcattributes:organizedInClass -->
|?vs:term status<!-- a bug exports vs:term instead of vs:term_status-->
|?rdfs:isDefinedBy
|?rdfs:subPropertyOf
|?dcterms:replaces
|?concept type<!-- the concept type: class, property … -->
|format=rdf
}}

Using XSLT from a Wiki page

  1. generate RDF resource link
  2. use that link and a XSLT style sheet from MediaWiki:SMWToSKOS.xsl on export tool Special:XMLTransformation:
    XSLT processed RDF export (Special:XMLTransformation)

Combined query

Using compound queries (SMW extension) it should be possible to get the right and left RDF-export compound using later one compound query export.

RDF export resource: all concepts in class Darwin Core

RDF export resource: scheme semantics of Darwin Core

{{#ask: [[Category: Darwin Core]] [[Category:Concept]]
|?dcterms:identifier
|?dcterms:issued
|?dcterms:modified
|?skos:prefLabel
|?skos:altLabel
|?skos:definition
|?skos:example
|?skos:inScheme
|?skos:closeMatch
|?vann:termGroup<!-- we use it also for dwcattributes:organizedInClass -->
|?vs:term status<!-- a bug exports vs:term instead of vs:term_status-->
|?rdfs:isDefinedBy
|?terms-internal:isDefinedBy
|?rdfs:subPropertyOf
|?dcterms:replaces
|?concept type<!-- the concept type: class, property … -->
|?vann:usageNote<!-- Wiki page -->
|?sioc:has_discussion<!-- Wiki page of discussion -->
|format=rdf
}}
{{#ask: [[Category: Darwin Core]] [[Category:Concept scheme]]
|?Modification date
|?dc:identifier
|?dc:publisher
|?dcterms:bibliographicCitation
|?dcterms:contributor
|?dcterms:description
|?dcterms:title
|?vann:preferredNamespacePrefix
|?vann:preferredNamespaceUri
|format=RDF
}}

↘                              ↙
RDF export (Combined query of #ask is generated by the code below)
The code in template: Concept scheme/RDF resource link provides the RDF resource link.

Test #compound_query

(not used, a ask query using an OR for two different object classes works ok.)

in a compound query, the properties of the partial queries must match, resulting in a single "column" (requires same number of fields). The above does not work.