Semantic Technologies for Historical Research: A Survey

Contents:

1. Introduction
2. Historical research and semantic technology
3. Historical Information Science
4. Semantic technologies for Historical Information Science
5. Current Historical Semantic Web
6. Conclusions
References

This paper discusses how semantic technology (Semantic Web; linked data) could enhance the online research of economic and social history (section 4.2). In introductory sections, the authors explain the differences between structured, semi-structured and unstructured resources from the IT point of view. Structured resources are for example encoded as relational databases, XML files, spreadsheet workbooks or RDF triple stores. The possibility to apply semantic web technologies to digital historical resources is strongly dependent on the degree of structure of these resources. The less structured a resource, the more complex the software algorithms and workflows necessary to transform or model the data into formal languages. Section 3.2.1 depicts the specific semantic interoperability challenges that these workflows may encounter while extracting and transforming information contained in the sources. Section 4 starts with a brief description of the relevant semantic technology (RDF, OWL and SPARQL), where after the authors describe how these technologies can actually help historians to solve semantic gaps in their datasets. For example, how OWL historical ontologies can be used to facilitate historical knowledge discovery through inference, using OWL reasoners. In order to be able to map concepts and problem definitions as articulated in historical information science to approaches developed by the Semantic Web community, the authors reviewed a variety of publications, as well as projects, datasets, and technologies. Section 5 describes how this desk research, in combination with eight interviews with Dutch pioneers in this area, was conducted. Furthermore, section 5 presents a collection of 67 contributions that are selected, analysed and categorized because they provide advances in some areas of semantic historical computing. The ultimate aim of this domain is to build an open, world wide, online persistent graph of historical linked knowledge.The authors are participating in the e-Research group at DANS and in the CEDAR-project at the Dutch research institute KNAW. In their conclusion they claim that semantic technologies are suitable for representing inner semantics implicitly contained in historical sources. These can be appropriately identified, formalized and linked using the cited tools. The authors hope that current developments, named eHumanities or Digital Humanities, might pave the way for an integration of semantic technologies into a new domain, called eHistory.

The relevance of this paper lies in its presentation of the state of the art in semantic technologies applied to historical datasets, together with the survey of recent contributions to the research in semantic historical computing. Given that the technology described deals with text processing and mining, it is also relevant to the accessibility of online AV archives, where speech recognition, context information, ontologies and controlled vocabularies are combined more and more often to increase audiovisual content accessibility.