Metadata Provenance in Europeana and the Semantic Web

Contents:

1 Introduction
2 Metadata and the Semantic Web
3 Provenance
4 Metadata Provenance
5 Metadata Provenance in Europeana
6 Discussion
References

This thesis gives an overview on various ways and best-practices to handle metadata provenance in the Semantic Web. First it is explained what provenance metadata is and how it can be represented in RDF. In chapter 3, the mapping between Dublin Core and PROV (data model for the specification of activities that affect the described resource) demonstrates, how much provenance information is “hidden” in Dublin Core metadata. The PROV vocabulary is recommended if the whole provenance chain of a resource has to be tracked, possibly with additional information about the underlying workflow and the lifecycle of the resource. In chapter 4 the difficulties and possible approaches how to represent the provenance of metadata are investigated. Decoupling of metadata from the described resources is accomplished with RDF. Herefor the author proposes an extension of the Dublin Core Abstract Model, together with a revision that formulates DCAM in RDF. Chapter 5 works out a graph-based Europeana Data Model (EDM) where metadata provenance can be implemented with the next RDF version. The thesis ends with a list of four cornerstones proposed for the next version of the EDM and explaining the need for metadata provenance in Europeana.

Keeping track of provenance information on metadata becomes a necessity in case of the enrichment of existing data via automatic indexing. For Linked Data practitioners this thesis presents a technical representation of provenance in RDF. Also relevant read for archivists who strive that the origins, conditions, rules and other means of production of every statement within their (audiovisual) archive are known and can be used to put it into the right context.