Thomas Risse (L3S) presented LiWA and the problem terminology evolution at the Institute for Natural Language Processing of the University Stuttgart
Due to the central role that the World Wide Web plays in nearly all areas of today’s life, adequate Web archiving has become a cultural necessity in preserving knowledge. A first generation of Web archiving technology has been built by pioneers in the domain based on existing search technology. The next generation web archiving technologies will overcome limitations in content capture, preservation, analysis and enrichment. It is the goal of the LiWA project (Living Web Archives, IST FP7 216267) to turn Web archives from pure Web page storages into “living Web archives”. Such living archives, will be capable of: handling a variety of content types; dealing with evolution as well as long-term archive interpretability.
One important aspect is the archive interpretability. The correspondence between the terminology used for querying and the one used in content objects to be retrieved is a crucial prerequisite for effective content access based on retrieval technology. However, as terminology is evolving over time, a growing gap opens up between older documents in (long-term) archives and the active language used for querying such archives. Thus, technologies for detecting and systematically handling terminology evolution are required to ensure ``semantic’’ accessibility of (Web) archive content on the long run.
Within this talk we give an overview about the LiWA project and present in more detail the problem of terminology evolution by giving a more formal problem statement and discuss issues, first ideas and relevant technologies.