Updated September 25, 2008 — Area : Semantic Evolution.
Enabling legibility of web archives
Due to the central role that the World Wide Web plays in nearly all areas of today’s life, its continuous growth, and its change rate, adequate Web archiving has become a cultural necessity in preserving knowledge. Ensuring archival of its content - which is a complex task by itself - is just the first step toward “full”’ content preservation. It also has to be ensured that content can be found and interpreted on the long run.
This type of semantic accessibility of content suffers due to changes in language over time, especially if we consider time frames beyond ten years. Language changes are triggered by various factors including new insights, political and cultural trends, new legal requirements, high-impact events, etc. As an example consider the name of the city Saint Petersburg: This Russian city was founded in 1703 as “Sankt-Piter-Burh” and soon after renamed to “Saint Petersburg”’. From 1914-1924 it was named “Petrograd” and afterwards “Leningrad”. Since 1991 the name changed back to “Saint Petersburg”. Evolution of terms is of course not restricted to location names and the terminology change rate clearly depends on the domain of discourse.
Due to this terminology development over time, search with standard information retrieval techniques, using current language or terminology will not be able to find all relevant content created in the past, when other terms were used to express the sought content.
For keeping Web archives semantically accessible it is necessary to develop methods for automatically dealing with terminology evolution. This includes the detection of terminology evolution as well as ways to integrate the knowledge about terminology evolution into time-aware retrieval approaches, such as the one presented in. The query ``Saint Petersburg’’ could, for example, be expanded with the right terms for the different periods when querying an archive (``Saint Piter Burh’’—> ``Saint Petersburg’’—> ``Petrograd’’—> ``Leningrad’’—> ``Saint Petersburg’‘).
Adequately dealing with terminology evolution requires the consideration of the linguistic and of the semantic layer: in addition to emerging and vanishing terms, it is exactly the change in the mapping between language (terms used) and concepts (intended meaning) that constitutes terminology evolution.