Updated September 25, 2008 — Area : Temporal Coherence.
How to avoid the “lost in cyberspace syndrome” in web archives
The goal in this area of research is to develop novel methods that allow archives to provide temporal coherence and achieve a higher level of quality. This includes methods for proper dating of Web pages, which is difficult as the capturing of a single Web site may take hours during which changes can be made or pages can become unavailable. Specifically the work package objectives are:
- Provide coherent crawls with complete and correct temporal metadata, including time-aware reference sources such as the Wikipedia history.
- Reconcile temporal information across multiple crawls and/or multiple archives.
- Optimize efficiency for time-aware Web crawls.