Talk “Data Quality in Web Archiving”

Posted April 16, 2009
Areas : Temporal Coherence, General, Events.

A talk on “Data Quality in Web Archiving” will be given at the 3rd Workshop on Information Credibility on the Web (WICOW 2009) in Madrid (Spain) on Monday, April 20.

The paper on “Data Quality in Web Archiving” by Marc Spaniol, Dimitar Denev, Arturas Mazeika, Pierre Senellart and Gerhard Weikum will be presented at the 3rd Workshop on Information Credibility on the Web (WICOW 2009). The Workshop and paper presentation takes places on April 20 at Madrid, Spain and is organized in conjunction with the 18th International World Wide Web Conference (WWW 2009). The paper addresses the problems of capturing a large Web site that may span hours or even days, which increases the risk that contents collected so far are incoherent with the parts that are still to be crawled. The paper introduces a model for identifying coherent sections of an archive and, thus, measuring the data quality in Web archiving. Additionally, a crawling strategy is introduced that aims to ensure archive coherence by minimizing the diffusion of Web site captures.