The SOLAR System for Sharp Web Archiving

Posted October 22, 2010
Areas : Temporal Coherence, General.

The paper entitled “The SOLAR System for Sharp Web Archiving” by A. Mazeika, D. Denev, M. Spaniol and G. Weikum has been accepted for presentation at IWAW 2010

This paper presents the SOLAR (Scheduling of Downloads for Archiving of Web Sites) system for sharp Web archiving. SOLAR crawls all pages of a Web site and then re-crawls the visited pages forming visit-revisit intervals. If all visit-revisit intervals overlap and no page changed between its visit and revisit then all pages are “sharp” and captured as if the entire site were downloaded instantaneously. SOLAR judiciously schedules visits and revisits to maximize the number of sharp pages based on the predictions of page-specific change rates. Experiments with synthetic date show SOLAR outperforms existing techniques and captures the sites as sharp as possible.