IWAW Proceedings online
IWAW09 took take place the 30th of September and 1st of October 2009, in conjunction with ECDL in Corfu (Greece). The proceedings are now available online.
Around 40 participants attended IWAW2009, which took place on Sep. 30 / Oct. 1 2009, in conjunction with ECDL in Corfu (Greece). The workshop provided a comprehensive overview on active research and practice on the preservation of the Web. This year’s workshop also addressed several new approaches and research (from virtual worlds preservation to temporal dimension of Web Archives) as well as practical issues addressed by Archiving institutions, specifically with respect to managing the storage of large volumes of digital material. In this context, a special Session was devoted to the WARC storage format, which has been accepted as a new ISO standard (ISO 28500:2009), as well as emerging tool support to handle these container objects. In general, scalability issues and managing large-volume crawls were topics of intensive discussions, based on the increasing body of experience available in numerous institutions by now, running a series of Web archiving activities in a range of different configurations.
Talk about “Turning pure Web Page Storages into Living Web Archives” at Cultural Heritage on line
The LiWA applications and its R&D challenges will be presented at the Conference “Cultural Heritage on line Empowering users: an active role for user communities” at Florence, Italy on the 15th and 16th of December, 2009
Web content plays an increasingly important role in the knowledge-based society, and the preservation and long-term accessibility of Web history has high value (e.g., for scholarly studies, market analyses, intellectual property disputes, etc.). There is strongly growing interest in its preservation by libraries and archival organizations as well as emerging industrial services. Web content characteristics (high dynamics, volatility, contributor and format variety) make adequate Web archiving a challenge.
LiWA will look beyond the pure “freezing” of Web content snapshots for a long time, transforming pure snapshot storage into a “Living” Web Archive. In order to create Living Web Archives, the LiWA project will address R&D challenges in the three areas: Archive Fidelity, Archive coherence and Archive interpretability. The results of the project will be demonstrated within two application scenarios namely “Streaming Archive” and “Social Web Archive”. The Streaming Archive application will showcase the building of an audio-visual Web archive and how audio and video broadcast related web information can be preserved. The Social Web application will demonstrate how web archives can capture the dynamics and the different types of user interaction of the social web.
Talk “From Web page storages to Living Web Archive” at London
Dr. Thomas Risse (L3S) will give a talk at the “JISC, the DPC and the UK Web Archiving Consortium Workshop”, at the The British Library Conference Centre in London, on July 21st.
The paper on “From Web page storages to Living Web Archive” will be presented by Dr. Thomas Risse, at the JISC, the DPC and the UK Web Archiving Consortium Workshop which will take place at The British Library Conference Centre in London, on July 21st.
SHARC: Framework for Quality-Conscious Web Archiving
A paper on quality-conscious web archiving has been accepted in the 35th International Conference on Very Large Data Bases (VLDB 2009)
The paper on quality-conscious web archiving by Dimitar Denev, Arturas Mazeika, Marc Spaniol, and Gerhard Weikum has been accepted for presentation to the 35th International Conference on Very Large Data Bases (VLDB 2009). The conference takes place on 24-28 August in Lyon, France. The paper presents the SHARC framework for assessing the data quality in Web archives and for tuning capturing strategies towards better quality with given resources. The paper defines quality measures, characterise their properties, and derives a suite of quality-conscious scheduling strategies for archive crawling.
Talk “Data Quality in Web Archiving”
A talk on “Data Quality in Web Archiving” will be given at the 3rd Workshop on Information Credibility on the Web (WICOW 2009) in Madrid (Spain) on Monday, April 20.
The paper on “Data Quality in Web Archiving” by Marc Spaniol, Dimitar Denev, Arturas Mazeika, Pierre Senellart and Gerhard Weikum will be presented at the 3rd Workshop on Information Credibility on the Web (WICOW 2009). The Workshop and paper presentation takes places on April 20 at Madrid, Spain and is organized in conjunction with the 18th International World Wide Web Conference (WWW 2009). The paper addresses the problems of capturing a large Web site that may span hours or even days, which increases the risk that contents collected so far are incoherent with the parts that are still to be crawled. The paper introduces a model for identifying coherent sections of an archive and, thus, measuring the data quality in Web archiving. Additionally, a crawling strategy is introduced that aims to ensure archive coherence by minimizing the diffusion of Web site captures.
Lecture “Web Archiving” at Stuttgart Media University (HdM) on March 17, 2009
Marc Spaniol (Max-Planck-Institute for Computer Science) gave a lecture on “Web Archiving” within the scope of the course “Creation of an E-learning Module for Web Archiving” on March 17, 2009 at Stuttgart Media University (HdM), Germany.
On March 17, 2009 Dr. Marc Spaniol from Max-Planck-Institute for Computer Science has presented ongoing research taking place within the LiWA project to students and scientists of Stuttgart Media University (HdM). During the 90 minutes lecture, Dr. Spaniol introduced the main issues in Web archiving and present examples of Web spam as well as temporal coherence ensuring crawling strategies to the audience. The lecture was part of an elective course on “Creation of an E-learning Module for Web Archiving” in the scope of the Library and Information Management program. This course is organized by Prof. Markus Hennies and Prof. Heidrun Wiesenmüller M.A.. The lecture is open to public and takes place at 14.15 on March 17 at HdM’s site in Wolframstraße. Guests interested in Web archiving and/or LiWA (in particular) are welcome.
Lecture “Web Archiving” held at RWTH Aachen University
Marc Spaniol (Max-Planck-Institute for Computer Science) gave a lecture on “Web Archiving” within the scope of the “Web Science” course on November 28, 2008 at RWTH Aachen University, Germany.
The course on Web Science at RWTH Aachen University organized by Prof. Dr. Matthias Jarke and Dr. Ralf Klamma of Lehrstuhl Informatik 5 addresses Web Science as a new and challenging study field in computer science. This course covers the wide range of current and emerging Web concepts, technologies and Web-based software systems. In order to present recent approaches in the scope of Web archiving, Dr. Marc Spaniol from Max-Planck-Institute for Computer Science was invited to present ongoing research taking place within the LiWA project. During the 90 minutes lecture, Dr. Spaniol introduced the main issues in Web archiving and presented examples of Web spam as well as temporal coherence ensuring crawling strategies to the audience.
Half day session on LiWA during IWAW
A dedicated session took place during the 8th International Web Archiving Workshop
Over 70 web archivists and researchers in this domain attended the 8th edition of IWAW during which a full session was dedicated to present research objectives and early results from LiWA.
Lots of questions and interest from the audience, which is good sign for us. See below links to presentations from this session:
Web Spam: a Survey with Vision for the Archivist
Andras Benczur, David Siklosi, Jacint Szabo, Istvan Biro, Zsolt Fekete, Miklos Kurucz, Attila Pereszlenyi, Simon Racz, Adrienn Szabo (paper, presentation)
Terminology Evolution in Web Archiving: Open Issues
Nina Tahmasebi, Tereza Iofciu, Thomas Risse, Claudia Niederée, Wolf Siberski (paper,presentation)
Liwa Architecture
Radu Pop, Wolf Siberski, Mark Williamson (presentation)
“Catch me if you can”. Temporal Coherence of Web Archives
Marc Spaniol (presentation)
The Challenge of Dynamic Links
Mark Williamson (presentation)
