LiWA partners are pleased to announce the release in open-source of the complete list of components and tools issued from the LiWA project.
They are all grouped under the “liwa-technologies” project on Google code:
1° The Rich Media Capture Module - a plug-in dedicated to the capture of streaming video content:
2° The Temporal Coherence Analyser - a plug-in dedicated to the analysis of the temporal coherence of the archived Web content:
3° The Spam Assessment Interface - a Web service that enables the quality assessment of the archived Web content:
4° The Semantic Analizer - a component dedicated to the detection of terminology evolution:
5° The Web Archive UI Framework - a client-side framework that helps creating User Interface helpers for Web archive browsing:
To learn more about each component, the Google project provides also a wiki space, giving a brief description of each module and the necessary steps for its deployment: http://code.google.com/p/liwa-technologies/w/list
You are all welcome to download and try out the LiWA components. Your feedback and comments will be greatly appreciated, helping us to improve the documentation and the usability of the technologies.
The LiWA Newsletter No3 is now available, summarizing the findings and results of the 36 months project. Enjoy reading it!
The paper entitled “Web spam classification: a few features worth more”, co-written by M. Erdélyi, A. Garzó, and A. A. Benczúr has been accepted for presentation in Joint Web Quality 2011 in conjunction with the WWW2011, Hyderabad, India, ACM Press 2011.
In this paper we investigate how much various classes of Web spam features, some requiring very high computational effort, add to the classification accuracy. We realize that advances in machine learning, an area that has received less attention in the adversarial IR community, yields more improvement than new features and result in low cost yet accurate spam filters.
Marc Spaniol (Max-Planck-Institute for Computer Science) gave a lecture on “Web Archiving” within the scope of the course “Creation of an E-learning Module for Web Archiving” on March 17, 2009 at Stuttgart Media University (HdM), Germany.
On March 17, 2009 Dr. Marc Spaniol from Max-Planck-Institute for Computer Science has presented ongoing research taking place within the LiWA project to students and scientists of Stuttgart Media University (HdM). During the 90 minutes lecture, Dr. Spaniol introduced the main issues in Web archiving and present examples of Web spam as well as temporal coherence ensuring crawling strategies to the audience. The lecture was part of an elective course on “Creation of an E-learning Module for Web Archiving” in the scope of the Library and Information Management program. This course is organized by Prof. Markus Hennies and Prof. Heidrun Wiesenmüller M.A.. The lecture is open to public and takes place at 14.15 on March 17 at HdM’s site in Wolframstraße. Guests interested in Web archiving and/or LiWA (in particular) are welcome.
Marc Spaniol (Max-Planck-Institute for Computer Science) gave a lecture on “Web Archiving” within the scope of the “Web Science” course on November 28, 2008 at RWTH Aachen University, Germany.
The course on Web Science at RWTH Aachen University organized by Prof. Dr. Matthias Jarke and Dr. Ralf Klamma of Lehrstuhl Informatik 5 addresses Web Science as a new and challenging study field in computer science. This course covers the wide range of current and emerging Web concepts, technologies and Web-based software systems. In order to present recent approaches in the scope of Web archiving, Dr. Marc Spaniol from Max-Planck-Institute for Computer Science was invited to present ongoing research taking place within the LiWA project. During the 90 minutes lecture, Dr. Spaniol introduced the main issues in Web archiving and presented examples of Web spam as well as temporal coherence ensuring crawling strategies to the audience.
A dedicated session took place during the 8th International Web Archiving Workshop
Over 70 web archivists and researchers in this domain attended the 8th edition of IWAW during which a full session was dedicated to present research objectives and early results from LiWA.
Lots of questions and interest from the audience, which is good sign for us. See below links to presentations from this session:
Web Spam: a Survey with Vision for the Archivist
Andras Benczur, David Siklosi, Jacint Szabo, Istvan Biro, Zsolt Fekete, Miklos Kurucz, Attila Pereszlenyi, Simon Racz, Adrienn Szabo (paper, presentation)
Radu Pop, Wolf Siberski, Mark Williamson (presentation)
“Catch me if you can”. Temporal Coherence of Web Archives
Marc Spaniol (presentation)
The Challenge of Dynamic Links
Mark Williamson (presentation)