LiWA technologies released in Open Source

Posted August 30, 2010
Areas : Archive Fidelity, Spam Cleansing, Temporal Coherence, Semantic Evolution, Social Web, Rich Media, General.

LiWA partners are pleased to announce the release in open-source of the complete list of components and tools issued from the LiWA project.

They are all grouped under the “liwa-technologies” project on Google code:
http://code.google.com/p/liwa-technologies/.


1° The Rich Media Capture Module - a plug-in dedicated to the capture of streaming video content:
http://code.google.com/p/liwa-technologies/source/browse/rich-media-capture
http://code.google.com/p/liwa-technologies/downloads/detail?name=rich-media-capture-plugin-1.0.jar

2° The Temporal Coherence Analyser - a plug-in dedicated to the analysis of the temporal coherence of the archived Web content:
http://code.google.com/p/liwa-technologies/source/browse/temporal-coherence

3° The Spam Assessment Interface - a Web service that enables the quality assessment of the archived Web content:
http://code.google.com/p/liwa-technologies/source/browse/assessment-interface

4° The Semantic Analizer - a component dedicated to the detection of terminology evolution:
http://code.google.com/p/liwa-technologies/source/browse/SemanticAnalyser
http://code.google.com/p/liwa-technologies/downloads/detail?name=SemanticAnalyser-1.0.zip

5° The Web Archive UI Framework - a client-side framework that helps creating User Interface helpers for Web archive browsing:
http://code.google.com/p/liwa-technologies/source/browse/web-archive-ui-framework



To learn more about each component, the Google project provides also a wiki space, giving a brief description of each module and the necessary steps for its deployment: http://code.google.com/p/liwa-technologies/w/list



You are all welcome to download and try out the LiWA components. Your feedback and comments will be greatly appreciated, helping us to improve the documentation and the usability of the technologies.

 

LiWA Third Newsletter published

Posted April 07, 2011
Areas : Archive Fidelity, Spam Cleansing, Temporal Coherence, Semantic Evolution, Social Web, Rich Media, General, Events.

The LiWA Newsletter No3 is now available, summarizing the findings and results of the 36 months project. Enjoy reading it!

 

Web spam classification: a few features worth more

Posted March 10, 2011
Areas : Spam Cleansing, General, Events.

The paper entitled “Web spam classification: a few features worth more”, co-written by M. Erdélyi, A. Garzó, and A. A. Benczúr has been accepted for presentation in Joint Web Quality 2011 in conjunction with the WWW2011, Hyderabad, India, ACM Press 2011.

In this paper we investigate how much various classes of Web spam features, some requiring very high computational effort, add to the classification accuracy. We realize that advances in machine learning, an area that has received less attention in the adversarial IR community, yields more improvement than new features and result in low cost yet accurate spam filters.

 

Lecture “Web Archiving” at Stuttgart Media University (HdM) on March 17, 2009

Posted February 20, 2009
Areas : Spam Cleansing, Temporal Coherence, Events.

Marc Spaniol (Max-Planck-Institute for Computer Science) gave a lecture on “Web Archiving” within the scope of the course “Creation of an E-learning Module for Web Archiving” on March 17, 2009 at Stuttgart Media University (HdM), Germany.

On March 17, 2009 Dr. Marc Spaniol from Max-Planck-Institute for Computer Science has presented ongoing research taking place within the LiWA project to students and scientists of Stuttgart Media University (HdM). During the 90 minutes lecture, Dr. Spaniol introduced the main issues in Web archiving and present examples of Web spam as well as temporal coherence ensuring crawling strategies to the audience. The lecture was part of an elective course on “Creation of an E-learning Module for Web Archiving” in the scope of the Library and Information Management program. This course is organized by Prof. Markus Hennies and Prof. Heidrun Wiesenmüller M.A.. The lecture is open to public and takes place at 14.15 on March 17 at HdM’s site in Wolframstraße. Guests interested in Web archiving and/or LiWA (in particular) are welcome.

 

Lecture “Web Archiving” held at RWTH Aachen University

Posted December 08, 2008
Areas : Spam Cleansing, Temporal Coherence, Events.

Marc Spaniol (Max-Planck-Institute for Computer Science) gave a lecture on “Web Archiving” within the scope of the “Web Science” course on November 28, 2008 at RWTH Aachen University, Germany.

The course on Web Science at RWTH Aachen University organized by Prof. Dr. Matthias Jarke and Dr. Ralf Klamma of Lehrstuhl Informatik 5 addresses Web Science as a new and challenging study field in computer science. This course covers the wide range of current and emerging Web concepts, technologies and Web-based software systems. In order to present recent approaches in the scope of Web archiving, Dr. Marc Spaniol from Max-Planck-Institute for Computer Science was invited to present ongoing research taking place within the LiWA project. During the 90 minutes lecture, Dr. Spaniol introduced the main issues in Web archiving and presented examples of Web spam as well as temporal coherence ensuring crawling strategies to the audience.

 

Half day session on LiWA during IWAW

Posted September 18, 2008
Areas : Archive Fidelity, Spam Cleansing, Temporal Coherence, Semantic Evolution, General, Events.

A dedicated session took place during the 8th International Web Archiving Workshop

image
Over 70 web archivists and researchers in this domain attended the 8th edition of IWAW during which a full session was dedicated to present research objectives and early results from LiWA.
image Lots of questions and interest from the audience, which is good sign for us. See below links to presentations from this session:

Web Spam: a Survey with Vision for the Archivist
Andras Benczur, David Siklosi, Jacint Szabo, Istvan Biro, Zsolt Fekete, Miklos Kurucz, Attila Pereszlenyi, Simon Racz, Adrienn Szabo (paper, presentation)

imageTerminology Evolution in Web Archiving: Open Issues
Nina Tahmasebi, Tereza Iofciu, Thomas Risse, Claudia Niederée, Wolf Siberski (paper,presentation)

Liwa Architecture
Radu Pop, Wolf Siberski, Mark Williamson (presentation)

“Catch me if you can”. Temporal Coherence of Web Archives
Marc Spaniol (presentation)

The Challenge of Dynamic Links
Mark Williamson (presentation)