Presented at IWAW 08 by Andras Benczur, David Siklosi, Jacint Szabo, Istvan Biro, Zsolt Fekete, Miklos Kurucz, Attila Pereszlenyi, Simon Racz, Adrienn Szabo
While Web archive quality is endangered by Web spam, a side effect of the high commercial value of top-ranked search-engine results, so far Web spam ﬁltering technologies are rarely used by Web archivists. In this paper we make the ﬁrst attempt to disseminate existing methodology and envision a solution for Web archives to share knowledge and unite efforts in Web spam hunting. We survey the state of the art in Web spam ﬁltering illustrated by the recent Web spam challenge data sets and techniques and describe the ﬁltering solution for archives envisioned in the LiWA project.
See paper here