Zenz. G., Tahmasebi, N., and T. Risse have been invited for submission of their paper “Language Evolution On The Go” (Extended Version) to the Journal on Multimedia Tools and Applications.
The paper “On the Applicability of Word Sense Discrimination on 201 Years of Modern English”, co-written by Tahmasebi, N., K. Niklas, G. Zenz, and T. Risse has been submitted to the Journal of Computational Linguistics.
Word sense discrimination is the first, important step towards automatic detection of language evolution within large, historic document collections. By comparing the found word senses over time, we can reveal and use important information that will improve understanding and accessibility of a digital archive. Algorithms for word sense discrimination have been developed while keeping today’s language in mind and have thus been evaluated on well selected, modern datasets. The quality of the word senses found in the discrimination step has a large impact on the detection of language evolution. Therefore, as a first step, we verify that word sense discrimination can successfully be applied to digitized historic documents and that the results correctly correspond to word senses. Because accessibility of digitized historic collections is influenced also by the quality of the optical character recognition (OCR), as a second step we investigate the effects of OCR errors on word sense discrimination results. All evaluations in this paper are performed on The Times Archive, a collection of newspaper articles from 1785 - 1985.
The paper entitled “Language Evolution On The Go” by G. Zenz, N. Tahmasebi and T. Risse will be presented at SAME 2010
Knowing about the evolution of a term can significantly decrease time needed for searching for information. It can also aid in quickly getting a broader overview, which is essential when one is on the move. In this paper we present a solution for providing language evolution knowledge “on the go”. On the 3rd International Workshop on Semantic Ambient Media Experience 2010, November 10th in conjunction with AmI-10 in Malaga, Spain, the LiWA project will present a mobile interface for easy access and visualization as well as an overview of how this evolution was found.
The paper entitled “Terminology Evolution Module for Web Archives in the LiWA Context” by N. Tahmasebi, G. Zenz, T. Risse and T. Iofciu has been accepted for presentation at IWAW 2010
This paper presents the LiWA Terminology evolution module, TeVo which takes us one step closer to fully automatic detection of terminology evolution. TeVo consists of a pipeline for finding evolution from web archives based on the UIMA framework. The LiWA TeVo module consists of two main processing chains, the first for Warc file extraction and text processing and the second for finding terminology evolution. The terminology evolution browser is also presented, the TeVo browser, which aids in exploring evolution of terms present in archives.
A paper entitled “Using Word Sense Discrimination on Historic Document Collections” has been accepted for presentation at the 10th ACM/IEEE JCDL
The paper entitled “Using Word Sense Discrimination on Historic Document Collections” by Nina Tahmasebi, Kai Niklas, Thomas Theuerkauf and Thomas Risse has been accepted in the 10th ACM/IEEE Joint Conference on Digital Libraries. The paper evaluates word sense discrimination on historic document collections to investigate if word senses can be found automatically using modern technology applied on historic data. The paper also investigates which impact OCR errors, present in scanned historic documents, have on finding word senses in an automatic way. Finding word senses in an automatic way is the first step towards detecting terminology evolution and hence an important step in our research. Nina Tahmasebi will present the paper on June 22nd, 2010 at JCDL which is held in conjunction with ICADL in Surfers Paradise (Gold Coast, Australia).
A paper on First Results on Detecting Term Evolutions has been accepted at IWAW 2009
The paper “First Results on Detecting Term Evolutions” by Nina Tahmasebi, Sukriti Ramesh and Thomas Risse has been accepted and presented at IWAW09 which took take place the 30th of September and 1st of October 2009, in conjunction with ECDL in Corfu Greece. The paper presents first results on Detecting Term evolutions.
A paper on Automatic Detection on Terminology Evolution has been accepted at On The Move Academy in conjunction with On The Move Federated Conferences 2009
The paper entitled “Automatic Detection on Terminology Evolution” by Nina Tahmasebi has been accepted for presentation at the On The Move Academy 2009 in conjunction with the On The Move Federated Conferences, Vilamoura, Portugal 2009. The paper won Best Paper Award which was handed out on during the social event on November 4. The paper presents a Ph.D. proposal on the topic of detecting term evolutions for use in information retrieval in long term archives.
Around 40 participants attended IWAW2009, which took place on Sep. 30 / Oct. 1 2009, in conjunction with ECDL in Corfu (Greece). The workshop provided a comprehensive overview on active research and practice on the preservation of the Web. This year’s workshop also addressed several new approaches and research (from virtual worlds preservation to temporal dimension of Web Archives) as well as practical issues addressed by Archiving institutions, specifically with respect to managing the storage of large volumes of digital material. In this context, a special Session was devoted to the WARC storage format, which has been accepted as a new ISO standard (ISO 28500:2009), as well as emerging tool support to handle these container objects. In general, scalability issues and managing large-volume crawls were topics of intensive discussions, based on the increasing body of experience available in numerous institutions by now, running a series of Web archiving activities in a range of different configurations.
A paper on dealing with terminology evolution in web archives has been accepted in the 12th International Workshop on the Web and Databases (WebDB 2009)
The paper entitled ‘Bridging the Terminology Gap in Web Archive Search’ by Klaus Berberich, Srikanta Bedathur, Mauro Sozio, and Gerhard Weikum has been accepted in the 12th International Workshop on the Web and Databases (WebDB 2009). The paper proposes a method to find query reformulations that paraphrase users’ information needs using past terminology. Such query reformulations are key to retrieving old but highly relevant documents in web archives that were written using now outdated terminology. Klaus Berberich will present the paper on June 28th, 2009 at WebDB 2009, which is held in conjunction with SIGMOD 2009 in Providence (Rhode Island, USA).
Presented at IWAW 08 by Radu Pop, Wolf Siberski, Mark Williamson
See presentation here
Presented at IWAW 08 by Nina Tahmasebi, Tereza Iofciu, Thomas Risse, Claudia Niederée, Wolf Siberski
The correspondence between the terminology used for querying and the one used in content objects to be retrieved, is a crucial prerequisite for effective retrieval technology. However, as terminology is evolving over time, a growing gap opens up between older documents in (long-term) archives and the active language used for querying such archives. Thus, technologies for detecting and systematically handling terminology evolution are required to ensure “semantic” accessibility of (Web) archive content on the long run. As a starting point for dealing with terminology evolution this paper formalizes the problem and discusses issues, first ideas and relevant technologies.