Archive fidelity
Updated September 24, 2008  Area : Archive Fidelity.

Enabling complete and faithfull capture of web content

The objective of LiWA in this domain is to improve dramatically the fidelity of Web archives by enabling capture of content defeating current Web capture tools. This comprises the ability to find links to resources regardless of the encoding using virtual browsing, the detection and capture of structural hidden Web and the capacity to handle streaming protocols to capture rich media Web sites:

- Ensure that the requirements are comprehensively documented and published, together with demonstrations of the state of the art.

- Extend the current Web capture state of the art using multiple technologies and strategies, for example a virtual browser with integrated ECMAscript/Javascript engine, a tethered browser, and advanced Web scraping.

- Develop test data and test scripts to measure metrics for capture of dynamic, hidden and rich content .

- Demonstrate measurable improvements in Web capture as the new technologies are deployed,achieving a statistically significant improvement over the current state of the art by the end of the LiWA project.

