Wrapping Web Pages into XML Documents with Norfolk
Anne-Marie
Vercoustre
CSIRO-MIS Technologies for Electronic Documents Group
Tuesday 16 October at 11am
Abstract
The notion of wrapping a web server into XML documents is driven
from the need for structured data that can be used by a variety
of applications. The web contains vast amounts of information
that is useless to most applications since it is mainly targeting
a human audience. A solution to this would be to automate the
browsing process and then convert the extracted information into
a more suitable format - like XML. This is called wrapping. We
have used two different tools to wrap several tourist sites into
XML The tool we have been using are Norfolk, a system developed
since 1997 by the TED group and W4F, initially developed at the
University of Pennsylvania, now a commercial product.
This presentation will introduce the general tasks of wrappers
and will present Norfolk, a system initially develop for creating
virtual documents from heterogeneous sources. It has recently
been extended to cater for the creation of XML documents for the
purpose of wrapping.
It will also discuss the limitation of current approaches and
will suggest some future research directions.
Short resume
Dr. Anne-Marie Vercoustre is a senior researcher in the
Research group for Technology for Electronic Documents ( TED) at
CSIRO Mathematical and Information Sciences Division, based in Melbourne.
Her main research interests are in structured document (SGML-like),
document workflow, corporate memory, and the reuse of information
from heterogeneous and distributed sources. Before joining CSIRO
in September 2000, Anne-Marie has been a researcher for more than
twenty years at INRIA, France, where she has been involved in research
on syntax-directed programming environments and structured document
tools. She has participated in several European projects around
Software factories and Digital Libraries. She is currently on the
chair board of SIGWEB and on many conference program committees.
Back to HAIL Home Page