F. Kokkoras, N. Bassiliades, I. Vlahavas, “Modeling Information Extraction Wrappers with Conceptual Graphs”, Proc. (2nd Volume) 3rd Panhellenic Conference on Artificial Intelligence (SETN'04), Zitis Publications, ISBN 960-431-910-8, 5-8 May 2004, Samos, Greece, 2004.
Proc. (2nd Volume) 3rd Panhellenic Conference on Artificial Intelligence (SETN'04), Zitis Publications, ISBN 960-431-910-8, 5-8 May 2004, Samos, Greece, 2004.
Keywords:
information extraction, conceptual graphs, personalization, wrapper induction, Document Object Model.
In this paper, we propose the use of the Conceptual Graphs knowledge representation and reasoning formalism to model information extraction wrappers (CG-Wrappers). An information extraction wrapper is a mapping that populates a data repository with implicit objects that exist inside a given web page. Creating a wrapper, usually involves some training by which the wrapper learns to identify the desired information based, mainly, on the surrounding HTML elements. In the paper, we demonstrate how the generalization, specialization and projection operations of the Conceptual Graph theory naturally support both the wrapper induction and the wrapper evaluation tasks. The proposed modeling approach is flexible enough to support wrapper reuse, enabling us in that way to create more complex wrappers.