Paper Details

  • Title:

    DEiXTo: A web data extraction suite

  • Author(s):

    F. Kokkoras, K. Ntonas, Nick Bassiliades

  • Keywords: web data extraction, web scraping, pattern matching.
  • Abstract:

    Web data extraction (or web scraping) is the process of collecting unstructured or semi-structured information from the World Wide Web, at different levels of automation. It is an important, valuable and practical approach towards web reuse while at the same time can serve the transition of the web to the semantic web, by provid-ing the structured data required by the latter. In this paper we present DEiXTo, a web data extraction suite that provides an arsenal of features aiming at designing and deploying well-engineered extraction tasks. We focus on presenting the core pattern matching algorithm and the overall architecture, which allows programming of custom-made solutions for hard extraction tasks. DEiXTo consists of both freeware and open source components.

  • Category: Conference Papers
  • Tags: 2013 Kokkoras Ntonas Bassiliades