DEiXTo (or ΔEiXTo) is a powerful web data extraction tool that is based on the W3C Document Object Model (DOM). It allows users to create highly accurate “extraction rules” (wrappers) that describe what pieces of data to scrape from a web page. DEiXTo consists of two separate, standalone components:
- GUI DEiXTo, an MS Windows™ application implementing a graphical user interface that is used to manage extraction rules (build, test, fine-tune, save and modify), and
- DEiXTo Executor, a stand-alone extraction rule executor (command line utility) that massively and automatically applies extraction rules on targeted HTML pages and produces structured output in a variety of formats.
DEiXTo can contend with a wide range of web sites with high precision and recall, since it provides the user with an arsenal of features aiming at the construction of well-engineered extraction rules. Wrappers built with GUI DEiXTo can be scheduled to run automatically providing periodic and automated access to resources of interest, saving users a lot of time, energy and repetitive effort.