|
TIES (Trainable Information Extraction System) |
TIES (Trainable Information Extraction System) is an Adaptive Information Extraction (IE) system currently under development at ITC-irst within the Dot.Kom project. TIES is based on a Java reimplementation of the Boosted Wrapper Induction (BWI) algorithm devised by Dayne Freitag and Nicholas Kushmerick [Freitag & Kushmerick 00].
Boosted Wrapper Induction (BWI) is an IE technique that uses the AdaBoost algorithm to generate a general extraction procedure that combines a set of specific wrappers [Freitag & Kushmerick 00]. BWI has been shown to perform well on a variety of tasks with moderately structured and highly structured documents. [Kauchak et al 02] investigated how boosting contributes to this success and examined its performance in the challenging direction of using BWI as an IE method for unstructured Natural Language documents.
TIES automatically markups the documents with a predefined set of XML tags, exploiting markup rules automatically learned from a corpus previously annotated. The XML tags are intended to identify instances of entities from a set of relevant elements defined by the user. The system architecture is based on boosting and wrapper induction techniques, but it has a high degree of flexibility that allows the development of new weak learners, as well as to add new validation strategies. The standard behaviour provided by the current implementation employs only simple orthographic features, but more complex features (e.g., morpho-syntactic ones) can be added to improve the performances simply using a customized preprocessor.
TIES is implemented in Java and runs on all platforms supporting Java 2.
The current release of TIES (version 2.2) was delivered in March 2004.
If you are interested in obtaining TIES for research purposes, please contact
Claudio Giuliano (email: giuliano [at] itc.it) and Alberto Lavelli (email: lavelli [at] itc.it).
For technical information about TIES, please contact Claudio Giuliano (email: giuliano [at] itc.it).