Information Extraction in Structured Documents Using Tree Automata Induction

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/701

Title:	Information Extraction in Structured Documents Using Tree Automata Induction
Authors:	Kosala, Raymond VAN DEN BUSSCHE, Jan Bruynooghe, Maurice Blockeel, Hendrik
Issue Date:	2002
Publisher:	Springer-Verlag
Source:	Principles of Data Mining and Knowledge Discovery: 6th European Conference, PKDD 2002. p. 299-310
Series/Report:	Lecture Notes in Computer Science
Series/Report no.:	2431
Abstract:	Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, such documents have a tree structure. Hence it is natural to investigate methods that are able to recognise and exploit this tree structure. We do this by exploring the use of tree automata for IE in structured documents. Experimental results on benchmark data sets show that our approach compares favorably with previous approaches.
Document URI:	http://hdl.handle.net/1942/701
ISSN:	0302-9743
Category:	A1
Type:	Journal Contribution
Appears in Collections:	Research publications

File	Description	Size	Format
datamining2.pdf		517.84 kB	Adobe PDF	View/Open

Check