Document Structure Analysis

The logical structure of a document describes the document’s logical components. Examples for logical components are headlines, tables, footnotes, or enumerations. Several applications in the document processing domain benefit from the information of the logical structure, for instance information extraction or document retrieval tasks. Unfortunately, the typical document formats do not contain all the structure information desirable for automatic document processing. In order to make use of the document structure, the document has to be analyzed and the original structure of the document has to be reconstructed.

The goal of the project is to develop a general framework for automatic structure analysis that is applicable to different document types, e.g. file cards, business letters, or papers. The combination of visual and automatic methods allows an efficient and effective creation of the required document models according to the user’s need.

More information about this and related work can be found in the following publications.

Publications