Semantic information extraction from images of complex documents


Autoria(s): Peanho, Claudio Antonio; Stagni, Henrique; Silva, Flavio Soares Correa da
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

08/08/2013

08/08/2013

01/12/2012

Resumo

Even though the digital processing of documents is increasingly widespread in industry, printed documents are still largely in use. In order to process electronically the contents of printed documents, information must be extracted from digital images of documents. When dealing with complex documents, in which the contents of different regions and fields can be highly heterogeneous with respect to layout, printing quality and the utilization of fonts and typing standards, the reconstruction of the contents of documents from digital images can be a difficult problem. In the present article we present an efficient solution for this problem, in which the semantic contents of fields in a complex document are extracted from a digital image.

Opus Software

Opus Software

Identificador

APPLIED INTELLIGENCE, DORDRECHT, v. 37, n. 4, supl. 1, Part 1, pp. 543-557, DEC, 2012

0924-669X

http://www.producao.usp.br/handle/BDPI/32527

10.1007/s10489-012-0348-x

http://dx.doi.org/10.1007/s10489-012-0348-x

Idioma(s)

eng

Publicador

SPRINGER

DORDRECHT

Relação

APPLIED INTELLIGENCE

Direitos

openAccess

Copyright SPRINGER

Palavras-Chave #DOCUMENT IMAGE PROCESSING #INFORMATION EXTRACTION FROM DOCUMENTS #COMPUTATIONAL APPROACH #SYSTEM #COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Tipo

article

original article

publishedVersion