QUAD: quality assessment of documents


Autoria(s): Deepak, Kumar; Ramakrishnan, AG
Data(s)

2011

Resumo

We propose a set of metrics that evaluate the uniformity, sharpness, continuity, noise, stroke width variance,pulse width ratio, transient pixels density, entropy and variance of components to quantify the quality of a document image. The measures are intended to be used in any optical character recognition (OCR) engine to a priori estimate the expected performance of the OCR. The suggested measures have been evaluated on many document images, which have different scripts. The quality of a document image is manually annotated by users to create a ground truth. The idea is to correlate the values of the measures with the user annotated data. If the measure calculated matches the annotated description,then the metric is accepted; else it is rejected. In the set of metrics proposed, some of them are accepted and the rest are rejected. We have defined metrics that are easily estimatable. The metrics proposed in this paper are based on the feedback of homely grown OCR engines for Indic (Tamil and Kannada) languages. The metrics are independent of the scripts, and depend only on the quality and age of the paper and the printing. Experiments and results for each proposed metric are discussed. Actual recognition of the printed text is not performed to evaluate the proposed metrics. Sometimes, a document image containing broken characters results in good document image as per the evaluated metrics, which is part of the unsolved challenges. The proposed measures work on gray scale document images and fail to provide reliable information on binarized document image.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/46229/1/Cam_Doc_Ana_Rec_1_2011.pdf

Deepak, Kumar and Ramakrishnan, AG (2011) QUAD: quality assessment of documents. In: Proc. 4th International Workshop on Camera-based Document Analysis and Recognition (CBDAR 2011), 2011.

Publicador

National Association of Theatre Nurses

Relação

http://imlab.jp/cbdar2011/

http://eprints.iisc.ernet.in/46229/

Palavras-Chave #Electrical Engineering
Tipo

Conference Paper

PeerReviewed