Improved recognition of aged Kannada documents by effective segmentation of merged characters


Autoria(s): Madhavaraj, A; Ramakrishnan, AG; Kumar, Shiva HR; Bhat, Nagaraj
Data(s)

2014

Resumo

In optical character recognition of very old books, the recognition accuracy drops mainly due to the merging or breaking of characters. In this paper, we propose the first algorithm to segment merged Kannada characters by using a hypothesis to select the positions to be cut. This method searches for the best possible positions to segment, by taking into account the support vector machine classifier's recognition score and the validity of the aspect ratio (width to height ratio) of the segments between every pair of cut positions. The hypothesis to select the cut position is based on the fact that a concave surface exists above and below the touching portion. These concave surfaces are noted down by tracing the valleys in the top contour of the image and similarly doing it for the image rotated upside-down. The cut positions are then derived as closely matching valleys of the original and the rotated images. Our proposed segmentation algorithm works well for different font styles, shapes and sizes better than the existing vertical projection profile based segmentation. The proposed algorithm has been tested on 1125 different word images, each containing multiple merged characters, from an old Kannada book and 89.6% correct segmentation is achieved and the character recognition accuracy of merged words is 91.2%. A few points of merge are still missed due to the absence of a matched valley due to the specific shapes of the particular characters meeting at the merges.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/52979/1/Int_Com_Sig_Pro_Com_2014.pdf

Madhavaraj, A and Ramakrishnan, AG and Kumar, Shiva HR and Bhat, Nagaraj (2014) Improved recognition of aged Kannada documents by effective segmentation of merged characters. In: International Conference on Signal Processing and Communications (SPCOM), JUL 22-25, 2014, Banaglore, INDIA.

Publicador

IEEE

Relação

http://ieeexplore.ieee.org/xpl/abstractAuthors.jsp?arnumber=6983951

http://eprints.iisc.ernet.in/52979/

Palavras-Chave #Electrical Engineering
Tipo

Conference Proceedings

NonPeerReviewed