Language models for online handwritten Tamil word recognition


Autoria(s): Sundaram, Suresh; Urala, Bhargava K; Ramakrishnan, AG
Data(s)

2012

Resumo

N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/46547/1/Pro_Wor_Doc_Ana_Rec_42_2012.pdf

Sundaram, Suresh and Urala, Bhargava K and Ramakrishnan, AG (2012) Language models for online handwritten Tamil word recognition. In: Proceeding of the workshop on Document Analysis and Recognition, Dec. 16, 2012, New York, NY, USA.

Publicador

ACM, Inc

Relação

http://dx.doi.org/10.1145/2432553.2432562

http://eprints.iisc.ernet.in/46547/

Palavras-Chave #Electrical Engineering
Tipo

Conference Paper

PeerReviewed