MAST: multi-script annotation toolkit for scenic text


Autoria(s): Kasar, T; Kumar, D; Anil Prasad, MN; Girish, D; Ramakrishnan, AG
Data(s)

2011

Resumo

This paper describes a semi-automatic tool for annotation of multi-script text from natural scene images. To our knowledge, this is the maiden tool that deals with multi-script text or arbitrary orientation. The procedure involves manual seed selection followed by a region growing process to segment each word present in the image. The threshold for region growing can be varied by the user so as to ensure pixel-accurate character segmentation. The text present in the image is tagged word-by-word. A virtual keyboard interface has also been designed for entering the ground truth in ten Indic scripts, besides English. The keyboard interface can easily be generated for any script, thereby expanding the scope of the toolkit. Optionally, each segmented word can further be labeled into its constituent characters/symbols. Polygonal masks are used to split or merge the segmented words into valid characters/symbols. The ground truth is represented by a pixel-level segmented image and a '.txt' file that contains information about the number of words in the image, word bounding boxes, script and ground truth Unicode. The toolkit, developed using MATLAB, can be used to generate ground truth and annotation for any generic document image. Thus, it is useful for researchers in the document image processing community for evaluating the performance of document analysis and recognition techniques. The multi-script annotation toolokit (MAST) is available for free download.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/46228/1/Mul_OCR_Ana_Noi_Unst_Tex_Dat_1_2011.pdf

Kasar, T and Kumar, D and Anil Prasad, MN and Girish, D and Ramakrishnan, AG (2011) MAST: multi-script annotation toolkit for scenic text. In: Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, 2011, New York, NY, USA.

Publicador

Association for Computing Machinery

Relação

http://dx.doi.org/10.1145/2034617.2034633

http://eprints.iisc.ernet.in/46228/

Palavras-Chave #Electrical Engineering
Tipo

Conference Proceedings

PeerReviewed