11 resultados para Text-image

em Indian Institute of Science - Bangalore - Índia


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Scenic word images undergo degradations due to motion blur, uneven illumination, shadows and defocussing, which lead to difficulty in segmentation. As a result, the recognition results reported on the scenic word image datasets of ICDAR have been low. We introduce a novel technique, where we choose the middle row of the image as a sub-image and segment it first. Then, the labels from this segmented sub-image are used to propagate labels to other pixels in the image. This approach, which is unique and distinct from the existing methods, results in improved segmentation. Bayesian classification and Max-flow methods have been independently used for label propagation. This midline based approach limits the impact of degradations that happens to the image. The segmented text image is recognized using the trial version of Omnipage OCR. We have tested our method on ICDAR 2003 and ICDAR 2011 datasets. Our word recognition results of 64.5% and 71.6% are better than those of methods in the literature and also methods that competed in the Robust reading competition. Our method makes an implicit assumption that degradation is not present in the middle row.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes an approach based on Zernike moments and Delaunay triangulation for localization of hand-written text in machine printed text documents. The Zernike moments of the image are first evaluated and we classify the text as hand-written using the nearest neighbor classifier. These features are independent of size, slant, orientation, translation and other variations in handwritten text. We then use Delaunay triangulation to reclassify the misclassified text regions. When imposing Delaunay triangulation on the centroid points of the connected components, we extract features based on the triangles and reclassify the text. We remove the noise components in the document as part of the preprocessing step so this method works well on noisy documents. The success rate of the method is found to be 86%. Also for specific hand-written elements such as signatures or similar text the accuracy is found to be even higher at 93%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose two texture-based approaches, one involving Gabor filters and the other employing log-polar wavelets, for separating text from non-text elements in a document image. Both the proposed algorithms compute local energy at some information-rich points, which are marked by Harris' corner detector. The advantage of this approach is that the algorithm calculates the local energy at selected points and not throughout the image, thus saving a lot of computational time. The algorithm has been tested on a large set of scanned text pages and the results have been seen to be better than the results from the existing algorithms. Among the proposed schemes, the Gabor filter based scheme marginally outperforms the wavelet based scheme.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes and compares four methods of binarzing text images captured using a camera mounted on a cell phone. The advantages and disadvantages(image clarity and computational complexity) of each method over the others are demonstrated through binarized results. The images are of VGA or lower resolution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the design of a full fledged OCR system for printed Kannada text. The machine recognition of Kannada characters is difficult due to similarity in the shapes of different characters, script complexity and non-uniqueness in the representation of diacritics. The document image is subject to line segmentation, word segmentation and zone detection. From the zonal information, base characters, vowel modifiers and consonant conjucts are separated. Knowledge based approach is employed for recognizing the base characters. Various features are employed for recognising the characters. These include the coefficients of the Discrete Cosine Transform, Discrete Wavelet Transform and Karhunen-Louve Transform. These features are fed to different classifiers. Structural features are used in the subsequent levels to discriminate confused characters. Use of structural features, increases recognition rate from 93% to 98%. Apart from the classical pattern classification technique of nearest neighbour, Artificial Neural Network (ANN) based classifiers like Back Propogation and Radial Basis Function (RBF) Networks have also been studied. The ANN classifiers are trained in supervised mode using the transform features. Highest recognition rate of 99% is obtained with RBF using second level approximation coefficients of Haar wavelets as the features on presegmented base characters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a technique for irreversible watermarking approach robust to affine transform attacks in camera, biomedical and satellite images stored in the form of monochrome bitmap images. The watermarking approach is based on image normalisation in which both watermark embedding and extraction are carried out with respect to an image normalised to meet a set of predefined moment criteria. The normalisation procedure is invariant to affine transform attacks. The result of watermarking scheme is suitable for public watermarking applications, where the original image is not available for watermark extraction. Here, direct-sequence code division multiple access approach is used to embed multibit text information in DCT and DWT transform domains. The proposed watermarking schemes are robust against various types of attacks such as Gaussian noise, shearing, scaling, rotation, flipping, affine transform, signal processing and JPEG compression. Performance analysis results are measured using image processing metrics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conventional encryption techniques are usually applicable for text data and often unsuited for encrypting multimedia objects for two reasons. Firstly, the huge sizes associated with multimedia objects make conventional encryption computationally costly. Secondly, multimedia objects come with massive redundancies which are useful in avoiding encryption of the objects in their entirety. Hence a class of encryption techniques devoted to encrypting multimedia objects like images have been developed. These techniques make use of the fact that the data comprising multimedia objects like images could in general be seggregated into two disjoint components, namely salient and non-salient. While the former component contributes to the perceptual quality of the object, the latter only adds minor details to it. In the context of images, the salient component is often much smaller in size than the non-salient component. Encryption effort is considerably reduced if only the salient component is encrypted while leaving the other component unencrypted. A key challenge is to find means to achieve a desirable seggregation so that the unencrypted component does not reveal any information about the object itself. In this study, an image encryption approach that uses fractal structures known as space-filling curves- in order to reduce the encryption overload is presented. In addition, the approach also enables a high quality lossy compression of images.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a semi-automatic tool for annotation of multi-script text from natural scene images. To our knowledge, this is the maiden tool that deals with multi-script text or arbitrary orientation. The procedure involves manual seed selection followed by a region growing process to segment each word present in the image. The threshold for region growing can be varied by the user so as to ensure pixel-accurate character segmentation. The text present in the image is tagged word-by-word. A virtual keyboard interface has also been designed for entering the ground truth in ten Indic scripts, besides English. The keyboard interface can easily be generated for any script, thereby expanding the scope of the toolkit. Optionally, each segmented word can further be labeled into its constituent characters/symbols. Polygonal masks are used to split or merge the segmented words into valid characters/symbols. The ground truth is represented by a pixel-level segmented image and a '.txt' file that contains information about the number of words in the image, word bounding boxes, script and ground truth Unicode. The toolkit, developed using MATLAB, can be used to generate ground truth and annotation for any generic document image. Thus, it is useful for researchers in the document image processing community for evaluating the performance of document analysis and recognition techniques. The multi-script annotation toolokit (MAST) is available for free download.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Query suggestion is an important feature of the search engine with the explosive and diverse growth of web contents. Different kind of suggestions like query, image, movies, music and book etc. are used every day. Various types of data sources are used for the suggestions. If we model the data into various kinds of graphs then we can build a general method for any suggestions. In this paper, we have proposed a general method for query suggestion by combining two graphs: (1) query click graph which captures the relationship between queries frequently clicked on common URLs and (2) query text similarity graph which finds the similarity between two queries using Jaccard similarity. The proposed method provides literally as well as semantically relevant queries for users' need. Simulation results show that the proposed algorithm outperforms heat diffusion method by providing more number of relevant queries. It can be used for recommendation tasks like query, image, and product suggestion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Image inpainting is the process of filling the unwanted region in an image marked by the user. It is used for restoring old paintings and photographs, removal of red eyes from pictures, etc. In this paper, we propose an efficient inpainting algorithm which takes care of false edge propagation. We use the classical exemplar based technique to find out the priority term for each patch. To ensure that the edge content of the nearest neighbor patch found by minimizing L-2 distance between patches, we impose an additional constraint that the entropy of the patches be similar. Entropy of the patch acts as a good measure of edge content. Additionally, we fill the image by considering overlapping patches to ensure smoothness in the output. We use structural similarity index as the measure of similarity between ground truth and inpainted image. The results of the proposed approach on a number of examples on real and synthetic images show the effectiveness of our algorithm in removing objects and thin scratches or text written on image. It is also shown that the proposed approach is robust to the shape of the manually selected target. Our results compare favorably to those obtained by existing techniques