62 resultados para Optical character recognition devices

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The effectiveness of linear matched filters for improved character discrimination in presence of random noise and poorly defined characters has been investigated. We have found that although the performance of the filter in presence of random noise is reasonably good (16 dB gain in signal-to-noise-ratio) its performance is poor when the unknown character is distorted (linear shift and rotation).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The effectiveness of linear matched filters for improved character discrimination in presence of random noise and poorly defined characters has been investigated. We have found that although the performance of the filter in presence of random noise is reasonably good (16 dB gain in signal-to-noise-ratio) its performance is poor when the unknown character is distorted (linear shift and rotation).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a technique for artificial generation of learning and test sample sets suitable for character recognition research. Sample sets of English (Latin), Malayalam, Kannada and Tamil characters are generated easily through their prototype specifications by the endpoint co-ordinates, nature of segments and connectivity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The machine replication of human reading has been the subject of intensive research for more than three decades. A large number of research papers and reports have already been published on this topic. Many commercial establishments have manufactured recognizers of varying capabilities. Handheld, desk-top, medium-size and large systems costing as high as half a million dollars are available, and are in use for various applications. However, the ultimate goal of developing a reading machine having the same reading capabilities of humans still remains unachieved. So, there still is a great gap between human reading and machine reading capabilities, and a great amount of further effort is required to narrow-down this gap, if not bridge it. This review is organized into six major sections covering a general overview (an introduction), applications of character recognition techniques, methodologies in character recognition, research work in character recognition, some practical OCRs and the conclusions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new application of two dimensional Principal Component Analysis (2DPCA) to the problem of online character recognition in Tamil Script. A novel set of features employing polynomial fits and quartiles in combination with conventional features are derived for each sample point of the Tamil character obtained after smoothing and resampling. These are stacked to form a matrix, using which a covariance matrix is constructed. A subset of the eigenvectors of the covariance matrix is employed to get the features in the reduced sub space. Each character is modeled as a separate subspace and a modified form of the Mahalanobis distance is derived to classify a given test character. Results indicate that the recognition accuracy using the 2DPCA scheme shows an approximate 3% improvement over the conventional PCA technique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a scheme for classification of online handwritten characters based on polynomial regression of the sampled points of the sub-strokes in a character. The segmentation is done based on the velocity profile of the written character and this requires a smoothening of the velocity profile. We propose a novel scheme for smoothening the velocity profile curve and identification of the critical points to segment the character. We also porpose another method for segmentation based on the human eye perception. We then extract two sets of features for recognition of handwritten characters. Each sub-stroke is a simple curve, a part of the character, and is represented by the distance measure of each point from the first point. This forms the first set of feature vector for each character. The second feature vector are the coeficients obtained from the B-splines fitted to the control knots obtained from the segmentation algorithm. The feature vector is fed to the SVM classifier and it indicates an efficiency of 68% using the polynomial regression technique and 74% using the spline fitting method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we compare the experimental results for Tamil online handwritten character recognition using HMM and Statistical Dynamic Time Warping (SDTW) as classifiers. HMM was used for a 156-class problem. Different feature sets and values for the HMM states & mixtures were tried and the best combination was found to be 16 states & 14 mixtures, giving an accuracy of 85%. The features used in this combination were retained and a SDTW model with 20 states and single Gaussian was used as classifier. Also, the symbol set was increased to include numerals, punctuation marks and special symbols like $, & and #, taking the number of classes to 188. It was found that, with a small addition to the feature set, this simple SDTW classifier performed on par with the more complicated HMM model, giving an accuracy of 84%. Mixture density estimation computations was reduced by 11 times. The recognition is writer independent, as the dataset used is quite large, with a variety of handwriting styles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information forms the basis of modern technology. To meet the ever-increasing demand for information, means have to be devised for a more efficient and better-equipped technology to intelligibly process data. Advances in photonics have made their impact on each of the four key applications in information processing, i.e., acquisition, transmission, storage and processing of information. The inherent advantages of ultrahigh bandwidth, high speed and low-loss transmission has already established fiber-optics as the backbone of communication technology. However, the optics to electronics inter-conversion at the transmitter and receiver ends severely limits both the speed and bit rate of lightwave communication systems. As the trend towards still faster and higher capacity systems continues, it has become increasingly necessary to perform more and more signal-processing operations in the optical domain itself, i.e., with all-optical components and devices that possess a high bandwidth and can perform parallel processing functions to eliminate the electronic bottleneck.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we consider the problem of time series classification. Using piecewise linear interpolation various novel kernels are obtained which can be used with Support vector machines for designing classifiers capable of deciding the class of a given time series. The approach is general and is applicable in many scenarios. We apply the method to the task of Online Tamil handwritten character recognition with promising results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We demonstrate that etched fiber Bragg gratings (eFBGs) coated with single walled carbon nanotubes (SWNTs) and graphene oxide (GO) are highly sensitive and accurate biochemical sensors. Here, for detecting protein concanavalin A (Con A), mannose-functionalized poly(propyl ether imine) (PETIM) dendrimers (DMs) have been attached to the SWNTs (or GO) coated on the surface modified eFBG. The dendrimers act as multivalent ligands, having specificity to detect lectin Con A. The specificity of the sensor is shown by a much weaker response (factor of similar to 2500 for the SWNT and similar to 2000 for the GO coated eFBG) to detect non specific lectin peanut agglutinin. DM molecules functionalized GO coated eFBG sensors showed excellent specificity to Con A even in the presence of excess amount of an interfering protein bovine serum albumin. The shift in the Bragg wavelength (Delta lambda(B)) with respect to the lambda(B) values of SWNT (or GO)-DM coated eFBG for various concentrations of lectin follows Langmuir type adsorption isotherm, giving an affinity constant of similar to 4 x 10(7) M-1 for SWNTs coated eFBG and similar to 3 x 10(8) M-1 for the GO coated eFBG. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In optical character recognition of very old books, the recognition accuracy drops mainly due to the merging or breaking of characters. In this paper, we propose the first algorithm to segment merged Kannada characters by using a hypothesis to select the positions to be cut. This method searches for the best possible positions to segment, by taking into account the support vector machine classifier's recognition score and the validity of the aspect ratio (width to height ratio) of the segments between every pair of cut positions. The hypothesis to select the cut position is based on the fact that a concave surface exists above and below the touching portion. These concave surfaces are noted down by tracing the valleys in the top contour of the image and similarly doing it for the image rotated upside-down. The cut positions are then derived as closely matching valleys of the original and the rotated images. Our proposed segmentation algorithm works well for different font styles, shapes and sizes better than the existing vertical projection profile based segmentation. The proposed algorithm has been tested on 1125 different word images, each containing multiple merged characters, from an old Kannada book and 89.6% correct segmentation is achieved and the character recognition accuracy of merged words is 91.2%. A few points of merge are still missed due to the absence of a matched valley due to the specific shapes of the particular characters meeting at the merges.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The document images that are fed into an Optical Character Recognition system, might be skewed. This could be due to improper feeding of the document into the scanner or may be due to a faulty scanner. In this paper, we propose a skew detection and correction method for document images. We make use of the inherent randomness in the Horizontal Projection profiles of a text block image, as the skew of the image varies. The proposed algorithm has proved to be very robust and time efficient. The entire process takes less than a second on a 2.4 GHz Pentium IV PC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a set of metrics that evaluate the uniformity, sharpness, continuity, noise, stroke width variance,pulse width ratio, transient pixels density, entropy and variance of components to quantify the quality of a document image. The measures are intended to be used in any optical character recognition (OCR) engine to a priori estimate the expected performance of the OCR. The suggested measures have been evaluated on many document images, which have different scripts. The quality of a document image is manually annotated by users to create a ground truth. The idea is to correlate the values of the measures with the user annotated data. If the measure calculated matches the annotated description,then the metric is accepted; else it is rejected. In the set of metrics proposed, some of them are accepted and the rest are rejected. We have defined metrics that are easily estimatable. The metrics proposed in this paper are based on the feedback of homely grown OCR engines for Indic (Tamil and Kannada) languages. The metrics are independent of the scripts, and depend only on the quality and age of the paper and the printing. Experiments and results for each proposed metric are discussed. Actual recognition of the printed text is not performed to evaluate the proposed metrics. Sometimes, a document image containing broken characters results in good document image as per the evaluated metrics, which is part of the unsolved challenges. The proposed measures work on gray scale document images and fail to provide reliable information on binarized document image.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work describes an online handwritten character recognition system working in combination with an offline recognition system. The online input data is also converted into an offline image, and parallely recognized by both online and offline strategies. Features are proposed for offline recognition and a disambiguation step is employed in the offline system for the samples for which the confidence level of the classifier is low. The outputs are then combined probabilistically resulting in a classifier out-performing both individual systems. Experiments are performed for Kannada, a South Indian Language, over a database of 295 classes. The accuracy of the online recognizer improves by 11% when the combination with offline system is used.