28 resultados para Script Identification, Wavelets and Fractals, Texture, Document Analysis, Clustering, Classification and Association Rules


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The following topics were dealt with: document analysis and recognition; multimedia document processing; character recognition; document image processing; cheque processing; form processing; music processing; document segmentation; electronic documents; character classification; handwritten character recognition; information retrieval; postal automation; font recognition; Indian language OCR; handwriting recognition; performance evaluation; graphics recognition; oriental character recognition; and word recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human CGI-58 (for comparative gene identification-58) and YLR099c, encoding Ict1p in Saccharomyces cerevisiae, have recently been identified as acyl-CoA-dependent lysophosphatidic acid acyltransferases. Sequence database searches for CGI-58 like proteins in Arabidopsis (Arabidopsis thaliana) revealed 24 proteins with At4g24160, a member of the alpha/beta-hydrolase family of proteins being the closest homolog. At4g24160 contains three motifs that are conserved across the plant species: a GXSXG lipase motif, a HX4D acyltransferase motif, and V(X)(3)HGF, a probable lipid binding motif. Dendrogram analysis of yeast ICT1, CGI-58, and At4g24160 placed these three polypeptides in the same group. Here, we describe and characterize At4g24160 as, to our knowledge, the first soluble lysophosphatidic acid acyltransferase in plants. A lipidomics approach revealed that At4g24160 has additional triacylglycerol lipase and phosphatidylcholine hydrolyzing enzymatic activities. These data establish At4g24160, a protein with a previously unknown function, as an enzyme that might play a pivotal role in maintaining the lipid homeostasis in plants by regulating both phospholipid and neutral lipid levels.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a novel, language-neutral approach for searching online handwritten text using Frechet distance. Online handwritten data, which is available as a time series (x,y,t), is treated as representing a parameterized curve in two-dimensions and the problem of searching online handwritten text is posed as a problem of matching two curves in a two-dimensional Euclidean space. Frechet distance is a natural measure for matching curves. The main contribution of this paper is the formulation of a variant of Frechet distance that can be used for retrieving words even when only a prefix of the word is given as query. Extensive experiments on UNIPEN dataset(1) consisting of over 16,000 words written by 7 users show that our method outperforms the state-of-the-art DTW method. Experiments were also conducted on a Multilingual dataset, generated on a PDA, with encouraging results. Our approach can be used to implement useful, exciting features like auto-completion of handwriting in PDAs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we describe a system for the automatic recognition of isolated handwritten Devanagari characters obtained by linearizing consonant conjuncts. Owing to the large number of characters and resulting demands on data acquisition, we use structural recognition techniques to reduce some characters to others. The residual characters are then classified using the subspace method. Finally the results of structural recognition and feature-based matching are mapped to give final output. The proposed system Ifs evaluated for the writer dependent scenario.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Zirconia-based solid electrolytes with zircon (ZrSiO4) as the auxiliary electrode have been suggested of sensing silicon concentrations in iron and steel melts. A knowledge of phase relations in the ternary system MO-SiO2-ZrO2 (M = Ca, Mg) is useful for selecting an appropriate auxiliary electrode. In this investigation, an isothermal section for the phase diagram of the system CaO-SiO2ZrO2 at 1573 K has been established by equilibrating mixtures of component oxides in air, followed by quenching and phase identification by optical miroscopy, energy disperse analysis of X-rays (EDAX) and X-ray diffraction analysis (XRD). The equilibrium phase relations have also been confirmed by computation using the available thermodynamic data on condensed phases in the system. The results indicate that zircon is not in thermodynamic equilibrium with calcia-stabilized zirconia or calcium zirconate. The silica containing phase in equilibrium with stabilized zirconia is Ca3ZrSi2O9. Calcium zirconate can coexist with Ca3ZrSi2O9 and Ca2SiO4.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Competition theory predicts that local communities should consist of species that are more dissimilar than expected by chance. We find a strikingly different pattern in a multicontinent data set (55 presence-absence matrices from 24 locations) on the composition of mixed-species bird flocks, which are important sub-units of local bird communities the world over. By using null models and randomization tests followed by meta-analysis, we find the association strengths of species in flocks to be strongly related to similarity in body size and foraging behavior and higher for congeneric compared with noncongeneric species pairs. Given the local spatial scales of our individual analyses, differences in the habitat preferences of species are unlikely to have caused these association patterns; the patterns observed are most likely the outcome of species interactions. Extending group-living and social-information-use theory to a heterospecific context, we discuss potential behavioral mechanisms that lead to positive interactions among similar species in flocks, as well as ways in which competition costs are reduced. Our findings highlight the need to consider positive interactions along with competition when seeking to explain community assembly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a set of metrics that evaluate the uniformity, sharpness, continuity, noise, stroke width variance,pulse width ratio, transient pixels density, entropy and variance of components to quantify the quality of a document image. The measures are intended to be used in any optical character recognition (OCR) engine to a priori estimate the expected performance of the OCR. The suggested measures have been evaluated on many document images, which have different scripts. The quality of a document image is manually annotated by users to create a ground truth. The idea is to correlate the values of the measures with the user annotated data. If the measure calculated matches the annotated description,then the metric is accepted; else it is rejected. In the set of metrics proposed, some of them are accepted and the rest are rejected. We have defined metrics that are easily estimatable. The metrics proposed in this paper are based on the feedback of homely grown OCR engines for Indic (Tamil and Kannada) languages. The metrics are independent of the scripts, and depend only on the quality and age of the paper and the printing. Experiments and results for each proposed metric are discussed. Actual recognition of the printed text is not performed to evaluate the proposed metrics. Sometimes, a document image containing broken characters results in good document image as per the evaluated metrics, which is part of the unsolved challenges. The proposed measures work on gray scale document images and fail to provide reliable information on binarized document image.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Text segmentation and localization algorithms are proposed for the born-digital image dataset. Binarization and edge detection are separately carried out on the three colour planes of the image. Connected components (CC's) obtained from the binarized image are thresholded based on their area and aspect ratio. CC's which contain sufficient edge pixels are retained. A novel approach is presented, where the text components are represented as nodes of a graph. Nodes correspond to the centroids of the individual CC's. Long edges are broken from the minimum spanning tree of the graph. Pair wise height ratio is also used to remove likely non-text components. A new minimum spanning tree is created from the remaining nodes. Horizontal grouping is performed on the CC's to generate bounding boxes of text strings. Overlapping bounding boxes are removed using an overlap area threshold. Non-overlapping and minimally overlapping bounding boxes are used for text segmentation. Vertical splitting is applied to generate bounding boxes at the word level. The proposed method is applied on all the images of the test dataset and values of precision, recall and H-mean are obtained using different approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we describe a method for feature extraction and classification of characters manually isolated from scene or natural images. Characters in a scene image may be affected by low resolution, uneven illumination or occlusion. We propose a novel method to perform binarization on gray scale images by minimizing energy functional. Discrete Cosine Transform and Angular Radial Transform are used to extract the features from characters after normalization for scale and translation. We have evaluated our method on the complete test set of Chars74k dataset for English and Kannada scripts consisting of handwritten and synthesized characters, as well as characters extracted from camera captured images. We utilize only synthesized and handwritten characters from this dataset as training set. Nearest neighbor classification is used in our experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have benchmarked the maximum obtainable recognition accuracy on five publicly available standard word image data sets using semi-automated segmentation and a commercial OCR. These images have been cropped from camera captured scene images, born digital images (BDI) and street view images. Using the Matlab based tool developed by us, we have annotated at the pixel level more than 3600 word images from the five data sets. The word images binarized by the tool, as well as by our own midline analysis and propagation of segmentation (MAPS) algorithm are recognized using the trial version of Nuance Omnipage OCR and these two results are compared with the best reported in the literature. The benchmark word recognition rates obtained on ICDAR 2003, Sign evaluation, Street view, Born-digital and ICDAR 2011 data sets are 83.9%, 89.3%, 79.6%, 88.5% and 86.7%, respectively. The results obtained from MAPS binarized word images without the use of any lexicon are 64.5% and 71.7% for ICDAR 2003 and 2011 respectively, and these values are higher than the best reported values in the literature of 61.1% and 41.2%, respectively. MAPS results of 82.8% for BDI 2011 dataset matches the performance of the state of the art method based on power law transform.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cu2SnS3 films have been processed by the sol-gel route. Differential Scanning Calorimetry (DSC) study was done to observe the phase transformations and to ascertain the deposition temperature. X-ray diffraction (XRD) confirms the phase formation of Cu2SnS3. The texture coefficient analysis shows the preferential orientation of the (112) facet. Scanning electron microscopy reveals the morphology of the film Energy Dispersive Spectroscopy (EDS) was used for compositional studies. Raman spectrum shows the peaks corresponding to the tetragonal phase of Cu2SnS3.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Structural characterization of electrodeposited boron was carried out by using transmission electron microscopy and Raman spectroscopy. Electron diffraction and phase contrast imaging were carried out by using transmission electron microscopy. Phase identification was done based on the analysis of electron diffraction patterns and the power spectrum calculated from the lattice images from thin regions of the sample. Raman spectroscopic examination was carried out to study the nature of bonding and the allotropic form of boron obtained after electrodeposition. The results obtained from transmission electron microscopy showed the presence of nanocrystallites embedded in an amorphous mass of boron. Raman microscopic studies showed that amorphous boron could be converted to its crystalline form at high temperatures.