974 resultados para Word Processing
Resumo:
Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for LVCSR systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication.In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on a 1138 word vocabulary RM1 task using Sphinx 3.7 system show that, for a typical case the matrix multiplication approach leads to overall speedup of 46%. Both the low-rank approximation methods increase the speedup to around 60%, with the former method increasing the word error rate (WER) from 3.2% to 6.6%, while the latter increases the WER from 3.2% to 3.5%.
Resumo:
We address the problem of speech enhancement in real-world noisy scenarios. We propose to solve the problem in two stages, the first comprising a generalized spectral subtraction technique, followed by a sequence of perceptually-motivated post-processing algorithms. The role of the post-processing algorithms is to compensate for the effects of noise as well as to suppress any artifacts created by the first-stage processing. The key post-processing mechanisms are aimed at suppressing musical noise and to enhance the formant structure of voiced speech as well as to denoise the linear-prediction residual. The parameter values in the techniques are fixed optimally by experimentally evaluating the enhancement performance as a function of the parameters. We used the Carnegie-Mellon university Arctic database for our experiments. We considered three real-world noise types: fan noise, car noise, and motorbike noise. The enhancement performance was evaluated by conducting listening experiments on 12 subjects. The listeners reported a clear improvement (MOS improvement of 0.5 on an average) over the noisy signal in the perceived quality (increase in the mean-opinion score (MOS)) for positive signal-to-noise-ratios (SNRs). For negative SNRs, however, the improvement was found to be marginal.
Resumo:
This paper describes a semi-automatic tool for annotation of multi-script text from natural scene images. To our knowledge, this is the maiden tool that deals with multi-script text or arbitrary orientation. The procedure involves manual seed selection followed by a region growing process to segment each word present in the image. The threshold for region growing can be varied by the user so as to ensure pixel-accurate character segmentation. The text present in the image is tagged word-by-word. A virtual keyboard interface has also been designed for entering the ground truth in ten Indic scripts, besides English. The keyboard interface can easily be generated for any script, thereby expanding the scope of the toolkit. Optionally, each segmented word can further be labeled into its constituent characters/symbols. Polygonal masks are used to split or merge the segmented words into valid characters/symbols. The ground truth is represented by a pixel-level segmented image and a '.txt' file that contains information about the number of words in the image, word bounding boxes, script and ground truth Unicode. The toolkit, developed using MATLAB, can be used to generate ground truth and annotation for any generic document image. Thus, it is useful for researchers in the document image processing community for evaluating the performance of document analysis and recognition techniques. The multi-script annotation toolokit (MAST) is available for free download.
Resumo:
In this paper we propose a postprocessing technique for a spectrogram diffusion based harmonic/percussion decom- position algorithm. The proposed technique removes har- monic instrument leakages in the percussion enhanced out- puts of the baseline algorithm. The technique uses median filtering and an adaptive detection of percussive segments in subbands followed by piecewise signal reconstruction using envelope properties to ensure that percussion is enhanced while harmonic leakages are suppressed. A new binary mask is created for the percussion signal which upon applying on the original signal improves harmonic versus percussion separation. We compare our algorithm with two recent techniques and show that on a database of polyphonic Indian music, the postprocessing algorithm improves the harmonic versus percussion decomposition significantly.
Resumo:
Scenic word images undergo degradations due to motion blur, uneven illumination, shadows and defocussing, which lead to difficulty in segmentation. As a result, the recognition results reported on the scenic word image datasets of ICDAR have been low. We introduce a novel technique, where we choose the middle row of the image as a sub-image and segment it first. Then, the labels from this segmented sub-image are used to propagate labels to other pixels in the image. This approach, which is unique and distinct from the existing methods, results in improved segmentation. Bayesian classification and Max-flow methods have been independently used for label propagation. This midline based approach limits the impact of degradations that happens to the image. The segmented text image is recognized using the trial version of Omnipage OCR. We have tested our method on ICDAR 2003 and ICDAR 2011 datasets. Our word recognition results of 64.5% and 71.6% are better than those of methods in the literature and also methods that competed in the Robust reading competition. Our method makes an implicit assumption that degradation is not present in the middle row.
Resumo:
N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.
Resumo:
We have benchmarked the maximum obtainable recognition accuracy on five publicly available standard word image data sets using semi-automated segmentation and a commercial OCR. These images have been cropped from camera captured scene images, born digital images (BDI) and street view images. Using the Matlab based tool developed by us, we have annotated at the pixel level more than 3600 word images from the five data sets. The word images binarized by the tool, as well as by our own midline analysis and propagation of segmentation (MAPS) algorithm are recognized using the trial version of Nuance Omnipage OCR and these two results are compared with the best reported in the literature. The benchmark word recognition rates obtained on ICDAR 2003, Sign evaluation, Street view, Born-digital and ICDAR 2011 data sets are 83.9%, 89.3%, 79.6%, 88.5% and 86.7%, respectively. The results obtained from MAPS binarized word images without the use of any lexicon are 64.5% and 71.7% for ICDAR 2003 and 2011 respectively, and these values are higher than the best reported values in the literature of 61.1% and 41.2%, respectively. MAPS results of 82.8% for BDI 2011 dataset matches the performance of the state of the art method based on power law transform.
Resumo:
Saccharomyces cerevisiae RAD50, MRE11, and XRS2 genes are essential for telomere length maintenance, cell cycle checkpoint signaling, meiotic recombination, and DNA double-stranded break (DSB) repair via nonhomologous end joining and homologous recombination. The DSB repair pathways that draw upon Mre11-Rad50-Xrs2 subunits are complex, so their mechanistic features remain poorly understood. Moreover, the molecular basis of DSB end resection in yeast mre11-nuclease deficient mutants and Mre11 nuclease-independent activation of ATM in mammals remains unknown and adds a new dimension to many unanswered questions about the mechanism of DSB repair. Here, we demonstrate that S. cerevisiae Mre11 (ScMre11) exhibits higher binding affinity for single-over double-stranded DNA and intermediates of recombination and repair and catalyzes robust unwinding of substrates possessing a 3' single-stranded DNA overhang but not of 5' overhangs or blunt-ended DNA fragments. Additional evidence disclosed that ScMre11 nuclease activity is dispensable for its DNA binding and unwinding activity, thus uncovering the molecular basis underlying DSB end processing in mre11 nuclease deficient mutants. Significantly, Rad50, Xrs2, and Sae2 potentiate the DNA unwinding activity of Mre11, thus underscoring functional interaction among the components of DSB end repair machinery. Our results also show that ScMre11 by itself binds to DSB ends, then promotes end bridging of duplex DNA, and directly interacts with Sae2. We discuss the implications of these results in the context of an alternative mechanism for DSB end processing and the generation of single-stranded DNA for DNA repair and homologous recombination.
Resumo:
It is well known that extremely long low-density parity-check (LDPC) codes perform exceptionally well for error correction applications, short-length codes are preferable in practical applications. However, short-length LDPC codes suffer from performance degradation owing to graph-based impairments such as short cycles, trapping sets and stopping sets and so on in the bipartite graph of the LDPC matrix. In particular, performance degradation at moderate to high E-b/N-0 is caused by the oscillations in bit node a posteriori probabilities induced by short cycles and trapping sets in bipartite graphs. In this study, a computationally efficient algorithm is proposed to improve the performance of short-length LDPC codes at moderate to high E-b/N-0. This algorithm makes use of the information generated by the belief propagation (BP) algorithm in previous iterations before a decoding failure occurs. Using this information, a reliability-based estimation is performed on each bit node to supplement the BP algorithm. The proposed algorithm gives an appreciable coding gain as compared with BP decoding for LDPC codes of a code rate equal to or less than 1/2 rate coding. The coding gains are modest to significant in the case of optimised (for bipartite graph conditioning) regular LDPC codes, whereas the coding gains are huge in the case of unoptimised codes. Hence, this algorithm is useful for relaxing some stringent constraints on the graphical structure of the LDPC code and for developing hardware-friendly designs.
Resumo:
In the current study, the evolution of microstructure and texture has been studied for Ti-6Al-4V-0.1B alloy during sub-transus thermomechanical processing. This part of the work deals with the deformation response of the alloy by rolling in the (alpha + beta) phase field. The (alpha + beta) annealing behavior of the rolled specimen is communicated in part II. Rolled microstructures of the alloys exhibit either kinked or straight alpha colonies depending on their orientations with respect to the principal rolling directions. The Ti-6Al-4V-0.1B alloy shows an improved rolling response compared with the alloy Ti-6Al-4V because of smaller alpha lamellae size, coherency of alpha/beta interfaces, and multiple slip due to orientation factors. Accelerated dynamic globularization for this alloy is similarly caused by the intralamellar transverse boundary formation via multiple slip and strain accumulation at TiB particles. The (0002)(alpha) pole figures of rolled Ti-6Al-4V alloy shows ``TD splitting'' at lower rolling temperatures because of strong initial texture. Substantial beta phase mitigates the effect of starting texture at higher temperature so that ``RD splitting'' characterizes the basal pole figure. Weak starting texture and easy slip transfer for Ti-6Al-4V-0.1B alloy produce simultaneous TD and RD splittings in basal pole figures at all rolling temperatures.
Resumo:
The first part of this study describes the evolution of microstructure and texture in Ti-6Al-4V-0.1B alloy during sub-transus rolling vis-A -vis the control alloy Ti-6Al-4V. In the second part, the static annealing response of the two alloys at self-same conditions is compared and the principal micromechanisms are analyzed. Faster globularization kinetics has been observed in the Ti-6Al-4V-0.1B alloy for equivalent annealing conditions. This is primarily attributed to the alpha colonies, which leads to easy boundary splitting via multiple slip activation in this alloy. The other mechanisms facilitating lamellar to equiaxed morphological transformations, e.g., termination migration and cylinderization, also start early in the boron-modified alloy due to small alpha colony size, small aspect ratio of the alpha lamellae, and the presence of TiB particles in the microstructure. Both the alloys exhibit weakening of basal fiber (ND||aOE (c) 0001 >) and strengthening of prism fiber (RD||aOE (c) aOE(a)) upon annealing. A close proximity between the orientations of fully globularized primary alpha and secondary alpha phases during alpha -> beta -> alpha transformation has accounted for such a texture modification.
Resumo:
Procedures were developed for purification and processing of electrodeposited enriched boron powder for control rod application in India's first commercial Proto Type Fast Breeder Reactor (PFBR). Methodology for removal of anionic (F-, Cl-, BF4-) and cationic (Fe2+, Fe3+, Ni2+) impurities was developed. Parameters for grinding boron flakes obtained after electrodeposition were optimized to obtain the boron powder having particle size less than 100 gm. The rate of removal of impurities was studied with respect to time and concentration of the reagents used for purification. Process parameters for grinding and removal of impurities were optimized. A flowsheet was proposed which helps in minimizing the purification time and concentration of the reagent used for the effective removal of impurities. The purification methodology developed in this work could produce boron that meets the technical specifications for control rod application in a fast reactor.
Resumo:
In this paper, we report a breakthrough result on the difficult task of segmentation and recognition of coloured text from the word image dataset of ICDAR robust reading competition challenge 2: reading text in scene images. We split the word image into individual colour, gray and lightness planes and enhance the contrast of each of these planes independently by a power-law transform. The discrimination factor of each plane is computed as the maximum between-class variance used in Otsu thresholding. The plane that has maximum discrimination factor is selected for segmentation. The trial version of Omnipage OCR is then used on the binarized words for recognition. Our recognition results on ICDAR 2011 and ICDAR 2003 word datasets are compared with those reported in the literature. As baseline, the images binarized by simple global and local thresholding techniques were also recognized. The word recognition rate obtained by our non-linear enhancement and selection of plance method is 72.8% and 66.2% for ICDAR 2011 and 2003 word datasets, respectively. We have created ground-truth for each image at the pixel level to benchmark these datasets using a toolkit developed by us. The recognition rate of benchmarked images is 86.7% and 83.9% for ICDAR 2011 and 2003 datasets, respectively.
Resumo:
The design and development of a Bottom Pressure Recorder for a Tsunami Early Warning System is described here. The special requirements that it should satisfy for the specific application of deployment at ocean bed and pressure monitoring of the water column above are dealt with. A high-resolution data digitization and low circuit power consumption are typical ones. The implementation details of the data sensing and acquisition part to meet these are also brought out. The data processing part typically encompasses a Tsunami detection algorithm that should detect an event of significance in the background of a variety of periodic and aperiodic noise signals. Such an algorithm and its simulation are presented. Further, the results of sea trials carried out on the system off the Chennai coast are presented. The high quality and fidelity of the data prove that the system design is robust despite its low cost and with suitable augmentations, is ready for a full-fledged deployment at ocean bed. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
We develop a communication theoretic framework for modeling 2-D magnetic recording channels. Using the model, we define the signal-to-noise ratio (SNR) for the channel considering several physical parameters, such as the channel bit density, code rate, bit aspect ratio, and noise parameters. We analyze the problem of optimizing the bit aspect ratio for maximizing SNR. The read channel architecture comprises a novel 2-D joint self-iterating equalizer and detection system with noise prediction capability. We evaluate the system performance based on our channel model through simulations. The coded performance with the 2-D equalizer detector indicates similar to 5.5 dB of SNR gain over uncoded data.