17 resultados para 280205 Text Processing

em Indian Institute of Science - Bangalore - Índia


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Identifying translations from comparable corpora is a well-known problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comparable corpora in many Indian languages with other ``auxiliary'' languages. We observe that translations have many topically related words in common in the auxiliary language. To model this, we define the notion of a translingual theme, a set of topically related words from auxiliary language corpora, and present a probabilistic framework for translation induction. Extensive experiments on 35 comparable corpora using English and French as auxiliary languages show that this approach can yield dramatic improvements in performance (e.g. MRR improves by 124% to 0.419 for Telugu-Kannada). A user study on WikiTSu, a system for cross-lingual Wikipedia title suggestion that uses our approach, shows a 20% improvement in the quality of titles suggested.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose two texture-based approaches, one involving Gabor filters and the other employing log-polar wavelets, for separating text from non-text elements in a document image. Both the proposed algorithms compute local energy at some information-rich points, which are marked by Harris' corner detector. The advantage of this approach is that the algorithm calculates the local energy at selected points and not throughout the image, thus saving a lot of computational time. The algorithm has been tested on a large set of scanned text pages and the results have been seen to be better than the results from the existing algorithms. Among the proposed schemes, the Gabor filter based scheme marginally outperforms the wavelet based scheme.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-nomalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPP's with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, Nearest neighbor, Linear discriminant function, SVM's and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes and compares four methods of binarzing text images captured using a camera mounted on a cell phone. The advantages and disadvantages(image clarity and computational complexity) of each method over the others are demonstrated through binarized results. The images are of VGA or lower resolution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new method based on unit continuity metric (UCM) is proposed for optimal unit selection in text-to-speech (TTS) synthesis. UCM employs two features, namely, pitch continuity metric and spectral continuity metric. The methods have been implemented and tested on our test bed called MILE-TTS and it is available as web demo. After verification by a self selection test, the algorithms are evaluated on 8 paragraphs each for Kannada and Tamil by native users of the languages. Mean-opinion-score (MOS) shows that naturalness and comprehension are better with UCM based algorithm than the non-UCM based ones. The naturalness of the TTS output is further enhanced by a new rule based algorithm for pause prediction for Tamil language. The pauses between the words are predicted based on parts-of-speech information obtained from the input text.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a semi-automatic tool for annotation of multi-script text from natural scene images. To our knowledge, this is the maiden tool that deals with multi-script text or arbitrary orientation. The procedure involves manual seed selection followed by a region growing process to segment each word present in the image. The threshold for region growing can be varied by the user so as to ensure pixel-accurate character segmentation. The text present in the image is tagged word-by-word. A virtual keyboard interface has also been designed for entering the ground truth in ten Indic scripts, besides English. The keyboard interface can easily be generated for any script, thereby expanding the scope of the toolkit. Optionally, each segmented word can further be labeled into its constituent characters/symbols. Polygonal masks are used to split or merge the segmented words into valid characters/symbols. The ground truth is represented by a pixel-level segmented image and a '.txt' file that contains information about the number of words in the image, word bounding boxes, script and ground truth Unicode. The toolkit, developed using MATLAB, can be used to generate ground truth and annotation for any generic document image. Thus, it is useful for researchers in the document image processing community for evaluating the performance of document analysis and recognition techniques. The multi-script annotation toolokit (MAST) is available for free download.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development of a microstructure in 304L stainless steel during industrial hot-forming operations, including press forging (mean strain rate of 0.15 s(-1)), rolling/extrusion (2-5 s(-1)), and hammer forging (100 s(-1)) at different temperatures in the range 600-1200 degrees C, was studied with a view to validating the predictions of the processing map. The results have shown that excellent correlation exists between the regimes exhibited by the map and the product microstructures. 304L stainless steel exhibits instability bands when hammer forged at temperatures below 1100 degrees C, rolled/extruded below 1000 degrees C, or press forged below 800 degrees C. All of these conditions must be avoided in mechanical processing of the material. On the other hand, ideally, the material may be rolled, extruded, or press forged at 1200 degrees C to obtain a defect-free microstructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The hot deformation behavior of hot isostatically pressed (HIPd) P/M IN-100 superalloy has been studied in the temperature range 1000-1200 degrees C and strain rate range 0.0003-10 s(-1) using hot compression testing. A processing map has been developed on the basis of these data and using the principles of dynamic materials modelling. The map exhibited three domains: one at 1050 degrees C and 0.01 s(-1), with a peak efficiency of power dissipation of approximate to 32%, the second at 1150 degrees C and 10 s(-1), with a peak efficiency of approximate to 36% and the third at 1200 degrees C and 0.1 s(-1), with a similar efficiency. On the basis of optical and electron microscopic observations, the first domain was interpreted to represent dynamic recovery of the gamma phase, the second domain represents dynamic recrystallization (DRX) of gamma in the presence of softer gamma', while the third domain represents DRX of the gamma phase only. The gamma' phase is stable upto 1150 degrees C, gets deformed below this temperature and the chunky gamma' accumulates dislocations, which at larger strains cause cracking of this phase. At temperatures lower than 1080 degrees C and strain rates higher than 0.1 s(-1), the material exhibits flow instability, manifested in the form of adiabatic shear bands. The material may be subjected to mechanical processing without cracking or instabilities at 1200 degrees C and 0.1 s(-1), which are the conditions for DRX of the gamma phase.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An overview of the synthesis of materials under microwave irradiation has been presented based on the work performed recently. A variety of reactions such as direct combination, carbothermal reduction, carbidation and nitridation have been described. Examples of microwave preparation of glasses are also presented. Great advantages of fast, clean and reduced reaction temperature of microwave methods are emphasized. The example of ZrO2-CeO2 ceramics has been used show the extraordinarily fast and effective sintering which occurs in microwave irradiation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Power dissipation maps have been generated in the temperature range of 900 degrees C to 1150 degrees C and strain rate range of 10(-3) to 10 s(-1) for a cast aluminide alloy Ti-24Al-20Nb using dynamic material model. The results define two distinct regimes of temperature and strain rate in which efficiency of power dissipation is maximum. The first region, centered around 975 degrees C/0.1 s(-1), is shown to correspond to dynamic recrystallization of the alpha(2) phase and the second, centered around 1150 degrees C/0.001 s(-1), corresponds to dynamic recovery and superplastic deformation of the beta phase. Thermal activation analysis using the power law creep equation yielded apparent activation energies of 854 and 627 kJ/mol for the first and second regimes, respectively. Reanalyzing the data by alternate methods yielded activation energies in the range of 170 to 220 kJ/mol and 220 to 270 kJ/mol for the first and second regimes, respectively. Cross slip was shown to constitute the activation barrier in both cases. Two distinct regimes of processing instability-one at high strain rates and the other at the low strain rates in the lower temperature regions-have been identified, within which shear bands are formed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Al-Li-SiCp composites were fabricated by a simple and cost effective stir casting technique. A compound billet technique has been developed to overcome the problems encountered during hot extrusion of these composites. After successful fabrication hardness measurement and room temperature compressive test were carried out on 8090 Al and its composites reinforced with 8, 12 and 18vol.% SiC particles in as extruded and peak aged conditions. The addition of SiC increases the hardness. 0.2% proof stress and compressive strength of Al-Li-8%SiC and Al-Li-12%SiC composites are higher than the unreinforced alloy. in case of the Al-Li-18%SiC composite, the 0.2% proof stress and compressive strength were higher than the unreinforced alloy but lower than those of Al-Li-8%SiC and Al-Li-12%SiC composites. This is attributed to clustering of particles and poor interfacial bonding.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the present investigation, two nozzle configurations are used for spray deposition, convergent nozzle (nozzle-A), and convergent nozzle with 2 mm parallel portion attached at its end (nozzle-C) without changing the exit area. First, the conditions for subambient aspiration pressure, i.e., pressure at the tip of the melt delivery tube, are established by varying the protrusion length of the melt delivery tube at different applied gas pressures for both of the nozzles. Using these conditions, spray deposits in a reproducible manner are successfully obtained for 7075 Al alloy. The effect of applied gas pressure, flight distance, and nozzle configuration on various characteristics of spray deposition, viz., yield, melt flow rate, and gas-to-metal ratio, is examined. The over-spray powder is also characterized with respect to powder size distribution, shape, and microstructure. Some of the results are explained with the help of numerical analysis presented in an earlier article.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A systematic study of Ar ion implantation in cupric oxide films has been reported. Oriented CuO films were deposited by pulsed excimer laser ablation technique on (1 0 0) YSZ substrates. X-ray diffraction (XRD) spectra showed the highly oriented nature of the deposited CuO films. The films were subjected to ion bombardment for studies of damage formation, Implantations were carried out using 100 keV Arf over a dose range between 5 x 10(12) and 5 x 10(15) ions/cm(2). The as-deposited and ion beam processed samples were characterized by XRD technique and resistance versus temperature (R-T) measurements. The activation energies for electrical conduction were found from In [R] versus 1/T curves. Defects play an important role in the conduction mechanism in the implanted samples. The conductivity of the film increases, and the corresponding activation energy decreases with respect to the dose value.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Al-5 wt pct Si alloy is processed by upset forging in the temperature range 300 K to 800 K and in the strain rate range 0.02 to 200 s−1. The hardness and tensile properties of the product have been studied. A “safe” window in the strain rate-temperature field has been identified for processing of this alloy to obtain maximum tensile ductility in the product. For the above strain rate range, the temperature range of processing is 550 K to 700 K for obtaining high ductility in the product. On the basis of microstructure and the ductility of the product, the temperature-strain rate regimes of damage due to cavity formation at particles and wedge cracking have been isolated for this alloy. The tensile fracture features recorded on the product specimens are in conformity with the above damage mechanisms. A high temperature treatment above ≈600 K followed by fairly fast cooling gives solid solution strengthening in the alloy at room temperature.