7 resultados para Parallel corpora
em Cochin University of Science
Resumo:
This paper describes about an English-Malayalam Cross-Lingual Information Retrieval system. The system retrieves Malayalam documents in response to query given in English or Malayalam. Thus monolingual information retrieval is also supported in this system. Malayalam is one of the most prominent regional languages of Indian subcontinent. It is spoken by more than 37 million people and is the native language of Kerala state in India. Since we neither had any full-fledged online bilingual dictionary nor any parallel corpora to build the statistical lexicon, we used a bilingual dictionary developed in house for translation. Other language specific resources like Malayalam stemmer, Malayalam morphological root analyzer etc developed in house were used in this work
Resumo:
Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably
Resumo:
During 1990's the Wavelet Transform emerged as an important signal processing tool with potential applications in time-frequency analysis and non-stationary signal processing.Wavelets have gained popularity in broad range of disciplines like signal/image compression, medical diagnostics, boundary value problems, geophysical signal processing, statistical signal processing,pattern recognition,underwater acoustics etc.In 1993, G. Evangelista introduced the Pitch- synchronous Wavelet Transform, which is particularly suited for pseudo-periodic signal processing.The work presented in this thesis mainly concentrates on two interrelated topics in signal processing,viz. the Wavelet Transform based signal compression and the computation of Discrete Wavelet Transform. A new compression scheme is described in which the Pitch-Synchronous Wavelet Transform technique is combined with the popular linear Predictive Coding method for pseudo-periodic signal processing. Subsequently,A novel Parallel Multiple Subsequence structure is presented for the efficient computation of Wavelet Transform. Case studies also presented to highlight the potential applications.
Resumo:
Dynamics of Nd:YAG laser with intracavity KTP crystal operating in two parallel polarized modes is investigated analytically and numerically. System equilibrium points were found out and the stability of each of them was checked using Routh–Hurwitz criteria and also by calculating the eigen values of the Jacobian. It is found that the system possesses three equilibrium points for (Ij, Gj), where j = 1, 2. One of these equilibrium points undergoes Hopf bifurcation in output dynamics as the control parameter is increased. The other two remain unstable throughout the entire region of the parameter space. Our numerical analysis of the Hopf bifurcation phenomena is found to be in good agreement with the analytical results. Nature of energy transfer between the two modes is also studied numerically.
Resumo:
Animportant step in the residue number system(RNS) based signal processing is the conversion of signal into residue domain. Many implementations of this conversion have been proposed for various goals, and one of the implementations is by a direct conversion from an analogue input. A novel approach for analogue-to-residue conversion is proposed in this research using the most popular Sigma–Delta analogue-to-digital converter (SD-ADC). In this approach, the front end is the same as in traditional SD-ADC that uses Sigma–Delta (SD) modulator with appropriate dynamic range, but the filtering is doneby a filter implemented usingRNSarithmetic. Hence, the natural output of the filter is an RNS representation of the input signal. The resolution, conversion speed, hardware complexity and cost of implementation of the proposed SD based analogue-to-residue converter are compared with the existing analogue-to-residue converters based on Nyquist rate ADCs
Resumo:
This paper describes JERIM-320, a new 320-bit hash function used for ensuring message integrity and details a comparison with popular hash functions of similar design. JERIM-320 and FORK -256 operate on four parallel lines of message processing while RIPEMD-320 operates on two parallel lines. Popular hash functions like MD5 and SHA-1 use serial successive iteration for designing compression functions and hence are less secure. The parallel branches help JERIM-320 to achieve higher level of security using multiple iterations and processing on the message blocks. The focus of this work is to prove the ability of JERIM 320 in ensuring the integrity of messages to a higher degree to suit the fast growing internet applications
Resumo:
This work proposes a parallel genetic algorithm for compressing scanned document images. A fitness function is designed with Hausdorff distance which determines the terminating condition. The algorithm helps to locate the text lines. A greater compression ratio has achieved with lesser distortion