39 resultados para Word Processing
Resumo:
Rays, belonging to the class Elasmobranchii, constitute a major fishery in many states in India like Tamil Nadu, Gujarat, Andhra Pradesh, Kerala and Maharashtra. The estimated landings are 21,700 tonnes per annum. Even though the meat of rays is nutritious and free from bones and spines, there is little demand for fresh meat due to the presence of a high urea content. The landings are mainly used for salt curing which fetches only very low prices for the producers. Urea nitrogen constituted the major component (50.8%) of the non-protein nitrogen of the meat. An attempt has been made to standat-dize the processing steps to reduce the urea levels in the meat before freezing by using different simple techniques like dipping the fillets in stagnant chilled water, dipping in chilled running water and dipping in stirred chilled running water. It was found that meat dipped in stirred running water for two hours reduced the urea level of the meat by 62%. The yield of the lateral fin fillets and caudal fin fillets vary with the size of the ray. The drip loss during frozen storage is found to be more in the case of samples frozen stored after the treatment for urea removal by the method of stirring in running water. The samples treated in stagnant chilled water had the lowest drip loss. The total nitrogen was higher in samples treated in stagnant chilled water and lowest in the samples treated in stirred running water. The overall acceptability was high in the case of samples treated with stirred running water and frozen stored
Resumo:
In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.
Resumo:
The date palm Phoenix dactylifera has played an important role in the day-to-day life of the people for the last 7000 years. Today worldwide production, utilization and industrialization of dates are continuously increasing since date fruits have earned great importance in human nutrition owing to their rich content of essential nutrients. Tons of date palm fruit wastes are discarded daily by the date processing industries leading to environmental problems. Wastes such as date pits represent an average of 10% of the date fruits. Thus, there is an urgent need to find suitable applications for this waste. In spite of several studies on date palm cultivation, their utilization and scope for utilizing date fruit in therapeutic applications, very few reviews are available and they are limited to the chemistry and pharmacology of the date fruits and phytochemical composition, nutritional significance and potential health benefits of date fruit consumption. In this context, in the present review the prospects of valorization of these date fruit processing by-products and wastes’ employing fermentation and enzyme processing technologies towards total utilization of this valuable commodity for the production of biofuels, biopolymers, biosurfactants, organic acids, antibiotics, industrial enzymes and other possible industrial chemicals are discussed
Resumo:
Sensitisation of natural rubber latex by addition of a small quantity of an anionic surfactant prior to the addition of a coacervant results in quick coagulation. The natural rubber prepared by the novel coagulation method shows improved raw rubber characteristics, better cure characteristics in gum and carbon black filled compounds and improved mechanical properties as compared to the conventionally coagulated natural rubber. Compounds based on dried masterbatches prepared by the incorporation of fluffy carbon black in different forms of soap sensitised natural rubber latices such as fresh latex, preserved field latex, centrifuged latex and a blend of preserved field latex and skim latex show improved cure characteristics and vucanizate properties as compared to an equivalent conventional dry rubber-fluffy carbon black based compound. The latex masterbatch based vulcanizates show higher level of crosslinking and better dispersion of filler. Vulcanizates based on fresh natural rubber latex- dual filler masterbatches containing a blend of carbon black and silica prepared by the modified coagulation process shows very good mechanical and dynamic properties that could be correlated to a low rolling resistance. The carbon black/silica/nanoclay tri-filler - fresh natural rubber latex masterbatch based vulcanizates show improved mechanical properties as the proportion of nanoclay increased up to 5 phr. The fresh natural rubber latex based carbon black-silica masterbatch/ polybutadiene blend vulcanizates show superior mechanical and dynamic properties as compared to the equivalent compound vulcanizates prepared from the dry natural rubber-filler (conventional dry mix)/polybutadiene blends