989 resultados para Statistical methodologies
Resumo:
Some investigations on the spectral and statistical characteristics of deep water waves are available for Indian waters. But practically no systematic investigation on the shallow water wave spectral and probabilistic characteristics is made for any part of the Indian coast except for a few restricted studies. Hence a comprehensive study of the shallow water wave climate and their spectral and statistical characteristics for a location (Alleppey) along the southwest coast of India is undertaken based on recorded data. The results of the investigation are presented in this thesis.The thesis comprises of seven chapters
Resumo:
For years, choosing the right career by monitoring the trends and scope for different career paths have been a requirement for all youngsters all over the world. In this paper we provide a scientific, data mining based method for job absorption rate prediction and predicting the waiting time needed for 100% placement, for different engineering courses in India. This will help the students in India in a great deal in deciding the right discipline for them for a bright future. Information about passed out students are obtained from the NTMIS ( National technical manpower information system ) NODAL center in Kochi, India residing in Cochin University of science and technology
Resumo:
This paper compares statistical technique of paraphrase identification to semantic technique of paraphrase identification. The statistical techniques used for comparison are word set and word-order based methods where as the semantic technique used is the WordNet similarity matrix method described by Stevenson and Fernando in [3].
Resumo:
In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
A potential fungal strain producing extracellular β-glucosidase enzyme was isolated from sea water and identified as ^ëéÉêJ Öáääìë=ëóÇçïáá BTMFS 55 by a molecular approach based on 28S rDNA sequence homology which showed 93% identity with already reported sequences of ^ëéÉêÖáääìë=ëóÇçïáá in the GenBank. A sequential optimization strategy was used to enhance the production of β-glucosidase under solid state fermentation (SSF) with wheat bran (WB) as the growth medium. The two-level Plackett-Burman (PB) design was implemented to screen medium components that influence β-glucosidase production and among the 11 variables, moisture content, inoculums, and peptone were identified as the most significant factors for β-glucosidase production. The enzyme was purified by (NH4)2SO4 precipitation followed by ion exchange chromatography on DEAE sepharose. The enzyme was a monomeric protein with a molecular weight of ~95 kDa as determined by SDS-PAGE. It was optimally active at pH 5.0 and 50°C. It showed high affinity towards éNPG and enzyme has a hã and sã~ñ of 0.67 mM and 83.3 U/mL, respectively. The enzyme was tolerant to glucose inhibition with a há of 17 mM. Low concentration of alcohols (10%), especially ethanol, could activate the enzyme. A considerable level of ethanol could produce from wheat bran and rice straw after 48 and 24 h, respectively, with the help of p~ÅÅÜ~êçãóÅÉë=ÅÉêÉîáëá~É in presence of cellulase and the purified β-glucosidase of ^ëéÉêÖáääìë=ëóÇçïáá BTMFS 55.
Resumo:
Low grade and High grade Gliomas are tumors that originate in the glial cells. The main challenge in brain tumor diagnosis is whether a tumor is benign or malignant, primary or metastatic and low or high grade. Based on the patient's MRI, a radiologist could not differentiate whether it is a low grade Glioma or a high grade Glioma. Because both of these are almost visually similar, autopsy confirms the diagnosis of low grade with high-grade and infiltrative features. In this paper, textural description of Grade I and grade III Glioma are extracted using First order statistics and Gray Level Co-occurance Matrix Method (GLCM). Textural features are extracted from 16X16 sub image of the segmented Region of Interest(ROI) .In the proposed method, first order statistical features such as contrast, Intensity , Entropy, Kurtosis and spectral energy and GLCM features extracted were showed promising results. The ranges of these first order statistics and GLCM based features extracted are highly discriminant between grade I and Grade III. In this study which gives statistical textural information of grade I and grade III Glioma which is very useful for further classification and analysis and thus assisting Radiologist in greater extent.
Resumo:
The characterization and grading of glioma tumors, via image derived features, for diagnosis, prognosis, and treatment response has been an active research area in medical image computing. This paper presents a novel method for automatic detection and classification of glioma from conventional T2 weighted MR images. Automatic detection of the tumor was established using newly developed method called Adaptive Gray level Algebraic set Segmentation Algorithm (AGASA).Statistical Features were extracted from the detected tumor texture using first order statistics and gray level co-occurrence matrix (GLCM) based second order statistical methods. Statistical significance of the features was determined by t-test and its corresponding p-value. A decision system was developed for the grade detection of glioma using these selected features and its p-value. The detection performance of the decision system was validated using the receiver operating characteristic (ROC) curve. The diagnosis and grading of glioma using this non-invasive method can contribute promising results in medical image computing