8 resultados para Statistical approach
em Cochin University of Science
Resumo:
This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
A Parts of Speech tagger for Malayalam which uses a stochastic approach has been proposed. The tagger makes use of word frequencies and bigram statistics from a corpus. The morphological analyzer is used to generate a tagged corpus due to the unavailability of an annotated corpus in Malayalam. Although the experiments have been performed on a very small corpus, the results have shown that the statistical approach works well with a highly agglutinative language like Malayalam
Resumo:
Geochemical composition is a set of data for predicting the climatic condition existing in an ecosystem. Both the surficial and core sediment geochemistry are helpful in monitoring, assessing and evaluating the marine environment. The aim of the research work is to assess the relationship between the biogeochemical constituents in the Cochin Estuarine System (CES), their modifications after a long period of anoxia and also to identify the various processes which control the sediment composition in this region, through a multivariate statistical approach. Therefore the study of present core sediment geochemistry has a critical role in unraveling the benchmark of their characterization. Sediment cores from four prominent zones of CES were examined for various biogeochemical aspects. The results have served as rejuvenating records for the prediction of core sediment status prevailing in the CES
Resumo:
Geochemical composition is a set of data for predicting the climatic condition existing in an ecosystem. Both the surficial and core sediment geochemistry are helpful in monitoring, assessing and evaluating the marine environment. The aim of the research work is to assess the relationship between the biogeochemical constituents in the Cochin Estuarine System (CES), their modifications after a long period of anoxia and also to identify the various processes which control the sediment composition in this region, through a multivariate statistical approach. Therefore the study of present core sediment geochemistry has a critical role in unraveling the benchmark of their characterization. Sediment cores from four prominent zones of CES were examined for various biogeochemical aspects. The results have served as rejuvenating records for the prediction of core sediment status prevailing in the CES
Resumo:
The preceding discussion and review of literature show that studies on gear selectivity have received great attention, while gear efficiency studies do not seem to have received equal consideration. In temperate waters, fishing industry is well organised and relatively large and well equipped vessels and gear are used for commercial fishing and the number of species are less; whereas in tropics particularly in India, small scale fishery dominates the scene and the fishery is multispecies operated upon by nmltigear. Therefore many of the problems faced in India may not exist in developed countries. Perhaps this would be the reason for the paucity of literature on the problems in estimation of relative efficiency. Much work has been carried out in estimating relative efficiency (Pycha, 1962; Pope, 1963; Gulland, 1967; Dickson, 1971 and Collins, 1979). The main subject of interest in the present thesis is an investigation into the problems in the comparison of fishing gears. especially in using classical test procedures with special reference to the prevailing fishing practices (that is. with reference to the catch data generated by the existing system). This has been taken up with a view to standardizing an approach for comparing the efficiency of fishing gear. Besides this, the implications of the terms ‘gear efficiency‘ and ‘gear selectivity‘ have been examined and based on the commonly used selectivity model (Holt, 1963), estimation of the ratio of fishing power of two gear has been considered. An attempt to determine the size of fish for which a gear is most efficient.has also been made. The work has been presented in eight chapters
Resumo:
A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
A potential fungal strain producing extracellular β-glucosidase enzyme was isolated from sea water and identified as ^ëéÉêJ Öáääìë=ëóÇçïáá BTMFS 55 by a molecular approach based on 28S rDNA sequence homology which showed 93% identity with already reported sequences of ^ëéÉêÖáääìë=ëóÇçïáá in the GenBank. A sequential optimization strategy was used to enhance the production of β-glucosidase under solid state fermentation (SSF) with wheat bran (WB) as the growth medium. The two-level Plackett-Burman (PB) design was implemented to screen medium components that influence β-glucosidase production and among the 11 variables, moisture content, inoculums, and peptone were identified as the most significant factors for β-glucosidase production. The enzyme was purified by (NH4)2SO4 precipitation followed by ion exchange chromatography on DEAE sepharose. The enzyme was a monomeric protein with a molecular weight of ~95 kDa as determined by SDS-PAGE. It was optimally active at pH 5.0 and 50°C. It showed high affinity towards éNPG and enzyme has a hã and sã~ñ of 0.67 mM and 83.3 U/mL, respectively. The enzyme was tolerant to glucose inhibition with a há of 17 mM. Low concentration of alcohols (10%), especially ethanol, could activate the enzyme. A considerable level of ethanol could produce from wheat bran and rice straw after 48 and 24 h, respectively, with the help of p~ÅÅÜ~êçãóÅÉë=ÅÉêÉîáëá~É in presence of cellulase and the purified β-glucosidase of ^ëéÉêÖáääìë=ëóÇçïáá BTMFS 55.
Resumo:
Post-transcriptional gene silencing by RNA interference is mediated by small interfering RNA called siRNA. This gene silencing mechanism can be exploited therapeutically to a wide variety of disease-associated targets, especially in AIDS, neurodegenerative diseases, cholesterol and cancer on mice with the hope of extending these approaches to treat humans. Over the recent past, a significant amount of work has been undertaken to understand the gene silencing mediated by exogenous siRNA. The design of efficient exogenous siRNA sequences is challenging because of many issues related to siRNA. While designing efficient siRNA, target mRNAs must be selected such that their corresponding siRNAs are likely to be efficient against that target and unlikely to accidentally silence other transcripts due to sequence similarity. So before doing gene silencing by siRNAs, it is essential to analyze their off-target effects in addition to their inhibition efficiency against a particular target. Hence designing exogenous siRNA with good knock-down efficiency and target specificity is an area of concern to be addressed. Some methods have been developed already by considering both inhibition efficiency and off-target possibility of siRNA against agene. Out of these methods, only a few have achieved good inhibition efficiency, specificity and sensitivity. The main focus of this thesis is to develop computational methods to optimize the efficiency of siRNA in terms of “inhibition capacity and off-target possibility” against target mRNAs with improved efficacy, which may be useful in the area of gene silencing and drug design for tumor development. This study aims to investigate the currently available siRNA prediction approaches and to devise a better computational approach to tackle the problem of siRNA efficacy by inhibition capacity and off-target possibility. The strength and limitations of the available approaches are investigated and taken into consideration for making improved solution. Thus the approaches proposed in this study extend some of the good scoring previous state of the art techniques by incorporating machine learning and statistical approaches and thermodynamic features like whole stacking energy to improve the prediction accuracy, inhibition efficiency, sensitivity and specificity. Here, we propose one Support Vector Machine (SVM) model, and two Artificial Neural Network (ANN) models for siRNA efficiency prediction. In SVM model, the classification property is used to classify whether the siRNA is efficient or inefficient in silencing a target gene. The first ANNmodel, named siRNA Designer, is used for optimizing the inhibition efficiency of siRNA against target genes. The second ANN model, named Optimized siRNA Designer, OpsiD, produces efficient siRNAs with high inhibition efficiency to degrade target genes with improved sensitivity-specificity, and identifies the off-target knockdown possibility of siRNA against non-target genes. The models are trained and tested against a large data set of siRNA sequences. The validations are conducted using Pearson Correlation Coefficient, Mathews Correlation Coefficient, Receiver Operating Characteristic analysis, Accuracy of prediction, Sensitivity and Specificity. It is found that the approach, OpsiD, is capable of predicting the inhibition capacity of siRNA against a target mRNA with improved results over the state of the art techniques. Also we are able to understand the influence of whole stacking energy on efficiency of siRNA. The model is further improved by including the ability to identify the “off-target possibility” of predicted siRNA on non-target genes. Thus the proposed model, OpsiD, can predict optimized siRNA by considering both “inhibition efficiency on target genes and off-target possibility on non-target genes”, with improved inhibition efficiency, specificity and sensitivity. Since we have taken efforts to optimize the siRNA efficacy in terms of “inhibition efficiency and offtarget possibility”, we hope that the risk of “off-target effect” while doing gene silencing in various bioinformatics fields can be overcome to a great extent. These findings may provide new insights into cancer diagnosis, prognosis and therapy by gene silencing. The approach may be found useful for designing exogenous siRNA for therapeutic applications and gene silencing techniques in different areas of bioinformatics.