995 resultados para Statistical Concepts
Resumo:
In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
Resumo:
While channel coding is a standard method of improving a system’s energy efficiency in digital communications, its practice does not extend to high-speed links. Increasing demands in network speeds are placing a large burden on the energy efficiency of high-speed links and render the benefit of channel coding for these systems a timely subject. The low error rates of interest and the presence of residual intersymbol interference (ISI) caused by hardware constraints impede the analysis and simulation of coded high-speed links. Focusing on the residual ISI and combined noise as the dominant error mechanisms, this paper analyses error correlation through concepts of error region, channel signature, and correlation distance. This framework provides a deeper insight into joint error behaviours in high-speed links, extends the range of statistical simulation for coded high-speed links, and provides a case against the use of biased Monte Carlo methods in this setting
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
A potential fungal strain producing extracellular β-glucosidase enzyme was isolated from sea water and identified as ^ëéÉêJ Öáääìë=ëóÇçïáá BTMFS 55 by a molecular approach based on 28S rDNA sequence homology which showed 93% identity with already reported sequences of ^ëéÉêÖáääìë=ëóÇçïáá in the GenBank. A sequential optimization strategy was used to enhance the production of β-glucosidase under solid state fermentation (SSF) with wheat bran (WB) as the growth medium. The two-level Plackett-Burman (PB) design was implemented to screen medium components that influence β-glucosidase production and among the 11 variables, moisture content, inoculums, and peptone were identified as the most significant factors for β-glucosidase production. The enzyme was purified by (NH4)2SO4 precipitation followed by ion exchange chromatography on DEAE sepharose. The enzyme was a monomeric protein with a molecular weight of ~95 kDa as determined by SDS-PAGE. It was optimally active at pH 5.0 and 50°C. It showed high affinity towards éNPG and enzyme has a hã and sã~ñ of 0.67 mM and 83.3 U/mL, respectively. The enzyme was tolerant to glucose inhibition with a há of 17 mM. Low concentration of alcohols (10%), especially ethanol, could activate the enzyme. A considerable level of ethanol could produce from wheat bran and rice straw after 48 and 24 h, respectively, with the help of p~ÅÅÜ~êçãóÅÉë=ÅÉêÉîáëá~É in presence of cellulase and the purified β-glucosidase of ^ëéÉêÖáääìë=ëóÇçïáá BTMFS 55.
Resumo:
Low grade and High grade Gliomas are tumors that originate in the glial cells. The main challenge in brain tumor diagnosis is whether a tumor is benign or malignant, primary or metastatic and low or high grade. Based on the patient's MRI, a radiologist could not differentiate whether it is a low grade Glioma or a high grade Glioma. Because both of these are almost visually similar, autopsy confirms the diagnosis of low grade with high-grade and infiltrative features. In this paper, textural description of Grade I and grade III Glioma are extracted using First order statistics and Gray Level Co-occurance Matrix Method (GLCM). Textural features are extracted from 16X16 sub image of the segmented Region of Interest(ROI) .In the proposed method, first order statistical features such as contrast, Intensity , Entropy, Kurtosis and spectral energy and GLCM features extracted were showed promising results. The ranges of these first order statistics and GLCM based features extracted are highly discriminant between grade I and Grade III. In this study which gives statistical textural information of grade I and grade III Glioma which is very useful for further classification and analysis and thus assisting Radiologist in greater extent.
Resumo:
The characterization and grading of glioma tumors, via image derived features, for diagnosis, prognosis, and treatment response has been an active research area in medical image computing. This paper presents a novel method for automatic detection and classification of glioma from conventional T2 weighted MR images. Automatic detection of the tumor was established using newly developed method called Adaptive Gray level Algebraic set Segmentation Algorithm (AGASA).Statistical Features were extracted from the detected tumor texture using first order statistics and gray level co-occurrence matrix (GLCM) based second order statistical methods. Statistical significance of the features was determined by t-test and its corresponding p-value. A decision system was developed for the grade detection of glioma using these selected features and its p-value. The detection performance of the decision system was validated using the receiver operating characteristic (ROC) curve. The diagnosis and grading of glioma using this non-invasive method can contribute promising results in medical image computing
Resumo:
The basic concepts of digital signal processing are taught to the students in engineering and science. The focus of the course is on linear, time invariant systems. The question as to what happens when the system is governed by a quadratic or cubic equation remains unanswered in the vast majority of literature on signal processing. Light has been shed on this problem when John V Mathews and Giovanni L Sicuranza published the book Polynomial Signal Processing. This book opened up an unseen vista of polynomial systems for signal and image processing. The book presented the theory and implementations of both adaptive and non-adaptive FIR and IIR quadratic systems which offer improved performance than conventional linear systems. The theory of quadratic systems presents a pristine and virgin area of research that offers computationally intensive work. Once the area of research is selected, the next issue is the choice of the software tool to carry out the work. Conventional languages like C and C++ are easily eliminated as they are not interpreted and lack good quality plotting libraries. MATLAB is proved to be very slow and so do SCILAB and Octave. The search for a language for scientific computing that was as fast as C, but with a good quality plotting library, ended up in Python, a distant relative of LISP. It proved to be ideal for scientific computing. An account of the use of Python, its scientific computing package scipy and the plotting library pylab is given in the appendix Initially, work is focused on designing predictors that exploit the polynomial nonlinearities inherent in speech generation mechanisms. Soon, the work got diverted into medical image processing which offered more potential to exploit by the use of quadratic methods. The major focus in this area is on quadratic edge detection methods for retinal images and fingerprints as well as de-noising raw MRI signals
Resumo:
Die gegenwärtige Entwicklung der internationalen Klimapolitik verlangt von Deutschland eine Reduktion seiner Treibhausgasemissionen. Wichtigstes Treibhausgas ist Kohlendioxid, das durch die Verbrennung fossiler Energieträger in die Atmosphäre freigesetzt wird. Die Reduktionsziele können prinzipiell durch eine Verminderung der Emissionen sowie durch die Schaffung von Kohlenstoffsenken erreicht werden. Senken beschreiben dabei die biologische Speicherung von Kohlenstoff in Böden und Wäldern. Eine wichtige Einflussgröße auf diese Prozesse stellt die räumliche Dynamik der Landnutzung einer Region dar. In dieser Arbeit wird das Modellsystem HILLS entwickelt und zur Simulation dieser komplexen Wirkbeziehungen im Bundesland Hessen genutzt. Ziel ist es, mit HILLS über eine Analyse des aktuellen Zustands hinaus auch Szenarien über Wege der zukünftigen regionalen Entwicklung von Landnutzung und ihrer Wirkung auf den Kohlenstoffhaushalt bis 2020 zu untersuchen. Für die Abbildung der räumlichen und zeitlichen Dynamik von Landnutzung in Hessen wird das Modell LUCHesse entwickelt. Seine Aufgabe ist die Simulation der relevanten Prozesse auf einem 1 km2 Raster, wobei die Raten der Änderung exogen als Flächentrends auf Ebene der hessischen Landkreise vorgegeben werden. LUCHesse besteht aus Teilmodellen für die Prozesse: (A) Ausbreitung von Siedlungs- und Gewerbefläche, (B) Strukturwandel im Agrarsektor sowie (C) Neuanlage von Waldflächen (Aufforstung). Jedes Teilmodell umfasst Methoden zur Bewertung der Standorteignung der Rasterzellen für unterschiedliche Landnutzungsklassen und zur Zuordnung der Trendvorgaben zu solchen Rasterzellen, die jeweils am besten für eine Landnutzungsklasse geeignet sind. Eine Validierung der Teilmodelle erfolgt anhand von statistischen Daten für den Zeitraum von 1990 bis 2000. Als Ergebnis eines Simulationslaufs werden für diskrete Zeitschritte digitale Karten der Landnutzugsverteilung in Hessen erzeugt. Zur Simulation der Kohlenstoffspeicherung wird eine modifizierte Version des Ökosystemmodells Century entwickelt (GIS-Century). Sie erlaubt einen gesteuerten Simulationslauf in Jahresschritten und unterstützt die Integration des Modells als Komponente in das HILLS Modellsystem. Es werden verschiedene Anwendungsschemata für GIS-Century entwickelt, mit denen die Wirkung der Stilllegung von Ackerflächen, der Aufforstung sowie der Bewirtschaftung bereits bestehender Wälder auf die Kohlenstoffspeicherung untersucht werden kann. Eine Validierung des Modells und der Anwendungsschemata erfolgt anhand von Feld- und Literaturdaten. HILLS implementiert eine sequentielle Kopplung von LUCHesse mit GIS-Century. Die räumliche Kopplung geschieht dabei auf dem 1 km2 Raster, die zeitliche Kopplung über die Einführung eines Landnutzungsvektors, der die Beschreibung der Landnutzungsänderung einer Rasterzelle während des Simulationszeitraums enthält. Außerdem integriert HILLS beide Modelle über ein dienste- und datenbankorientiertes Konzept in ein Geografisches Informationssystem (GIS). Auf diesem Wege können die GIS-Funktionen zur räumlichen Datenhaltung und Datenverarbeitung genutzt werden. Als Anwendung des Modellsystems wird ein Referenzszenario für Hessen mit dem Zeithorizont 2020 berechnet. Das Szenario setzt im Agrarsektor eine Umsetzung der AGENDA 2000 Politik voraus, die in großem Maße zu Stilllegung von Ackerflächen führt, während für den Bereich Siedlung und Gewerbe sowie Aufforstung die aktuellen Trends der Flächenausdehnung fortgeschrieben werden. Mit HILLS ist es nun möglich, die Wirkung dieser Landnutzungsänderungen auf die biologische Kohlenstoffspeicherung zu quantifizieren. Während die Ausdehnung von Siedlungsflächen als Kohlenstoffquelle identifiziert werden kann (37 kt C/a), findet sich die wichtigste Senke in der Bewirtschaftung bestehender Waldflächen (794 kt C/a). Weiterhin führen die Stilllegung von Ackerfläche (26 kt C/a) sowie Aufforstung (29 kt C/a) zu einer zusätzlichen Speicherung von Kohlenstoff. Für die Kohlenstoffspeicherung in Böden zeigen die Simulationsexperimente sehr klar, dass diese Senke nur von beschränkter Dauer ist.