973 resultados para statistical speaker models
Resumo:
This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
In this paper, we study the relationship between the failure rate and the mean residual life of doubly truncated random variables. Accordingly, we develop characterizations for exponential, Pareto 11 and beta distributions. Further, we generalize the identities for fire Pearson and the exponential family of distributions given respectively in Nair and Sankaran (1991) and Consul (1995). Applications of these measures in file context of lengthbiased models are also explored
Resumo:
The classical methods of analysing time series by Box-Jenkins approach assume that the observed series uctuates around changing levels with constant variance. That is, the time series is assumed to be of homoscedastic nature. However, the nancial time series exhibits the presence of heteroscedasticity in the sense that, it possesses non-constant conditional variance given the past observations. So, the analysis of nancial time series, requires the modelling of such variances, which may depend on some time dependent factors or its own past values. This lead to introduction of several classes of models to study the behaviour of nancial time series. See Taylor (1986), Tsay (2005), Rachev et al. (2007). The class of models, used to describe the evolution of conditional variances is referred to as stochastic volatility modelsThe stochastic models available to analyse the conditional variances, are based on either normal or log-normal distributions. One of the objectives of the present study is to explore the possibility of employing some non-Gaussian distributions to model the volatility sequences and then study the behaviour of the resulting return series. This lead us to work on the related problem of statistical inference, which is the main contribution of the thesis
Resumo:
We present a statistical image-based shape + structure model for Bayesian visual hull reconstruction and 3D structure inference. The 3D shape of a class of objects is represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes are then estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We show how the use of a class-specific prior in a visual hull reconstruction can reduce the effect of segmentation errors from the silhouette extraction process. The proposed method is applied to a data set of pedestrian images, and improvements in the approximate 3D models under various noise conditions are shown. We further augment the shape model to incorporate structural features of interest; unknown structural parameters for a novel set of contours are then inferred via the Bayesian reconstruction process. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a data set of thousands of pedestrian images generated from a synthetic model, we can accurately infer the 3D locations of 19 joints on the body based on observed silhouette contours from real images.
Resumo:
Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics.Formalisms for manipulating these models have been developedrelatively independently in these research communities. In this paper weexplore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independencenetworks (PINs). The paper contains a self-contained review of the basic principles of PINs.It is shown that the well-known forward-backward (F-B) and Viterbialgorithms for HMMs are special cases of more general inference algorithms forarbitrary PINs. Furthermore, the existence of inference and estimationalgorithms for more general graphical models provides a set of analysistools for HMM practitioners who wish to explore a richer class of HMMstructures.Examples of relatively complex models to handle sensorfusion and coarticulationin speech recognitionare introduced and treated within the graphical model framework toillustrate the advantages of the general approach.
Resumo:
Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological purposes. In particular, we discuss the use of time-series regression for counts using a wide range Generalised Linear Models as well as Generalised Additive Models. In addition, recently critical points in using statistical software for GAM were stressed, and reanalyses of time series data on air pollution and health were performed in order to update already published. Applications are offered through an example on the relationship between asthma emergency admissions and photochemical air pollutants
Resumo:
Observations in daily practice are sometimes registered as positive values larger then a given threshold α. The sample space is in this case the interval (α,+∞), α > 0, which can be structured as a real Euclidean space in different ways. This fact opens the door to alternative statistical models depending not only on the assumed distribution function, but also on the metric which is considered as appropriate, i.e. the way differences are measured, and thus variability
Resumo:
This paper is a first draft of the principle of statistical modelling on coordinates. Several causes —which would be long to detail—have led to this situation close to the deadline for submitting papers to CODAWORK’03. The main of them is the fast development of the approach along the last months, which let appear previous drafts as obsolete. The present paper contains the essential parts of the state of the art of this approach from my point of view. I would like to acknowledge many clarifying discussions with the group of people working in this field in Girona, Barcelona, Carrick Castle, Firenze, Berlin, G¨ottingen, and Freiberg. They have given a lot of suggestions and ideas. Nevertheless, there might be still errors or unclear aspects which are exclusively my fault. I hope this contribution serves as a basis for further discussions and new developments
Resumo:
The paper discusses maintenance challenges of organisations with a huge number of devices and proposes the use of probabilistic models to assist monitoring and maintenance planning. The proposal assumes connectivity of instruments to report relevant features for monitoring. Also, the existence of enough historical registers with diagnosed breakdowns is required to make probabilistic models reliable and useful for predictive maintenance strategies based on them. Regular Markov models based on estimated failure and repair rates are proposed to calculate the availability of the instruments and Dynamic Bayesian Networks are proposed to model cause-effect relationships to trigger predictive maintenance services based on the influence between observed features and previously documented diagnostics
Resumo:
Esta tesis está dividida en dos partes: en la primera parte se presentan y estudian los procesos telegráficos, los procesos de Poisson con compensador telegráfico y los procesos telegráficos con saltos. El estudio presentado en esta primera parte incluye el cálculo de las distribuciones de cada proceso, las medias y varianzas, así como las funciones generadoras de momentos entre otras propiedades. Utilizando estas propiedades en la segunda parte se estudian los modelos de valoración de opciones basados en procesos telegráficos con saltos. En esta parte se da una descripción de cómo calcular las medidas neutrales al riesgo, se encuentra la condición de no arbitraje en este tipo de modelos y por último se calcula el precio de las opciones Europeas de compra y venta.
Resumo:
The performance of a model-based diagnosis system could be affected by several uncertainty sources, such as,model errors,uncertainty in measurements, and disturbances. This uncertainty can be handled by mean of interval models.The aim of this thesis is to propose a methodology for fault detection, isolation and identification based on interval models. The methodology includes some algorithms to obtain in an automatic way the symbolic expression of the residual generators enhancing the structural isolability of the faults, in order to design the fault detection tests. These algorithms are based on the structural model of the system. The stages of fault detection, isolation, and identification are stated as constraint satisfaction problems in continuous domains and solved by means of interval based consistency techniques. The qualitative fault isolation is enhanced by a reasoning in which the signs of the symptoms are derived from analytical redundancy relations or bond graph models of the system. An initial and empirical analysis regarding the differences between interval-based and statistical-based techniques is presented in this thesis. The performance and efficiency of the contributions are illustrated through several application examples, covering different levels of complexity.