47 resultados para Markov chains hidden Markov models Viterbi algorithm Forward-Backward algorithm maximum likelihood

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The multivariate skew-t distribution (J Multivar Anal 79:93-113, 2001; J R Stat Soc, Ser B 65:367-389, 2003; Statistics 37:359-363, 2003) includes the Student t, skew-Cauchy and Cauchy distributions as special cases and the normal and skew-normal ones as limiting cases. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis of repeated measures, pretest/post-test data, under multivariate null intercept measurement error model (J Biopharm Stat 13(4):763-771, 2003) where the random errors and the unobserved value of the covariate (latent variable) follows a Student t and skew-t distribution, respectively. The results and methods are numerically illustrated with an example in the field of dentistry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Robotic mapping is the process of automatically constructing an environment representation using mobile robots. We address the problem of semantic mapping, which consists of using mobile robots to create maps that represent not only metric occupancy but also other properties of the environment. Specifically, we develop techniques to build maps that represent activity and navigability of the environment. Our approach to semantic mapping is to combine machine learning techniques with standard mapping algorithms. Supervised learning methods are used to automatically associate properties of space to the desired classification patterns. We present two methods, the first based on hidden Markov models and the second on support vector machines. Both approaches have been tested and experimentally validated in two problem domains: terrain mapping and activity-based mapping.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this article is to find out the influence of the parameters of the ARIMA-GARCH models in the prediction of artificial neural networks (ANN) of the feed forward type, trained with the Levenberg-Marquardt algorithm, through Monte Carlo simulations. The paper presents a study of the relationship between ANN performance and ARIMA-GARCH model parameters, i.e. the fact that depending on the stationarity and other parameters of the time series, the ANN structure should be selected differently. Neural networks have been widely used to predict time series and their capacity for dealing with non-linearities is a normally outstanding advantage. However, the values of the parameters of the models of generalized autoregressive conditional heteroscedasticity have an influence on ANN prediction performance. The combination of the values of the GARCH parameters with the ARIMA autoregressive terms also implies in ANN performance variation. Combining the parameters of the ARIMA-GARCH models and changing the ANN`s topologies, we used the Theil inequality coefficient to measure the prediction of the feed forward ANN.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we propose and analyze nonlinear elliptical models for longitudinal data, which represent an alternative to gaussian models in the cases of heavy tails, for instance. The elliptical distributions may help to control the influence of the observations in the parameter estimates by naturally attributing different weights for each case. We consider random effects to introduce the within-group correlation and work with the marginal model without requiring numerical integration. An iterative algorithm to obtain maximum likelihood estimates for the parameters is presented, as well as diagnostic results based on residual distances and local influence [Cook, D., 1986. Assessment of local influence. journal of the Royal Statistical Society - Series B 48 (2), 133-169; Cook D., 1987. Influence assessment. journal of Applied Statistics 14 (2),117-131; Escobar, L.A., Meeker, W.Q., 1992, Assessing influence in regression analysis with censored data, Biometrics 48, 507-528]. As numerical illustration, we apply the obtained results to a kinetics longitudinal data set presented in [Vonesh, E.F., Carter, R.L., 1992. Mixed-effects nonlinear regression for unbalanced repeated measures. Biometrics 48, 1-17], which was analyzed under the assumption of normality. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we extend partial linear models with normal errors to Student-t errors Penalized likelihood equations are applied to derive the maximum likelihood estimates which appear to be robust against outlying observations in the sense of the Mahalanobis distance In order to study the sensitivity of the penalized estimates under some usual perturbation schemes in the model or data the local influence curvatures are derived and some diagnostic graphics are proposed A motivating example preliminary analyzed under normal errors is reanalyzed under Student-t errors The local influence approach is used to compare the sensitivity of the model estimates (C) 2010 Elsevier B V All rights reserved

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scale mixtures of the skew-normal (SMSN) distribution is a class of asymmetric thick-tailed distributions that includes the skew-normal (SN) distribution as a special case. The main advantage of these classes of distributions is that they are easy to simulate and have a nice hierarchical representation facilitating easy implementation of the expectation-maximization algorithm for the maximum-likelihood estimation. In this paper, we assume an SMSN distribution for the unobserved value of the covariates and a symmetric scale mixtures of the normal distribution for the error term of the model. This provides a robust alternative to parameter estimation in multivariate measurement error models. Specific distributions examined include univariate and multivariate versions of the SN, skew-t, skew-slash and skew-contaminated normal distributions. The results and methods are applied to a real data set.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phylogenetic analyses of chloroplast DNA sequences, morphology, and combined data have provided consistent support for many of the major branches within the angiosperm, clade Dipsacales. Here we use sequences from three mitochondrial loci to test the existing broad scale phylogeny and in an attempt to resolve several relationships that have remained uncertain. Parsimony, maximum likelihood, and Bayesian analyses of a combined mitochondrial data set recover trees broadly consistent with previous studies, although resolution and support are lower than in the largest chloroplast analyses. Combining chloroplast and mitochondrial data results in a generally well-resolved and very strongly supported topology but the previously recognized problem areas remain. To investigate why these relationships have been difficult to resolve we conducted a series of experiments using different data partitions and heterogeneous substitution models. Usually more complex modeling schemes are favored regardless of the partitions recognized but model choice had little effect on topology or support values. In contrast there are consistent but weakly supported differences in the topologies recovered from coding and non-coding matrices. These conflicts directly correspond to relationships that were poorly resolved in analyses of the full combined chloroplast-mitochondrial data set. We suggest incongruent signal has contributed to our inability to confidently resolve these problem areas. (c) 2007 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we deal with robust inference in heteroscedastic measurement error models Rather than the normal distribution we postulate a Student t distribution for the observed variables Maximum likelihood estimates are computed numerically Consistent estimation of the asymptotic covariance matrices of the maximum likelihood and generalized least squares estimators is also discussed Three test statistics are proposed for testing hypotheses of interest with the asymptotic chi-square distribution which guarantees correct asymptotic significance levels Results of simulations and an application to a real data set are also reported (C) 2009 The Korean Statistical Society Published by Elsevier B V All rights reserved

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we develop a flexible cure rate survival model by assuming the number of competing causes of the event of interest to follow the Conway-Maxwell Poisson distribution. This model includes as special cases some of the well-known cure rate models discussed in the literature. Next, we discuss the maximum likelihood estimation of the parameters of this cure rate survival model. Finally, we illustrate the usefulness of this model by applying it to a real cutaneous melanoma data. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The estimation of data transformation is very useful to yield response variables satisfying closely a normal linear model, Generalized linear models enable the fitting of models to a wide range of data types. These models are based on exponential dispersion models. We propose a new class of transformed generalized linear models to extend the Box and Cox models and the generalized linear models. We use the generalized linear model framework to fit these models and discuss maximum likelihood estimation and inference. We give a simple formula to estimate the parameter that index the transformation of the response variable for a subclass of models. We also give a simple formula to estimate the rth moment of the original dependent variable. We explore the possibility of using these models to time series data to extend the generalized autoregressive moving average models discussed by Benjamin er al. [Generalized autoregressive moving average models. J. Amer. Statist. Assoc. 98, 214-223]. The usefulness of these models is illustrated in a Simulation study and in applications to three real data sets. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For the first time, we introduce a class of transformed symmetric models to extend the Box and Cox models to more general symmetric models. The new class of models includes all symmetric continuous distributions with a possible non-linear structure for the mean and enables the fitting of a wide range of models to several data types. The proposed methods offer more flexible alternatives to Box-Cox or other existing procedures. We derive a very simple iterative process for fitting these models by maximum likelihood, whereas a direct unconditional maximization would be more difficult. We give simple formulae to estimate the parameter that indexes the transformation of the response variable and the moments of the original dependent variable which generalize previous published results. We discuss inference on the model parameters. The usefulness of the new class of models is illustrated in one application to a real dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce in this paper a new class of discrete generalized nonlinear models to extend the binomial, Poisson and negative binomial models to cope with count data. This class of models includes some important models such as log-nonlinear models, logit, probit and negative binomial nonlinear models, generalized Poisson and generalized negative binomial regression models, among other models, which enables the fitting of a wide range of models to count data. We derive an iterative process for fitting these models by maximum likelihood and discuss inference on the parameters. The usefulness of the new class of models is illustrated with an application to a real data set. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Considering the Wald, score, and likelihood ratio asymptotic test statistics, we analyze a multivariate null intercept errors-in-variables regression model, where the explanatory and the response variables are subject to measurement errors, and a possible structure of dependency between the measurements taken within the same individual are incorporated, representing a longitudinal structure. This model was proposed by Aoki et al. (2003b) and analyzed under the bayesian approach. In this article, considering the classical approach, we analyze asymptotic test statistics and present a simulation study to compare the behavior of the three test statistics for different sample sizes, parameter values and nominal levels of the test. Also, closed form expressions for the score function and the Fisher information matrix are presented. We consider two real numerical illustrations, the odontological data set from Hadgu and Koch (1999), and a quality control data set.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this article, we discuss inferential aspects of the measurement error regression models with null intercepts when the unknown quantity x (latent variable) follows a skew normal distribution. We examine first the maximum-likelihood approach to estimation via the EM algorithm by exploring statistical properties of the model considered. Then, the marginal likelihood, the score function and the observed information matrix of the observed quantities are presented allowing direct inference implementation. In order to discuss some diagnostics techniques in this type of models, we derive the appropriate matrices to assessing the local influence on the parameter estimates under different perturbation schemes. The results and methods developed in this paper are illustrated considering part of a real data set used by Hadgu and Koch [1999, Application of generalized estimating equations to a dental randomized clinical trial. Journal of Biopharmaceutical Statistics, 9, 161-178].