949 resultados para MAXIMUM LIKELIHOOD ESTIMATOR


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner. These algorithms are based on mixture modeling and make two distinct appeals to the Expectation-Maximization (EM) principle (Dempster, Laird, and Rubin 1977)---both for the estimation of mixture components and for coping with the missing data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$- insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify non-quadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we focus on the problem of estimating a bounded density using a finite combination of densities from a given class. We consider the Maximum Likelihood Procedure (MLE) and the greedy procedure described by Li and Barron. Approximation and estimation bounds are given for the above methods. We extend and improve upon the estimation results of Li and Barron, and in particular prove an $O(\\frac{1}{\\sqrt{n}})$ bound on the estimation error which does not depend on the number of densities in the estimated combination.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Our goal in this paper is to assess reliability and validity of egocentered network data using multilevel analysis (Muthen, 1989, Hox, 1993) under the multitrait-multimethod approach. The confirmatory factor analysis model for multitrait-multimethod data (Werts & Linn, 1970; Andrews, 1984) is used for our analyses. In this study we reanalyse a part of data of another study (Kogovšek et al., 2002) done on a representative sample of the inhabitants of Ljubljana. The traits used in our article are the name interpreters. We consider egocentered network data as hierarchical; therefore a multilevel analysis is required. We use Muthen's partial maximum likelihood approach, called pseudobalanced solution (Muthen, 1989, 1990, 1994) which produces estimations close to maximum likelihood for large ego sample sizes (Hox & Mass, 2001). Several analyses will be done in order to compare this multilevel analysis to classic methods of analysis such as the ones made in Kogovšek et al. (2002), who analysed the data only at group (ego) level considering averages of all alters within the ego. We show that some of the results obtained by classic methods are biased and that multilevel analysis provides more detailed information that much enriches the interpretation of reliability and validity of hierarchical data. Within and between-ego reliabilities and validities and other related quality measures are defined, computed and interpreted

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este trabajo estudia el efecto del estado de salud sobre la afiliación al Régimen Contributivo y el efecto del seguro público (Régimen Contributivo) y el seguro privado sobre el uso de servicios de salud (Consulta externa).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Se estima la tasa de retorno de la educación en Bogotá para 1997 y 2003 por medio de la metodología de Heckman. Se encuentra que los retornos de la educación y de la experiencia potencial son menores en 2003. El ingreso laboral promedio también disminuye.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose and estimate a financial distress model that explicitly accounts for the interactions or spill-over effects between financial institutions, through the use of a spatial continuity matrix that is build from financial network data of inter bank transactions. Such setup of the financial distress model allows for the empirical validation of the importance of network externalities in determining financial distress, in addition to institution specific and macroeconomic covariates. The relevance of such specification is that it incorporates simultaneously micro-prudential factors (Basel 2) as well as macro-prudential and systemic factors (Basel 3) as determinants of financial distress. Results indicate network externalities are an important determinant of financial health of a financial institutions. The parameter that measures the effect of network externalities is both economically and statistical significant and its inclusion as a risk factor reduces the importance of the firm specific variables such as the size or degree of leverage of the financial institution. In addition we analyze the policy implications of the network factor model for capital requirements and deposit insurance pricing.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Históricamente se ha reconocido que los conflictos internos afectan de manera directa variables a nivel individual como la salud de las personas, los niveles de escolaridad y el desplazamiento forzoso de los afectados. Sin embargo, solo hasta la última década las investigaciones académicas se han inclinado en documentar y cuantificar rigurosamente los efectos colaterales de la violencia sobre las condiciones de vida de los individuos. La presente investigación estudia cómo la exposición al conflicto en Colombia ha afectado las decisiones en términos de mercado laboral de las personas. La estrategia de identificación internaliza los reconocidos problemas de endogeneidad del conflicto con variables de actividad y desarrollo económico y presenta resultados robustos a fenómenos de migración interna y desplazamiento. En términos de participación laboral y desempleo, se encuentran efectos heterogéneos a nivel de género como respuestas a la violencia experimentada. En particular, la probabilidad de participación laboral de las mujeres se incremente como consecuencia de la exposición al conflicto, mientras que la de desempleo disminuye. Para los hombres, los resultados muestran una menor probabilidad de participación, efecto contrario al de las mujeres, y un efecto análogo en términos de desempleo. La investigación no encuentra efectos diferenciales en términos de informalidad laboral.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An investigation into the phylogenetic variation of plant tolerance and the root and shoot uptake of organic contaminants was undertaken. The aim was to determine if particular families or genera were tolerant of, or accumulated organic pollutants. Data were collected from sixty-nine studies. The variation between experiments was accounted for using a residual maximum likelihood analysis to approximate means for individual taxa. A nested ANOVA was subsequently used to determine differences at a number of differing phylogenetic levels. Significant differences were observed at a number of phylogenetic levels for the tolerance to TPH, the root concentration factor and the shoot concentration factor. There was no correlation between the uptake of organic pollutants and that of heavy metals. The data indicate that plant phylogeny is an important influence on both the plant tolerance and uptake of organic pollutants. If this study can be expanded, such information can be used when designing plantings for phytoremediation or risk reduction during the restoration of contaminated sites.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Models of the dynamics of nitrogen in soil (soil-N) can be used to aid the fertilizer management of a crop. The predictions of soil-N models can be validated by comparison with observed data. Validation generally involves calculating non-spatial statistics of the observations and predictions, such as their means, their mean squared-difference, and their correlation. However, when the model predictions are spatially distributed across a landscape the model requires validation with spatial statistics. There are three reasons for this: (i) the model may be more or less successful at reproducing the variance of the observations at different spatial scales; (ii) the correlation of the predictions with the observations may be different at different spatial scales; (iii) the spatial pattern of model error may be informative. In this study we used a model, parameterized with spatially variable input information about the soil, to predict the mineral-N content of soil in an arable field, and compared the results with observed data. We validated the performance of the N model spatially with a linear mixed model of the observations and model predictions, estimated by residual maximum likelihood. This novel approach allowed us to describe the joint variation of the observations and predictions as: (i) independent random variation that occurred at a fine spatial scale; (ii) correlated random variation that occurred at a coarse spatial scale; (iii) systematic variation associated with a spatial trend. The linear mixed model revealed that, in general, the performance of the N model changed depending on the spatial scale of interest. At the scales associated with random variation, the N model underestimated the variance of the observations, and the predictions were correlated poorly with the observations. At the scale of the trend, the predictions and observations shared a common surface. The spatial pattern of the error of the N model suggested that the observations were affected by the local soil condition, but this was not accounted for by the N model. In summary, the N model would be well-suited to field-scale management of soil nitrogen, but suited poorly to management at finer spatial scales. This information was not apparent with a non-spatial validation. (c),2007 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article assesses the extent to which sampling variation affects findings about Malmquist productivity change derived using data envelopment analysis (DEA), in the first stage by calculating productivity indices and in the second stage by investigating the farm-specific change in productivity. Confidence intervals for Malmquist indices are constructed using Simar and Wilson's (1999) bootstrapping procedure. The main contribution of this article is to account in the second stage for the information in the second stage provided by the first-stage bootstrap. The DEA SEs of the Malmquist indices given by bootstrapping are employed in an innovative heteroscedastic panel regression, using a maximum likelihood procedure. The application is to a sample of 250 Polish farms over the period 1996 to 2000. The confidence intervals' results suggest that the second half of 1990s for Polish farms was characterized not so much by productivity regress but rather by stagnation. As for the determinants of farm productivity change, we find that the integration of the DEA SEs in the second-stage regression is significant in explaining a proportion of the variance in the error term. Although our heteroscedastic regression results differ with those from the standard OLS, in terms of significance and sign, they are consistent with theory and previous research.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A number of authors have proposed clinical trial designs involving the comparison of several experimental treatments with a control treatment in two or more stages. At the end of the first stage, the most promising experimental treatment is selected, and all other experimental treatments are dropped from the trial. Provided it is good enough, the selected experimental treatment is then compared with the control treatment in one or more subsequent stages. The analysis of data from such a trial is problematic because of the treatment selection and the possibility of stopping at interim analyses. These aspects lead to bias in the maximum-likelihood estimate of the advantage of the selected experimental treatment over the control and to inaccurate coverage for the associated confidence interval. In this paper, we evaluate the bias of the maximum-likelihood estimate and propose a bias-adjusted estimate. We also propose an approach to the construction of a confidence region for the vector of advantages of the experimental treatments over the control based on an ordering of the sample space. These regions are shown to have accurate coverage, although they are also shown to be necessarily unbounded. Confidence intervals for the advantage of the selected treatment are obtained from the confidence regions and are shown to have more accurate coverage than the standard confidence interval based upon the maximum-likelihood estimate and its asymptotic standard error.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper concerns the design and analysis of serial dilution assays to estimate the infectivity of a sample of tissue when it is assumed that the sample contains a finite number of indivisible infectious units such that a subsample will be infectious if it contains one or more of these units. The aim of the study is to estimate the number of infectious units in the original sample. The standard approach to the analysis of data from such a study is based on the assumption of independence of aliquots both at the same dilution level and at different dilution levels, so that the numbers of infectious units in the aliquots follow independent Poisson distributions. An alternative approach is based on calculation of the expected value of the total number of samples tested that are not infectious. We derive the likelihood for the data on the basis of the discrete number of infectious units, enabling calculation of the maximum likelihood estimate and likelihood-based confidence intervals. We use the exact probabilities that are obtained to compare the maximum likelihood estimate with those given by the other methods in terms of bias and standard error and to compare the coverage of the confidence intervals. We show that the methods have very similar properties and conclude that for practical use the method that is based on the Poisson assumption is to be recommended, since it can be implemented by using standard statistical software. Finally we consider the design of serial dilution assays, concluding that it is important that neither the dilution factor nor the number of samples that remain untested should be too large.