29 resultados para Multivariate Statistics


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The classification rules of linear discriminant analysis are defined by the true mean vectors and the common covariance matrix of the populations from which the data come. Because these true parameters are generally unknown, they are commonly estimated by the sample mean vector and covariance matrix of the data in a training sample randomly drawn from each population. However, these sample statistics are notoriously susceptible to contamination by outliers, a problem compounded by the fact that the outliers may be invisible to conventional diagnostics. High-breakdown estimation is a procedure designed to remove this cause for concern by producing estimates that are immune to serious distortion by a minority of outliers, regardless of their severity. In this article we motivate and develop a high-breakdown criterion for linear discriminant analysis and give an algorithm for its implementation. The procedure is intended to supplement rather than replace the usual sample-moment methodology of discriminant analysis either by providing indications that the dataset is not seriously affected by outliers (supporting the usual analysis) or by identifying apparently aberrant points and giving resistant estimators that are not affected by them.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper use consider the problem of providing standard errors of the component means in normal mixture models fitted to univariate or multivariate data by maximum likelihood via the EM algorithm. Two methods of estimation of the standard errors are considered: the standard information-based method and the computationally-intensive bootstrap method. They are compared empirically by their application to three real data sets and by a small-scale Monte Carlo experiment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The small sample performance of Granger causality tests under different model dimensions, degree of cointegration, direction of causality, and system stability are presented. Two tests based on maximum likelihood estimation of error-correction models (LR and WALD) are compared to a Wald test based on multivariate least squares estimation of a modified VAR (MWALD). In large samples all test statistics perform well in terms of size and power. For smaller samples, the LR and WALD tests perform better than the MWALD test. Overall, the LR test outperforms the other two in terms of size and power in small samples.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A mixture model incorporating long-term survivors has been adopted in the field of biostatistics where some individuals may never experience the failure event under study. The surviving fractions may be considered as cured. In most applications, the survival times are assumed to be independent. However, when the survival data are obtained from a multi-centre clinical trial, it is conceived that the environ mental conditions and facilities shared within clinic affects the proportion cured as well as the failure risk for the uncured individuals. It necessitates a long-term survivor mixture model with random effects. In this paper, the long-term survivor mixture model is extended for the analysis of multivariate failure time data using the generalized linear mixed model (GLMM) approach. The proposed model is applied to analyse a numerical data set from a multi-centre clinical trial of carcinoma as an illustration. Some simulation experiments are performed to assess the applicability of the model based on the average biases of the estimates formed. Copyright (C) 2001 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The earnings gap between men and women has remained comparatively stable at an aggregate level over the 1990s in Australia. From one perspective, this is a reminder of the considerable difficulty of addressing wage differentials once the most overt forms of wage discrimination have been removed, and of the limited impact of most policy initiatives. From another, it may be seen as evidence that dire predictions about the effects of decentralisation on the earnings gap have failed to materialise. In this paper, I use Australian Bureau of Statistics data to show that a number of different trends are evident underneath the relatively static picture shown by the aggregate statistics, particularly as wage dispersion has increased. The data suggest not only that the prospects for pay equity are far from benign, but also that in the current labour market the issue of gender pay inequality cannot be effectively addressed separately from wage inequality more generally.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article develops a weighted least squares version of Levene's test of homogeneity of variance for a general design, available both for univariate and multivariate situations. When the design is balanced, the univariate and two common multivariate test statistics turn out to be proportional to the corresponding ordinary least squares test statistics obtained from an analysis of variance of the absolute values of the standardized mean-based residuals from the original analysis of the data. The constant of proportionality is simply a design-dependent multiplier (which does not necessarily tend to unity). Explicit results are presented for randomized block and Latin square designs and are illustrated for factorial treatment designs and split-plot experiments. The distribution of the univariate test statistic is close to a standard F-distribution, although it can be slightly underdispersed. For a complex design, the test assesses homogeneity of variance across blocks, treatments, or treatment factors and offers an objective interpretation of residual plot.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.