998 resultados para Normal multivariate
Resumo:
We study the problem of testing the error distribution in a multivariate linear regression (MLR) model. The tests are functions of appropriately standardized multivariate least squares residuals whose distribution is invariant to the unknown cross-equation error covariance matrix. Empirical multivariate skewness and kurtosis criteria are then compared to simulation-based estimate of their expected value under the hypothesized distribution. Special cases considered include testing multivariate normal, Student t; normal mixtures and stable error models. In the Gaussian case, finite-sample versions of the standard multivariate skewness and kurtosis tests are derived. To do this, we exploit simple, double and multi-stage Monte Carlo test methods. For non-Gaussian distribution families involving nuisance parameters, confidence sets are derived for the the nuisance parameters and the error distribution. The procedures considered are evaluated in a small simulation experi-ment. Finally, the tests are applied to an asset pricing model with observable risk-free rates, using monthly returns on New York Stock Exchange (NYSE) portfolios over five-year subperiods from 1926-1995.
Resumo:
The literature related to skew–normal distributions has grown rapidly in recent years but at the moment few applications concern the description of natural phenomena with this type of probability models, as well as the interpretation of their parameters. The skew–normal distributions family represents an extension of the normal family to which a parameter (λ) has been added to regulate the skewness. The development of this theoretical field has followed the general tendency in Statistics towards more flexible methods to represent features of the data, as adequately as possible, and to reduce unrealistic assumptions as the normality that underlies most methods of univariate and multivariate analysis. In this paper an investigation on the shape of the frequency distribution of the logratio ln(Cl−/Na+) whose components are related to waters composition for 26 wells, has been performed. Samples have been collected around the active center of Vulcano island (Aeolian archipelago, southern Italy) from 1977 up to now at time intervals of about six months. Data of the logratio have been tentatively modeled by evaluating the performance of the skew–normal model for each well. Values of the λ parameter have been compared by considering temperature and spatial position of the sampling points. Preliminary results indicate that changes in λ values can be related to the nature of environmental processes affecting the data
Resumo:
The multivariate skew-t distribution (J Multivar Anal 79:93-113, 2001; J R Stat Soc, Ser B 65:367-389, 2003; Statistics 37:359-363, 2003) includes the Student t, skew-Cauchy and Cauchy distributions as special cases and the normal and skew-normal ones as limiting cases. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis of repeated measures, pretest/post-test data, under multivariate null intercept measurement error model (J Biopharm Stat 13(4):763-771, 2003) where the random errors and the unobserved value of the covariate (latent variable) follows a Student t and skew-t distribution, respectively. The results and methods are numerically illustrated with an example in the field of dentistry.
Resumo:
Skew-normal distribution is a class of distributions that includes the normal distributions as a special case. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis in a multivariate, null intercept, measurement error model [R. Aoki, H. Bolfarine, J.A. Achcar, and D. Leao Pinto Jr, Bayesian analysis of a multivariate null intercept error-in -variables regression model, J. Biopharm. Stat. 13(4) (2003b), pp. 763-771] where the unobserved value of the covariate (latent variable) follows a skew-normal distribution. The results and methods are applied to a real dental clinical trial presented in [A. Hadgu and G. Koch, Application of generalized estimating equations to a dental randomized clinical trial, J. Biopharm. Stat. 9 (1999), pp. 161-178].
Resumo:
Canalizing genes possess such broad regulatory power, and their action sweeps across a such a wide swath of processes that the full set of affected genes are not highly correlated under normal conditions. When not active, the controlling gene will not be predictable to any significant degree by its subject genes, either alone or in groups, since their behavior will be highly varied relative to the inactive controlling gene. When the controlling gene is active, its behavior is not well predicted by any one of its targets, but can be very well predicted by groups of genes under its control. To investigate this question, we introduce in this paper the concept of intrinsically multivariate predictive (IMP) genes, and present a mathematical study of IMP in the context of binary genes with respect to the coefficient of determination (CoD), which measures the predictive power of a set of genes with respect to a target gene. A set of predictor genes is said to be IMP for a target gene if all properly contained subsets of the predictor set are bad predictors of the target but the full predictor set predicts the target with great accuracy. We show that logic of prediction, predictive power, covariance between predictors, and the entropy of the joint probability distribution of the predictors jointly affect the appearance of IMP genes. In particular, we show that high-predictive power, small covariance among predictors, a large entropy of the joint probability distribution of predictors, and certain logics, such as XOR in the 2-predictor case, are factors that favor the appearance of IMP. The IMP concept is applied to characterize the behavior of the gene DUSP1, which exhibits control over a central, process-integrating signaling pathway, thereby providing preliminary evidence that IMP can be used as a criterion for discovery of canalizing genes.
Resumo:
In this article, we consider local influence analysis for the skew-normal linear mixed model (SN-LMM). As the observed data log-likelihood associated with the SN-LMM is intractable, Cook`s well-known approach cannot be applied to obtain measures of local influence. Instead, we develop local influence measures following the approach of Zhu and Lee (2001). This approach is based on the use of an EM-type algorithm and is measurement invariant under reparametrizations. Four specific perturbation schemes are discussed. Results obtained for a simulated data set and a real data set are reported, illustrating the usefulness of the proposed methodology.
Resumo:
Background: Despite the recommendations to continue the regime of healthy food and physical activity (PA) postpartum for women with previous gestational diabetes mellitus (GDM), the scientific evidence reveals that these recommendations may not be complied to. This study compared lifestyle and health status in women whose pregnancy was complicated by GDM with women who had a normal pregnancy and delivery. Methods: The inclusion criteria were women with GDM (ICD-10: O24.4 A and O24.4B) and women with uncomplicated pregnancy and delivery in 2005 (ICD-10: O80.0). A random sample of women fulfilling the criteria (n = 882) were identified from the Swedish Medical Birth Register. A questionnaire was sent by mail to eligible women approximately four years after the pregnancy. A total of 444 women (50.8%) agreed to participate, 111 diagnosed with GDM in their pregnancy and 333 with normal pregnancy/ delivery. Results: Women with previous GDM were significantly older, reported higher body weight and less PA before the index pregnancy. No major differences between the groups were noticed regarding lifestyle at the follow-up. Overall, few participants fulfilled the national recommendations of PA and diet. At the follow-up, 19 participants had developed diabetes, all with previous GDM. Women with previous GDM reported significantly poorer self-rated health (SRH), higher level of sick-leave and more often using medication on regular basis. However, a history of GDM or having overt diabetes mellitus showed no association with poorer SRH in the multivariate analysis. Irregular eating habits, no regular PA, overweight/obesity, and regular use of medication were associated with poorer SRH in all participants. Conclusions: Suboptimal levels of PA, and fruit and vegetable consumption were found in a sample of women with a history of GDM as well as for women with normal pregnancy approximately four years after index pregnancy. Women with previous GDM seem to increase their PA after childbirth, but still they perform their PA at lower intensity than women with a history of normal pregnancy. Having GDM at index pregnancy or being diagnosed with overt diabetes mellitus at follow-up did not demonstrate associations with poorer SRH four years after delivery.
Resumo:
Preeclampsia is defined as an extremely serious complication of the pregnancy-puerperium cycle with delayed emergence of cardiovascular risk factors, including metabolic syndrome. The research aimed estimate the prevalences of metabolic syndrome and associated factors in women with preeclampsia and normal pregnancy followed five years after childbirth. This is a cross-sectional observational study using a quantitative approach, conducted at a maternity school in the city of Natal in Rio Grande do Norte state. The sample was composed of 70 women with previous preeclampsia and 75 normal selected by simple random probability sampling. Subjects were analyzed for sociodemographic, obstetric, clinical, anthropometric and biochemical parameters. International Diabetes Federation criteria were adopted to diagnose metabol ic syndrome. The Kolmogorov-Smirnov, Mann-Whitney, Student s t, Pearson s chi-squared, and Fisher s exact tests, in addition to simple logistic regression, were used for data analysis, at a 5% significance level (p ≤ 0.05). Statistical tests demonstrated elevated body mass index (p = 0.001), predominance of family history of diabetes mellitus (p = 0.022) and significantly higher prevalence of metabolic syndrome in the preeclampsia group (37.1%) when compared to normal (22.7%) (p = 0.042). Intergroup comparison showed a high number of metabolic syndrome components in women with previous preeclampsia. Altered systolic and diastolic blood pressure (p < 0.001) was the most prevalent, followed by low concentrations of high-density lipoproteins (p = 0.049), and hyperglycemia (p=0.030). There was a predominance of the metabolic syndrome in women with schooling 0-9 years (42.4%) (p = 0.005), body mass index above 30Kg.m 2 (52.3%) (p < 0.001), uric acid high (62.5%) (p = 0.050 and family history of hypertension (38.5%) (p< 0.001). Multivariate analysis of the data showed that the body mass index above 30 kg.m2, education level less than 10 years of study (p < 0.001) and family history of hypertension (p = 0.002) remained associated with the metabolic syndrome after multivariate analysis of the data. It is considered Women with previous preeclampsia exhibited high prevalence of metabolic syndrome and their individual components in relation to normal, especially, altered systolic and diastolic blood pressure, low concentrations of high-density lipoproteins and hyperglycemia. The factors associated to this ou tcome were obesity, less than 10 years of schooling, and family history of hypertension. Overall, this study identified young women with a history of PE exposed to a higher cardiovascular risk than normal
Resumo:
Linear mixed effects models have been widely used in analysis of data where responses are clustered around some random effects, so it is not reasonable to assume independence between observations in the same cluster. In most biological applications, it is assumed that the distributions of the random effects and of the residuals are Gaussian. This makes inferences vulnerable to the presence of outliers. Here, linear mixed effects models with normal/independent residual distributions for robust inferences are described. Specific distributions examined include univariate and multivariate versions of the Student-t, the slash and the contaminated normal. A Bayesian framework is adopted and Markov chain Monte Carlo is used to carry out the posterior analysis. The procedures are illustrated using birth weight data on rats in a texicological experiment. Results from the Gaussian and robust models are contrasted, and it is shown how the implementation can be used for outlier detection. The thick-tailed distributions provide an appealing robust alternative to the Gaussian process in linear mixed models, and they are easily implemented using data augmentation and MCMC techniques.
Resumo:
Data visualization techniques are powerful in the handling and analysis of multivariate systems. One such technique known as parallel coordinates was used to support the diagnosis of an event, detected by a neural network-based monitoring system, in a boiler at a Brazilian Kraft pulp mill. Its attractiveness is the possibility of the visualization of several variables simultaneously. The diagnostic procedure was carried out step-by-step going through exploratory, explanatory, confirmatory, and communicative goals. This tool allowed the visualization of the boiler dynamics in an easier way, compared to commonly used univariate trend plots. In addition it facilitated analysis of other aspects, namely relationships among process variables, distinct modes of operation and discrepant data. The whole analysis revealed firstly that the period involving the detected event was associated with a transition between two distinct normal modes of operation, and secondly the presence of unusual changes in process variables at this time.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.
Resumo:
BACKGROUND: Due to their molecular weight, it is possible that the adipokines adiponectin, resistin and leptin accumulate when glomerular filtration rate (GFR) is decreased. In reduced renal clearance, altered serum concentrations of these proteins might affect cardiovascular risk. The objective of the study was to investigate the relationship between adipokine concentrations and GFR. METHODS: The association between GFR, as determined by the abbreviated MDRD equation, and the concentrations of the adipokines adiponectin, resistin and leptin was assessed in a cohort of coronary patients (n=538; 363 male, 165 female). After calculation of correlations between GFR and adipokine concentrations, the association was further assessed by analysis of covariance following adjustment for age, gender, BMI, presence of type 2 diabetes, presence of hypertension, history of smoking as well as for serum lipid concentrations. RESULTS: Mean GFR in our study population was 68.74+/-15.27 ml/min/1.73 m(2). 74.3% of the patients had a GFR >60 ml/min/1.73 m(2), 24% of the patients had a GFR between 30 and 60 ml/min/1.73 m(2), and 1.7% of the patients had a GFR <30 ml/min/1.73 m(2). There were significant inverse correlations between adiponectin (r=-0.372; p<0.001), resistin (r=-0.227; p<0.001) and leptin (r=-0.151; p=0.009) concentrations and GFR. After multivariate adjustment, the associations remained significant for adiponectin and resistin. Subgroup analysis in patients with GFR >60 ml/min/1.73 m(2) showed a significant correlation between GFR and adiponectin as well as leptin concentrations. However, after adjustment, these associations no longer were significant. CONCLUSIONS: There is an independent association between GFR and the serum concentrations of adiponectin and resistin. However, this association is not present at GFR >60 ml/min/1.73 m(2). This finding suggests that adipokine concentrations in mildly impaired and normal renal function are influenced by factors other than GFR.
Resumo:
There is an emerging interest in modeling spatially correlated survival data in biomedical and epidemiological studies. In this paper, we propose a new class of semiparametric normal transformation models for right censored spatially correlated survival data. This class of models assumes that survival outcomes marginally follow a Cox proportional hazard model with unspecified baseline hazard, and their joint distribution is obtained by transforming survival outcomes to normal random variables, whose joint distribution is assumed to be multivariate normal with a spatial correlation structure. A key feature of the class of semiparametric normal transformation models is that it provides a rich class of spatial survival models where regression coefficients have population average interpretation and the spatial dependence of survival times is conveniently modeled using the transformed variables by flexible normal random fields. We study the relationship of the spatial correlation structure of the transformed normal variables and the dependence measures of the original survival times. Direct nonparametric maximum likelihood estimation in such models is practically prohibited due to the high dimensional intractable integration of the likelihood function and the infinite dimensional nuisance baseline hazard parameter. We hence develop a class of spatial semiparametric estimating equations, which conveniently estimate the population-level regression coefficients and the dependence parameters simultaneously. We study the asymptotic properties of the proposed estimators, and show that they are consistent and asymptotically normal. The proposed method is illustrated with an analysis of data from the East Boston Ashma Study and its performance is evaluated using simulations.
Resumo:
In the diagnosis of diabetic autonomic neuropathy (DAN) various autonomic tests are used. We took a novel statistical approach to find a combination of autonomic tests that best separates normal controls from patients with DAN.