27 resultados para Multivariate data analysis


Relevância:

90.00% 90.00%

Publicador:

Resumo:

This article introduces generalized beta-generated (GBG) distributions. Sub-models include all classical beta-generated, Kumaraswamy-generated and exponentiated distributions. They are maximum entropy distributions under three intuitive conditions, which show that the classical beta generator skewness parameters only control tail entropy and an additional shape parameter is needed to add entropy to the centre of the parent distribution. This parameter controls skewness without necessarily differentiating tail weights. The GBG class also has tractable properties: we present various expansions for moments, generating function and quantiles. The model parameters are estimated by maximum likelihood and the usefulness of the new class is illustrated by means of some real data sets. (c) 2011 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The beta-Birnbaum-Saunders (Cordeiro and Lemonte, 2011) and Birnbaum-Saunders (Birnbaum and Saunders, 1969a) distributions have been used quite effectively to model failure times for materials subject to fatigue and lifetime data. We define the log-beta-Birnbaum-Saunders distribution by the logarithm of the beta-Birnbaum-Saunders distribution. Explicit expressions for its generating function and moments are derived. We propose a new log-beta-Birnbaum-Saunders regression model that can be applied to censored data and be used more effectively in survival analysis. We obtain the maximum likelihood estimates of the model parameters for censored data and investigate influence diagnostics. The new location-scale regression model is modified for the possibility that long-term survivors may be presented in the data. Its usefulness is illustrated by means of two real data sets. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective: To build a life table and determine the factors related to the time of treatment of undernourished children at a nutrition rehabilitation centre (CREN), Sao Paulo, Brazil. Design: Nutritional status was assessed from weight-for-age, height-for-age and BMI-for-age Z-scores, while neuropsychomotor development was classified according to the milestones of childhood development. Life tables, Kaplan-Meier survival curves and Cox multiple regression models were employed in data analysis. Setting: CREN (Centre of Nutritional Recovery and Education), Sao Paulo, Brazil. Subjects: Undernourished children (n 228) from the southern slums of Sao Paulo who had received treatment at CREN under a day-hospital regime between the years 1994 and 2009. Results: The Kaplan-Meier curves of survival analysis showed statistically significant differences in the periods of treatment at CREN between children presenting different degrees of neuropsychomotor development (log-rank = 6.621; P = 0.037). Estimates based on the multivariate Cox model revealed that children aged >= 24 months at the time of admission exhibited a lower probability of nutritional rehabilitation (hazard ratio (HR) = 0.49; P = 0.046) at the end of the period compared with infants aged up 12 months. Children presenting slow development were better rehabilitated in comparison with those exhibiting adequate evolution (HR = 4.48; P = 0.023). No significant effects of sex, degree of undernutrition or birth weight on the probability of nutritional rehabilitation were found. Conclusions: Age and neuropsychomotor developmental status at the time of admission to CREN are critical factors in determining the duration of treatment.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we propose a random intercept Poisson model in which the random effect is assumed to follow a generalized log-gamma (GLG) distribution. This random effect accommodates (or captures) the overdispersion in the counts and induces within-cluster correlation. We derive the first two moments for the marginal distribution as well as the intraclass correlation. Even though numerical integration methods are, in general, required for deriving the marginal models, we obtain the multivariate negative binomial model from a particular parameter setting of the hierarchical model. An iterative process is derived for obtaining the maximum likelihood estimates for the parameters in the multivariate negative binomial model. Residual analysis is proposed and two applications with real data are given for illustration. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A common interest in gene expression data analysis is to identify from a large pool of candidate genes the genes that present significant changes in expression levels between a treatment and a control biological condition. Usually, it is done using a statistic value and a cutoff value that are used to separate the genes differentially and nondifferentially expressed. In this paper, we propose a Bayesian approach to identify genes differentially expressed calculating sequentially credibility intervals from predictive densities which are constructed using the sampled mean treatment effect from all genes in study excluding the treatment effect of genes previously identified with statistical evidence for difference. We compare our Bayesian approach with the standard ones based on the use of the t-test and modified t-tests via a simulation study, using small sample sizes which are common in gene expression data analysis. Results obtained report evidence that the proposed approach performs better than standard ones, especially for cases with mean differences and increases in treatment variance in relation to control variance. We also apply the methodologies to a well-known publicly available data set on Escherichia coli bacterium.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A new method for analysis of scattering data from lamellar bilayer systems is presented. The method employs a form-free description of the cross-section structure of the bilayer and the fit is performed directly to the scattering data, introducing also a structure factor when required. The cross-section structure (electron density profile in the case of X-ray scattering) is described by a set of Gaussian functions and the technique is termed Gaussian deconvolution. The coefficients of the Gaussians are optimized using a constrained least-squares routine that induces smoothness of the electron density profile. The optimization is coupled with the point-of-inflection method for determining the optimal weight of the smoothness. With the new approach, it is possible to optimize simultaneously the form factor, structure factor and several other parameters in the model. The applicability of this method is demonstrated by using it in a study of a multilamellar system composed of lecithin bilayers, where the form factor and structure factor are obtained simultaneously, and the obtained results provided new insight into this very well known system.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract Background Dizziness is a common complaint among older adults and has been linked to a wide range of health conditions, psychological and social characteristics in this population. However a profile of dizziness is still uncertain which hampers clinical decision-making. We therefore sought to explore the relationship between dizziness and a comprehensive range of demographic data, diseases, health and geriatric conditions, and geriatric syndromes in a representative sample of community-dwelling older people. Methods This is a cross-sectional, population-based study derived from FIBRA (Network for the Study of Frailty in Brazilian Elderly Adults), with 391 elderly adults, both men and women, aged 65 years and older. Elderly participants living at home in an urban area were enrolled through a process of random cluster sampling of census regions. The outcome variable was the self-report of dizziness in the last year. Several feelings of dizziness were investigated including vertigo, spinning, light or heavy headedness, floating, fuzziness, giddiness and instability. A multivariate logistic regression analysis was conducted to estimate the adjusted odds ratios and build the probability model for dizziness. Results The complaint of dizziness was reported by 45% of elderly adults, from which 71.6% were women (p=0.004). The multivariate regression analysis revealed that dizziness is associated with depressive symptoms (OR = 2.08; 95% CI 1.29–3.35), perceived fatigue (OR = 1.93; 95% CI 1.21-3.10), recurring falls (OR = 2.01; 95% CI 1.11-3.62) and excessive drowsiness (OR = 1.91; 95% CI 1.11–3.29). The discrimination of the final model was AUC = 0.673 (95% CI 0.619-0.727) (p< 0.001). Conclusions The prevalence of dizziness in community-dwelling elderly adults is substantial. It is associated with other common geriatric conditions usually neglected in elderly adults, such as fatigue and drowsiness, supporting its possible multifactorial manifestation. Our findings demonstrate the need to expand the design in future studies, aiming to estimate risk and identify possible causal relations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Aortic aneurysm and dissection are important causes of death in older people. Ruptured aneurysms show catastrophic fatality rates reaching near 80%. Few population-based mortality studies have been published in the world and none in Brazil. The objective of the present study was to use multiple-cause-of-death methodology in the analysis of mortality trends related to aortic aneurysm and dissection in the state of Sao Paulo, between 1985 and 2009. Methods: We analyzed mortality data from the Sao Paulo State Data Analysis System, selecting all death certificates on which aortic aneurysm and dissection were listed as a cause-of-death. The variables sex, age, season of the year, and underlying, associated or total mentions of causes of death were studied using standardized mortality rates, proportions and historical trends. Statistical analyses were performed by chi-square goodness-of-fit and H Kruskal-Wallis tests, and variance analysis. The joinpoint regression model was used to evaluate changes in age-standardized rates trends. A p value less than 0.05 was regarded as significant. Results: Over a 25-year period, there were 42,615 deaths related to aortic aneurysm and dissection, of which 36,088 (84.7%) were identified as underlying cause and 6,527 (15.3%) as an associated cause-of-death. Dissection and ruptured aneurysms were considered as an underlying cause of death in 93% of the deaths. For the entire period, a significant increased trend of age-standardized death rates was observed in men and women, while certain non-significant decreases occurred from 1996/2004 until 2009. Abdominal aortic aneurysms and aortic dissections prevailed among men and aortic dissections and aortic aneurysms of unspecified site among women. In 1985 and 2009 death rates ratios of men to women were respectively 2.86 and 2.19, corresponding to a difference decrease between rates of 23.4%. For aortic dissection, ruptured and non-ruptured aneurysms, the overall mean ages at death were, respectively, 63.2, 68.4 and 71.6 years; while, as the underlying cause, the main associated causes of death were as follows: hemorrhages (in 43.8%/40.5%/13.9%); hypertensive diseases (in 49.2%/22.43%/24.5%) and atherosclerosis (in 14.8%/25.5%/15.3%); and, as associated causes, their principal overall underlying causes of death were diseases of the circulatory (55.7%), and respiratory (13.8%) systems and neoplasms (7.8%). A significant seasonal variation, with highest frequency in winter, occurred in deaths identified as underlying cause for aortic dissection, ruptured and non-ruptured aneurysms. Conclusions: This study introduces the methodology of multiple-causes-of-death to enhance epidemiologic knowledge of aortic aneurysm and dissection in São Paulo, Brazil. The results presented confer light to the importance of mortality statistics and the need for epidemiologic studies to understand unique trends in our own population.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.