21 resultados para Multivariate statistical methods
Resumo:
Despite considerable research conducted on 'Tahiti' lime [Citrus latifolia (Yu Tanaka) Tanaka] in several countries, few long-term studies have focused on rootstock effects on fruit production and quality under non-irrigated conditions. As for many other fruit crops, rootstock studies for 'Tahiti' lime are often based on the evaluation of several horticultural responses simultaneously, instead of considering multivariate statistical approaches which may provide with more comprehensive information. Consequently, a trial was installed to evaluate the horticultural performance of non-irrigated 'Tahiti' lime trees budded onto the following 12 rootstocks: 'HRS 801' and 'HRS 827' hybrids; 'Rubidoux', 'FCAV' and 'Flying Dragon' trifoliates; 'Sun Chu Sha Kat' and 'Sunki' mandarins; 'Cravo Limeira' and 'Cravo FCAV' 'Rangpur' limes; 'Carrizo' citrange, 'Swingle' citrumelo, and 'Orlando' tangelo. The trial was installed in 2001, in an 8 m x 5 m spacing with no supplementary irrigation. Measurements of yield, fruit quality oriented to different consuming markets, canopy volume and tree tolerance to drought, were performed. A multivariate cluster analysis identified both 'Rangpur' lime rootstocks as those inducing larger cumulative yield and higher percentage of fruits for the domestic market, with highest drought tolerance to the trees. Despite of their high susceptibility to drought stress under non-irrigated conditions, the 'Flying Dragon' and 'FCAV' trifoliate rootstocks performed outstandingly for 'Tahiti' lime, inducing higher yield efficiency, early bearing and larger percentage of high-quality fruits for foreign markets, with smaller trees more suitable for high-density plantings. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
The polychaetes assemblage structure was used in order to investigate taxonomic sufficiency in a heavily polluted tropical bay. Species abundance was aggregated into progressively higher taxa matrices (genus, family, order) and was analyzed using univariate and multivariate techniques. Polychaetes distribution in Guanabara Bay (GB) was in accordance with a pollution gradient, probably ruled by the organic enrichment, consequent effects of hypoxia and altered redox conditions coupled with prevailing patterns of circulation. Within the sectors of GB, an increasing gradient in species richness and occurrence was observed, ranging from the azoic and impoverished stations in the inner sector to a well-structured community in terms of species composition and abundance inhabiting the outer sector. Multivariate statistical analysis showed similar results when species were aggregated into genera and families, while greater difference occurred at coarser taxonomic identification (order). The literature about taxonomic sufficiency has demonstrated that faunal patterns at different taxonomic levels tend to become similar with increased pollution. In GB, an analysis carried out solely at family level is perfectly adequate to describe the environmental gradient, considered a useful tool for a quick environmental assessment. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.
Resumo:
Abstract Background Several mathematical and statistical methods have been proposed in the last few years to analyze microarray data. Most of those methods involve complicated formulas, and software implementations that require advanced computer programming skills. Researchers from other areas may experience difficulties when they attempting to use those methods in their research. Here we present an user-friendly toolbox which allows large-scale gene expression analysis to be carried out by biomedical researchers with limited programming skills. Results Here, we introduce an user-friendly toolbox called GEDI (Gene Expression Data Interpreter), an extensible, open-source, and freely-available tool that we believe will be useful to a wide range of laboratories, and to researchers with no background in Mathematics and Computer Science, allowing them to analyze their own data by applying both classical and advanced approaches developed and recently published by Fujita et al. Conclusion GEDI is an integrated user-friendly viewer that combines the state of the art SVR, DVAR and SVAR algorithms, previously developed by us. It facilitates the application of SVR, DVAR and SVAR, further than the mathematical formulas present in the corresponding publications, and allows one to better understand the results by means of available visualizations. Both running the statistical methods and visualizing the results are carried out within the graphical user interface, rendering these algorithms accessible to the broad community of researchers in Molecular Biology.
Resumo:
Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.