19 resultados para Bayesian model selection

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this paper is to develop a Bayesian analysis for the right-censored survival data when immune or cured individuals may be present in the population from which the data is taken. In our approach the number of competing causes of the event of interest follows the Conway-Maxwell-Poisson distribution which generalizes the Poisson distribution. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the proposed model. Also, some discussions on the model selection and an illustration with a real data set are considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose a hybrid hazard regression model with threshold stress which includes the proportional hazards and the accelerated failure time models as particular cases. To express the behavior of lifetimes the generalized-gamma distribution is assumed and an inverse power law model with a threshold stress is considered. For parameter estimation we develop a sampling-based posterior inference procedure based on Markov Chain Monte Carlo techniques. We assume proper but vague priors for the parameters of interest. A simulation study investigates the frequentist properties of the proposed estimators obtained under the assumption of vague priors. Further, some discussions on model selection criteria are given. The methodology is illustrated on simulated and real lifetime data set.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Introduction: The purpose of this ecological study was to evaluate the urban spatial and temporal distribution of tuberculosis (TB) in Ribeirao Preto, State of Sao Paulo, southeast Brazil, between 2006 and 2009 and to evaluate its relationship with factors of social vulnerability such as income and education level. Methods: We evaluated data from TBWeb, an electronic notification system for TB cases. Measures of social vulnerability were obtained from the SEADE Foundation, and information about the number of inhabitants, education and income of the households were obtained from Brazilian Institute of Geography and Statistics. Statistical analyses were conducted by a Bayesian regression model assuming a Poisson distribution for the observed new cases of TB in each area. A conditional autoregressive structure was used for the spatial covariance structure. Results: The Bayesian model confirmed the spatial heterogeneity of TB distribution in Ribeirao Preto, identifying areas with elevated risk and the effects of social vulnerability on the disease. We demonstrated that the rate of TB was correlated with the measures of income, education and social vulnerability. However, we observed areas with low vulnerability and high education and income, but with high estimated TB rates. Conclusions: The study identified areas with different risks for TB, given that the public health system deals with the characteristics of each region individually and prioritizes those that present a higher propensity to risk of TB. Complex relationships may exist between TB incidence and a wide range of environmental and intrinsic factors, which need to be studied in future research.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract Background The search for enriched (aka over-represented or enhanced) ontology terms in a list of genes obtained from microarray experiments is becoming a standard procedure for a system-level analysis. This procedure tries to summarize the information focussing on classification designs such as Gene Ontology, KEGG pathways, and so on, instead of focussing on individual genes. Although it is well known in statistics that association and significance are distinct concepts, only the former approach has been used to deal with the ontology term enrichment problem. Results BayGO implements a Bayesian approach to search for enriched terms from microarray data. The R source-code is freely available at http://blasto.iq.usp.br/~tkoide/BayGO in three versions: Linux, which can be easily incorporated into pre-existent pipelines; Windows, to be controlled interactively; and as a web-tool. The software was validated using a bacterial heat shock response dataset, since this stress triggers known system-level responses. Conclusion The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on. Moreover, BayGO allows one to measure the statistical association between generic ontology terms and differential expression, instead of working only with the common significance analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

INTRODUCTION: The purpose of this ecological study was to evaluate the urban spatial and temporal distribution of tuberculosis (TB) in Ribeirão Preto, State of São Paulo, southeast Brazil, between 2006 and 2009 and to evaluate its relationship with factors of social vulnerability such as income and education level. METHODS: We evaluated data from TBWeb, an electronic notification system for TB cases. Measures of social vulnerability were obtained from the SEADE Foundation, and information about the number of inhabitants, education and income of the households were obtained from Brazilian Institute of Geography and Statistics. Statistical analyses were conducted by a Bayesian regression model assuming a Poisson distribution for the observed new cases of TB in each area. A conditional autoregressive structure was used for the spatial covariance structure. RESULTS: The Bayesian model confirmed the spatial heterogeneity of TB distribution in Ribeirão Preto, identifying areas with elevated risk and the effects of social vulnerability on the disease. We demonstrated that the rate of TB was correlated with the measures of income, education and social vulnerability. However, we observed areas with low vulnerability and high education and income, but with high estimated TB rates. CONCLUSIONS: The study identified areas with different risks for TB, given that the public health system deals with the characteristics of each region individually and prioritizes those that present a higher propensity to risk of TB. Complex relationships may exist between TB incidence and a wide range of environmental and intrinsic factors, which need to be studied in future research.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The allometric growth of two groups of Nassarius vibex on beds of the bivalve Mytella charruana on the northern coast of the State of Sao Paulo, was evaluated between September 2006 and February 2007 in the bed on Camaroeiro Beach, and from March 2007 to June 2007 at Cidade Beach. The shells from Camaroeiro were longer and wider and had a smaller shell aperture than those from Cidade; a principal components analysis also confirmed different morphometric patterns between the areas. The allometric growth of the two groups showed great variation in the development of individuals. The increase of shell width and height in relation to shell length did not differ between the two areas. Shell aperture showed a contrasting growth pattern, with individuals from Camaroeiro having smaller apertures. The methodology based on Kullback-Leibler information theory and the multi-model inference showed, for N. vibex, that the classic linear allometric growth was not the most suitable explanation for the observed morphometric relationships. The patterns of relative growth observed in the two groups of N. vibex may be a consequence of different growth and variation rates, which modifies the development of the individuals. Other factors such as food resource availability and environmental parameters, which might also differ between the two areas, should also be considered.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The brain's structural and functional systems, protein-protein interaction, and gene networks are examples of biological systems that share some features of complex networks, such as highly connected nodes, modularity, and small-world topology. Recent studies indicate that some pathologies present topological network alterations relative to norms seen in the general population. Therefore, methods to discriminate the processes that generate the different classes of networks (e. g., normal and disease) might be crucial for the diagnosis, prognosis, and treatment of the disease. It is known that several topological properties of a network (graph) can be described by the distribution of the spectrum of its adjacency matrix. Moreover, large networks generated by the same random process have the same spectrum distribution, allowing us to use it as a "fingerprint". Based on this relationship, we introduce and propose the entropy of a graph spectrum to measure the "uncertainty" of a random graph and the Kullback-Leibler and Jensen-Shannon divergences between graph spectra to compare networks. We also introduce general methods for model selection and network model parameter estimation, as well as a statistical procedure to test the nullity of divergence between two classes of complex networks. Finally, we demonstrate the usefulness of the proposed methods by applying them to (1) protein-protein interaction networks of different species and (2) on networks derived from children diagnosed with Attention Deficit Hyperactivity Disorder (ADHD) and typically developing children. We conclude that scale-free networks best describe all the protein-protein interactions. Also, we show that our proposed measures succeeded in the identification of topological changes in the network while other commonly used measures (number of edges, clustering coefficient, average path length) failed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Brood desertion is a life history strategy that allows parents to minimize costs related to parental care and increase their future fecundity. The harvestman Neosadocus maximus is an interesting model organism to study costs and benefits of temporary brood desertion because females abandon their clutches periodically and keep adding eggs to their clutches for some weeks. In this study, we tested if temporary brood desertion (a) imposes a cost to caring females by increasing the risk of egg predation and (b) offers a benefit to caring females by increasing fecundity as a result of increased foraging opportunities. With intensive field observations followed by a model selection approach, we showed that the proportion of consumed eggs was very low during the day and it was not influenced by the frequency of brood desertion. The proportion of consumed eggs was higher at night and it was negatively related to the frequency of brood desertion. However, frequent brood desertion did not result in higher fecundity, measured both as the number of eggs added to the current clutch and the probability of laying a second clutch over the course of the reproductive season. Considering that harvestmen are sensitive to dehydration, brood desertion during the day may attenuate the physiological stress of remaining exposed on the vegetation. Moreover, since brood desertion is higher during the day, when egg predation pressure is lower, caring females could be adjusting their maternal effort to the temporal variation in predation risk, which is regarded as the main cost of brood desertion in ectotherms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We examined the effects of soil mesofauna and the litter decomposition environment (above and belowground) on leaf decomposition rates in three forest types in southeastern Brazil. To estimate decomposition experimentally, we used litterbags with a standard substrate in a full-factorial experimental design. We used model selection to compare three decomposition models and also to infer the importance of forest type, decomposition environment, mesofauna, and their interactions on the decomposition process. Rather than the frequently used simple and double-exponential models, the best model to describe our dataset was the exponential deceleration model, which assumed a single organic compartment with an exponential decrease of the decomposition rate. Decomposition was higher in the wet than in the seasonal forest, and the differences between forest types were stronger aboveground. Regarding litter decomposition environment, decomposition was predominantly higher below than aboveground, but the magnitude of this effect was higher in the seasonal than in wet forests. Mesofauna exclusion treatments had slower decomposition, except aboveground into the Semi-deciduous Forest, where the mesofauna presence did not affect decomposition. Furthermore, the effect of mesofauna was stronger in the wet forests and belowground. Overall, our results suggest that, in a regional scale, both decomposers activity and the positive effect of soil mesofauna in decomposition are constrained by abiotic factors, such as moisture conditions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes a general class of regression models for continuous proportions when the data contain zeros or ones. The proposed class of models assumes that the response variable has a mixed continuous-discrete distribution with probability mass at zero or one. The beta distribution is used to describe the continuous component of the model, since its density has a wide range of different shapes depending on the values of the two parameters that index the distribution. We use a suitable parameterization of the beta law in terms of its mean and a precision parameter. The parameters of the mixture distribution are modeled as functions of regression parameters. We provide inference, diagnostic, and model selection tools for this class of models. A practical application that employs real data is presented. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Questions Does the spatial association between isolated adult trees and understorey plants change along a gradient of sand dunes? Does this association depend on the life form of the understorey plant? Location Coastal sand dunes, southeast Brazil. Methods We recorded the occurrence of understorey plant species in 100 paired 0.25 m2 plots under adult trees and in adjacent treeless sites along an environmental gradient from beach to inland. Occurrence probabilities were modelled as a function of the fixed variables of the presence of a neighbour, distance from the seashore and life form, and a random variable, the block (i.e. the pair of plots). Generalized linear mixed models (GLMM) were fitted in a backward step-wise procedure using Akaike's information criterion (AIC) for model selection. Results The occurrence of understorey plants was affected by the presence of an adult tree neighbour, but the effect varied with the life form of the understorey species. Positive spatial association was found between isolated adult neighbour and young trees, whereas a negative association was found for shrubs. Moreover, a neutral association was found for lianas, whereas for herbs the effect of the presence of an adult neighbour ranged from neutral to negative, depended on the subgroup considered. The strength of the negative association with forbs increased with distance from the seashore. However, for the other life forms, the associational pattern with adult trees did not change along the gradient. Conclusions For most of the understorey life forms there is no evidence that the spatial association between isolated adult trees and understorey plants changes with the distance from the seashore, as predicted by the stress gradient hypothesis, a common hypothesis in the literature about facilitation in plant communities. Furthermore, the positive spatial association between isolated adult trees and young trees identified along the entire gradient studied indicates a positive feedback that explains the transition from open vegetation to forest in subtropical coastal dune environments.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

A data set of a commercial Nellore beef cattle selection program was used to compare breeding models that assumed or not markers effects to estimate the breeding values, when a reduced number of animals have phenotypic, genotypic and pedigree information available. This herd complete data set was composed of 83,404 animals measured for weaning weight (WW), post-weaning gain (PWG), scrotal circumference (SC) and muscle score (MS), corresponding to 116,652 animals in the relationship matrix. Single trait analyses were performed by MTDFREML software to estimate fixed and random effects solutions using this complete data. The additive effects estimated were assumed as the reference breeding values for those animals. The individual observed phenotype of each trait was adjusted for fixed and random effects solutions, except for direct additive effects. The adjusted phenotype composed of the additive and residual parts of observed phenotype was used as dependent variable for models' comparison. Among all measured animals of this herd, only 3160 animals were genotyped for 106 SNP markers. Three models were compared in terms of changes on animals' rank, global fit and predictive ability. Model 1 included only polygenic effects, model 2 included only markers effects and model 3 included both polygenic and markers effects. Bayesian inference via Markov chain Monte Carlo methods performed by TM software was used to analyze the data for model comparison. Two different priors were adopted for markers effects in models 2 and 3, the first prior assumed was a uniform distribution (U) and, as a second prior, was assumed that markers effects were distributed as normal (N). Higher rank correlation coefficients were observed for models 3_U and 3_N, indicating a greater similarity of these models animals' rank and the rank based on the reference breeding values. Model 3_N presented a better global fit, as demonstrated by its low DIC. The best models in terms of predictive ability were models 1 and 3_N. Differences due prior assumed to markers effects in models 2 and 3 could be attributed to the better ability of normal prior in handle with collinear effects. The models 2_U and 2_N presented the worst performance, indicating that this small set of markers should not be used to genetically evaluate animals with no data, since its predictive ability is restricted. In conclusion, model 3_N presented a slight superiority when a reduce number of animals have phenotypic, genotypic and pedigree information. It could be attributed to the variation retained by markers and polygenic effects assumed together and the normal prior assumed to markers effects, that deals better with the collinearity between markers. (C) 2012 Elsevier B.V. All rights reserved.