Biblioteca Digital

15 resultados para selection model

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo

Solvable model for template coexistence in protocells

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Compartmentalization of self-replicating molecules (templates) in protocells is a necessary step towards the evolution of modern cells. However, coexistence between distinct template types inside a protocell can be achieved only if there is a selective pressure favoring protocells with a mixed template composition. Here we study analytically a group selection model for the coexistence between two template types using the diffusion approximation of population genetics. The model combines competition at the template and protocell levels as well as genetic drift inside protocells. At the steady state, we find a continuous phase transition separating the coexistence and segregation regimes, with the order parameter vanishing linearly with the distance to the critical point. In addition, we derive explicit analytical expressions for the critical steadystate probability density of protocell compositions.

On the impact of disproportional samples in credit scoring models: An application to a Brazilian bank data

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples. (C) 2012 Elsevier Ltd. All rights reserved.

Perfil da demanda por microcomputadores no Brasil: o que os microdados da POF-IBGE indicam?

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Neste estudo, analisou-se a relação entre a despesa domiciliar com a compra de computadores e as características demográficas e socioeconômicas dos domicílios brasileiros. Foram utilizados os microdados de duas Pesquisas de Orçamentos Familiares (POF), elaboradas pelo Instituto Brasileiro de Geografia e Estatística (IBGE): 2002-2003 e 2008-2009. Essas bases permitiram que se utilizasse a despesa total per capita como variável definidora do poder aquisitivo do domicílio. Foi adotada uma abordagem econométrica para a natureza desse tipo de análise, isto é, o modelo de seleção de Heckman, que envolve dois estágios. No primeiro, analisaram-se os fatores associados à probabilidade de ocorrência da despesa e, no segundo, foram avaliados os fatores associados aos valores da despesa efetuada. Os principais resultados indicaram que o perfil do chefe (gênero e idade) e a composição dos domicílios e escolaridade dos moradores são fatores relevantes tanto para a decisão de gastar quanto para a decisão sobre o valor a ser gasto. A redução da elasticidade que relaciona as despesas com computador ao poder aquisitivo do domicílio (em 2002-2003 foi 0,56763, enquanto em 2008-2009 caiu para 0,41546) pode ser explicada pela queda no preço dos computadores e pelo aumento do poder de compra das famílias.

Study of using marker assisted selection on a beef cattle breeding program by model comparison

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A data set of a commercial Nellore beef cattle selection program was used to compare breeding models that assumed or not markers effects to estimate the breeding values, when a reduced number of animals have phenotypic, genotypic and pedigree information available. This herd complete data set was composed of 83,404 animals measured for weaning weight (WW), post-weaning gain (PWG), scrotal circumference (SC) and muscle score (MS), corresponding to 116,652 animals in the relationship matrix. Single trait analyses were performed by MTDFREML software to estimate fixed and random effects solutions using this complete data. The additive effects estimated were assumed as the reference breeding values for those animals. The individual observed phenotype of each trait was adjusted for fixed and random effects solutions, except for direct additive effects. The adjusted phenotype composed of the additive and residual parts of observed phenotype was used as dependent variable for models' comparison. Among all measured animals of this herd, only 3160 animals were genotyped for 106 SNP markers. Three models were compared in terms of changes on animals' rank, global fit and predictive ability. Model 1 included only polygenic effects, model 2 included only markers effects and model 3 included both polygenic and markers effects. Bayesian inference via Markov chain Monte Carlo methods performed by TM software was used to analyze the data for model comparison. Two different priors were adopted for markers effects in models 2 and 3, the first prior assumed was a uniform distribution (U) and, as a second prior, was assumed that markers effects were distributed as normal (N). Higher rank correlation coefficients were observed for models 3_U and 3_N, indicating a greater similarity of these models animals' rank and the rank based on the reference breeding values. Model 3_N presented a better global fit, as demonstrated by its low DIC. The best models in terms of predictive ability were models 1 and 3_N. Differences due prior assumed to markers effects in models 2 and 3 could be attributed to the better ability of normal prior in handle with collinear effects. The models 2_U and 2_N presented the worst performance, indicating that this small set of markers should not be used to genetically evaluate animals with no data, since its predictive ability is restricted. In conclusion, model 3_N presented a slight superiority when a reduce number of animals have phenotypic, genotypic and pedigree information. It could be attributed to the variation retained by markers and polygenic effects assumed together and the normal prior assumed to markers effects, that deals better with the collinearity between markers. (C) 2012 Elsevier B.V. All rights reserved.

A mixed model QTL analysis for sugarcane multiple-harvest-location trial data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sugarcane-breeding programs take at least 12 years to develop new commercial cultivars. Molecular markers offer a possibility to study the genetic architecture of quantitative traits in sugarcane, and they may be used in marker-assisted selection to speed up artificial selection. Although the performance of sugarcane progenies in breeding programs are commonly evaluated across a range of locations and harvest years, many of the QTL detection methods ignore two- and three-way interactions between QTL, harvest, and location. In this work, a strategy for QTL detection in multi-harvest-location trial data, based on interval mapping and mixed models, is proposed and applied to map QTL effects on a segregating progeny from a biparental cross of pre-commercial Brazilian cultivars, evaluated at two locations and three consecutive harvest years for cane yield (tonnes per hectare), sugar yield (tonnes per hectare), fiber percent, and sucrose content. In the mixed model, we have included appropriate (co)variance structures for modeling heterogeneity and correlation of genetic effects and non-genetic residual effects. Forty-six QTLs were found: 13 QTLs for cane yield, 14 for sugar yield, 11 for fiber percent, and 8 for sucrose content. In addition, QTL by harvest, QTL by location, and QTL by harvest by location interaction effects were significant for all evaluated traits (30 QTLs showed some interaction, and 16 none). Our results contribute to a better understanding of the genetic architecture of complex traits related to biomass production and sucrose content in sugarcane.

A general threshold stress hybrid hazard model for lifetime data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose a hybrid hazard regression model with threshold stress which includes the proportional hazards and the accelerated failure time models as particular cases. To express the behavior of lifetimes the generalized-gamma distribution is assumed and an inverse power law model with a threshold stress is considered. For parameter estimation we develop a sampling-based posterior inference procedure based on Markov Chain Monte Carlo techniques. We assume proper but vague priors for the parameters of interest. A simulation study investigates the frequentist properties of the proposed estimators obtained under the assumption of vague priors. Further, some discussions on model selection criteria are given. The methodology is illustrated on simulated and real lifetime data set.

A Bayesian analysis of the Conway-Maxwell-Poisson cure rate model

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this paper is to develop a Bayesian analysis for the right-censored survival data when immune or cured individuals may be present in the population from which the data is taken. In our approach the number of competing causes of the event of interest follows the Conway-Maxwell-Poisson distribution which generalizes the Poisson distribution. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the proposed model. Also, some discussions on the model selection and an illustration with a real data set are considered.

CONTEXT TREE SELECTION AND LINGUISTIC RHYTHM RETRIEVAL FROM WRITTEN TEXTS

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.

Molecular mapping of the regenerative niche in a murine model of myocardial infarction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Adult stem cells are distributed through the whole organism, and present a great potential for the therapy of different types of disease. For the design of efficient therapeutic strategies, it is important to have a more detailed understanding of their basic biological characteristics, as well as of the signals produced by damaged tissues and to which they respond. Myocardial infarction (MI), a disease caused by a lack of blood flow supply in the heart, represents the most common cause of morbidity and mortality in the Western world. Stem cell therapy arises as a promising alternative to conventional treatments, which are often ineffective in preventing loss of cardiomyocytes and fibrosis. Cell therapy protocols must take into account the molecular events that occur in the regenerative niche of MI. In the present study, we investigated the expression profile of ten genes coding for chemokines or cytokines in a murine model of MI, aiming at the characterization of the regenerative niche. MI was induced in adult C57BL/6 mice and heart samples were collected after 24 h and 30 days, as well as from control animals, for quantitative RT-PCR. Expression of the chemokine genes CCL2, CCL3, CCL4, CCL7, CXCL2 and CXCL10 was significantly increased 24 h after infarction, returning to baseline levels on day 30. Expression of the CCL8 gene significantly increased only on day 30, whereas gene expression of CXCL12 and CX3CL1 were not significantly increased in either ischemic period. Finally, expression of the IL-6 gene increased 24 h after infarction and was maintained at a significantly higher level than control samples 30 days later. These results contribute to the better knowledge of the regenerative niche in MI, allowing a more efficient selection or genetic manipulation of cells in therapeutic protocols.

Three-feature model to reproduce the topology of citation networks and the effects from authors' visibility on their h-index

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Various factors are believed to govern the selection of references in citation networks, but a precise, quantitative determination of their importance has remained elusive. In this paper, we show that three factors can account for the referencing pattern of citation networks for two topics, namely "graphenes" and "complex networks", thus allowing one to reproduce the topological features of the networks built with papers being the nodes and the edges established by citations. The most relevant factor was content similarity, while the other two - in-degree (i.e. citation counts) and age of publication - had varying importance depending on the topic studied. This dependence indicates that additional factors could play a role. Indeed, by intuition one should expect the reputation (or visibility) of authors and/or institutions to affect the referencing pattern, and this is only indirectly considered via the in-degree that should correlate with such reputation. Because information on reputation is not readily available, we simulated its effect on artificial citation networks considering two communities with distinct fitness (visibility) parameters. One community was assumed to have twice the fitness value of the other, which amounts to a double probability for a paper being cited. While the h-index for authors in the community with larger fitness evolved with time with slightly higher values than for the control network (no fitness considered), a drastic effect was noted for the community with smaller fitness. (C) 2012 Elsevier Ltd. All rights reserved.

Effect of external fields in Axelrod's model of social dynamics

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study of the effects of spatially uniform fields on the steady-state properties of Axelrod's model has yielded plenty of counterintuitive results. Here, we reexamine the impact of this type of field for a selection of parameters such that the field-free steady state of the model is heterogeneous or multicultural. Analyses of both one- and two-dimensional versions of Axelrod's model indicate that the steady state remains heterogeneous regardless of the value of the field strength. Turning on the field leads to a discontinuous decrease on the number of cultural domains, which we argue is due to the instability of zero-field heterogeneous absorbing configurations. We find, however, that spatially nonuniform fields that implement a consensus rule among the neighborhood of the agents enforce homogenization. Although the overall effects of the fields are essentially the same irrespective of the dimensionality of the model, we argue that the dimensionality has a significant impact on the stability of the field-free homogeneous steady state.

Random regression analyses using B-spline functions to model growth of Nellore cattle

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this study was to estimate (co)variance components using random regression on B-spline functions to weight records obtained from birth to adulthood. A total of 82 064 weight records of 8145 females obtained from the data bank of the Nellore Breeding Program (PMGRN/Nellore Brazil) which started in 1987, were used. The models included direct additive and maternal genetic effects and animal and maternal permanent environmental effects as random. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of age (cubic regression) were considered as random covariate. The random effects were modeled using B-spline functions considering linear, quadratic and cubic polynomials for each individual segment. Residual variances were grouped in five age classes. Direct additive genetic and animal permanent environmental effects were modeled using up to seven knots (six segments). A single segment with two knots at the end points of the curve was used for the estimation of maternal genetic and maternal permanent environmental effects. A total of 15 models were studied, with the number of parameters ranging from 17 to 81. The models that used B-splines were compared with multi-trait analyses with nine weight traits and to a random regression model that used orthogonal Legendre polynomials. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most appropriate and parsimonious model to describe the covariance structure of the data. Selection for higher weight, such as at young ages, should be performed taking into account an increase in mature cow weight. Particularly, this is important in most of Nellore beef cattle production systems, where the cow herd is maintained on range conditions. There is limited modification of the growth curve of Nellore cattle with respect to the aim of selecting them for rapid growth at young ages while maintaining constant adult weight.

A Bayesian destructive weighted Poisson cure rate model and an application to a cutaneous melanoma data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.

A feature selection approach for identification of signature genes from SAGE data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. Results A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. Conclusion The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.

A mathematical model for optimizing the indications of liver transplantation in patients with hepatocellular carcinoma

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background The criteria for organ sharing has developed a system that prioritizes liver transplantation (LT) for patients with hepatocellular carcinoma (HCC) who have the highest risk of wait-list mortality. In some countries this model allows patients only within the Milan Criteria (MC, defined by the presence of a single nodule up to 5 cm, up to three nodules none larger than 3 cm, with no evidence of extrahepatic spread or macrovascular invasion) to be evaluated for liver transplantation. This police implies that some patients with HCC slightly more advanced than those allowed by the current strict selection criteria will be excluded, even though LT for these patients might be associated with acceptable long-term outcomes. Methods We propose a mathematical approach to study the consequences of relaxing the MC for patients with HCC that do not comply with the current rules for inclusion in the transplantation candidate list. We consider overall 5-years survival rates compatible with the ones reported in the literature. We calculate the best strategy that would minimize the total mortality of the affected population, that is, the total number of people in both groups of HCC patients that die after 5 years of the implementation of the strategy, either by post-transplantation death or by death due to the basic HCC. We illustrate the above analysis with a simulation of a theoretical population of 1,500 HCC patients with tumor size exponentially. The parameter λ obtained from the literature was equal to 0.3. As the total number of patients in these real samples was 327 patients, this implied in an average size of 3.3 cm and a 95% confidence interval of [2.9; 3.7]. The total number of available livers to be grafted was assumed to be 500. Results With 1500 patients in the waiting list and 500 grafts available we simulated the total number of deaths in both transplanted and non-transplanted HCC patients after 5 years as a function of the tumor size of transplanted patients. The total number of deaths drops down monotonically with tumor size, reaching a minimum at size equals to 7 cm, increasing from thereafter. With tumor size equals to 10 cm the total mortality is equal to the 5 cm threshold of the Milan criteria. Conclusion We concluded that it is possible to include patients with tumor size up to 10 cm without increasing the total mortality of this population.