921 resultados para Heckman selection model
Resumo:
We propose a general framework for the analysis of animal telemetry data through the use of weighted distributions. It is shown that several interpretations of resource selection functions arise when constructed from the ratio of a use and availability distribution. Through the proposed general framework, several popular resource selection models are shown to be special cases of the general model by making assumptions about animal movement and behavior. The weighted distribution framework is shown to be easily extended to readily account for telemetry data that are highly auto-correlated; as is typical with use of new technology such as global positioning systems animal relocations. An analysis of simulated data using several models constructed within the proposed framework is also presented to illustrate the possible gains from the flexible modeling framework. The proposed model is applied to a brown bear data set from southeast Alaska.
Resumo:
We consider a fully model-based approach for the analysis of distance sampling data. Distance sampling has been widely used to estimate abundance (or density) of animals or plants in a spatially explicit study area. There is, however, no readily available method of making statistical inference on the relationships between abundance and environmental covariates. Spatial Poisson process likelihoods can be used to simultaneously estimate detection and intensity parameters by modeling distance sampling data as a thinned spatial point process. A model-based spatial approach to distance sampling data has three main benefits: it allows complex and opportunistic transect designs to be employed, it allows estimation of abundance in small subregions, and it provides a framework to assess the effects of habitat or experimental manipulation on density. We demonstrate the model-based methodology with a small simulation study and analysis of the Dubbo weed data set. In addition, a simple ad hoc method for handling overdispersion is also proposed. The simulation study showed that the model-based approach compared favorably to conventional distance sampling methods for abundance estimation. In addition, the overdispersion correction performed adequately when the number of transects was high. Analysis of the Dubbo data set indicated a transect effect on abundance via Akaike’s information criterion model selection. Further goodness-of-fit analysis, however, indicated some potential confounding of intensity with the detection function.
Resumo:
Sugarcane-breeding programs take at least 12 years to develop new commercial cultivars. Molecular markers offer a possibility to study the genetic architecture of quantitative traits in sugarcane, and they may be used in marker-assisted selection to speed up artificial selection. Although the performance of sugarcane progenies in breeding programs are commonly evaluated across a range of locations and harvest years, many of the QTL detection methods ignore two- and three-way interactions between QTL, harvest, and location. In this work, a strategy for QTL detection in multi-harvest-location trial data, based on interval mapping and mixed models, is proposed and applied to map QTL effects on a segregating progeny from a biparental cross of pre-commercial Brazilian cultivars, evaluated at two locations and three consecutive harvest years for cane yield (tonnes per hectare), sugar yield (tonnes per hectare), fiber percent, and sucrose content. In the mixed model, we have included appropriate (co)variance structures for modeling heterogeneity and correlation of genetic effects and non-genetic residual effects. Forty-six QTLs were found: 13 QTLs for cane yield, 14 for sugar yield, 11 for fiber percent, and 8 for sucrose content. In addition, QTL by harvest, QTL by location, and QTL by harvest by location interaction effects were significant for all evaluated traits (30 QTLs showed some interaction, and 16 none). Our results contribute to a better understanding of the genetic architecture of complex traits related to biomass production and sucrose content in sugarcane.
Resumo:
In this paper we propose a hybrid hazard regression model with threshold stress which includes the proportional hazards and the accelerated failure time models as particular cases. To express the behavior of lifetimes the generalized-gamma distribution is assumed and an inverse power law model with a threshold stress is considered. For parameter estimation we develop a sampling-based posterior inference procedure based on Markov Chain Monte Carlo techniques. We assume proper but vague priors for the parameters of interest. A simulation study investigates the frequentist properties of the proposed estimators obtained under the assumption of vague priors. Further, some discussions on model selection criteria are given. The methodology is illustrated on simulated and real lifetime data set.
Resumo:
The purpose of this paper is to develop a Bayesian analysis for the right-censored survival data when immune or cured individuals may be present in the population from which the data is taken. In our approach the number of competing causes of the event of interest follows the Conway-Maxwell-Poisson distribution which generalizes the Poisson distribution. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the proposed model. Also, some discussions on the model selection and an illustration with a real data set are considered.
Resumo:
The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.
Resumo:
Adult stem cells are distributed through the whole organism, and present a great potential for the therapy of different types of disease. For the design of efficient therapeutic strategies, it is important to have a more detailed understanding of their basic biological characteristics, as well as of the signals produced by damaged tissues and to which they respond. Myocardial infarction (MI), a disease caused by a lack of blood flow supply in the heart, represents the most common cause of morbidity and mortality in the Western world. Stem cell therapy arises as a promising alternative to conventional treatments, which are often ineffective in preventing loss of cardiomyocytes and fibrosis. Cell therapy protocols must take into account the molecular events that occur in the regenerative niche of MI. In the present study, we investigated the expression profile of ten genes coding for chemokines or cytokines in a murine model of MI, aiming at the characterization of the regenerative niche. MI was induced in adult C57BL/6 mice and heart samples were collected after 24 h and 30 days, as well as from control animals, for quantitative RT-PCR. Expression of the chemokine genes CCL2, CCL3, CCL4, CCL7, CXCL2 and CXCL10 was significantly increased 24 h after infarction, returning to baseline levels on day 30. Expression of the CCL8 gene significantly increased only on day 30, whereas gene expression of CXCL12 and CX3CL1 were not significantly increased in either ischemic period. Finally, expression of the IL-6 gene increased 24 h after infarction and was maintained at a significantly higher level than control samples 30 days later. These results contribute to the better knowledge of the regenerative niche in MI, allowing a more efficient selection or genetic manipulation of cells in therapeutic protocols.
Resumo:
Various factors are believed to govern the selection of references in citation networks, but a precise, quantitative determination of their importance has remained elusive. In this paper, we show that three factors can account for the referencing pattern of citation networks for two topics, namely "graphenes" and "complex networks", thus allowing one to reproduce the topological features of the networks built with papers being the nodes and the edges established by citations. The most relevant factor was content similarity, while the other two - in-degree (i.e. citation counts) and age of publication - had varying importance depending on the topic studied. This dependence indicates that additional factors could play a role. Indeed, by intuition one should expect the reputation (or visibility) of authors and/or institutions to affect the referencing pattern, and this is only indirectly considered via the in-degree that should correlate with such reputation. Because information on reputation is not readily available, we simulated its effect on artificial citation networks considering two communities with distinct fitness (visibility) parameters. One community was assumed to have twice the fitness value of the other, which amounts to a double probability for a paper being cited. While the h-index for authors in the community with larger fitness evolved with time with slightly higher values than for the control network (no fitness considered), a drastic effect was noted for the community with smaller fitness. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The study of the effects of spatially uniform fields on the steady-state properties of Axelrod's model has yielded plenty of counterintuitive results. Here, we reexamine the impact of this type of field for a selection of parameters such that the field-free steady state of the model is heterogeneous or multicultural. Analyses of both one- and two-dimensional versions of Axelrod's model indicate that the steady state remains heterogeneous regardless of the value of the field strength. Turning on the field leads to a discontinuous decrease on the number of cultural domains, which we argue is due to the instability of zero-field heterogeneous absorbing configurations. We find, however, that spatially nonuniform fields that implement a consensus rule among the neighborhood of the agents enforce homogenization. Although the overall effects of the fields are essentially the same irrespective of the dimensionality of the model, we argue that the dimensionality has a significant impact on the stability of the field-free homogeneous steady state.
Resumo:
The objective of this study was to estimate (co)variance components using random regression on B-spline functions to weight records obtained from birth to adulthood. A total of 82 064 weight records of 8145 females obtained from the data bank of the Nellore Breeding Program (PMGRN/Nellore Brazil) which started in 1987, were used. The models included direct additive and maternal genetic effects and animal and maternal permanent environmental effects as random. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of age (cubic regression) were considered as random covariate. The random effects were modeled using B-spline functions considering linear, quadratic and cubic polynomials for each individual segment. Residual variances were grouped in five age classes. Direct additive genetic and animal permanent environmental effects were modeled using up to seven knots (six segments). A single segment with two knots at the end points of the curve was used for the estimation of maternal genetic and maternal permanent environmental effects. A total of 15 models were studied, with the number of parameters ranging from 17 to 81. The models that used B-splines were compared with multi-trait analyses with nine weight traits and to a random regression model that used orthogonal Legendre polynomials. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most appropriate and parsimonious model to describe the covariance structure of the data. Selection for higher weight, such as at young ages, should be performed taking into account an increase in mature cow weight. Particularly, this is important in most of Nellore beef cattle production systems, where the cow herd is maintained on range conditions. There is limited modification of the growth curve of Nellore cattle with respect to the aim of selecting them for rapid growth at young ages while maintaining constant adult weight.
Resumo:
In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.
Resumo:
Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. Results A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. Conclusion The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.
Resumo:
Abstract Background The criteria for organ sharing has developed a system that prioritizes liver transplantation (LT) for patients with hepatocellular carcinoma (HCC) who have the highest risk of wait-list mortality. In some countries this model allows patients only within the Milan Criteria (MC, defined by the presence of a single nodule up to 5 cm, up to three nodules none larger than 3 cm, with no evidence of extrahepatic spread or macrovascular invasion) to be evaluated for liver transplantation. This police implies that some patients with HCC slightly more advanced than those allowed by the current strict selection criteria will be excluded, even though LT for these patients might be associated with acceptable long-term outcomes. Methods We propose a mathematical approach to study the consequences of relaxing the MC for patients with HCC that do not comply with the current rules for inclusion in the transplantation candidate list. We consider overall 5-years survival rates compatible with the ones reported in the literature. We calculate the best strategy that would minimize the total mortality of the affected population, that is, the total number of people in both groups of HCC patients that die after 5 years of the implementation of the strategy, either by post-transplantation death or by death due to the basic HCC. We illustrate the above analysis with a simulation of a theoretical population of 1,500 HCC patients with tumor size exponentially. The parameter λ obtained from the literature was equal to 0.3. As the total number of patients in these real samples was 327 patients, this implied in an average size of 3.3 cm and a 95% confidence interval of [2.9; 3.7]. The total number of available livers to be grafted was assumed to be 500. Results With 1500 patients in the waiting list and 500 grafts available we simulated the total number of deaths in both transplanted and non-transplanted HCC patients after 5 years as a function of the tumor size of transplanted patients. The total number of deaths drops down monotonically with tumor size, reaching a minimum at size equals to 7 cm, increasing from thereafter. With tumor size equals to 10 cm the total mortality is equal to the 5 cm threshold of the Milan criteria. Conclusion We concluded that it is possible to include patients with tumor size up to 10 cm without increasing the total mortality of this population.
Resumo:
Máster en Oceanografía
Resumo:
The aim of my dissertation is to study the gender wage gap with a specific focus on developing and transition countries. In the first chapter I present the main existing theories proposed to analyse the gender wage gap and I review the empirical literature on the gender wage gap in developing and transition countries and its main findings. Then, I discuss the overall empirical issues related to the estimation of the gender wage gap and the issues specific to developing and transition countries. The second chapter is an empirical analysis of the gender wage gap in a developing countries, the Union of Comoros, using data from the multidimensional household budget survey “Enquete integrale auprès des ménages” (EIM) run in 2004. The interest of my work is to provide a benchmark analysis for further studies on the situation of women in the Comorian labour market and to contribute to the literature on gender wage gap in Africa by making available more information on the dynamics and mechanism of the gender wage gap, given the limited interest on the topic in this area of the world. The third chapter is an applied analysis of the gender wage gap in a transition country, Poland, using data from the Labour Force Survey (LSF) collected for the years 1994 and 2004. I provide a detailed examination of how gender earning differentials have changed over the period starting from 1994 to a more advanced transition phase in 2004, when market elements have become much more important in the functioning of the Polish economy than in the earlier phase. The main contribution of my dissertation is the application of the econometrical methodology that I describe in the beginning of the second chapter. First, I run a preliminary OLS and quantile regression analysis to estimate and describe the raw and conditional wage gaps along the distribution. Second, I estimate quantile regressions separately for males and females, in order to allow for different rewards to characteristics. Third, I proceed to decompose the raw wage gap estimated at the mean through the Oaxaca-Blinder (1973) procedure. In the second chapter I run a two-steps Heckman procedure by estimating a model of participation in the labour market which shows a significant selection bias for females. Forth, I apply the Machado-Mata (2005) techniques to extend the decomposition analysis at all points of the distribution. In Poland I can also implement the Juhn, Murphy and Pierce (1991) decomposition over the period 1994-2004, to account for effects to the pay gap due to changes in overall wage dispersion beyond Oaxaca’s standard decomposition.