922 resultados para Heckman-type selection models
Resumo:
Background: Heckman-type selection models have been used to control HIV prevalence estimates for selection bias when participation in HIV testing and HIV status are associated after controlling for observed variables. These models typically rely on the strong assumption that the error terms in the participation and the outcome equations that comprise the model are distributed as bivariate normal.
Methods: We introduce a novel approach for relaxing the bivariate normality assumption in selection models using copula functions. We apply this method to estimating HIV prevalence and new confidence intervals (CI) in the 2007 Zambia Demographic and Health Survey (DHS) by using interviewer identity as the selection variable that predicts participation (consent to test) but not the outcome (HIV status).
Results: We show in a simulation study that selection models can generate biased results when the bivariate normality assumption is violated. In the 2007 Zambia DHS, HIV prevalence estimates are similar irrespective of the structure of the association assumed between participation and outcome. For men, we estimate a population HIV prevalence of 21% (95% CI = 16%–25%) compared with 12% (11%–13%) among those who consented to be tested; for women, the corresponding figures are 19% (13%–24%) and 16% (15%–17%).
Conclusions: Copula approaches to Heckman-type selection models are a useful addition to the methodological toolkit of HIV epidemiology and of epidemiology in general. We develop the use of this approach to systematically evaluate the robustness of HIV prevalence estimates based on selection models, both empirically and in a simulation study.
Resumo:
Background: Selection bias in HIV prevalence estimates occurs if non-participation in testing is correlated with HIV status. Longitudinal data suggests that individuals who know or suspect they are HIV positive are less likely to participate in testing in HIV surveys, in which case methods to correct for missing data which are based on imputation and observed characteristics will produce biased results. Methods: The identity of the HIV survey interviewer is typically associated with HIV testing participation, but is unlikely to be correlated with HIV status. Interviewer identity can thus be used as a selection variable allowing estimation of Heckman-type selection models. These models produce asymptotically unbiased HIV prevalence estimates, even when non-participation is correlated with unobserved characteristics, such as knowledge of HIV status. We introduce a new random effects method to these selection models which overcomes non-convergence caused by collinearity, small sample bias, and incorrect inference in existing approaches. Our method is easy to implement in standard statistical software, and allows the construction of bootstrapped standard errors which adjust for the fact that the relationship between testing and HIV status is uncertain and needs to be estimated. Results: Using nationally representative data from the Demographic and Health Surveys, we illustrate our approach with new point estimates and confidence intervals (CI) for HIV prevalence among men in Ghana (2003) and Zambia (2007). In Ghana, we find little evidence of selection bias as our selection model gives an HIV prevalence estimate of 1.4% (95% CI 1.2% – 1.6%), compared to 1.6% among those with a valid HIV test. In Zambia, our selection model gives an HIV prevalence estimate of 16.3% (95% CI 11.0% - 18.4%), compared to 12.1% among those with a valid HIV test. Therefore, those who decline to test in Zambia are found to be more likely to be HIV positive. Conclusions: Our approach corrects for selection bias in HIV prevalence estimates, is possible to implement even when HIV prevalence or non-participation is very high or very low, and provides a practical solution to account for both sampling and parameter uncertainty in the estimation of confidence intervals. The wide confidence intervals estimated in an example with high HIV prevalence indicate that it is difficult to correct statistically for the bias that may occur when a large proportion of people refuse to test.
Adjusting HIV Prevalence Estimates for Non-participation: an Application to Demographic Surveillance
Resumo:
Introduction: HIV testing is a cornerstone of efforts to combat the HIV epidemic, and testing conducted as part of surveillance provides invaluable data on the spread of infection and the effectiveness of campaigns to reduce the transmission of HIV. However, participation in HIV testing can be low, and if respondents systematically select not to be tested because they know or suspect they are HIV positive (and fear disclosure), standard approaches to deal with missing data will fail to remove selection bias. We implemented Heckman-type selection models, which can be used to adjust for missing data that are not missing at random, and established the extent of selection bias in a population-based HIV survey in an HIV hyperendemic community in rural South Africa.
Methods: We used data from a population-based HIV survey carried out in 2009 in rural KwaZulu-Natal, South Africa. In this survey, 5565 women (35%) and 2567 men (27%) provided blood for an HIV test. We accounted for missing data using interviewer identity as a selection variable which predicted consent to HIV testing but was unlikely to be independently associated with HIV status. Our approach involved using this selection variable to examine the HIV status of residents who would ordinarily refuse to test, except that they were allocated a persuasive interviewer. Our copula model allows for flexibility when modelling the dependence structure between HIV survey participation and HIV status.
Results: For women, our selection model generated an HIV prevalence estimate of 33% (95% CI 27–40) for all people eligible to consent to HIV testing in the survey. This estimate is higher than the estimate of 24% generated when only information from respondents who participated in testing is used in the analysis, and the estimate of 27% when imputation analysis is used to predict missing data on HIV status. For men, we found an HIV prevalence of 25% (95% CI 15–35) using the selection model, compared to 16% among those who participated in testing, and 18% estimated with imputation. We provide new confidence intervals that correct for the fact that the relationship between testing and HIV status is unknown and requires estimation.
Conclusions: We confirm the feasibility and value of adopting selection models to account for missing data in population-based HIV surveys and surveillance systems. Elements of survey design, such as interviewer identity, present the opportunity to adopt this approach in routine applications. Where non-participation is high, true confidence intervals are much wider than those generated by standard approaches to dealing with missing data suggest.
Resumo:
Consider a general regression model with an arbitrary and unknown link function and a stochastic selection variable that determines whether the outcome variable is observable or missing. The paper proposes U-statistics that are based on kernel functions as estimators for the directions of the parameter vectors in the link function and the selection equation, and shows that these estimators are consistent and asymptotically normal.
Resumo:
In this paper we propose a simple method of characterizing countervailing incentives in adverse selection problems. The key element in our characterization consists of analyzing properties of the full information problem. This allows solving the principal problem without using optimal control theory. Our methodology can also be applied to different economic settings: health economics, monopoly regulation, labour contracts, limited liabilities and environmental regulation.
Resumo:
The maintenance of genetic variation in a spatially heterogeneous environment has been one of the main research themes in theoretical population genetics. Despite considerable progress in understanding the consequences of spatially structured environments on genetic variation, many problems remain unsolved. One of them concerns the relationship between the number of demes, the degree of dominance, and the maximum number of alleles that can be maintained by selection in a subdivided population. In this work, we study the potential of maintaining genetic variation in a two-deme model with deme-independent degree of intermediate dominance, which includes absence of G x E interaction as a special case. We present a thorough numerical analysis of a two-deme three-allele model, which allows us to identify dominance and selection patterns that harbor the potential for stable triallelic equilibria. The information gained by this approach is then used to construct an example in which existence and asymptotic stability of a fully polymorphic equilibrium can be proved analytically. Noteworthy, in this example the parameter range in which three alleles can coexist is maximized for intermediate migration rates. Our results can be interpreted in a specialist-generalist context and (among others) show when two specialists can coexist with a generalist in two demes if the degree of dominance is deme independent and intermediate. The dominance relation between the generalist allele and the specialist alleles play a decisive role. We also discuss linear selection on a quantitative trait and show that G x E interaction is not necessary for the maintenance of more than two alleles in two demes.
Resumo:
For climate risk management, cumulative distribution functions (CDFs) are an important source of information. They are ideally suited to compare probabilistic forecasts of primary (e.g. rainfall) or secondary data (e.g. crop yields). Summarised as CDFs, such forecasts allow an easy quantitative assessment of possible, alternative actions. Although the degree of uncertainty associated with CDF estimation could influence decisions, such information is rarely provided. Hence, we propose Cox-type regression models (CRMs) as a statistical framework for making inferences on CDFs in climate science. CRMs were designed for modelling probability distributions rather than just mean or median values. This makes the approach appealing for risk assessments where probabilities of extremes are often more informative than central tendency measures. CRMs are semi-parametric approaches originally designed for modelling risks arising from time-to-event data. Here we extend this original concept beyond time-dependent measures to other variables of interest. We also provide tools for estimating CDFs and surrounding uncertainty envelopes from empirical data. These statistical techniques intrinsically account for non-stationarities in time series that might be the result of climate change. This feature makes CRMs attractive candidates to investigate the feasibility of developing rigorous global circulation model (GCM)-CRM interfaces for provision of user-relevant forecasts. To demonstrate the applicability of CRMs, we present two examples for El Ni ? no/Southern Oscillation (ENSO)-based forecasts: the onset date of the wet season (Cairns, Australia) and total wet season rainfall (Quixeramobim, Brazil). This study emphasises the methodological aspects of CRMs rather than discussing merits or limitations of the ENSO-based predictors.
Resumo:
All systems found in nature exhibit, with different degrees, a nonlinear behavior. To emulate this behavior, classical systems identification techniques use, typically, linear models, for mathematical simplicity. Models inspired by biological principles (artificial neural networks) and linguistically motivated (fuzzy systems), due to their universal approximation property, are becoming alternatives to classical mathematical models. In systems identification, the design of this type of models is an iterative process, requiring, among other steps, the need to identify the model structure, as well as the estimation of the model parameters. This thesis addresses the applicability of gradient-basis algorithms for the parameter estimation phase, and the use of evolutionary algorithms for model structure selection, for the design of neuro-fuzzy systems, i.e., models that offer the transparency property found in fuzzy systems, but use, for their design, algorithms introduced in the context of neural networks. A new methodology, based on the minimization of the integral of the error, and exploiting the parameter separability property typically found in neuro-fuzzy systems, is proposed for parameter estimation. A recent evolutionary technique (bacterial algorithms), based on the natural phenomenon of microbial evolution, is combined with genetic programming, and the resulting algorithm, bacterial programming, advocated for structure determination. Different versions of this evolutionary technique are combined with gradient-based algorithms, solving problems found in fuzzy and neuro-fuzzy design, namely incorporation of a-priori knowledge, gradient algorithms initialization and model complexity reduction.
Resumo:
An example of the evolution of the interacting behaviours of parents and progeny is studied using iterative equations linking the frequencies of the gametes produced by the progeny to the frequencies of the gametes in the parental generation. This population genetics approach shows that a model in which both behaviours are determined by a single locus can lead to a stable equilibrium in which the two behaviours continue to segregate. A model in which the behaviours are determined by genes at two separate loci leads eventually to fixation of the alleles at both loci but this can take many generations of selection. Models of the type described in this paper will be needed to understand the evolution of complex behaviour when genomic or experimental information is available about the genetic determinants of behaviour and the selective values of different genomes. (c) 2007 Elsevier Inc. All rights reserved.
Resumo:
Background: Dengue is the most important mosquito-borne viral disease worldwide. Dengue virus comprises four antigenically related viruses named dengue virus type 1 to 4 (DENV1-4). DENV-3 was re-introduced into the Americas in 1994 causing outbreaks in Nicaragua and Panama. DENV-3 was introduced in Brazil in 2000 and then spread to most of the Brazilian States, reaching the neighboring country, Paraguay in 2002. In this study, we have analyzed the phylogenetic relationship of DENV-3 isolated in Brazil and Paraguay with viruses isolated worldwide. We have also analyzed the evolutionary divergence dynamics of DENV-3 viruses. Results: The entire open reading frame (ORF) of thirteen DENV-3 isolated in Brazil (n = 9) and Paraguay (n = 4) were sequenced for phylogenetic analysis. DENV-3 grouped into three main genotypes (I, II and III). Several internal clades were found within each genotype that we called lineage and sub-lineage. Viruses included in this study belong to genotype III and grouped together with viruses isolated in the Americas within the lineage III. The Brazilian viruses were further segregated into two different sub-lineage, A and B, and the Paraguayan into the sub-lineage B. All three genotypes showed internal grouping. The nucleotide divergence was in average 6.7% for genotypes, 2.7% for lineages and 1.5% for sub-lineages. Phylogenetic trees constructed with any of the protein gene sequences showed the same segregation of the DENV-3 in three genotypes. Conclusion: Our results showed that two groups of DENV-3 genotypes III circulated in Brazil during 2002-2009, suggesting different events of introduction of the virus through different regions of the country. In Paraguay, only one group DENV-3 genotype III is circulating that is very closely related to the Brazilian viruses of sub-lineage B. Different degree of grouping can be observed for DENV-3 and each group showed a characteristic evolutionary divergence. Finally, we have observed that any protein gene sequence can be used to identify the virus genotype.