920 resultados para Regression imputation


Relevância:

20.00% 20.00%

Publicador:

Resumo:

robreg provides a number of robust estimators for linear regression models. Among them are the high breakdown-point and high efficiency MM-estimator, the Huber and bisquare M-estimator, and the S-estimator, each supporting classic or robust standard errors. Furthermore, basic versions of the LMS/LQS (least median of squares) and LTS (least trimmed squares) estimators are provided. Note that the moremata package, also available from SSC, is required.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND A cost-effective strategy to increase the density of available markers within a population is to sequence a small proportion of the population and impute whole-genome sequence data for the remaining population. Increased densities of typed markers are advantageous for genome-wide association studies (GWAS) and genomic predictions. METHODS We obtained genotypes for 54 602 SNPs (single nucleotide polymorphisms) in 1077 Franches-Montagnes (FM) horses and Illumina paired-end whole-genome sequencing data for 30 FM horses and 14 Warmblood horses. After variant calling, the sequence-derived SNP genotypes (~13 million SNPs) were used for genotype imputation with the software programs Beagle, Impute2 and FImpute. RESULTS The mean imputation accuracy of FM horses using Impute2 was 92.0%. Imputation accuracy using Beagle and FImpute was 74.3% and 77.2%, respectively. In addition, for Impute2 we determined the imputation accuracy of all individual horses in the validation population, which ranged from 85.7% to 99.8%. The subsequent inclusion of Warmblood sequence data further increased the correlation between true and imputed genotypes for most horses, especially for horses with a high level of admixture. The final imputation accuracy of the horses ranged from 91.2% to 99.5%. CONCLUSIONS Using Impute2, the imputation accuracy was higher than 91% for all horses in the validation population, which indicates that direct imputation of 50k SNP-chip data to sequence level genotypes is feasible in the FM population. The individual imputation accuracy depended mainly on the applied software and the level of admixture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of nonparametric estimation of a concave regression function F. We show that the supremum distance between the least square s estimatorand F on a compact interval is typically of order(log(n)/n)2/5. This entails rates of convergence for the estimator’s derivative. Moreover, we discuss the impact of additional constraints on F such as monotonicity and pointwise bounds. Then we apply these results to the analysis of current status data, where the distribution function of the event times is assumed to be concave.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Let Y_i = f(x_i) + E_i\ (1\le i\le n) with given covariates x_1\lt x_2\lt \cdots\lt x_n , an unknown regression function f and independent random errors E_i with median zero. It is shown how to apply several linear rank test statistics simultaneously in order to test monotonicity of f in various regions and to identify its local extrema.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

When considering data from many trials, it is likely that some of them present a markedly different intervention effect or exert an undue influence on the summary results. We develop a forward search algorithm for identifying outlying and influential studies in meta-analysis models. The forward search algorithm starts by fitting the hypothesized model to a small subset of likely outlier-free studies and proceeds by adding studies into the set one-by-one that are determined to be closest to the fitted model of the existing set. As each study is added to the set, plots of estimated parameters and measures of fit are monitored to identify outliers by sharp changes in the forward plots. We apply the proposed outlier detection method to two real data sets; a meta-analysis of 26 studies that examines the effect of writing-to-learn interventions on academic achievement adjusting for three possible effect modifiers, and a meta-analysis of 70 studies that compares a fluoride toothpaste treatment to placebo for preventing dental caries in children. A simple simulated example is used to illustrate the steps of the proposed methodology, and a small-scale simulation study is conducted to evaluate the performance of the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Coronary atherosclerosis has been considered a chronic disease characterized by ongoing progression in response to systemic risk factors and local pro-atherogenic stimuli. As our understanding of the pathobiological mechanisms implicated in atherogenesis and plaque progression is evolving, effective treatment strategies have been developed that led to substantial reduction of the clinical manifestations and acute complications of coronary atherosclerotic disease. More recently, intracoronary imaging modalities have enabled detailed in vivo quantification and characterization of coronary atherosclerotic plaque, serial evaluation of atherosclerotic changes over time, and assessment of vascular responses to effective anti-atherosclerotic medications. The use of intracoronary imaging modalities has demonstrated that intensive lipid lowering can halt plaque progression and may even result in regression of coronary atheroma when the highest doses of the most potent statins are used. While current evidence indicates the feasibility of atheroma regression and of reversal of presumed high-risk plaque characteristics in response to intensive anti-atherosclerotic therapies, these changes of plaque size and composition are modest and their clinical implications remain largely elusive. Growing interest has focused on achieving more pronounced regression of coronary plaque using novel anti-atherosclerotic medications, and more importantly on elucidating ways toward clinical translation of favorable changes of plaque anatomy into more favorable clinical outcomes for our patients.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND Studies that systematically assess change in ulcerative colitis (UC) extent over time in adult patients are scarce. AIM To assess changes in disease extent over time and to evaluate clinical parameters associated with this change. METHODS Data from the Swiss IBD cohort study were analysed. We used logistic regression modelling to identify factors associated with a change in disease extent. RESULTS A total of 918 UC patients (45.3% females) were included. At diagnosis, UC patients presented with the following disease extent: proctitis [199 patients (21.7%)], left-sided colitis [338 patients (36.8%)] and extensive colitis/pancolitis [381 (41.5%)]. During a median disease duration of 9 [4-16] years, progression and regression was documented in 145 patients (15.8%) and 149 patients (16.2%) respectively. In addition, 624 patients (68.0%) had a stable disease extent. The following factors were identified to be associated with disease progression: treatment with systemic glucocorticoids [odds ratio (OR) 1.704, P = 0.025] and calcineurin inhibitors (OR: 2.716, P = 0.005). No specific factors were found to be associated with disease regression. CONCLUSIONS Over a median disease duration of 9 [4-16] years, about two-thirds of UC patients maintained the initial disease extent; the remaining one-third had experienced either progression or regression of the disease extent.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The adult male golden hamster, when exposed to blinding (BL), short photoperiod (SP), or daily melatonin injections (MEL) demonstrates dramatic reproductive collapse. This collapse can be blocked by removal of the pineal gland prior to treatment. Reproductive collapse is characterized by a dramatic decrease in both testicular weight and serum gonadotropin titers. The present study was designed to examine the interactions of the hypothalamus and pituitary gland during testicular regression, and to specifically compare and contrast changes caused by the three commonly employed methods of inducing testicular regression (BL,SP,MEL). Hypothalamic LHRH content was altered by all three treatments. There was an initial increase in content of LHRH that occurred concomitantly with the decreased serum gonadotropin titers, followed by a precipitous decline in LHRH content which reflected the rapid increases in both serum LH and FSH which occur during spontaneous testicular recrudescence. In vitro pituitary responsiveness was altered by all three treatments: there was a decline in basal and maximally stimulatable release of both LH and FSH which paralleled the fall of serum gonadotropins. During recrudescence both basal and maximal release dramatically increased in a manner comparable to serum hormone levels. While all three treatments were equally effective in their ability to induce changes at all levels of the endocrine system, there were important temporal differences in the effects of the various treatments. Melatonin injections induced the most rapid changes in endocrine parameters, followed by exposure to short photoperiod. Blinding required the most time to induce the same changes. This study has demonstrated that pineal-mediated testicular regression is a process which involves dynamic changes in multiply-dependent endocrine relationships, and proper evaluation of these changes must be performed with specific temporal events in mind. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Consider a nonparametric regression model Y=mu*(X) + e, where the explanatory variables X are endogenous and e satisfies the conditional moment restriction E[e|W]=0 w.p.1 for instrumental variables W. It is well known that in these models the structural parameter mu* is 'ill-posed' in the sense that the function mapping the data to mu* is not continuous. In this paper, we derive the efficiency bounds for estimating linear functionals E[p(X)mu*(X)] and int_{supp(X)}p(x)mu*(x)dx, where p is a known weight function and supp(X) the support of X, without assuming mu* to be well-posed or even identified.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Data Envelopment Analysis (DEA) efficiency score obtained for an individual firm is a point estimate without any confidence interval around it. In recent years, researchers have resorted to bootstrapping in order to generate empirical distributions of efficiency scores. This procedure assumes that all firms have the same probability of getting an efficiency score from any specified interval within the [0,1] range. We propose a bootstrap procedure that empirically generates the conditional distribution of efficiency for each individual firm given systematic factors that influence its efficiency. Instead of resampling directly from the pooled DEA scores, we first regress these scores on a set of explanatory variables not included at the DEA stage and bootstrap the residuals from this regression. These pseudo-efficiency scores incorporate the systematic effects of unit-specific factors along with the contribution of the randomly drawn residual. Data from the U.S. airline industry are utilized in an empirical application.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this dissertation was to estimate HIV incidence among the individuals who had HIV tests performed at the Houston Department of Health and Human Services (HDHHS) public health laboratory, and to examine the prevalence of HIV and AIDS concurrent diagnoses among HIV cases reported between 2000 and 2007 in Houston/Harris County. ^ The first study in this dissertation estimated the cumulative HIV incidence among the individuals testing at Houston public health laboratory using Serologic Testing Algorithms for Recent HIV Seroconversion (STARHS) during the two year study period (June 1, 2005 to May 31, 2007). The HIV incidence was estimated using two independently developed statistical imputation methods, one developed by the Centers for Disease Control and Prevention (CDC), and the other developed by HDHHS. Among the 54,394 persons who tested for HIV during the study period, 942 tested HIV positive (positivity rate=1.7%). Of these HIV positives, 448 (48%) were newly reported to the Houston HIV/AIDS Reporting System (HARS) and 417 of these 448 blood specimens (93%) were available for STARHS testing. The STARHS results showed 139 (33%) out of the 417 specimens were newly infected with HIV. Using both the CDC and HDHHS methods, the estimated cumulative HIV incidences over the two-year study period were similar: 862 per 100,000 persons (95% CI: 655-1,070) by CDC method, and 925 per 100,000 persons (95% CI: 908-943) by HDHHS method. Consistent with the national finding, this study found African Americans, and men who have sex with men (MSM) accounted for most of the new HIV infections among the individuals testing at Houston public health laboratory. Using CDC statistical method, this study also found the highest cumulative HIV incidence (2,176 per 100,000 persons [95%CI: 1,536-2,798]) was among those who tested in the HIV counseling and testing sites, compared to the sexually transmitted disease clinics (1,242 per 100,000 persons [95%CI: 871-1,608]) and city health clinics (215 per 100,000 persons [95%CI: 80-353]. This finding suggested the HIV counseling and testing sites in Houston were successful in reaching high risk populations and testing them early for HIV. In addition, older age groups had higher cumulative HIV incidence, but accounted for smaller proportions of new HIV infections. The incidence in the 30-39 age group (994 per 100,000 persons [95%CI: 625-1,363]) was 1.5 times the incidence in 13-29 age group (645 per 100,000 persons [95%CI: 447-840]); the incidences in 40-49 age group (1,371 per 100,000 persons [95%CI: 765-1,977]) and 50 or above age groups (1,369 per 100,000 persons [95%CI: 318-2,415]) were 2.1 times compared to the youngest 13-29 age group. The increased HIV incidence in older age groups suggested that persons 40 or above were still at risk to contract HIV infections. HIV prevention programs should encourage more people who are age 40 and above to test for HIV. ^ The second study investigated concurrent diagnoses of HIV and AIDS in Houston. Concurrent HIV/AIDS diagnosis is defined as AIDS diagnosis within three months of HIV diagnosis. This study found about one-third of the HIV cases were diagnosed with HIV and AIDS concurrently (within three months) in Houston/Harris County. Using multivariable logistic regression analysis, this study found being male, Hispanic, older, and diagnosed in the private sector of care were positively associated with concurrent HIV and AIDS diagnoses. By contrast, men who had sex with men and also used injection drugs (MSM/IDU) were 0.64 times (95% CI: 0.44-0.93) less likely to have concurrent HIV and AIDS diagnoses. A sensitivity analysis comparing difference durations of elapsed time for concurrent HIV and AIDS diagnosis definitions (1-month, 3-month, and 12-month cut-offs) affected the effect size of the odds ratios, but not the direction. ^ The results of these two studies, one describing characteristics of the individuals who were newly infected with HIV, and the other study describing persons who were diagnosed with HIV and AIDS concurrently, can be used as a reference for HIV prevention program planning in Houston/Harris County. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^