967 resultados para Instrumental variable regression


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The history of the logistic function since its introduction in 1838 is reviewed, and the logistic model for a polychotomous response variable is presented with a discussion of the assumptions involved in its derivation and use. Following this, the maximum likelihood estimators for the model parameters are derived along with a Newton-Raphson iterative procedure for evaluation. A rigorous mathematical derivation of the limiting distribution of the maximum likelihood estimators is then presented using a characteristic function approach. An appendix with theorems on the asymptotic normality of sample sums when the observations are not identically distributed, with proofs, supports the presentation on asymptotic properties of the maximum likelihood estimators. Finally, two applications of the model are presented using data from the Hypertension Detection and Follow-up Program, a prospective, population-based, randomized trial of treatment for hypertension. The first application compares the risk of five-year mortality from cardiovascular causes with that from noncardiovascular causes; the second application compares risk factors for fatal or nonfatal coronary heart disease with those for fatal or nonfatal stroke. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

logitcprplot can be used after logistic regression for graphing a component-plus-residual plot (a.k.a. partial residual plot) for a given predictor, including a lowess, local polynomial, restricted cubic spline, fractional polynomial, penalized spline, regression spline, running line, or adaptive variable span running line smooth

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In variable charge soils, anion retention and accumulation through adsorption at exchange sites is a competitive process. The objectives of this study in the wet tropics of far north Queensland were to investigate (i) whether the pre-existing high sulphate in variable charge soils had any impact on the retention of chloride and nitrate, derived mostly from the applied fertilizer; and (ii) whether chloride competed with nitrate during the adsorption processes. Soil cores up to 12.5 m depth were taken from seven sites, representing four soil types, in the Johnstone River Catchment. Six of these sites had been under sugarcane (Saccharum officinarum-S) cultivation for at least 50 years and one was an undisturbed rainforest. The cores were segmented at 1.0 m depth increments, and subsamples were analysed for nitrate-N, cation (CEC)- and anion-exchange capacities (AEC), pH, exchangeable cations (Ca, Mg, K, Na), soil organic C (SOC), electrical conductivity (EC), sulphate-S, and chloride. Sulphate-S load in 1-12 m depth under cropping ranged from 9.4 to 73.9 t ha(-1) (mean= 40 t ha(-1)) compared with 74.4 t ha(-1) in the rainforest. Chloride load under cropping ranged from 1.5 to 9.6 t ha(-1) (mean= 4.9 t ha(-1)) compared to 0.9 t ha(-1) in the rainforest, and the nitrate-N load from 113 to 2760 kg ha(-1) (mean = 910 kg ha(-1)) under cropping compared to 12 kg ha(-1) in the rainforest. Regardless of the soil type, the total chloride or nitrate-N input in fertilisers was 7.5 t ha(-1), during the last 50 years. Sulphate-S distribution in soil profiles decreased with depth at >2 m, whereas bulges of chloride or nitrate-N were observed at depths >2 m. This suggests that chloride or nitrate adsorption and retention increased with decreasing sulphate dominance. Abrupt decreases in equivalent fraction of sulphate (EFSO4), at depths >2 m, were accompanied by rapid increases in equivalent fraction of chloride (EFCl), followed by nitrate (EFNO3). The stepwise regression for EFCl and EFNO3 indicated that nitrate retention was reduced by the pre-existing sulphate and imported chloride, whereas only sulphate reduced chloride adsorption. The results indicate that chloride and nitrate adsorption and retention occurred, in the order chloride>nitrate, in soils containing large amounts of sulphate under approximately similar total inputs of N- and Cl-fertilisers. (C) 2004 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and Objective: To examine if commonly recommended assumptions for multivariable logistic regression are addressed in two major epidemiological journals. Methods: Ninety-nine articles from the Journal of Clinical Epidemiology and the American Journal of Epidemiology were surveyed for 10 criteria: six dealing with computation and four with reporting multivariable logistic regression results. Results: Three of the 10 criteria were addressed in 50% or more of the articles. Statistical significance testing or confidence intervals were reported in all articles. Methods for selecting independent variables were described in 82%, and specific procedures used to generate the models were discussed in 65%. Fewer than 50% of the articles indicated if interactions were tested or met the recommended events per independent variable ratio of 10: 1. Fewer than 20% of the articles described conformity to a linear gradient, examined collinearity, reported information on validation procedures, goodness-of-fit, discrimination statistics, or provided complete information on variable coding. There was no significant difference (P >.05) in the proportion of articles meeting the criteria across the two journals. Conclusion: Articles reviewed frequently did not report commonly recommended assumptions for using multivariable logistic regression. (C) 2004 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Long-term forecasts of pest pressure are central to the effective management of many agricultural insect pests. In the eastern cropping regions of Australia, serious infestations of Helicoverpa punctigera (Wallengren) and H. armigera (Hübner)(Lepidoptera: Noctuidae) are experienced annually. Regression analyses of a long series of light-trap catches of adult moths were used to describe the seasonal dynamics of both species. The size of the spring generation in eastern cropping zones could be related to rainfall in putative source areas in inland Australia. Subsequent generations could be related to the abundance of various crops in agricultural areas, rainfall and the magnitude of the spring population peak. As rainfall figured prominently as a predictor variable, and can itself be predicted using the Southern Oscillation Index (SOI), trap catches were also related to this variable. The geographic distribution of each species was modelled in relation to climate and CLIMEX was used to predict temporal variation in abundance at given putative source sites in inland Australia using historical meteorological data. These predictions were then correlated with subsequent pest abundance data in a major cropping region. The regression-based and bioclimatic-based approaches to predicting pest abundance are compared and their utility in predicting and interpreting pest dynamics are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A resiliência é um construto que remete à habilidade do ser humano de ter êxito frente às adversidades da vida, superá-las e inclusive, ser fortalecido ou transformado por elas. Campos de investigações da psicologia, como Psicologia da Saúde, Psicologia Positiva e Comportamento Organizacional Positivo, têm considerado a resiliência como uma importante via para a compreensão dos aspectos positivos e saudáveis dos indivíduos. Este trabalho pretendeu ampliar o conhecimento acerca da resiliência e suas relações com outros construtos no contexto organizacional. Para isto, definiu-se como objetivo geral deste estudo verificar a capacidade preditiva do conflito intragrupal (tarefa e relacionamento), do suporte social no trabalho (emocional, informacional e instrumental) e do autoconceito profissional (saúde, realização, autoconfiança e competência) sobre a resiliência (adaptação ou aceitação positiva de mudanças, espiritualidade, resignação diante da vida, competência pessoal e persistência diante das dificuldades) de policiais militares. Participaram do estudo 133 policiais militares de um batalhão do interior do estado de São Paulo, prevalecendo indivíduos do sexo masculino (97,7%), com idade média de 30 anos (DP= 5,7). Para a medida das variáveis foram utilizadas as seguintes escalas validadas: Escala de Avaliação de Resiliência reduzida, Escala de Conflitos Intragrupais, Escala de Percepção de Suporte Social no Trabalho e a Escala de Autoconceito Profissional. Os dados foram submetidos a cálculos descritivos e a análises de regressão linear múltipla padrão. Os resultados indicaram que o modelo que reunia as variáveis antecedentes (conflito intragrupal, suporte social no trabalho e autoconceito profissional) explicou significativamente a variância das dimensões da resiliência: 30% da persistência diante das dificuldades, 29% da adaptação ou aceitação positiva de mudanças, 28% da competência pessoal e 11% da espiritualidade. As variáveis que tiveram impacto estatisticamente importante sobre a persistência diante das dificuldades foram o suporte emocional no trabalho, cuja direção da predição foi inversa, e autoconfiança, cuja direção da predição foi direta. A adaptação ou aceitação positiva de mudanças teve como preditor inverso a variável saúde e como preditor direto a autoconfiança. A competência pessoal teve impacto significativo da variável autoconfiança, que se mostrou um preditor direto. A espiritualidade, por sua vez, teve um único preditor significante, a variável realização, cuja direção da predição foi direta. Os resultados sugerem que dentre as variáveis antecedentes, o autoconceito profissional evidenciou maior poder de explicação da variância da resiliência. À luz da literatura da área foram discutidos estes achados. Por fim, foram apresentadas as limitações e a proposta de uma agenda de pesquisa que contribua para confirmação e ampliação dos resultados desta investigação.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is generally assumed when using Bayesian inference methods for neural networks that the input data contains no noise or corruption. For real-world (errors in variable) problems this is clearly an unsafe assumption. This paper presents a Bayesian neural network framework which allows for input noise given that some model of the noise process exists. In the limit where this noise process is small and symmetric it is shown, using the Laplace approximation, that there is an additional term to the usual Bayesian error bar which depends on the variance of the input noise process. Further, by treating the true (noiseless) input as a hidden variable and sampling this jointly with the network's weights, using Markov Chain Monte Carlo methods, it is demonstrated that it is possible to infer the unbiassed regression over the noiseless input.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Correlation and regression are two of the statistical procedures most widely used by optometrists. However, these tests are often misused or interpreted incorrectly, leading to erroneous conclusions from clinical experiments. This review examines the major statistical tests concerned with correlation and regression that are most likely to arise in clinical investigations in optometry. First, the use, interpretation and limitations of Pearson's product moment correlation coefficient are described. Second, the least squares method of fitting a linear regression to data and for testing how well a regression line fits the data are described. Third, the problems of using linear regression methods in observational studies, if there are errors associated in measuring the independent variable and for predicting a new value of Y for a given X, are discussed. Finally, methods for testing whether a non-linear relationship provides a better fit to the data and for comparing two or more regression lines are considered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Researchers often use 3-way interactions in moderated multiple regression analysis to test the joint effect of 3 independent variables on a dependent variable. However, further probing of significant interaction terms varies considerably and is sometimes error prone. The authors developed a significance test for slope differences in 3-way interactions and illustrate its importance for testing psychological hypotheses. Monte Carlo simulations revealed that sample size, magnitude of the slope difference, and data reliability affected test power. Application of the test to published data yielded detection of some slope differences that were undetected by alternative probing techniques and led to changes of results and conclusions. The authors conclude by discussing the test's applicability for psychological research. Copyright 2006 by the American Psychological Association.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two types of prediction problem can be solved using a regression line viz., prediction of the ‘population’ regression line at the point ‘x’ and prediction of an ‘individual’ new member of the population ‘y1’ for which ‘x1’ has been measured. The second problem is probably the most commonly encountered and the most relevant to calibration studies. A regression line is likely to be most useful for calibration if the range of values of the X variable is large, if there is a good representation of the ‘x,y’ values across the range of X, and if several estimates of ‘y’ are made at each ‘x’. It is poor statistical practice to use a regression line for calibration or prediction beyond the limits of the data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Non-linear relationships are common in microbiological research and often necessitate the use of the statistical techniques of non-linear regression or curve fitting. In some circumstances, the investigator may wish to fit an exponential model to the data, i.e., to test the hypothesis that a quantity Y either increases or decays exponentially with increasing X. This type of model is straight forward to fit as taking logarithms of the Y variable linearises the relationship which can then be treated by the methods of linear regression.