942 resultados para Logistic regression methodology
Resumo:
Pós-graduação em Saúde Coletiva - FMB
Resumo:
Static analysis tools report software defects that may or may not be detected by other verification methods. Two challenges complicating the adoption of these tools are spurious false positive warnings and legitimate warnings that are not acted on. This paper reports automated support to help address these challenges using logistic regression models that predict the foregoing types of warnings from signals in the warnings and implicated code. Because examining many potential signaling factors in large software development settings can be expensive, we use a screening methodology to quickly discard factors with low predictive power and cost-effectively build predictive models. Our empirical evaluation indicates that these models can achieve high accuracy in predicting accurate and actionable static analysis warnings, and suggests that the models are competitive with alternative models built without screening.
Resumo:
Pós-graduação em Saúde Coletiva - FMB
Resumo:
Background: Over the last century the incidence of cutaneous melanoma has increased worldwide, a trend that has also been observed in Brazil. The identified risk factors for melanoma include the pattern of sun exposure, family history, and certain phenotypic features. In addition, the incidence of melanoma might be influenced by ethnicity. Like many countries, Brazil has high immigration rates and consequently a heterogenous population. However, Brazil is unique among such countries in that the ethnic heterogeneity of its population is primarily attributable to admixture. This study aimed to evaluate the contribution of European ethnicity to the risk of cutaneous melanoma in Brazil. Methodology/Principal Findings: We carried out a hospital-based case-control study in the metropolitan area of Sao Paulo, Brazil. We evaluated 424 hospitalized patients (202 melanoma patients and 222 control patients) regarding phenotypic features, sun exposure, and number of grandparents born in Europe. Through multivariate logistic regression analysis, we found the following variables to be independently associated with melanoma: grandparents born in Europe-Spain (OR = 3.01, 95% CI: 1.03-8.77), Italy (OR = 3.47, 95% CI: 1.41-8.57), a Germanic/Slavic country (OR = 3.06, 95% CI: 1.05-8.93), or >= 2 European countries (OR = 2.82, 95% CI: 1.06-7.47); eye color-light brown (OR = 1.99, 95% CI: 1.14-3.84) and green/blue (OR = 4.62; 95% CI 2.22-9.58); pigmented lesion removal (OR = 3.78; 95% CI: 2.21-6.49); no lifetime sunscreen use (OR = 3.08; 95% CI: 1.03-9.22); and lifetime severe sunburn (OR = 1.81; 95% CI: 1.03-3.19). Conclusions: Our results indicate that European ancestry is a risk factor for cutaneous melanoma. Such risk appears to be related not only to skin type, eye color, and tanning capacity but also to others specific characteristics of European populations introduced in the New World by European immigrants.
Resumo:
Within the nutritional context, the supplementation of microminerals in bird food is often made in quantities exceeding those required in the attempt to ensure the proper performance of the animals. The experiments of type dosage x response are very common in the determination of levels of nutrients in optimal food balance and include the use of regression models to achieve this objective. Nevertheless, the regression analysis routine, generally, uses a priori information about a possible relationship between the response variable. The isotonic regression is a method of estimation by least squares that generates estimates which preserves data ordering. In the theory of isotonic regression this information is essential and it is expected to increase fitting efficiency. The objective of this work was to use an isotonic regression methodology, as an alternative way of analyzing data of Zn deposition in tibia of male birds of Hubbard lineage. We considered the models of plateau response of polynomial quadratic and linear exponential forms. In addition to these models, we also proposed the fitting of a logarithmic model to the data and the efficiency of the methodology was evaluated by Monte Carlo simulations, considering different scenarios for the parametric values. The isotonization of the data yielded an improvement in all the fitting quality parameters evaluated. Among the models used, the logarithmic presented estimates of the parameters more consistent with the values reported in literature.
Resumo:
Background: In a classical study, Durkheim mapped suicide rates, wealth, and low family density and realized that they clustered in northern France. Assessing others variables, such as religious society, he constructed a framework for the analysis of the suicide, which still allows international comparisons using the same basic methodology. The present study aims to identify possible significantly clusters of suicide in the city of Sao Paulo, and then, verify their statistical associations with socio-economic and cultural characteristics. Methods: A spatial scan statistical test was performed to analyze the geographical pattern of suicide deaths of residents in the city of Sao Paulo by Administrative District, from 1996 to 2005. Relative risks and high and/or low clusters were calculated accounting for gender and age as co-variates, were analyzed using spatial scan statistics to identify geographical patterns. Logistic regression was used to estimate associations with socioeconomic variables, considering, the spatial cluster of high suicide rates as the response variable. Drawing from Durkheim's original work, current World Health Organization (WHO) reports and recent reviews, the following independent variables were considered: marital status, income, education, religion, and migration. Results: The mean suicide rate was 4.1/100,000 inhabitant-years. Against this baseline, two clusters were identified: the first, of increased risk (RR = 1.66), comprising 18 districts in the central region; the second, of decreased risk (RR = 0.78), including 14 districts in the southern region. The downtown area toward the southwestern region of the city displayed the highest risk for suicide, and though the overall risk may be considered low, the rate climbs up to an intermediate level in this region. One logistic regression analysis contrasted the risk cluster (18 districts) against the other remaining 78 districts, testing the effects of socioeconomic-cultural variables. The following categories of proportion of persons within the clusters were identified as risk factors: singles (OR = 2.36), migrants (OR = 1.50), Catholics (OR = 1.37) and higher income (OR = 1.06). In a second logistic model, likewise conceived, the following categories of proportion of persons were identified as protective factors: married (OR = 0.49) and Evangelical (OR = 0.60). Conclusions: This risk/ protection profile is in accordance with the interpretation that, as a social phenomenon, suicide is related to social isolation. Thus, the classical framework put forward by Durkheim seems to still hold, even though its categorical expression requires re-interpretation.
Resumo:
Introduction: The literature suggests that individuals with history of cleft lip and palate who present with midfacial growth deficiency are at higher risk of presenting lisping. The relationship between distortions during production of linguoalveolar fricative sounds and the severity of malocclusion, however, has not been established for the population with cleft. Objective: To study the association between lisping and dental arch relationship. Methodology: Speech samples and dental arch casts were obtained from 106 children with operated unilateral cleft lip and palate (UCLP) during the stage of mixed dentition and before orthodontic treatment. Videotaped productions of the phrase/u saci saiw sedu/were rated by speech-language pathologists for the identification of lisping during [s]. Dental arch casts were rated by orthodontists using the Goslon Yardstick and the Five-Year Index to establish dental arch relationship. Results: Multiple logistic regression showed no significant association between lisping and dento-occlusal index (p = .802) and age (p = .662). Substantial interjudge agreement during auditory-perceptual ratings was found (kappa = .63). Almost perfect agreement was found between orthodontists while establishing the dental arch relationship (kappa = .81). Discussion: This study failed to reveal an association between lisping and dental arch relationship in children with operated UCLP. Multiple variables may play a role in determining occurrence of lisping, warranting further investigation.
Resumo:
Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The presented study carried out an analysis on rural landscape changes. In particular the study focuses on the understanding of driving forces acting on the rural built environment using a statistical spatial model implemented through GIS techniques. It is well known that the study of landscape changes is essential for a conscious decision making in land planning. From a bibliography review results a general lack of studies dealing with the modeling of rural built environment and hence a theoretical modelling approach for such purpose is needed. The advancement in technology and modernity in building construction and agriculture have gradually changed the rural built environment. In addition, the phenomenon of urbanization of a determined the construction of new volumes that occurred beside abandoned or derelict rural buildings. Consequently there are two types of transformation dynamics affecting mainly the rural built environment that can be observed: the conversion of rural buildings and the increasing of building numbers. It is the specific aim of the presented study to propose a methodology for the development of a spatial model that allows the identification of driving forces that acted on the behaviours of the building allocation. In fact one of the most concerning dynamic nowadays is related to an irrational expansion of buildings sprawl across landscape. The proposed methodology is composed by some conceptual steps that cover different aspects related to the development of a spatial model: the selection of a response variable that better describe the phenomenon under study, the identification of possible driving forces, the sampling methodology concerning the collection of data, the most suitable algorithm to be adopted in relation to statistical theory and method used, the calibration process and evaluation of the model. A different combination of factors in various parts of the territory generated favourable or less favourable conditions for the building allocation and the existence of buildings represents the evidence of such optimum. Conversely the absence of buildings expresses a combination of agents which is not suitable for building allocation. Presence or absence of buildings can be adopted as indicators of such driving conditions, since they represent the expression of the action of driving forces in the land suitability sorting process. The existence of correlation between site selection and hypothetical driving forces, evaluated by means of modeling techniques, provides an evidence of which driving forces are involved in the allocation dynamic and an insight on their level of influence into the process. GIS software by means of spatial analysis tools allows to associate the concept of presence and absence with point futures generating a point process. Presence or absence of buildings at some site locations represent the expression of these driving factors interaction. In case of presences, points represent locations of real existing buildings, conversely absences represent locations were buildings are not existent and so they are generated by a stochastic mechanism. Possible driving forces are selected and the existence of a causal relationship with building allocations is assessed through a spatial model. The adoption of empirical statistical models provides a mechanism for the explanatory variable analysis and for the identification of key driving variables behind the site selection process for new building allocation. The model developed by following the methodology is applied to a case study to test the validity of the methodology. In particular the study area for the testing of the methodology is represented by the New District of Imola characterized by a prevailing agricultural production vocation and were transformation dynamic intensively occurred. The development of the model involved the identification of predictive variables (related to geomorphologic, socio-economic, structural and infrastructural systems of landscape) capable of representing the driving forces responsible for landscape changes.. The calibration of the model is carried out referring to spatial data regarding the periurban and rural area of the study area within the 1975-2005 time period by means of Generalised linear model. The resulting output from the model fit is continuous grid surface where cells assume values ranged from 0 to 1 of probability of building occurrences along the rural and periurban area of the study area. Hence the response variable assesses the changes in the rural built environment occurred in such time interval and is correlated to the selected explanatory variables by means of a generalized linear model using logistic regression. Comparing the probability map obtained from the model to the actual rural building distribution in 2005, the interpretation capability of the model can be evaluated. The proposed model can be also applied to the interpretation of trends which occurred in other study areas, and also referring to different time intervals, depending on the availability of data. The use of suitable data in terms of time, information, and spatial resolution and the costs related to data acquisition, pre-processing, and survey are among the most critical aspects of model implementation. Future in-depth studies can focus on using the proposed model to predict short/medium-range future scenarios for the rural built environment distribution in the study area. In order to predict future scenarios it is necessary to assume that the driving forces do not change and that their levels of influence within the model are not far from those assessed for the time interval used for the calibration.
Resumo:
Background Healthy lifestyle including sufficient physical activity may mitigate or prevent adverse long-term effects of childhood cancer. We described daily physical activities and sports in childhood cancer survivors and controls, and assessed determinants of both activity patterns. Methodology/Principal Findings The Swiss Childhood Cancer Survivor Study is a questionnaire survey including all children diagnosed with cancer 1976–2003 at age 0–15 years, registered in the Swiss Childhood Cancer Registry, who survived ≥5years and reached adulthood (≥20years). Controls came from the population-based Swiss Health Survey. We compared the two populations and determined risk factors for both outcomes in separate multivariable logistic regression models. The sample included 1058 survivors and 5593 controls (response rates 78% and 66%). Sufficient daily physical activities were reported by 52% (n = 521) of survivors and 37% (n = 2069) of controls (p<0.001). In contrast, 62% (n = 640) of survivors and 65% (n = 3635) of controls reported engaging in sports (p = 0.067). Risk factors for insufficient daily activities in both populations were: older age (OR for ≥35years: 1.5, 95CI 1.2–2.0), female gender (OR 1.6, 95CI 1.3–1.9), French/Italian Speaking (OR 1.4, 95CI 1.1–1.7), and higher education (OR for university education: 2.0, 95CI 1.5–2.6). Risk factors for no sports were: being a survivor (OR 1.3, 95CI 1.1–1.6), older age (OR for ≥35years: 1.4, 95CI 1.1–1.8), migration background (OR 1.5, 95CI 1.3–1.8), French/Italian speaking (OR 1.4, 95CI 1.2–1.7), lower education (OR for compulsory schooling only: 1.6, 95CI 1.2–2.2), being married (OR 1.7, 95CI 1.5–2.0), having children (OR 1.3, 95CI 1.4–1.9), obesity (OR 2.4, 95CI 1.7–3.3), and smoking (OR 1.7, 95CI 1.5–2.1). Type of diagnosis was only associated with sports. Conclusions/Significance Physical activity levels in survivors were lower than recommended, but comparable to controls and mainly determined by socio-demographic and cultural factors. Strategies to improve physical activity levels could be similar as for the general population.
Resumo:
Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the linkage disequilibrium (LD) structure in the genotype data, and accommodates missing genotypes via haplotype-based imputation. We also derive an efficient algorithm to simulate case-parent trios where genetic risk is determined via epistatic interactions.
Resumo:
This paper considers a wide class of semiparametric problems with a parametric part for some covariate effects and repeated evaluations of a nonparametric function. Special cases in our approach include marginal models for longitudinal/clustered data, conditional logistic regression for matched case-control studies, multivariate measurement error models, generalized linear mixed models with a semiparametric component, and many others. We propose profile-kernel and backfitting estimation methods for these problems, derive their asymptotic distributions, and show that in likelihood problems the methods are semiparametric efficient. While generally not true, with our methods profiling and backfitting are asymptotically equivalent. We also consider pseudolikelihood methods where some nuisance parameters are estimated from a different algorithm. The proposed methods are evaluated using simulation studies and applied to the Kenya hemoglobin data.
Resumo:
Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.
Resumo:
The municipality of San Juan La Laguna, Guatemala is home to approximately 5,200 people and located on the western side of the Lake Atitlán caldera. Steep slopes surround all but the eastern side of San Juan. The Lake Atitlán watershed is susceptible to many natural hazards, but most predictable are the landslides that can occur annually with each rainy season, especially during high-intensity events. Hurricane Stan hit Guatemala in October 2005; the resulting flooding and landslides devastated the Atitlán region. Locations of landslide and non-landslide points were obtained from field observations and orthophotos taken following Hurricane Stan. This study used data from multiple attributes, at every landslide and non-landslide point, and applied different multivariate analyses to optimize a model for landslides prediction during high-intensity precipitation events like Hurricane Stan. The attributes considered in this study are: geology, geomorphology, distance to faults and streams, land use, slope, aspect, curvature, plan curvature, profile curvature and topographic wetness index. The attributes were pre-evaluated for their ability to predict landslides using four different attribute evaluators, all available in the open source data mining software Weka: filtered subset, information gain, gain ratio and chi-squared. Three multivariate algorithms (decision tree J48, logistic regression and BayesNet) were optimized for landslide prediction using different attributes. The following statistical parameters were used to evaluate model accuracy: precision, recall, F measure and area under the receiver operating characteristic (ROC) curve. The algorithm BayesNet yielded the most accurate model and was used to build a probability map of landslide initiation points. The probability map developed in this study was also compared to the results of a bivariate landslide susceptibility analysis conducted for the watershed, encompassing Lake Atitlán and San Juan. Landslides from Tropical Storm Agatha 2010 were used to independently validate this study’s multivariate model and the bivariate model. The ultimate aim of this study is to share the methodology and results with municipal contacts from the author's time as a U.S. Peace Corps volunteer, to facilitate more effective future landslide hazard planning and mitigation.
Resumo:
BACKGROUND: Propofol and sevoflurane display additivity for gamma-aminobutyric acid receptor activation, loss of consciousness, and tolerance of skin incision. Information about their interaction regarding electroencephalographic suppression is unavailable. This study examined this interaction as well as the interaction on the probability of tolerance of shake and shout and three noxious stimulations by using a response surface methodology. METHODS: Sixty patients preoperatively received different combined concentrations of propofol (0-12 microg/ml) and sevoflurane (0-3.5 vol.%) according to a crisscross design (274 concentration pairs, 3 to 6 per patient). After having reached pseudo-steady state, the authors recorded bispectral index, state and response entropy and the response to shake and shout, tetanic stimulation, laryngeal mask airway insertion, and laryngoscopy. For the analysis of the probability of tolerance by logistic regression, a Greco interaction model was used. For the separate analysis of bispectral index, state and response entropy suppression, a fractional Emax Greco model was used. All calculations were performed with NONMEM V (GloboMax LLC, Hanover, MD). RESULTS: Additivity was found for all endpoints, the Ce(50, PROP)/Ce(50, SEVO) for bispectral index suppression was 3.68 microg. ml(-1)/ 1.53 vol.%, for tolerance of shake and shout 2.34 microg . ml(-1)/ 1.03 vol.%, tetanic stimulation 5.34 microg . ml(-1)/ 2.11 vol.%, laryngeal mask airway insertion 5.92 microg. ml(-1) / 2.55 vol.%, and laryngoscopy 6.55 microg. ml(-1)/2.83 vol.%. CONCLUSION: For both electroencephalographic suppression and tolerance to stimulation, the interaction of propofol and sevoflurane was identified as additive. The response surface data can be used for more rational dose finding in case of sequential and coadministration of propofol and sevoflurane.