14 resultados para Logistic maps

em DigitalCommons@The Texas Medical Center


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study investigates the degree to which gender, ethnicity, relationship to perpetrator, and geomapped socio-economic factors significantly predict the incidence of childhood sexual abuse, physical abuse and non- abuse. These variables are then linked to geographic identifiers using geographic information system (GIS) technology to develop a geo-mapping framework for child sexual and physical abuse prevention.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In 2011, there will be an estimated 1,596,670 new cancer cases and 571,950 cancer-related deaths in the US. With the ever-increasing applications of cancer genetics in epidemiology, there is great potential to identify genetic risk factors that would help identify individuals with increased genetic susceptibility to cancer, which could be used to develop interventions or targeted therapies that could hopefully reduce cancer risk and mortality. In this dissertation, I propose to develop a new statistical method to evaluate the role of haplotypes in cancer susceptibility and development. This model will be flexible enough to handle not only haplotypes of any size, but also a variety of covariates. I will then apply this method to three cancer-related data sets (Hodgkin Disease, Glioma, and Lung Cancer). I hypothesize that there is substantial improvement in the estimation of association between haplotypes and disease, with the use of a Bayesian mathematical method to infer haplotypes that uses prior information from known genetics sources. Analysis based on haplotypes using information from publically available genetic sources generally show increased odds ratios and smaller p-values in both the Hodgkin, Glioma, and Lung data sets. For instance, the Bayesian Joint Logistic Model (BJLM) inferred haplotype TC had a substantially higher estimated effect size (OR=12.16, 95% CI = 2.47-90.1 vs. 9.24, 95% CI = 1.81-47.2) and more significant p-value (0.00044 vs. 0.008) for Hodgkin Disease compared to a traditional logistic regression approach. Also, the effect sizes of haplotypes modeled with recessive genetic effects were higher (and had more significant p-values) when analyzed with the BJLM. Full genetic models with haplotype information developed with the BJLM resulted in significantly higher discriminatory power and a significantly higher Net Reclassification Index compared to those developed with haplo.stats for lung cancer. Future analysis for this work could be to incorporate the 1000 Genomes project, which offers a larger selection of SNPs can be incorporated into the information from known genetic sources as well. Other future analysis include testing non-binary outcomes, like the levels of biomarkers that are present in lung cancer (NNK), and extending this analysis to full GWAS studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is a growing interest in the location of Treatment, Storage, and Disposal (TSDF) sites in relation to minority communities. A number of studies have been completed, and the results of these studies have been varied. Some of the studies have shown a strong positive correlation between the location of TSDF sites and minority populations, while a few have shown no significance in that relationship. The major difference between these studies has been in the areal unit used.^ This study compared the minority populations of Texas census tracts and ZIP codes containing a TSDF using the associated county as the comparison population. The hypothesis of this study was that there was no difference between using census tracts and ZIP codes to analyze the relationship of minority populations and TSDF's. The census data used was from 1990, and the initial list of TSDF sites was supplied by the Texas Natural Resource Conservation Commission. The TSDF site locations were checked using graphical information systems (GIS) programs, in order to increase the accuracy of the identity of exposed ZIP codes and census tracts. The minority populations of the exposed areal units were compared using proportional differences, crosstables, maps, and logistic regression. The dependent variable used was the exposure status of the areal units under study, including counties, census tracts, and ZIP codes. The independent variables used included minority group proportion and grouping of the proportions, educational status, household income, and home value.^ In all cases, education was significant or near significant at the.05 level. Education rather than minority proportion was therefore the most significant predictor of the exposure status of a census tract or ZIP code. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The persistence of low birth weight and intrauterine growth retardation (IUGR) in the United States has puzzled researchers for decades. Much of the work that has been conducted on adverse birth outcomes has focused on low birth weight in general and not on IUGR. Studies that have examined IUGR specifically thus far have focused primarily on individual-level maternal risk factors. These risk factors have only been able to explain a small portion of the variance in IUGR. Therefore, recent work has begun to focus on community-level risk factors in addition to the individual-level maternal characteristics. This study uses Social Ecology to examine the relationship of individual and community-level risk factors and IUGR. Logistic regression was used to establish an individual-level model based on 155, 856 births recorded in Harris County, TX during 1999-2001. IUGR was characterized using a fetal growth ratio method with race/ethnic and sex specific mean birth weights calculated from national vital records. The spatial distributions of 114,460 birth records spatially located within the City of Houston were examined using choropleth, probability and density maps. Census tracts with higher than expected rates of IUGR and high levels of neighborhood disadvantage were highlighted. Neighborhood disadvantage was constructed using socioeconomic variables from the 2000 U.S. Census. Factor analysis was used to create a unified single measure. Lastly, a random coefficients model was used to examine the relationship between varying levels of community disadvantage, given the set of individual-level risk factors for 152,997 birth records spatially located within Harris County, TX. Neighborhood disadvantage was measured using three different indices adapted from previous work. The findings show that pregnancy-induced hypertension, previous preterm infant, tobacco use and insufficient weight gain have the highest association with IUGR. Neighborhood disadvantage only slightly further increases the risk of IUGR (OR 1.12 to 1.23). Although community level disadvantage only helped to explain a small proportion of the variance of IUGR, it did have a significant impact. This finding suggests that community level risk factors should be included in future work with IUGR and that more work needs to be conducted. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose. To examine the association between living in proximity to Toxics Release Inventory (TRI) facilities and the incidence of childhood cancer in the State of Texas. ^ Design. This is a secondary data analysis utilizing the publicly available Toxics release inventory (TRI), maintained by the U.S. Environmental protection agency that lists the facilities that release any of the 650 TRI chemicals. Total childhood cancer cases and childhood cancer rate (age 0-14 years) by county, for the years 1995-2003 were used from the Texas cancer registry, available at the Texas department of State Health Services website. Setting: This study was limited to the children population of the State of Texas. ^ Method. Analysis was done using Stata version 9 and SPSS version 15.0. Satscan was used for geographical spatial clustering of childhood cancer cases based on county centroids using the Poisson clustering algorithm which adjusts for population density. Pictorial maps were created using MapInfo professional version 8.0. ^ Results. One hundred and twenty five counties had no TRI facilities in their region, while 129 facilities had at least one TRI facility. An increasing trend for number of facilities and total disposal was observed except for the highest category based on cancer rate quartiles. Linear regression analysis using log transformation for number of facilities and total disposal in predicting cancer rates was computed, however both these variables were not found to be significant predictors. Seven significant geographical spatial clusters of counties for high childhood cancer rates (p<0.05) were indicated. Binomial logistic regression by categorizing the cancer rate in to two groups (<=150 and >150) indicated an odds ratio of 1.58 (CI 1.127, 2.222) for the natural log of number of facilities. ^ Conclusion. We have used a unique methodology by combining GIS and spatial clustering techniques with existing statistical approaches in examining the association between living in proximity to TRI facilities and the incidence of childhood cancer in the State of Texas. Although a concrete association was not indicated, further studies are required examining specific TRI chemicals. Use of this information can enable the researchers and public to identify potential concerns, gain a better understanding of potential risks, and work with industry and government to reduce toxic chemical use, disposal or other releases and the risks associated with them. TRI data, in conjunction with other information, can be used as a starting point in evaluating exposures and risks. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The history of the logistic function since its introduction in 1838 is reviewed, and the logistic model for a polychotomous response variable is presented with a discussion of the assumptions involved in its derivation and use. Following this, the maximum likelihood estimators for the model parameters are derived along with a Newton-Raphson iterative procedure for evaluation. A rigorous mathematical derivation of the limiting distribution of the maximum likelihood estimators is then presented using a characteristic function approach. An appendix with theorems on the asymptotic normality of sample sums when the observations are not identically distributed, with proofs, supports the presentation on asymptotic properties of the maximum likelihood estimators. Finally, two applications of the model are presented using data from the Hypertension Detection and Follow-up Program, a prospective, population-based, randomized trial of treatment for hypertension. The first application compares the risk of five-year mortality from cardiovascular causes with that from noncardiovascular causes; the second application compares risk factors for fatal or nonfatal coronary heart disease with those for fatal or nonfatal stroke. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Invasive pneumococcal disease (IPD) causes significant health burden in the US, is responsible for the majority of bacterial meningitis, and causes more deaths than any other vaccine preventable bacterial disease in the US. The estimated National IPD rate is 14.3 cases per 100,000 population with a case-fatality rate of 1.5 cases per 100,000 population. Although cases of IPD are routinely reported to the local health department in Harris County Texas, the incidence (IR) and case-fatality (CFR) rates have not been reported. Additionally, it is important to know which serotypes of S. pneumoniae are circulating in Harris County Texas and to determine if ‘replacement disease’ is occurring. ^ This study reported incidence and case-fatality rates from 2003 to 2009, and described the trends in IPD, including the IPD serotypes circulating in Harris County Texas during the study period, particularly in 2008 and 2010. Annual incidence rates were calculated and reported for 2003 to 2009, using complete surveillance-year data. ^ Geographic information system (GIS) software was used to create a series of maps of the data reported during the study period. Cluster and outlier analysis and hot spot analysis were conducted using both case counts by census tract and disease rate by census tract. ^ IPD age- and race-adjusted IR for Harris County Texas and their 95% confidence intervals (CIs) were 1.40 (95% CI 1.0, 1.8), 1.71 (95% CI 1.24, 2.17), 3.13 (95% CI 2.48, 3.78), 3.08 (95% CI 2.43, 3.74), 5.61 (95% CI 4.79, 6.43), 8.11 (95% CI 7.11, 9.1), and 7.65 (95% CI 6.69, 8.61) for the years 2003 to 2009, respectively (rates were age- and race-adjusted to each year's midyear US population estimates). A Poisson regression model demonstrated a statistically significant increasing trend of about 32 percent per year in the IPD rates over the course of the study period. IPD age- and race-adjusted case-fatality rates (CFR) for Harris County Texas were also calculated and reported. A Poisson regression model demonstrated a statistically significant increasing trend of about 26 percent per year in the IPD case-fatality rates from 2003 through 2009. A logistic regression model associated the risk of dying from IPD to alcohol abuse (OR 4.69, 95% CI 2.57, 8.56) and to meningitis (OR 2.42, 95% CI 1.46, 4.03). ^ The prevalence of non-vaccine serotypes (NVT) among IPD cases with serotyped isolates was 98.2 percent. In 2008, the year with the sample more geographically representative of all areas of Harris County Texas, the prevalence was 96 percent. Given these findings, it is reasonable to conclude that ‘replacement disease’ is occurring in Harris County Texas, meaning that, the majority of IPD is caused by serotypes not included in the PCV7 vaccine. Also in conclusion, IPD rates increased during the study period in Harris County Texas.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is an obvious carcinogen for lung cancer. Since CBMN (Cytokinesis-blocked micronucleus) has been found to be extremely sensitive to NNK-induced genetic damage, it is a potential important factor to predict the lung cancer risk. However, the association between lung cancer and NNK-induced genetic damage measured by CBMN assay has not been rigorously examined. ^ This research develops a methodology to model the chromosomal changes under NNK-induced genetic damage in a logistic regression framework in order to predict the occurrence of lung cancer. Since these chromosomal changes were usually not observed very long due to laboratory cost and time, a resampling technique was applied to generate the Markov chain of the normal and the damaged cell for each individual. A joint likelihood between the resampled Markov chains and the logistic regression model including transition probabilities of this chain as covariates was established. The Maximum likelihood estimation was applied to carry on the statistical test for comparison. The ability of this approach to increase discriminating power to predict lung cancer was compared to a baseline "non-genetic" model. ^ Our method offered an option to understand the association between the dynamic cell information and lung cancer. Our study indicated the extent of DNA damage/non-damage using the CBMN assay provides critical information that impacts public health studies of lung cancer risk. This novel statistical method could simultaneously estimate the process of DNA damage/non-damage and its relationship with lung cancer for each individual.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Children who experience early pubertal development have an increased risk of developing cancer (breast, ovarian, and testicular), osteoporosis, insulin resistance, and obesity as adults. Early pubertal development has been associated with depression, aggressiveness, and increased sexual prowess. Possible explanations for the decline in age of pubertal onset include genetics, exposure to environmental toxins, better nutrition, and a reduction in childhood infections. In this study we (1) evaluated the association between 415 single nucleotide polymorphisms (SNPs) from hormonal pathways and early puberty, defined as menarche prior to age 12 in females and Tanner Stage 2 development prior to age 11 in males, and (2) measured endocrine hormone trajectories (estradiol, testosterone, and DHEAS) in relation to age, race, and Tanner Stage in a cohort of children from Project HeartBeat! At the end of the 4-year study, 193 females had onset of menarche and 121 males had pubertal staging at age 11. African American females had a younger mean age at menarche than Non-Hispanic White females. African American females and males had a lower mean age at each pubertal stage (1-5) than Non-Hispanic White females and males. African American females had higher mean BMI measures at each pubertal stage than Non-Hispanic White females. Of the 415 SNPs evaluated in females, 22 SNPs were associated with early menarche, when adjusted for race ( p<0.05), but none remained significant after adjusting for multiple testing by False Discovery Rate (p<0.00017). In males, 17 SNPs were associated with early pubertal development when adjusted for race (p<0.05), but none remained significant when adjusted for multiple testing (p<0.00017). ^ There were 4955 hormone measurements taken during the 4-year study period from 632 African American and Non-Hispanic White males and females. On average, African American females started and ended the pubertal process at a younger age than Non-Hispanic White females. The mean age of Tanner Stage 2 breast development in African American and Non-Hispanic White females was 9.7 (S.D.=0.8) and 10.2 (S.D.=1.1) years, respectively. There was a significant difference by race in mean age for each pubertal stage, except Tanner Stage 1 for pubic hair development. Both Estradiol and DHEAS levels in females varied significantly with age, but not by race. Estradiol and DHEAS levels increased from Tanner Stage 1 to Tanner Stage 5.^ African American males had a lower mean age at each Tanner Stage of development than Non-Hispanic White males. The mean age of Tanner Stage 2 genital development in African American and Non-Hispanic White males was 10.5 (S.D.=1.1) and 10.8 (S.D.=1.1) years, respectively, but this difference was not significant (p=0.11). Testosterone levels varied significantly with age and race. Non-Hispanic White males had higher levels of testosterone than African American males from Tanner Stage 1-4. Testosterone levels increased for both races from Tanner Stage 1 to Tanner Stage 5. Testosterone levels had the steepest increase from ages 11-15 for both races. DHEAS levels in males varied significantly with age, but not by race. DHEAS levels had the steepest increase from ages 14-17. ^ In conclusion, African American males and females experience pubertal onset at a younger age than Non-Hispanic White males and females, but in this study, we could not find a specific gene that explained the observed variation in age of pubertal onset. Future studies with larger study populations may provide a better understanding of the contribution of genes in early pubertal onset.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dengue fever is a strictly human and non-human primate disease characterized by a high fever, thrombocytopenia, retro-orbital pain, and severe joint and muscle pain. Over 40% of the world population is at risk. Recent re-emergence of dengue outbreaks in Texas and Florida following the re-introduction of competent Aedes mosquito vectors in the United States have raised growing concerns about the potential for increased occurrences of dengue fever outbreaks throughout the southern United States. Current deficiencies in vector control, active surveillance and awareness among medical practitioners may contribute to a delay in recognizing and controlling a dengue virus outbreak. Previous studies have shown links between low-income census tracts, high population density, and dengue fever within the United States. Areas of low-income and high population density that correlate with the distribution of Aedes mosquitoes result in higher potential for outbreaks. In this retrospective ecologic study, nine maps were generated to model U.S. census tracts’ potential to sustain dengue virus transmission if the virus was introduced into the area. Variables in the model included presence of a competent vector in the county and census tract percent poverty and population density. Thirty states, 1,188 counties, and 34,705 census tracts were included in the analysis. Among counties with Aedes mosquito infestation, the census tracts were ranked high, medium, and low risk potential for sustained transmission of the virus. High risk census tracts were identified as areas having the vector, ≥20% poverty, and ≥500 persons per square mile. Census tracts with either ≥20% poverty or ≥500 persons per square mile and have the vector present are considered moderate risk. Census tracts that have the vector present but have <20% poverty and <500 persons per square mile are considered low risk. Furthermore, counties were characterized as moderate risk if 50% or more of the census tracts in that county were rated high or moderate risk, and high risk if 25% or greater were rated high risk. Extreme risk counties, which were primarily concentrated in Texas and Mississippi, were considered having 50% or greater of the census tracts ranked as high risk. Mapping of geographic areas with potential to sustain dengue virus transmission will support surveillance efforts and assist medical personnel in recognizing potential cases. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^