13 resultados para Prediction Models for Air Pollution
em DigitalCommons@The Texas Medical Center
Resumo:
Many studies have shown relationships between air pollution and the rate of hospital admissions for asthma. A few studies have controlled for age-specific effects by adding separate smoothing functions for each age group. However, it has not yet been reported whether air pollution effects are significantly different for different age groups. This lack of information is the motivation for this study, which tests the hypothesis that air pollution effects on asthmatic hospital admissions are significantly different by age groups. Each air pollutant's effect on asthmatic hospital admissions by age groups was estimated separately. In this study, daily time-series data for hospital admission rates from seven cities in Korea from June 1999 through 2003 were analyzed. The outcome variable, daily hospital admission rates for asthma, was related to five air pollutants which were used as the independent variables, namely particulate matter <10 micrometers (μm) in aerodynamic diameter (PM10), carbon monoxide (CO), ozone (O3), nitrogen dioxide (NO2), and sulfur dioxide (SO2). Meteorological variables were considered as confounders. Admission data were divided into three age groups: children (<15 years of age), adults (ages 15-64), and elderly (≥ 65 years of age). The adult age group was considered to be the reference group for each city. In order to estimate age-specific air pollution effects, the analysis was separated into two stages. In the first stage, Generalized Additive Models (GAMs) with cubic spline for smoothing were applied to estimate the age-city-specific air pollution effects on asthmatic hospital admission rates by city and age group. In the second stage, the Bayesian Hierarchical Model with non-informative prior which has large variance was used to combine city-specific effects by age groups. The hypothesis test showed that the effects of PM10, CO and NO2 were significantly different by age groups. Assuming that the air pollution effect for adults is zero as a reference, age-specific air pollution effects were: -0.00154 (95% confidence interval(CI)= (-0.0030,-0.0001)) for children and 0.00126 (95% CI = (0.0006, 0.0019)) for the elderly for PM 10; -0.0195 (95% CI = (-0.0386,-0.0004)) for children for CO; and 0.00494 (95% CI = (0.0028, 0.0071)) for the elderly for NO2. Relative rates (RRs) were 1.008 (95% CI = (1.000-1.017)) in adults and 1.021 (95% CI = (1.012-1.030)) in the elderly for every 10 μg/m3 increase of PM10 , 1.019 (95% CI = (1.005-1.033)) in adults and 1.022 (95% CI = (1.012-1.033)) in the elderly for every 0.1 part per million (ppm) increase of CO; 1.006 (95%CI = (1.002-1.009)) and 1.019 (95%CI = (1.007-1.032)) in the elderly for every 1 part per billion (ppb) increase of NO2 and SO2, respectively. Asthma hospital admissions were significantly increased for PM10 and CO in adults, and for PM10, CO, NO2 and SO2 in the elderly.^
Resumo:
An investigation was undertaken to determine the chemical characterization of inhalable particulate matter in the Houston area, with special emphasis on source identification and apportionment of outdoor and indoor atmospheric aerosols using multivariate statistical analyses.^ Fine (<2.5 (mu)m) particle aerosol samples were collected by means of dichotomous samplers at two fixed site (Clear Lake and Sunnyside) ambient monitoring stations and one mobile monitoring van in the Houston area during June-October 1981 as part of the Houston Asthma Study. The mobile van allowed particulate sampling to take place both inside and outside of twelve homes.^ The samples collected for 12-h sampling on a 7 AM-7 PM and 7 PM-7 AM (CDT) schedule were analyzed for mass, trace elements, and two anions. Mass was determined gravimetrically. An energy-dispersive X-ray fluorescence (XRF) spectrometer was used for determination of elemental composition. Ion chromatography (IC) was used to determine sulfate and nitrate.^ Average chemical compositions of fine aerosol at each site were presented. Sulfate was found to be the largest single component in the fine fraction mass, comprising approximately 30% of the fine mass outdoors and 12% indoors, respectively.^ Principal components analysis (PCA) was applied to identify sources of aerosols and to assess the role of meteorological factors on the variation in particulate samples. The results suggested that meteorological parameters were not associated with sources of aerosol samples collected at these Houston sites.^ Source factor contributions to fine mass were calculated using a combination of PCA and stepwise multivariate regression analysis. It was found that much of the total fine mass was apparently contributed by sulfate-related aerosols. The average contributions to the fine mass coming from the sulfate-related aerosols were 56% of the Houston outdoor ambient fine particulate matter and 26% of the indoor fine particulate matter.^ Characterization of indoor aerosol in residential environments was compared with the results for outdoor aerosols. It was suggested that much of the indoor aerosol may be due to outdoor sources, but there may be important contributions from common indoor sources in the home environment such as smoking and gas cooking. ^
Resumo:
The association between fine particulate matter air pollution (PM2.5) and cardiovascular disease (CVD) mortality was spatially analyzed for Harris County, Texas, at the census tract level. The objective was to assess how increased PM2.5 exposure related to CVD mortality in this area while controlling for race, income, education, and age. An estimated exposure raster was created for Harris County using Kriging to estimate the PM2.5 exposure at the census tract level. The PM2.5 exposure and the CVD mortality rates were analyzed in an Ordinary Least Squares (OLS) regression model and the residuals were subsequently assessed for spatial autocorrelation. Race, median household income, and age were all found to be significant (p<0.05) predictors in the model. This study found that for every one μg/m3 increase in PM2.5 exposure, holding age and education variables constant, an increase of 16.57 CVD deaths per 100,000 would be predicted for increased minimum exposure values and an increase of 14.47 CVD deaths per 100,000 would be predicted for increased maximum exposure values. This finding supports previous studies associating PM2.5 exposure with CVD mortality. This study further identified the areas of greatest PM2.5 exposure in Harris County as being the geographical locations of populations with the highest risk of CVD (i.e., predominantly older, low-income populations with a predominance of African Americans). The magnitude of the effect of PM2.5 exposure on CVD mortality rates in the study region indicates a need for further community-level studies in Harris County, and suggests that reducing excess PM2.5 exposure would reduce CVD mortality.^
Resumo:
Southeast Texas, including Houston, has a large presence of industrial facilities and has been documented to have poorer air quality and significantly higher cancer rates than the remainder of Texas. Given citizens’ concerns in this 4th largest city in the U.S., Mayor Bill White recently partnered with the UT School of Public Health to determine methods to evaluate the health risks of hazardous air pollutants (HAPs). Sexton et al. (2007) published a report that strongly encouraged analytic studies linking these pollutants with health outcomes. In response, we set out to complete the following aims: 1. determine the optimal exposure assessment strategy to assess the association between childhood cancer rates and increased ambient levels of benzene and 1,3-butadiene (in an ecologic setting) and 2. evaluate whether census tracts with the highest levels of benzene or 1,3-butadiene have higher incidence of childhood lymphohematopoietic cancer compared with census tracts with the lowest levels of benzene or 1,3-butadiene, using Poisson regression. The first aim was achieved by evaluating the usefulness of four data sources: geographic information systems (GIS) to identify proximity to point sources of industrial air pollution, industrial emission data from the U.S. EPA’s Toxic Release Inventory (TRI), routine monitoring data from the U.S. EPA Air Quality System (AQS) from 1999-2000 and modeled ambient air levels from the U.S. EPA’s 1999 National Air Toxic Assessment Project (NATA) ASPEN model. Further, once these four data sources were evaluated, we narrowed them down to two: the routine monitoring data from the AQS for the years 1998-2000 and the 1999 U.S. EPA NATA ASPEN modeled data. We applied kriging (spatial interpolation) methodology to the monitoring data and compared the kriged values to the ASPEN modeled data. Our results indicated poor agreement between the two methods. Relative to the U.S. EPA ASPEN modeled estimates, relying on kriging to classify census tracts into exposure groups would have caused a great deal of misclassification. To address the second aim, we additionally obtained childhood lymphohematopoietic cancer data for 1995-2004 from the Texas Cancer Registry. The U.S. EPA ASPEN modeled data were used to estimate ambient levels of benzene and 1,3-butadiene in separate Poisson regression analyses. All data were analyzed at the census tract level. We found that census tracts with the highest benzene levels had elevated rates of all leukemia (rate ratio (RR) = 1.37; 95% confidence interval (CI), 1.05-1.78). Among census tracts with the highest 1,3-butadiene levels, we observed RRs of 1.40 (95% CI, 1.07-1.81) for all leukemia. We detected no associations between benzene or 1,3-butadiene levels and childhood lymphoma incidence. This study is the first to examine this association in Harris and surrounding counties in Texas and is among the first to correlate monitored levels of HAPs with childhood lymphohematopoietic cancer incidence, evaluating several analytic methods in an effort to determine the most appropriate approach to test this association. Despite recognized weakness of ecologic analyses, our analysis suggests an association between childhood leukemia and hazardous air pollution.^
Resumo:
Although the area under the receiver operating characteristic (AUC) is the most popular measure of the performance of prediction models, it has limitations, especially when it is used to evaluate the added discrimination of a new biomarker in the model. Pencina et al. (2008) proposed two indices, the net reclassification improvement (NRI) and integrated discrimination improvement (IDI), to supplement the improvement in the AUC (IAUC). Their NRI and IDI are based on binary outcomes in case-control settings, which do not involve time-to-event outcome. However, many disease outcomes are time-dependent and the onset time can be censored. Measuring discrimination potential of a prognostic marker without considering time to event can lead to biased estimates. In this dissertation, we have extended the NRI and IDI to survival analysis settings and derived the corresponding sample estimators and asymptotic tests. Simulation studies were conducted to compare the performance of the time-dependent NRI and IDI with Pencina’s NRI and IDI. For illustration, we have applied the proposed method to a breast cancer study.^ Key words: Prognostic model, Discrimination, Time-dependent NRI and IDI ^
Resumo:
The federal regulatory regime for addressing airborne toxic pollutants functions fairly well in most of the country. However, it has proved deficient in addressing local risk issues, especially in urban areas with densely concentrated sources. The problem is especially pronounced in Houston, which is home to one of the world's biggest petrochemical complexes and a major port, both located near a large metropolitan center. Despite the fact that local government's role in regulating air toxics is typically quite limited, from 2004-2009, the City of Houston implemented a novel municipality-based air toxics reduction strategy. The initiatives ranged from voluntary agreements to litigation and legislation. This case study considers why the city chose the policy tools it did, how the tools performed relative to the designers' intentions, and how the debate among actors with conflicting values and goals shaped the policy landscape. The city's unconventional approach to controlling hazardous air pollution has not yet been examined rigorously. The case study was developed through reviews of publicly available documents and quasi-public documents obtained through public record requests, as well as interviews with key informants. The informants represented a range of experience and perspectives. They included current and former public officials at the city (including Mayor White), former Texas Commission on Environmental Quality staff, faculty at local universities, industry representatives, and environmental public health advocates. Some of the city's tools were successful in meeting their designers' intent, some were less successful. Ultimately, even those tools that did not achieve their stated purpose were nonetheless successful in bringing attention and resources to the air quality issue. Through a series of pleas and prods, the city managed to draw attention to the problem locally and get reluctant policymakers at higher levels of government to respond. This work demonstrates the potential for local government to overcome limitations in the federal regulatory regime for air toxics control, shifting the balance of local, state, and federal initiative. It also highlights the importance of flexible, cooperative strategies in local environmental protection.^
Resumo:
Exposure to air pollutants in urban locales has been associated with increased risk for chronic diseases including cardiovascular disease (CVD) and pulmonary diseases in epidemiological studies. The exact mechanism explaining how air pollution affects chronic disease is still unknown. However, oxidative stress and inflammatory pathways have been posited as likely mechanisms. ^ Data from the Multi-Ethnic Study of Atherosclerosis (MESA) and the Mexican-American Cohort Study (2003-2009) were used to examine the following aims, respectively: 1) to evaluate the association between long-term exposure to ambient particulate matter (PM) (PM10 and PM2.5) and nitrogen oxides (NO x) and telomere length (TL) among approximately 1,000 participants within MESA; and 2) to evaluate the association between traffic-related air pollution with self-reported asthma, diabetes, and hypertension among Mexican-Americans in Houston, Texas. ^ Our results from MESA were inconsistent regarding associations between long-term exposure to air pollution and shorter telomere length based on whether the participants came from New York (NY) or Los Angeles (LA). Although not statistically significant, we observed a negative association between long-term air pollution exposure and mean telomere length for NY participants, which was consistent with our hypothesis. Positive (statistically insignificant) associations were observed for LA participants. It is possible that our findings were more influenced by both outcome and exposure misclassification than by the absence of a relationship between pollution and TL. Future studies are needed that include longitudinal measures of telomere length as well as focus on effects of specific constituents of PM and other pollutant exposures on changes in telomere length over time. ^ This research provides support that Mexican-American adults who live near a major roadway or in close proximity to a dense street network have a higher prevalence of asthma. There was a non-significant trend towards an increased prevalence of adult asthma with increasing residential traffic exposure especially for residents who lived three or more years at their baseline address. Even though the prevalence of asthma is low in the Mexican-origin population, it is the fastest growing minority group in the U.S. and we would expect a growing number of Mexican-Americans who suffer from asthma in the future. Future studies are needed to better characterize risks for asthma associated with air pollution in this population.^
Resumo:
There is scant evidence regarding the associations between ambient levels of combustion pollutants and small for gestational age (SGA) infants. No studies of this type have been completed in the Southern United States. The main objective of the project presented was to determine associations between combustion pollutants and SGA infants in Texas using three different exposure assessments. ^ Birth certificate data that contained information on maternal and infant characteristics were obtained from the Texas Department of State Health Services (TX DSHS). Exposure assessment data for the three aims came from: (1) U.S. Environmental Protection Agency (EPA) National Air Toxics Assessment (NATA), (2) U.S. EPA Air Quality System (AQS), and (3) TX Department of Transportation (DOT), respectively. Multiple logistic regression models were used to determine the associations between combustion pollutants and SGA. ^ For the first study looked at annual estimates of four air toxics at the census tract level in the Greater Houston Area. After controlling for maternal race, maternal education, tobacco use, maternal age, number of prenatal visits, marital status, maternal weight gain, and median census tract income level, adjusted ORs and 95% confidence intervals (CI) for exposure to PAHs (per 10 ng/m3), naphthalene (per 10 ng/m3), benzene (per 1 µg/m3), and diesel engine emissions (per 10 µg/m3) were 1.01 (0.97–1.05), 1.00 (0.99–1.01), 1.01 (0.97–1.05), and 1.08 (0.95–1.23) respectively. For the second study looking at Hispanics in El Paso County, AORs and 95% confidence intervals (CI) for increases of 5 ng/m3 for the sum of carcinogenic PAHs (Σ c-PAHs), 1 ng/m3 of benzo[a]pyrene, and 100 ng/m3 in naphthalene during the third trimester of pregnancy were 1.02 (0.97–1.07), 1.03 (0.96–1.11), and 1.01 (0.97–1.06), respectively. For the third study using maternal proximity to major roadways as the exposure metric, there was a negative association with increasing distance from a maternal residence to the nearest major roadway (Odds Ratio (OR) = 0.96; 95% CI = 0.94–0.97) per 1000 m); however, once adjusted for covariates this effect was no longer significant (AOR = 0.98; 95% CI = 0.96–1.00). There was no association with distance weighted traffic density (DWTD). ^ This project is the first to look at SGA and combustion pollutants in the Southern United States with three different exposure metrics. Although there was no evidence of associations found between SGA and the air pollutants mentioned in these studies, the results contribute to the body of literature assessing maternal exposure to ambient air pollution and adverse birth outcomes. ^
Resumo:
Objectives. Predict who will develop a dissection. To create male and female prediction models using the risk factors: age, ethnicity, hypertension, high cholesterol, smoking, alcohol use, diabetes, heart attack, congestive heart failure, congenital and non-congenital heart disease, Marfan syndrome, and bicuspid aortic valve. ^ Methods. Using 572 patients diagnosed with aortic aneurysms, a model was developed for each of males and females using 80% of the data and then verified using the remaining 20% of the data. ^ Results. The male model predicted the probability of a male in having a dissection (p=0.076) and the female model predicted the probability of a female in having a dissection (p=0.054). The validation models did not support the choice of the developmental models. ^ Conclusions. The best models obtained suggested that those who are at a greater risk of having a dissection are males with non-congenital heart disease and who drink alcohol, and females with non-congenital heart disease and bicuspid aortic valve.^
Resumo:
The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.
Resumo:
It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays as well as next generation sequencing assays interrogating somatic mutation, insertion, deletion, translocation and structural rearrangements. Given the massive amount of data, a major challenge is to integrate information from multiple sources and formulate testable hypotheses. This thesis focuses on developing methodologies for integrative analyses of genomic assays profiled on the same set of samples. We have developed several novel methods for integrative biomarker identification and cancer classification. We introduce a regression-based approach to identify biomarkers predictive to therapy response or survival by integrating multiple assays including gene expression, methylation and copy number data through penalized regression. To identify key cancer-specific genes accounting for multiple mechanisms of regulation, we have developed the integIRTy software that provides robust and reliable inferences about gene alteration by automatically adjusting for sample heterogeneity as well as technical artifacts using Item Response Theory. To cope with the increasing need for accurate cancer diagnosis and individualized therapy, we have developed a robust and powerful algorithm called SIBER to systematically identify bimodally expressed genes using next generation RNAseq data. We have shown that prediction models built from these bimodal genes have the same accuracy as models built from all genes. Further, prediction models with dichotomized gene expression measurements based on their bimodal shapes still perform well. The effectiveness of outcome prediction using discretized signals paves the road for more accurate and interpretable cancer classification by integrating signals from multiple sources.
Resumo:
Maximizing data quality may be especially difficult in trauma-related clinical research. Strategies are needed to improve data quality and assess the impact of data quality on clinical predictive models. This study had two objectives. The first was to compare missing data between two multi-center trauma transfusion studies: a retrospective study (RS) using medical chart data with minimal data quality review and the PRospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study with standardized quality assurance. The second objective was to assess the impact of missing data on clinical prediction algorithms by evaluating blood transfusion prediction models using PROMMTT data. RS (2005-06) and PROMMTT (2009-10) investigated trauma patients receiving ≥ 1 unit of red blood cells (RBC) from ten Level I trauma centers. Missing data were compared for 33 variables collected in both studies using mixed effects logistic regression (including random intercepts for study site). Massive transfusion (MT) patients received ≥ 10 RBC units within 24h of admission. Correct classification percentages for three MT prediction models were evaluated using complete case analysis and multiple imputation based on the multivariate normal distribution. A sensitivity analysis for missing data was conducted to estimate the upper and lower bounds of correct classification using assumptions about missing data under best and worst case scenarios. Most variables (17/33=52%) had <1% missing data in RS and PROMMTT. Of the remaining variables, 50% demonstrated less missingness in PROMMTT, 25% had less missingness in RS, and 25% were similar between studies. Missing percentages for MT prediction variables in PROMMTT ranged from 2.2% (heart rate) to 45% (respiratory rate). For variables missing >1%, study site was associated with missingness (all p≤0.021). Survival time predicted missingness for 50% of RS and 60% of PROMMTT variables. MT models complete case proportions ranged from 41% to 88%. Complete case analysis and multiple imputation demonstrated similar correct classification results. Sensitivity analysis upper-lower bound ranges for the three MT models were 59-63%, 36-46%, and 46-58%. Prospective collection of ten-fold more variables with data quality assurance reduced overall missing data. Study site and patient survival were associated with missingness, suggesting that data were not missing completely at random, and complete case analysis may lead to biased results. Evaluating clinical prediction model accuracy may be misleading in the presence of missing data, especially with many predictor variables. The proposed sensitivity analysis estimating correct classification under upper (best case scenario)/lower (worst case scenario) bounds may be more informative than multiple imputation, which provided results similar to complete case analysis.^