8 resultados para Data Driven Clustering
em DigitalCommons@The Texas Medical Center
Resumo:
The overarching goal of the Pathway Semantics Algorithm (PSA) is to improve the in silico identification of clinically useful hypotheses about molecular patterns in disease progression. By framing biomedical questions within a variety of matrix representations, PSA has the flexibility to analyze combined quantitative and qualitative data over a wide range of stratifications. The resulting hypothetical answers can then move to in vitro and in vivo verification, research assay optimization, clinical validation, and commercialization. Herein PSA is shown to generate novel hypotheses about the significant biological pathways in two disease domains: shock / trauma and hemophilia A, and validated experimentally in the latter. The PSA matrix algebra approach identified differential molecular patterns in biological networks over time and outcome that would not be easily found through direct assays, literature or database searches. In this dissertation, Chapter 1 provides a broad overview of the background and motivation for the study, followed by Chapter 2 with a literature review of relevant computational methods. Chapters 3 and 4 describe PSA for node and edge analysis respectively, and apply the method to disease progression in shock / trauma. Chapter 5 demonstrates the application of PSA to hemophilia A and the validation with experimental results. The work is summarized in Chapter 6, followed by extensive references and an Appendix with additional material.
Resumo:
Similar to other health care processes, referrals are susceptible to breakdowns. These breakdowns in the referral process can lead to poor continuity of care, slow diagnostic processes, delays and repetition of tests, patient and provider dissatisfaction, and can lead to a loss of confidence in providers. These facts and the necessity for a deeper understanding of referrals in healthcare served as the motivation to conduct a comprehensive study of referrals. The research began with the real problem and need to understand referral communication as a mean to improve patient care. Despite previous efforts to explain referrals and the dynamics and interrelations of the variables that influence referrals there is not a common, contemporary, and accepted definition of what a referral is in the health care context. The research agenda was guided by the need to explore referrals as an abstract concept by: 1) developing a conceptual definition of referrals, and 2) developing a model of referrals, to finally propose a 3) comprehensive research framework. This dissertation has resulted in a standard conceptual definition of referrals and a model of referrals. In addition a mixed-method framework to evaluate referrals was proposed, and finally a data driven model was developed to predict whether a referral would be approved or denied by a specialty service. The three manuscripts included in this dissertation present the basis for studying and assessing referrals using a common framework that should allow an easier comparative research agenda to improve referrals taking into account the context where referrals occur.
Resumo:
A crucial link in preserving and protecting the future of our communities resides in maintaining the health and well being of our youth. While every member of the community owns an opinion regarding where to best utilize monies for prevention and intervention, the data to support such opinion is often scarce. In an effort to generate data-driven indices for community planning and action, the United Way of Comal County, Texas partnered with the University Of Texas - Houston Health Science Center, School Of Public Health to accomplish a county-specific needs assessment. A community-based participatory research emphasis utilizing the Mobilization for Action through Planning and Partnership (MAPP) format developed by the National Association of City and County Health Officials (NACCHO) was implemented to engage community members in identifying and addressing community priorities. The single greatest area of consensus and concern identified by community members was the health and well being of the youth population. Thus, a youth survey, targeting these specific areas of community concern, was designed, coordinated and administered to all 9-11th grade students in the county. 20% of the 3,698 completed surveys (72% response rate) were randomly selected for analysis. These 740 surveys were coded and scanned into an electronic survey database. Statistical analysis provided youth-reported data on the status of the multiple issues affecting the health and well being of the community's youth. These data will be reported back to the community stakeholders, as part of the larger Comal County Needs Assessment, for the purposes of community planning and action. Survey data will provide community planners with an awareness of the high risk behaviors and habit patterns amongst their youth. This knowledge will permit more effective targeting of the means for encouraging healthy behaviors and preventing the spread of disease. Further, the community-oriented, population-based nature of this effort will provide answers to questions raised by the community and will provide an effective launching pad for the development and implementation of targeted, preventive health strategies. ^
Resumo:
The objectives of this study were to identify and measure the average outcomes of the Open Door Mission's nine-month community-based substance abuse treatment program, identify predictors of successful outcomes, and make recommendations to the Open Door Mission for improving its treatment program.^ The Mission's program is exclusive to adult men who have limited financial resources: most of which were homeless or dependent on parents or other family members for basic living needs. Many, but not all, of these men are either chemically dependent or have a history of substance abuse.^ This study tracked a cohort of the Mission's graduates throughout this one-year study and identified various indicators of success at short-term intervals, which may be predictive of longer-term outcomes. We tracked various levels of 12-step program involvement, as well as other social and spiritual activities, such as church affiliation and recovery support.^ Twenty-four of the 66 subjects, or 36% met the Mission's requirements for success. Specific to this success criteria; Fifty-four, or 82% reported affiliation with a home church; Twenty-six, or 39% reported full-time employment; Sixty-one, or 92% did not report or were not identified as having any post-treatment arrests or incarceration, and; Forty, or 61% reported continuous abstinence from both drugs and alcohol.^ Five research-based hypotheses were developed and tested. The primary analysis tool was the web-based non-parametric dependency modeling tool, B-Course, which revealed some strong associations with certain variables, and helped the researchers generate and test several data-driven hypotheses. Full-time employment is the greatest predictor of abstinence: 95% of those who reported full time employment also reported continuous post-treatment abstinence, while 50% of those working part-time were abstinent and 29% of those with no employment were abstinent. Working with a 12-step sponsor, attending aftercare, and service with others were identified as predictors of abstinence.^ This study demonstrates that associations with abstinence and the ODM success criteria are not simply based on one social or behavioral factor. Rather, these relationships are interdependent, and show that abstinence is achieved and maintained through a combination of several 12-step recovery activities. This study used a simple assessment methodology, which demonstrated strong associations across variables and outcomes, which have practical applicability to the Open Door Mission for improving its treatment program. By leveraging the predictive capability of the various success determination methodologies discussed and developed throughout this study, we can identify accurate outcomes with both validity and reliability. This assessment instrument can also be used as an intervention that, if operationalized to the Mission’s clients during the primary treatment program, may measurably improve the effectiveness and outcomes of the Open Door Mission.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
Semantic Web technologies offer a promising framework for integration of disparate biomedical data. In this paper we present the semantic information integration platform under development at the Center for Clinical and Translational Sciences (CCTS) at the University of Texas Health Science Center at Houston (UTHSC-H) as part of our Clinical and Translational Science Award (CTSA) program. We utilize the Semantic Web technologies not only for integrating, repurposing and classification of multi-source clinical data, but also to construct a distributed environment for information sharing, and collaboration online. Service Oriented Architecture (SOA) is used to modularize and distribute reusable services in a dynamic and distributed environment. Components of the semantic solution and its overall architecture are described.
Resumo:
Double minutes (dm) are small chromatin particles of 0.3 microns diameter found only in the metaphase cells of human and murine tumors. Dm are unique cytogenetic structures since their numbers per cell show wide variation. At cell division, dm are retained despite the lack of centromeres. In squash preparations, dm show clustering often in association with chromosomes. Human carcinoma cell line SW613-S18 was found to have large numbers of dm and biological characteristics favorable for mitotic synchronization and chromosome isolation experiments.^ S18 cells were synchronized to mitosis with metabolic and mitotic blocking compounds. Mitotic cells were lysed to release chromosomes and dm from the mitotic spindle and the resulting suspensions were fractionated to enrich for dm. The DNA in enriched fractions was characterized. The reassociation kinetics of dm-DNA driven with placental human DNA was similar to the reassociation curve of labeled placental DNA under similar conditions. In situ hybridization of dm-DNA to tumor and normal metaphase cells showed grain localization over the entire karyotype. Dm-DNA was shown by pulse chase DNA replication experiments to replicate during early and mid S-phase of the cell cycle, but not in late S-phase. In addition, BrdUrd incorporation studies showed that dm-DNA replicates only once during the S-phase. Premature chromosome condensation studies suggest the basis of numerical heterogeneity of dm is nondisjunction, not anomalous or unscheduled DNA replication.^ These data and previous cytochemical banding studies of dm in SW613-S18 indicate that dm-DNA is chromosomal in origin. No evidence of gene amplification was found in the DNA reassociation data. It is likely that dm-DNA represents the pale-staining G-band regions of the human karyotype in this cell line. ^
Resumo:
To reach the goals established by the Institute of Medicine (IOM) and the Centers for Disease Control's (CDC) STOP TB USA, measures must be taken to curtail a future peak in Tuberculosis (TB) incidence and speed the currently stagnant rate of TB elimination. Both efforts will require, at minimum, the consideration and understanding of the third dimension of TB transmission: the location-based spread of an airborne pathogen among persons known and unknown to each other. This consideration will require an elucidation of the areas within the U.S. that have endemic TB. The Houston Tuberculosis Initiative (HTI) was a population-based active surveillance of confirmed Houston/Harris County TB cases from 1995–2004. Strengths in this dataset include the molecular characterization of laboratory confirmed cases, the collection of geographic locations (including home addresses) frequented by cases, and the HTI time period that parallels a decline in TB incidence in the United States (U.S.). The HTI dataset was used in this secondary data analysis to implement a GIS analysis of TB cases, the locations frequented by cases, and their association with risk factors associated with TB transmission. ^ This study reports, for the first time, the incidence of TB among the homeless in Houston, Texas. The homeless are an at-risk population for TB disease, yet they are also a population whose TB incidence has been unknown and unreported due to their non-enumeration. The first section of this dissertation identifies local areas in Houston with endemic TB disease. Many Houston TB cases who reported living in these endemic areas also share the TB risk factor of current or recent homelessness. Merging the 2004–2005 Houston enumeration of the homeless with historical HTI surveillance data of TB cases in Houston enabled this first-time report of TB risk among the homeless in Houston. The homeless were more likely to be US-born, belong to a genotypic cluster, and belong to a cluster of a larger size. The calculated average incidence among homeless persons was 411/100,000, compared to 9.5/100,000 among housed. These alarming rates are not driven by a co-infection but by social determinants. The unsheltered persons were hospitalized more days and required more follow-up time by staff than those who reported a steady housing situation. The homeless are a specific example of the increased targeting of prevention dollars that could occur if TB rates were reported for specific areas with known health disparities rather than as a generalized rate normalized over a diverse population. ^ It has been estimated that 27% of Houstonians use public transportation. The city layout allows bus routes to run like veins connecting even the most diverse of populations within the metropolitan area. Secondary data analysis of frequent bus use (defined as riding a route weekly) among TB cases was assessed for its relationship with known TB risk factors. The spatial distribution of genotypic clusters associated with bus use was assessed, along with the reported routes and epidemiologic-links among cases belonging to the identified clusters. ^ TB cases who reported frequent bus use were more likely to have demographic and social risk factors associated with poverty, immune suppression and health disparities. An equal proportion of bus riders and non-bus riders were cultured for Mycobacterium tuberculosis, yet 75% of bus riders were genotypically clustered, indicating recent transmission, compared to 56% of non-bus riders (OR=2.4, 95%CI(2.0, 2.8), p<0.001). Bus riders had a mean cluster size of 50.14 vs. 28.9 (p<0.001). Second order spatial analysis of clustered fingerprint 2 (n=122), a Beijing family cluster, revealed geographic clustering among cases based on their report of bus use. Univariate and multivariate analysis of routes reported by cases belonging to these clusters found that 10 of the 14 clusters were associated with use. Individual Metro routes, including one route servicing the local hospitals, were found to be risk factors for belonging to a cluster shown to be endemic in Houston. The routes themselves geographically connect the census tracts previously identified as having endemic TB. 78% (15/23) of Houston Metro routes investigated had one or more print groups reporting frequent use for every HTI study year. We present data on three specific but clonally related print groups and show that bus-use is clustered in time by route and is the only known link between cases in one of the three prints: print 22. (Abstract shortened by UMI.)^