19 resultados para process data
em DigitalCommons@The Texas Medical Center
Resumo:
The current state of health and biomedicine includes an enormity of heterogeneous data ‘silos’, collected for different purposes and represented differently, that are presently impossible to share or analyze in toto. The greatest challenge for large-scale and meaningful analyses of health-related data is to achieve a uniform data representation for data extracted from heterogeneous source representations. Based upon an analysis and categorization of heterogeneities, a process for achieving comparable data content by using a uniform terminological representation is developed. This process addresses the types of representational heterogeneities that commonly arise in healthcare data integration problems. Specifically, this process uses a reference terminology, and associated "maps" to transform heterogeneous data to a standard representation for comparability and secondary use. The capture of quality and precision of the “maps” between local terms and reference terminology concepts enhances the meaning of the aggregated data, empowering end users with better-informed queries for subsequent analyses. A data integration case study in the domain of pediatric asthma illustrates the development and use of a reference terminology for creating comparable data from heterogeneous source representations. The contribution of this research is a generalized process for the integration of data from heterogeneous source representations, and this process can be applied and extended to other problems where heterogeneous data needs to be merged.
Resumo:
People often use tools to search for information. In order to improve the quality of an information search, it is important to understand how internal information, which is stored in user’s mind, and external information, represented by the interface of tools interact with each other. How information is distributed between internal and external representations significantly affects information search performance. However, few studies have examined the relationship between types of interface and types of search task in the context of information search. For a distributed information search task, how data are distributed, represented, and formatted significantly affects the user search performance in terms of response time and accuracy. Guided by UFuRT (User, Function, Representation, Task), a human-centered process, I propose a search model, task taxonomy. The model defines its relationship with other existing information models. The taxonomy clarifies the legitimate operations for each type of search task of relation data. Based on the model and taxonomy, I have also developed prototypes of interface for the search tasks of relational data. These prototypes were used for experiments. The experiments described in this study are of a within-subject design with a sample of 24 participants recruited from the graduate schools located in the Texas Medical Center. Participants performed one-dimensional nominal search tasks over nominal, ordinal, and ratio displays, and searched one-dimensional nominal, ordinal, interval, and ratio tasks over table and graph displays. Participants also performed the same task and display combination for twodimensional searches. Distributed cognition theory has been adopted as a theoretical framework for analyzing and predicting the search performance of relational data. It has been shown that the representation dimensions and data scales, as well as the search task types, are main factors in determining search efficiency and effectiveness. In particular, the more external representations used, the better search task performance, and the results suggest the ideal search performance occurs when the question type and corresponding data scale representation match. The implications of the study lie in contributing to the effective design of search interface for relational data, especially laboratory results, which are often used in healthcare activities.
Resumo:
Any functionally important mutation is embedded in an evolutionary matrix of other mutations. Cladistic analysis, based on this, is a method of investigating gene effects using a haplotype phylogeny to define a set of tests which localize causal mutations to branches of the phylogeny. Previous implementations of cladistic analysis have not addressed the issue of analyzing data from related individuals, though in human studies, family data are usually needed to obtain unambiguous haplotypes. In this study, a method of cladistic analysis is described in which haplotype effects are parameterized in a linear model which accounts for familial correlations. The method was used to study the effect of apolipoprotein (Apo) B gene variation on total-, LDL-, and HDL-cholesterol, triglyceride, and Apo B levels in 121 French families. Five polymorphisms defined Apo B haplotypes: the signal peptide Insertion/deletion, Bsp 1286I, XbaI, MspI, and EcoRI. Eleven haplotypes were found, and a haplotype phylogeny was constructed and used to define a set of tests of haplotype effects on lipid and apo B levels.^ This new method of cladistic analysis, the parametric method, found significant effects for single haplotypes for all variables. For HDL-cholesterol, 3 clusters of evolutionarily-related haplotypes affecting levels were found. Haplotype effects accounted for about 10% of the genetic variance of triglyceride and HDL-cholesterol levels. The results of the parametric method were compared to those of a method of cladistic analysis based on permutational testing. The permutational method detected fewer haplotype effects, even when modified to account for correlations within families. Simulation studies exploring these differences found evidence of systematic errors in the permutational method due to the process by which haplotype groups were selected for testing.^ The applicability of cladistic analysis to human data was shown. The parametric method is suggested as an improvement over the permutational method. This study has identified candidate haplotypes for sequence comparisons in order to locate the functional mutations in the Apo B gene which may influence plasma lipid levels. ^
Resumo:
Academic and industrial research in the late 90s have brought about an exponential explosion of DNA sequence data. Automated expert systems are being created to help biologists to extract patterns, trends and links from this ever-deepening ocean of information. Two such systems aimed on retrieving and subsequently utilizing phylogenetically relevant information have been developed in this dissertation, the major objective of which was to automate the often difficult and confusing phylogenetic reconstruction process. ^ Popular phylogenetic reconstruction methods, such as distance-based methods, attempt to find an optimal tree topology (that reflects the relationships among related sequences and their evolutionary history) by searching through the topology space. Various compromises between the fast (but incomplete) and exhaustive (but computationally prohibitive) search heuristics have been suggested. An intelligent compromise algorithm that relies on a flexible “beam” search principle from the Artificial Intelligence domain and uses the pre-computed local topology reliability information to adjust the beam search space continuously is described in the second chapter of this dissertation. ^ However, sometimes even a (virtually) complete distance-based method is inferior to the significantly more elaborate (and computationally expensive) maximum likelihood (ML) method. In fact, depending on the nature of the sequence data in question either method might prove to be superior. Therefore, it is difficult (even for an expert) to tell a priori which phylogenetic reconstruction method—distance-based, ML or maybe maximum parsimony (MP)—should be chosen for any particular data set. ^ A number of factors, often hidden, influence the performance of a method. For example, it is generally understood that for a phylogenetically “difficult” data set more sophisticated methods (e.g., ML) tend to be more effective and thus should be chosen. However, it is the interplay of many factors that one needs to consider in order to avoid choosing an inferior method (potentially a costly mistake, both in terms of computational expenses and in terms of reconstruction accuracy.) ^ Chapter III of this dissertation details a phylogenetic reconstruction expert system that selects a superior proper method automatically. It uses a classifier (a Decision Tree-inducing algorithm) to map a new data set to the proper phylogenetic reconstruction method. ^
Resumo:
Analysis of recurrent events has been widely discussed in medical, health services, insurance, and engineering areas in recent years. This research proposes to use a nonhomogeneous Yule process with the proportional intensity assumption to model the hazard function on recurrent events data and the associated risk factors. This method assumes that repeated events occur for each individual, with given covariates, according to a nonhomogeneous Yule process with intensity function λx(t) = λ 0(t) · exp( x′β). One of the advantages of using a non-homogeneous Yule process for recurrent events is that it assumes that the recurrent rate is proportional to the number of events that occur up to time t. Maximum likelihood estimation is used to provide estimates of the parameters in the model, and a generalized scoring iterative procedure is applied in numerical computation. ^ Model comparisons between the proposed method and other existing recurrent models are addressed by simulation. One example concerning recurrent myocardial infarction events compared between two distinct populations, Mexican-American and Non-Hispanic Whites in the Corpus Christi Heart Project is examined. ^
Resumo:
The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^
Resumo:
The joint modeling of longitudinal and survival data is a new approach to many applications such as HIV, cancer vaccine trials and quality of life studies. There are recent developments of the methodologies with respect to each of the components of the joint model as well as statistical processes that link them together. Among these, second order polynomial random effect models and linear mixed effects models are the most commonly used for the longitudinal trajectory function. In this study, we first relax the parametric constraints for polynomial random effect models by using Dirichlet process priors, then three longitudinal markers rather than only one marker are considered in one joint model. Second, we use a linear mixed effect model for the longitudinal process in a joint model analyzing the three markers. In this research these methods were applied to the Primary Biliary Cirrhosis sequential data, which were collected from a clinical trial of primary biliary cirrhosis (PBC) of the liver. This trial was conducted between 1974 and 1984 at the Mayo Clinic. The effects of three longitudinal markers (1) Total Serum Bilirubin, (2) Serum Albumin and (3) Serum Glutamic-Oxaloacetic transaminase (SGOT) on patients' survival were investigated. Proportion of treatment effect will also be studied using the proposed joint modeling approaches. ^ Based on the results, we conclude that the proposed modeling approaches yield better fit to the data and give less biased parameter estimates for these trajectory functions than previous methods. Model fit is also improved after considering three longitudinal markers instead of one marker only. The results from analysis of proportion of treatment effects from these joint models indicate same conclusion as that from the final model of Fleming and Harrington (1991), which is Bilirubin and Albumin together has stronger impact in predicting patients' survival and as a surrogate endpoints for treatment. ^
Resumo:
Background. Health literacy is an important determinant for quality health care, and affects communication between patients and physicians. Poor communication may result in negative effects in health. Improved communication between patients and physicians could positively affect health outcomes. Communication skills are teachable.^ Objectives. (1) to evaluate the process involved in the design and implementation of a health literacy intervention targeting pediatric providers’ communication skills at the Texas Children’s Health Plan in Houston, Texas; and (2) to describe lessons learned from this process that may be used in future attempts to address the issue of health literacy and health communication. ^ Design/methods. The process evaluation of the implementation of a health literacy strategy at the Texas Children’s Health Plan (TCHP) consisted of a critical analysis of all documents and minutes from meetings of the team of investigators. It also involved a secondary analysis of data collected between December 2006 and June 2007. Descriptive statistics, paired t-test and Wilcoxon-signed-rank test were employed in analyzing the data. This information was complemented with a limited review of existing literature on communication skills training programs. ^ Results. The design of the educational intervention followed recommendations from experts in the field of health literacy. The delivery of the intervention was possible and benefited from existing resources and logistics within the TCHP. Very few targeted providers participated in two offerings of the workshop (6.6% and 1.7% respectively). After the educational intervention, providers showed increased knowledge of health literacy facts and its effects in health (p=0.001); increased awareness of the low health literacy problem (p=0.003); increased expectations for change in practice (p=0.002), and intent to use health literacy strategies for communication immediately following the intervention (p=0.001). Low participation indicated the need for further investigation of barriers to, and means for successful implementation of programs aimed to improving health communication. ^ Conclusions. A short, focused intervention utilizing health literacy strategies for communication appeared effective in increasing knowledge and intentions for change in a small group of pediatric providers. ^
Resumo:
Colorectal cancer is the forth most common diagnosed cancer in the United States. Every year about a hundred forty-seven thousand people will be diagnosed with colorectal cancer and fifty-six thousand people lose their lives due to this disease. Most of the hereditary nonpolyposis colorectal cancer (HNPCC) and 12% of the sporadic colorectal cancer show microsatellite instability. Colorectal cancer is a multistep progressive disease. It starts from a mutation in a normal colorectal cell and grows into a clone of cells that further accumulates mutations and finally develops into a malignant tumor. In terms of molecular evolution, the process of colorectal tumor progression represents the acquisition of sequential mutations. ^ Clinical studies use biomarkers such as microsatellite or single nucleotide polymorphisms (SNPs) to study mutation frequencies in colorectal cancer. Microsatellite data obtained from single genome equivalent PCR or small pool PCR can be used to infer tumor progression. Since tumor progression is similar to population evolution, we used an approach known as coalescent, which is well established in population genetics, to analyze this type of data. Coalescent theory has been known to infer the sample's evolutionary path through the analysis of microsatellite data. ^ The simulation results indicate that the constant population size pattern and the rapid tumor growth pattern have different genetic polymorphic patterns. The simulation results were compared with experimental data collected from HNPCC patients. The preliminary result shows the mutation rate in 6 HNPCC patients range from 0.001 to 0.01. The patients' polymorphic patterns are similar to the constant population size pattern which implies the tumor progression is through multilineage persistence instead of clonal sequential evolution. The results should be further verified using a larger dataset. ^
Resumo:
The difficulty of detecting differential gene expression in microarray data has existed for many years. Several correction procedures try to avoid the family-wise error rate in multiple comparison process, including the Bonferroni and Sidak single-step p-value adjustments, Holm's step-down correction method, and Benjamini and Hochberg's false discovery rate (FDR) correction procedure. Each multiple comparison technique has its advantages and weaknesses. We studied each multiple comparison method through numerical studies (simulations) and applied the methods to the real exploratory DNA microarray data, which detect of molecular signatures in papillary thyroid cancer (PTC) patients. According to our results of simulation studies, Benjamini and Hochberg step-up FDR controlling procedure is the best process among these multiple comparison methods and we discovered 1277 potential biomarkers among 54675 probe sets after applying the Benjamini and Hochberg's method to PTC microarray data.^
Resumo:
Introduction: The Texas Occupational Safety & Health Surveillance System (TOSHSS) was created to collect, analyze and interpret occupational injury and illness data in order to decrease the impact of occupational injuries within the state of Texas. This process evaluation was performed midway through the 4-year grant to assess the efficiency and effectiveness of the surveillance system’s planning and implementation activities1. ^ Methods: Two evaluation guidelines published by the Centers for Disease Control and Prevention (CDC) were used as the theoretical models for this process evaluation. The Framework for Program Evaluation in Public Health was used to examine the planning and design of TOSHSS using logic models. The Framework for Evaluating Public Health Surveillance Systems was used to examine the implementation of approximately 60 surveillance activities, including uses of the data obtained from the surveillance system. ^ Results/Discussion: TOSHSS planning activities omitted the creation of a scientific advisory committee and specific activities designed to maintain contacts with stakeholders; and proposed activities should be reassessed and aligned with ongoing performance measurement criteria, including the role of collaborators in helping the surveillance system achieve each proposed activity. TOSHSS implementation activities are substantially meeting expectations and received an overall score of 61% for all activities being performed. TOSHSS is considered a surveillance system that is simple, flexible, acceptable, fairly stable, timely, moderately useful, with good data quality and a PVP of 86%. ^ Conclusions: Through the third year of TOSHSS implementation, the surveillance system is has made a considerable contribution to the collection of occupational injury and illness information within the state of Texas. Implementation of the nine recommendations provided under this process evaluation is expected to increase the overall usefulness of the surveillance system and assist TDSHS in reducing occupational fatalities, injuries, and diseases within the state of Texas. ^ 1 Disclaimer: The Texas Occupational Safety and Health Surveillance System is supported by Grant/Cooperative Agreement Number (U60 OH008473-01A1). The content of the current evaluation are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health.^
Resumo:
The need for timely population data for health planning and Indicators of need has Increased the demand for population estimates. The data required to produce estimates is difficult to obtain and the process is time consuming. Estimation methods that require less effort and fewer data are needed. The structure preserving estimator (SPREE) is a promising technique not previously used to estimate county population characteristics. This study first uses traditional regression estimation techniques to produce estimates of county population totals. Then the structure preserving estimator, using the results produced in the first phase as constraints, is evaluated.^ Regression methods are among the most frequently used demographic methods for estimating populations. These methods use symptomatic indicators to predict population change. This research evaluates three regression methods to determine which will produce the best estimates based on the 1970 to 1980 indicators of population change. Strategies for stratifying data to improve the ability of the methods to predict change were tested. Difference-correlation using PMSA strata produced the equation which fit the data the best. Regression diagnostics were used to evaluate the residuals.^ The second phase of this study is to evaluate use of the structure preserving estimator in making estimates of population characteristics. The SPREE estimation approach uses existing data (the association structure) to establish the relationship between the variable of interest and the associated variable(s) at the county level. Marginals at the state level (the allocation structure) supply the current relationship between the variables. The full allocation structure model uses current estimates of county population totals to limit the magnitude of county estimates. The limited full allocation structure model has no constraints on county size. The 1970 county census age - gender population provides the association structure, the allocation structure is the 1980 state age - gender distribution.^ The full allocation model produces good estimates of the 1980 county age - gender populations. An unanticipated finding of this research is that the limited full allocation model produces estimates of county population totals that are superior to those produced by the regression methods. The full allocation model is used to produce estimates of 1986 county population characteristics. ^
Resumo:
As the requirements for health care hospitalization have become more demanding, so has the discharge planning process become a more important part of the health services system. A thorough understanding of hospital discharge planning can, then, contribute to our understanding of the health services system. This study involved the development of a process model of discharge planning from hospitals. Model building involved the identification of factors used by discharge planners to develop aftercare plans, and the specification of the roles of these factors in the development of the discharge plan. The factors in the model were concatenated in 16 discrete decision sequences, each of which produced an aftercare plan.^ The sample for this study comprised 407 inpatients admitted to the M. D. Anderson Hospital and Tumor Institution at Houston, Texas, who were discharged to any site within Texas during a 15 day period. Allogeneic bone marrow donors were excluded from the sample. The factors considered in the development of discharge plans were recorded by discharge planners and were used to develop the model. Data analysis consisted of sorting the discharge plans using the plan development factors until for some combination and sequence of factors all patients were discharged to a single site. The arrangement of factors that led to that aftercare plan became a decision sequence in the model.^ The model constructs the same discharge plans as those developed by hospital staff for every patient in the study. Tests of the validity of the model should be extended to other patients at the MDAH, to other cancer hospitals, and to other inpatient services. Revisions of the model based on these tests should be of value in the management of discharge planning services and in the design and development of comprehensive community health services.^
Resumo:
A general model for the illness-death stochastic process with covariates has been developed for the analysis of survival data. This model incorporates important baseline and time-dependent covariates to make proper adjustment for the transition probabilities and survival probabilities. The follow-up period is subdivided into small intervals and a constant hazard is assumed for each interval. An approximation formula is derived to estimate the transition parameters when the exact transition time is unknown.^ The method developed is illustrated by using data from a study on the prevention of the recurrence of a myocardial infarction and subsequent mortality, the Beta-Blocker Heart Attack Trial (BHAT). This method provides an analytical approach which simultaneously includes provision for both fatal and nonfatal events in the model. According to this analysis, the effectiveness of the treatment can be compared between the Placebo and Propranolol treatment groups with respect to fatal and nonfatal events. ^
Resumo:
Problems due to the lack of data standardization and data management have lead to work inefficiencies for the staff working with the vision data for the Lifetime Surveillance of Astronaut Health. Data has been collected over 50 years in a variety of manners and then entered into a software. The lack of communication between the electronic health record (EHR) form designer, epidemiologists, and optometrists has led to some level to confusion on the capability of the EHR system and how its forms can be designed to fit all the needs of the relevant parties. EHR form customizations or form redesigns were found to be critical for using NASA's EHR system in the most beneficial way for its patients, optometrists, and epidemiologists. In order to implement a protocol, data being collected was examined to find the differences in data collection methods. Changes were implemented through the establishment of a process improvement team (PIT). Based on the findings of the PIT, suggestions have been made to improve the current EHR system. If the suggestions are implemented correctly, this will not only improve efficiency of the staff at NASA and its contractors, but set guidelines for changes in other forms such as the vision exam forms. Because NASA is at the forefront of such research and health surveillance the impact of this management change could have a drastic improvement on the collection of and adaptability of the EHR. Accurate data collection from this 50+ year study is ongoing and is going to help current and future generations understand the implications of space flight on human health. It is imperative that the vast amount of information is documented correctly.^