14 resultados para structure, analysis, modeling

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this research is to develop a new statistical method to determine the minimum set of rows (R) in a R x C contingency table of discrete data that explains the dependence of observations. The statistical power of the method will be empirically determined by computer simulation to judge its efficiency over the presently existing methods. The method will be applied to data on DNA fragment length variation at six VNTR loci in over 72 populations from five major racial groups of human (total sample size is over 15,000 individuals; each sample having at least 50 individuals). DNA fragment lengths grouped in bins will form the basis of studying inter-population DNA variation within the racial groups are significant, will provide a rigorous re-binning procedure for forensic computation of DNA profile frequencies that takes into account intra-racial DNA variation among populations. ^

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The factorial validity of the SF-36 was evaluated using confirmatory factor analysis (CFA) methods, structural equation modeling (SEM), and multigroup structural equation modeling (MSEM). First, the measurement and structural model of the hypothesized SF-36 was explicated. Second, the model was tested for the validity of a second-order factorial structure, upon evidence of model misfit, determined the best-fitting model, and tested the validity of the best-fitting model on a second random sample from the same population. Third, the best-fitting model was tested for invariance of the factorial structure across race, age, and educational subgroups using MSEM.^ The findings support the second-order factorial structure of the SF-36 as proposed by Ware and Sherbourne (1992). However, the results suggest that: (a) Mental Health and Physical Health covary; (b) general mental health cross-loads onto Physical Health; (c) general health perception loads onto Mental Health instead of Physical Health; (d) many of the error terms are correlated; and (e) the physical function scale is not reliable across these two samples. This hierarchical factor pattern was replicated across both samples of health care workers, suggesting that the post hoc model fitting was not data specific. Subgroup analysis suggests that the physical function scale is not reliable across the "age" or "education" subgroups and that the general mental health scale path from Mental Health is not reliable across the "white/nonwhite" or "education" subgroups.^ The importance of this study is in the use of SEM and MSEM in evaluating sample data from the use of the SF-36. These methods are uniquely suited to the analysis of latent variable structures and are widely used in other fields. The use of latent variable models for self reported outcome measures has become widespread, and should now be applied to medical outcomes research. Invariance testing is superior to mean scores or summary scores when evaluating differences between groups. From a practical, as well as, psychometric perspective, it seems imperative that construct validity research related to the SF-36 establish whether this same hierarchical structure and invariance holds for other populations.^ This project is presented as three articles to be submitted for publication. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With substance abuse treatment expanding in prisons and jails, understanding how behavior change interacts with a restricted setting becomes more essential. The Transtheoretical Model (TTM) has been used to understand intentional behavior change in unrestricted settings, however, evidence indicates restrictive settings can affect the measurement and structure of the TTM constructs. The present study examined data from problem drinkers at baseline and end-of-treatment from three studies: (1) Project CARE (n = 187) recruited inmates from a large county jail; (2) Project Check-In (n = 116) recruited inmates from a state prison; (3) Project MATCH, a large multi-site alcohol study had two recruitment arms, aftercare (n = 724 pre-treatment and 650 post-treatment) and outpatient (n = 912 pre-treatment and 844 post-treatment). The analyses were conducted using cross-sectional data to test for non-invariance of measures of the TTM constructs: readiness, confidence, temptation, and processes of change (Structural Equation Modeling, SEM) across restricted and unrestricted settings. Two restricted (jail and aftercare) and one unrestricted group (outpatient) entering treatment and one restricted (prison) and two unrestricted groups (aftercare and outpatient) at end-of-treatment were contrasted. In addition TTM end-of-treatment profiles were tested as predictors of 12 month drinking outcomes (Profile Analysis). Although SEM did not indicate structural differences in the overall TTM construct model across setting types, there were factor structure differences on the confidence and temptation constructs at pre-treatment and in the factor structure of the behavioral processes at the end-of-treatment. For pre-treatment temptation and confidence, differences were found in the social situations factor loadings and in the variance for the confidence and temptation latent factors. For the end-of-treatment behavioral processes, differences across the restricted and unrestricted settings were identified in the counter-conditioning and stimulus control factor loadings. The TTM end-of-treatment profiles were not predictive of drinking outcomes in the prison sample. Both pre and post-treatment differences in structure across setting types involved constructs operationalized with behaviors that are limited for those in restricted settings. These studies suggest the TTM is a viable model for explicating addictive behavior change in restricted settings but calls for modification of subscale items that refer to specific behaviors and caution in interpreting the mean differences across setting types for problem drinkers. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Colorectal cancer is the forth most common diagnosed cancer in the United States. Every year about a hundred forty-seven thousand people will be diagnosed with colorectal cancer and fifty-six thousand people lose their lives due to this disease. Most of the hereditary nonpolyposis colorectal cancer (HNPCC) and 12% of the sporadic colorectal cancer show microsatellite instability. Colorectal cancer is a multistep progressive disease. It starts from a mutation in a normal colorectal cell and grows into a clone of cells that further accumulates mutations and finally develops into a malignant tumor. In terms of molecular evolution, the process of colorectal tumor progression represents the acquisition of sequential mutations. ^ Clinical studies use biomarkers such as microsatellite or single nucleotide polymorphisms (SNPs) to study mutation frequencies in colorectal cancer. Microsatellite data obtained from single genome equivalent PCR or small pool PCR can be used to infer tumor progression. Since tumor progression is similar to population evolution, we used an approach known as coalescent, which is well established in population genetics, to analyze this type of data. Coalescent theory has been known to infer the sample's evolutionary path through the analysis of microsatellite data. ^ The simulation results indicate that the constant population size pattern and the rapid tumor growth pattern have different genetic polymorphic patterns. The simulation results were compared with experimental data collected from HNPCC patients. The preliminary result shows the mutation rate in 6 HNPCC patients range from 0.001 to 0.01. The patients' polymorphic patterns are similar to the constant population size pattern which implies the tumor progression is through multilineage persistence instead of clonal sequential evolution. The results should be further verified using a larger dataset. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Microarray technology is a high-throughput method for genotyping and gene expression profiling. Limited sensitivity and specificity are one of the essential problems for this technology. Most of existing methods of microarray data analysis have an apparent limitation for they merely deal with the numerical part of microarray data and have made little use of gene sequence information. Because it's the gene sequences that precisely define the physical objects being measured by a microarray, it is natural to make the gene sequences an essential part of the data analysis. This dissertation focused on the development of free energy models to integrate sequence information in microarray data analysis. The models were used to characterize the mechanism of hybridization on microarrays and enhance sensitivity and specificity of microarray measurements. ^ Cross-hybridization is a major obstacle factor for the sensitivity and specificity of microarray measurements. In this dissertation, we evaluated the scope of cross-hybridization problem on short-oligo microarrays. The results showed that cross hybridization on arrays is mostly caused by oligo fragments with a run of 10 to 16 nucleotides complementary to the probes. Furthermore, a free-energy based model was proposed to quantify the amount of cross-hybridization signal on each probe. This model treats cross-hybridization as an integral effect of the interactions between a probe and various off-target oligo fragments. Using public spike-in datasets, the model showed high accuracy in predicting the cross-hybridization signals on those probes whose intended targets are absent in the sample. ^ Several prospective models were proposed to improve Positional Dependent Nearest-Neighbor (PDNN) model for better quantification of gene expression and cross-hybridization. ^ The problem addressed in this dissertation is fundamental to the microarray technology. We expect that this study will help us to understand the detailed mechanism that determines sensitivity and specificity on the microarrays. Consequently, this research will have a wide impact on how microarrays are designed and how the data are interpreted. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Teen pregnancy is a continuing problem, bringing with it a host of associated health and social risks. Alternative school students are especially at risk, but are historically under-represented in research. This is especially problematic in that instruments are needed to guide effective intervention development, but psychometrics for these instruments cannot be assumed when used in new populations. Decisional balance from the transtheoretical model offers a framework for understanding condom decision making, but has not been tested with alternative school students. Using responses from 640 subjects from Safer Choices 2 (a school-based HIV/STD/pregnancy prevention program implemented in 10 urban, southwestern alternative schools), a decisional balance scale for condom use was examined. A two-factor, mildly correlated model fit the data well. Tests of invariance examined scale functioning within gender and racial/ethnic groups. The underlying structure varied slightly based on subgroup, but on a practical level the impact on the use of scales was minimal. The structure and loadings were invariant across experimental condition. The pro scale was associated with a lower probability of having engaged in unprotected sexual behavior for sexually active subjects, and this association remained significant while controlling for demographic variables. The con scale did not show a significant association with engagement in unprotected sexual behaviors. Limitations and directions for future research were also discussed.^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Mixture modeling is commonly used to model categorical latent variables that represent subpopulations in which population membership is unknown but can be inferred from the data. In relatively recent years, the potential of finite mixture models has been applied in time-to-event data. However, the commonly used survival mixture model assumes that the effects of the covariates involved in failure times differ across latent classes, but the covariate distribution is homogeneous. The aim of this dissertation is to develop a method to examine time-to-event data in the presence of unobserved heterogeneity under a framework of mixture modeling. A joint model is developed to incorporate the latent survival trajectory along with the observed information for the joint analysis of a time-to-event variable, its discrete and continuous covariates, and a latent class variable. It is assumed that the effects of covariates on survival times and the distribution of covariates vary across different latent classes. The unobservable survival trajectories are identified through estimating the probability that a subject belongs to a particular class based on observed information. We applied this method to a Hodgkin lymphoma study with long-term follow-up and observed four distinct latent classes in terms of long-term survival and distributions of prognostic factors. Our results from simulation studies and from the Hodgkin lymphoma study demonstrated the superiority of our joint model compared with the conventional survival model. This flexible inference method provides more accurate estimation and accommodates unobservable heterogeneity among individuals while taking involved interactions between covariates into consideration.^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Structure-function analysis of human Integrator subunit 4 Anupama Sataluri Advisor: Eric. J. Wagner, Ph.D. Uridine-rich small nuclear RNAs (U snRNA) are RNA Polymerase-II (RNAPII) transcripts that are ubiquitously expressed and are known to be essential for gene expression. snRNAs play a key role in mRNA splicing and in histone mRNA expression. Inaccurate snRNA biosynthesis can lead to diseases related to defective splicing and histone mRNA expression. Although the 3′ end formation mechanism and processing machinery of other RNAPII transcripts such as mRNA has been well studied, the mechanism of snRNA 3′ end processing has remained a mystery until the recent discovery of the machinery that mediates this process. In 2005, a complex of 14 subunits (the Integrator complex) associated with RNA Polymerase-II was discovered. The 14subunits were annotated Integrator 1-14 based on their size. The subunits of this complex together were found to facilitate 3′ end processing of snRNA. Identification of the Integrator complex propelled research in the direction of understanding the events of snRNA 3’end processing. Recent studies from our lab confirmed that Integrator subunit (IntS) 9 and 11 together perform the endonucleolytic cleavage of the nascent snRNA 3′ end to generate mature snRNA. However, the role of other members of the Integrator complex remains elusive. Current research in our lab is focused on deciphering the role of each subunit within the Integrator complex This work specifically focuses on elucidating the role of human Integrator subunit 4 (IntS4) and understanding how it facilitates the overall function of the complex. IntS4 has structural similarity with a protein called “Symplekin”, which is part of the mRNA 3’end processing machinery. Symplekin has been thoroughly researched in recent years and structure-function correlation studies in the context of mRNA 3’end processing have reported a scaffold function for Symplekin due to the presence of HEAT repeat motifs in its N-terminus. Based upon the structural similarity between IntS4 and Symplekin, we hypothesized that Integrator subunit 4 may be behaving as a Symplekin-like scaffold molecule that facilitates the interaction between other members of the Integrator Complex. To answer this question, the two important goals of this study were to: 1) identify the region of IntS4, which is important for snRNA 3′ end processing and 2) determine binding partners of IntS4 which promote its function as a scaffold. IntS4 structurally consists of a highly conserved N-terminus with 8 HEAT repeats, followed by a nonconserved C- terminus. A series of siRNA resistant N and C-terminus deletion constructs as well as specific point mutants within its N-terminal HEAT repeats were generated for human IntS4 and, utilizing a snRNA transcriptional readthrough GFP-reporter assay, we tested their ability to rescue misprocessing. This assay revealed a possible scaffold like property of IntS4. To probe IntS4 for interaction partners, we performed co-immunoprecipitation on nuclear extracts of IntS4 expressing stable cell lines and identified IntS3 and IntS5 among other Integrator subunits to be binding partners which facilitate the scaffold like function of hIntS4. These findings have established a critical role for IntS4 in snRNA 3′ end processing, identified that both its N and C termini are essential for its function, and mapped putative interaction domains with other Integrator subunits.