6 resultados para Earnings and dividend announcements, high frequency data, information asymmetry
em DigitalCommons@The Texas Medical Center
Resumo:
It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays as well as next generation sequencing assays interrogating somatic mutation, insertion, deletion, translocation and structural rearrangements. Given the massive amount of data, a major challenge is to integrate information from multiple sources and formulate testable hypotheses. This thesis focuses on developing methodologies for integrative analyses of genomic assays profiled on the same set of samples. We have developed several novel methods for integrative biomarker identification and cancer classification. We introduce a regression-based approach to identify biomarkers predictive to therapy response or survival by integrating multiple assays including gene expression, methylation and copy number data through penalized regression. To identify key cancer-specific genes accounting for multiple mechanisms of regulation, we have developed the integIRTy software that provides robust and reliable inferences about gene alteration by automatically adjusting for sample heterogeneity as well as technical artifacts using Item Response Theory. To cope with the increasing need for accurate cancer diagnosis and individualized therapy, we have developed a robust and powerful algorithm called SIBER to systematically identify bimodally expressed genes using next generation RNAseq data. We have shown that prediction models built from these bimodal genes have the same accuracy as models built from all genes. Further, prediction models with dichotomized gene expression measurements based on their bimodal shapes still perform well. The effectiveness of outcome prediction using discretized signals paves the road for more accurate and interpretable cancer classification by integrating signals from multiple sources.
Resumo:
Data management and sharing are relatively new concepts in the health and life sciences fields. This presentation will cover some basic policies as well as the impediments to data sharing unique to health and life sciences data.
Resumo:
The phenomenon of premature chromosome condensation, resulting from fusion between mitotic and interphase cells, includes dissolution of the interphase nuclear framework, thus allowing a direct visualization of interphase chromosomes. Light microscope morphology of prematurely condensed chromosomes (PCC) from synchronized HeLa cells supports the model of an interphase "chromosome condensation cycle". PCC are increasingly attenuated as cells progress through G(,1). A maximum degree of decondensation is observed at active sites of DNA replication during S phase, and a condensed morphology is rapidly resumed following completion of replication of a chromosome segment.^ To permit ultrastructural and biochemical studies of PCC, a procedure was developed to induce premature chromosome condensation at high frequency. This was achieved by polyethylene glycol (PEG)-mediated fusion of a dense monolayer of mitotic and interphase cells induced by centrifugation onto lectin-coated culture dishes. Using this method, PCC induction frequencies of 60-90% are routinely obtained.^ Scanning electron microscope analysis of PCC spreads revealed that the extension of PCC during progression through G(,1) is accompanied by a transition of the basic 30 nm chromatin fiber from tightly packed looping fibers to extended longitudinal fibers. Sites of active DNA replication is S-PCC were indicated to be organized a single longitudinal fibers. Following replication of a chromosome segment, a rapid reorganization from the extended longitudinal fiber to packed looping fibers occurs. The postreplication maturation process appears to include the assembly of a chromosome core consisting of multiple longitudinal fibers.^ The role of histone H1 phosphorylation in PCC formation was investigated by acidurea polyacrylamide gel electrophoresis of total histone extracted from metaphase chromosomes and PCC following high frequency fusion. This investigation failed to demonstrate an extensive phosphorylation of H1 associated with PCC formation. However, significant dephosphorylation of superphosphorylated metaphase chromosome H1 was observed, indicating that interphase H1-phosphatase activity is dominant over metaphase H1 kinase activity. These observations provide evidence against models suggesting a role for H1 superphosphorylation in triggering mitotic condensation of chromosomes. ^
Resumo:
Maximizing data quality may be especially difficult in trauma-related clinical research. Strategies are needed to improve data quality and assess the impact of data quality on clinical predictive models. This study had two objectives. The first was to compare missing data between two multi-center trauma transfusion studies: a retrospective study (RS) using medical chart data with minimal data quality review and the PRospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study with standardized quality assurance. The second objective was to assess the impact of missing data on clinical prediction algorithms by evaluating blood transfusion prediction models using PROMMTT data. RS (2005-06) and PROMMTT (2009-10) investigated trauma patients receiving ≥ 1 unit of red blood cells (RBC) from ten Level I trauma centers. Missing data were compared for 33 variables collected in both studies using mixed effects logistic regression (including random intercepts for study site). Massive transfusion (MT) patients received ≥ 10 RBC units within 24h of admission. Correct classification percentages for three MT prediction models were evaluated using complete case analysis and multiple imputation based on the multivariate normal distribution. A sensitivity analysis for missing data was conducted to estimate the upper and lower bounds of correct classification using assumptions about missing data under best and worst case scenarios. Most variables (17/33=52%) had <1% missing data in RS and PROMMTT. Of the remaining variables, 50% demonstrated less missingness in PROMMTT, 25% had less missingness in RS, and 25% were similar between studies. Missing percentages for MT prediction variables in PROMMTT ranged from 2.2% (heart rate) to 45% (respiratory rate). For variables missing >1%, study site was associated with missingness (all p≤0.021). Survival time predicted missingness for 50% of RS and 60% of PROMMTT variables. MT models complete case proportions ranged from 41% to 88%. Complete case analysis and multiple imputation demonstrated similar correct classification results. Sensitivity analysis upper-lower bound ranges for the three MT models were 59-63%, 36-46%, and 46-58%. Prospective collection of ten-fold more variables with data quality assurance reduced overall missing data. Study site and patient survival were associated with missingness, suggesting that data were not missing completely at random, and complete case analysis may lead to biased results. Evaluating clinical prediction model accuracy may be misleading in the presence of missing data, especially with many predictor variables. The proposed sensitivity analysis estimating correct classification under upper (best case scenario)/lower (worst case scenario) bounds may be more informative than multiple imputation, which provided results similar to complete case analysis.^