6 resultados para Elements, High Trhoughput Data, elettrofisiologia, elaborazione dati, analisi Real Time
em DigitalCommons@The Texas Medical Center
New methods for quantification and analysis of quantitative real-time polymerase chain reaction data
Resumo:
Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^
Resumo:
Background: High grade serous carcinoma whether ovarian, tubal or primary peritoneal, continues to be the most lethal gynecologic malignancy in the USA. Although combination chemotherapy and aggressive surgical resection has improved survival in the past decade the majority of patients still succumb to chemo-resistant disease recurrence. It has recently been reported that amplification of 5q31-5q35.3 is associated with poor prognosis in patients with high grade serous ovarian carcinoma. Although the amplicon contains over 50 genes, it is notable for the presence of several members of the fibroblast growth factor signaling axis. In particular acidic fibroblast growth factor (FGF1) has been demonstrated to be one of the driving genes in mediating the observed prognostic effect of the amplicon in ovarian cancer patients. This study seeks to further validate the prognostic value of fibroblast growth receptor 4 (FGFR4), another candidate gene of the FGF/FGFR axis located in the same amplicon. The emphasis will be delineating the role the FGF1/FGFR4 signaling axis plays in high grade serous ovarian carcinoma; and test the feasibility of targeting the FGF1/FGFR4 axis therapeutically. Materials and Methods: Spearman and Pearson correlation studies on data generated from array CGH and transcriptome profiling analyses on 51 microdissected tumor samples were used to identify genes located on chromosome 5q31-35.3 that showed significant correlation between DNA and mRNA copy numbers. Significant correlation between FGF1 and FGFR4 DNA copy numbers was further validated by qPCR analysis on DNA isolated from 51 microdissected tumor samples. Immunolocalization and quantification of FGFR4 expression were performed on paraffin embedded tissue samples from 183 cases of high-grade serous ovarian carcinoma. The expression was then correlated with clinical data to assess impact on survival. The expression of FGF1 and FGFR4 in vitro was quantified by real-time PCR and western blotting in six high-grade serous ovarian carcinoma cell lines and compared to those in human ovarian surface epithelial cells to identify overexpression. The effect of FGF1 on these cell lines after serum starvation was quantified for in vitro cellular proliferation, migration/invasion, chemoresistance and survival utilizing a combination of commercially available colorimetric, fluorometric and electrical impedance assays. FGFR4 expression was then transiently silenced via siRNA transfection and the effects on response to FGF1, cellular proliferation, and migration were quantified. To identify relevant cellular pathways involved, responsive cell lines were transduced with different transcription response elements using the Cignal-Lenti reporter system and treated with FGF1 with and without transient FGFR4 knock down. This was followed by western blot confirmation for the relevant phosphoproteins. Anti-FGF1 antibodies and FGFR trap proteins were used to attempt inhibition of FGF mediated phenotypic changes and relevant signaling in vitro. Orthotopic intraperitoneal tumors were established in nude mice using serous cell lines that have been previously transfected with luciferase expressing constructs. The mice were then treated with FGFR trap protein. Tumor progression was then followed via bioluminescent imaging. The FGFR4 gene from 52 clinical samples was sequenced to screen for mutations. Results: FGFR4 DNA and mRNA copy numbers were significantly correlated and FGFR4 DNA copy number was significantly correlated with that of FGF1. Survival of patients with high FGFR4 expressing tumors was significantly shorter that those with low expression(median survival 28 vs 55 month p< 0.001) In a multivariate cox regression model FGFR expression significantly increased risk of death (HR 2.1, p<0.001). FGFR4 expression was significantly higher in all cell lines tested compared to HOSE, OVCA432 cell line in particular had very high expression suggesting amplification. FGF1 was also particularly overexpressed in OVCA432. FGF1 significantly increased cell survival after serum deprivation in all cell lines. Transient knock down of FGFR4 caused significant reduction in cell migration and proliferation in vitro and significantly decreased the proliferative effects of FGF1 in vitro. FGFR1, FGFR4 traps and anti-FGF1 antibodies did not show activity in vitro. OVCA432 transfected with the cignal lenti reporter system revealed significant activation of MAPK, NFkB and WNT pathways, western blotting confirmed the results. Reverse phase protein array (RPPA) analysis also showed activation of MAPK, AKT, WNT pathways and down regulation of E Cadherin. FGFR trap protein significantly reduced tumor growth in vivo in an orthotopic mouse model. Conclusions: Overexpression and amplification of several members of the FGF signaling axis present on the amplicon 5q31-35.3 is a negative prognostic indicator in high grade serous ovarian carcinoma and may drive poor survival associated with that amplicon. Activation of The FGF signaling pathway leads to downstream activation of MAPK, AKT, WNT and NFkB pathways leading to a more aggressive cancer phenotype with increased tumor growth, evasion of apoptosis and increased migration and invasion. Inhibition of FGF pathway in vivo via FGFR trap protein leads to significantly decreased tumor growth in an orthotopic mouse model.
Resumo:
Clinical Research Data Quality Literature Review and Pooled Analysis We present a literature review and secondary analysis of data accuracy in clinical research and related secondary data uses. A total of 93 papers meeting our inclusion criteria were categorized according to the data processing methods. Quantitative data accuracy information was abstracted from the articles and pooled. Our analysis demonstrates that the accuracy associated with data processing methods varies widely, with error rates ranging from 2 errors per 10,000 files to 5019 errors per 10,000 fields. Medical record abstraction was associated with the highest error rates (70–5019 errors per 10,000 fields). Data entered and processed at healthcare facilities had comparable error rates to data processed at central data processing centers. Error rates for data processed with single entry in the presence of on-screen checks were comparable to double entered data. While data processing and cleaning methods may explain a significant amount of the variability in data accuracy, additional factors not resolvable here likely exist. Defining Data Quality for Clinical Research: A Concept Analysis Despite notable previous attempts by experts to define data quality, the concept remains ambiguous and subject to the vagaries of natural language. This current lack of clarity continues to hamper research related to data quality issues. We present a formal concept analysis of data quality, which builds on and synthesizes previously published work. We further posit that discipline-level specificity may be required to achieve the desired definitional clarity. To this end, we combine work from the clinical research domain with findings from the general data quality literature to produce a discipline-specific definition and operationalization for data quality in clinical research. While the results are helpful to clinical research, the methodology of concept analysis may be useful in other fields to clarify data quality attributes and to achieve operational definitions. Medical Record Abstractor’s Perceptions of Factors Impacting the Accuracy of Abstracted Data Medical record abstraction (MRA) is known to be a significant source of data errors in secondary data uses. Factors impacting the accuracy of abstracted data are not reported consistently in the literature. Two Delphi processes were conducted with experienced medical record abstractors to assess abstractor’s perceptions about the factors. The Delphi process identified 9 factors that were not found in the literature, and differed with the literature by 5 factors in the top 25%. The Delphi results refuted seven factors reported in the literature as impacting the quality of abstracted data. The results provide insight into and indicate content validity of a significant number of the factors reported in the literature. Further, the results indicate general consistency between the perceptions of clinical research medical record abstractors and registry and quality improvement abstractors. Distributed Cognition Artifacts on Clinical Research Data Collection Forms Medical record abstraction, a primary mode of data collection in secondary data use, is associated with high error rates. Distributed cognition in medical record abstraction has not been studied as a possible explanation for abstraction errors. We employed the theory of distributed representation and representational analysis to systematically evaluate cognitive demands in medical record abstraction and the extent of external cognitive support employed in a sample of clinical research data collection forms. We show that the cognitive load required for abstraction in 61% of the sampled data elements was high, exceedingly so in 9%. Further, the data collection forms did not support external cognition for the most complex data elements. High working memory demands are a possible explanation for the association of data errors with data elements requiring abstractor interpretation, comparison, mapping or calculation. The representational analysis used here can be used to identify data elements with high cognitive demands.
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
High-throughput assays, such as yeast two-hybrid system, have generated a huge amount of protein-protein interaction (PPI) data in the past decade. This tremendously increases the need for developing reliable methods to systematically and automatically suggest protein functions and relationships between them. With the available PPI data, it is now possible to study the functions and relationships in the context of a large-scale network. To data, several network-based schemes have been provided to effectively annotate protein functions on a large scale. However, due to those inherent noises in high-throughput data generation, new methods and algorithms should be developed to increase the reliability of functional annotations. Previous work in a yeast PPI network (Samanta and Liang, 2003) has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional associations between proteins, and hence suggest their functions. One advantage of the work is that their algorithm is not sensitive to noises (false positives) in high-throughput PPI data. In this study, we improved their prediction scheme by developing a new algorithm and new methods which we applied on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting functionally associated proteins. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as independent and unbiased benchmarks to evaluate our algorithms and methods within the human PPI network. We showed that, compared with the previous work from Samanta and Liang, our algorithm and methods developed in this study improved the overall quality of functional inferences for human proteins. By applying the algorithms to the human PPI network, we obtained 4,233 significant functional associations among 1,754 proteins. Further comparisons of their KEGG and GO annotations allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made pathway analysis to identify several subclusters that are highly enriched in certain signaling pathways. Particularly, we performed a detailed analysis on a subcluster enriched in the transforming growth factor β signaling pathway (P<10-50) which is important in cell proliferation and tumorigenesis. Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotations in this post-genomic era.
Resumo:
Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^