21 resultados para high throughput screening
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
Microarray technology is a high-throughput method for genotyping and gene expression profiling. Limited sensitivity and specificity are one of the essential problems for this technology. Most of existing methods of microarray data analysis have an apparent limitation for they merely deal with the numerical part of microarray data and have made little use of gene sequence information. Because it's the gene sequences that precisely define the physical objects being measured by a microarray, it is natural to make the gene sequences an essential part of the data analysis. This dissertation focused on the development of free energy models to integrate sequence information in microarray data analysis. The models were used to characterize the mechanism of hybridization on microarrays and enhance sensitivity and specificity of microarray measurements. ^ Cross-hybridization is a major obstacle factor for the sensitivity and specificity of microarray measurements. In this dissertation, we evaluated the scope of cross-hybridization problem on short-oligo microarrays. The results showed that cross hybridization on arrays is mostly caused by oligo fragments with a run of 10 to 16 nucleotides complementary to the probes. Furthermore, a free-energy based model was proposed to quantify the amount of cross-hybridization signal on each probe. This model treats cross-hybridization as an integral effect of the interactions between a probe and various off-target oligo fragments. Using public spike-in datasets, the model showed high accuracy in predicting the cross-hybridization signals on those probes whose intended targets are absent in the sample. ^ Several prospective models were proposed to improve Positional Dependent Nearest-Neighbor (PDNN) model for better quantification of gene expression and cross-hybridization. ^ The problem addressed in this dissertation is fundamental to the microarray technology. We expect that this study will help us to understand the detailed mechanism that determines sensitivity and specificity on the microarrays. Consequently, this research will have a wide impact on how microarrays are designed and how the data are interpreted. ^
Resumo:
Detection of multidrug-resistant tuberculosis (MDR-TB), a frequent cause of treatment failure, takes 2 or more weeks to identify by culture. RIF-resistance is a hallmark of MDR-TB, and detection of mutations in the rpoB gene of Mycobacterium tuberculosis using molecular beacon probes with real-time quantitative polymerase chain reaction (qPCR) is a novel approach that takes ≤2 days. However, qPCR identification of resistant isolates, particularly for isolates with mixed RIF-susceptible and RIF-resistant bacteria, is reader dependent and limits its clinical use. The aim of this study was to develop an objective, reader-independent method to define rpoB mutants using beacon qPCR. This would facilitate the transition from a research protocol to the clinical setting, where high-throughput methods with objective interpretation are required. For this, DNAs from 107 M. tuberculosis clinical isolates with known susceptibility to RIF by culture-based methods were obtained from 2 regions where isolates have not previously been subjected to evaluation using molecular beacon qPCR: the Texas–Mexico border and Colombia. Using coded DNA specimens, mutations within an 81-bp hot spot region of rpoB were established by qPCR with 5 beacons spanning this region. Visual and mathematical approaches were used to establish whether the qPCR cycle threshold of the experimental isolate was significantly higher (mutant) compared to a reference wild-type isolate. Visual classification of the beacon qPCR required reader training for strains with a mixture of RIF-susceptible and RIF-resistant bacteria. Only then had the visual interpretation by an experienced reader had 100% sensitivity and 94.6% specificity versus RIF-resistance by culture phenotype and 98.1% sensitivity and 100% specificity versus mutations based on DNA sequence. The mathematical approach was 98% sensitive and 94.5% specific versus culture and 96.2% sensitive and 100% specific versus DNA sequence. Our findings indicate the mathematical approach has advantages over the visual reading, in that it uses a Microsoft Excel template to eliminate reader bias or inexperience, and allows objective interpretation from high-throughput analyses even in the presence of a mixture of RIF-resistant and RIF-susceptible isolates without the need for reader training.^
Resumo:
Radiomics is the high-throughput extraction and analysis of quantitative image features. For non-small cell lung cancer (NSCLC) patients, radiomics can be applied to standard of care computed tomography (CT) images to improve tumor diagnosis, staging, and response assessment. The first objective of this work was to show that CT image features extracted from pre-treatment NSCLC tumors could be used to predict tumor shrinkage in response to therapy. This is important since tumor shrinkage is an important cancer treatment endpoint that is correlated with probability of disease progression and overall survival. Accurate prediction of tumor shrinkage could also lead to individually customized treatment plans. To accomplish this objective, 64 stage NSCLC patients with similar treatments were all imaged using the same CT scanner and protocol. Quantitative image features were extracted and principal component regression with simulated annealing subset selection was used to predict shrinkage. Cross validation and permutation tests were used to validate the results. The optimal model gave a strong correlation between the observed and predicted shrinkages with . The second objective of this work was to identify sets of NSCLC CT image features that are reproducible, non-redundant, and informative across multiple machines. Feature sets with these qualities are needed for NSCLC radiomics models to be robust to machine variation and spurious correlation. To accomplish this objective, test-retest CT image pairs were obtained from 56 NSCLC patients imaged on three CT machines from two institutions. For each machine, quantitative image features with concordance correlation coefficient values greater than 0.90 were considered reproducible. Multi-machine reproducible feature sets were created by taking the intersection of individual machine reproducible feature sets. Redundant features were removed through hierarchical clustering. The findings showed that image feature reproducibility and redundancy depended on both the CT machine and the CT image type (average cine 4D-CT imaging vs. end-exhale cine 4D-CT imaging vs. helical inspiratory breath-hold 3D CT). For each image type, a set of cross-machine reproducible, non-redundant, and informative image features was identified. Compared to end-exhale 4D-CT and breath-hold 3D-CT, average 4D-CT derived image features showed superior multi-machine reproducibility and are the best candidates for clinical correlation.
Resumo:
Transcriptional enhancers are genomic DNA sequences that contain clustered transcription factor (TF) binding sites. When combinations of TFs bind to enhancer sequences they act together with basal transcriptional machinery to regulate the timing, location and quantity of gene transcription. Elucidating the genetic mechanisms responsible for differential gene expression, including the role of enhancers, during embryological and postnatal development is essential to an understanding of evolutionary processes and disease etiology. Numerous methods are in use to identify and characterize enhancers. Several high-throughput methods generate large datasets of enhancer sequences with putative roles in embryonic development. However, few enhancers have been deleted from the genome to determine their roles in the development of specific structures, such as the limb. Manipulation of enhancers at their endogenous loci, such as the deletion of such elements, leads to a better understanding of the regulatory interactions, rules and complexities that contribute to faithful and variant gene transcription – the molecular genetic substrate of evolution and disease. To understand the endogenous roles of two distinct enhancers known to be active in the mouse embryo limb bud we deleted them from the mouse genome. I hypothesized that deletion of these enhancers would lead to aberrant limb development. The enhancers were selected because of their association with p300, a protein associated with active transcription, and because the human enhancer sequences drive distinct lacZ expression patterns in limb buds of embryonic day (E) 11.5 transgenic mice. To confirm that the orthologous mouse enhancers, mouse 280 and 1442 (M280 and M1442, respectively), regulate expression in the developing limb we generated stable transgenic lines, and examined lacZ expression. In M280-lacZ mice, expression was detected in E11.5 fore- and hindlimbs in a region that corresponds to digits II-IV. M1442-lacZ mice exhibited lacZ expression in posterior and anterior margins of the fore- and hindlimbs that overlapped with digits I and V and several wrist bones. We generated mice lacking the M280 and M1442 enhancers by gene targeting. Intercrosses between M280 -/+ and M1442 -/+, respectively, generated M280 and M1442 null mice, which are born at expected Mendelian ratios and manifest no gross limb malformations. Quantitative real-time PCR of mutant E11.5 limb buds indicated that significant changes in transcriptional output of enhancer-proximal genes accompanied the deletion of both M280 and M1442. In neonatal null mice we observed that all limb bones are present in their expected positions, an observation also confirmed by histology of E18.5 distal limbs. Fine-scale measurement of E18.5 digit bone lengths found no differences between mutant and control embryos. Furthermore, when the developmental progression of cartilaginous elements was analyzed in M280 and M1442 embryos from E13.5-E15.5, transient development defects were not detected. These results demonstrate that M280 and M1442 are not required for mouse limb development. Though M280 is not required for embryonic limb development it is required for the development and/or maintenance of body size – adult M280 mice are significantly smaller than control littermates. These studies highlight the importance of experiments that manipulate enhancers in situ to understand their contribution to development.
Resumo:
Autophagy is an evolutionarily conserved process that functions to maintain homeostasis and provides energy during nutrient deprivation and environmental stresses for the survival of cells by delivering cytoplasmic contents to the lysosomes for recycling and energy generation. Dysregulation of this process has been linked to human diseases including immune disorders, neurodegenerative muscular diseases and cancer. Autophagy is a double edged sword in that it has both pro-survival and pro-death roles in cancer cells. Its cancer suppressive roles include the clearance of damaged organelles, which could otherwise lead to inflammation and therefore promote tumorigenesis. In its pro-survival role, autophagy allows cancer cells to overcome cytotoxic stresses generated the cancer environment or cancer treatments such as chemotherapy and evade cell death. A better understanding of how drugs that perturb autophagy affect cancer cell signaling is of critical importance toimprove the cancer treatment arsenal. In order to gain insights in the relationship between autophagy and drug treatments, we conducted a high-throughput drug screen to identify autophagy modulators. Our high-throughput screen utilized image based fluorescent microscopy for single cell analysis to identify chemical perturbants of the autophagic process. Phenothiazines emerged as the largest family of drugs that alter the autophagic process by increasing LC3-II punctae levels in different cancer cell lines. In addition, we observed multiple biological effects in cancer cells treated with phenothiazines. Those antitumorigenic effects include decreased cell migration, cell viability, and ATP production along with abortive autophagy. Our studies highlight the potential role of phenothiazines as agents for combinational therapy with other chemotherapeutic agents in the treatment of different cancers.