3 resultados para DNA-microarray data
em Digital Commons at Florida International University
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
The purpose of this research was to demonstrate the applicability of reduced-size STR (Miniplex) primer sets to challenging samples and to provide the forensic community with new information regarding the analysis of degraded and inhibited DNA. The Miniplex primer sets were validated in accordance with guidelines set forth by the Scientific Working Group on DNA Analysis Methods (SWGDAM) in order to demonstrate the scientific validity of the kits. The Miniplex sets were also used in the analysis of DNA extracted from human skeletal remains and telogen hair. In addition, a method for evaluating the mechanism of PCR inhibition was developed using qPCR. The Miniplexes were demonstrated to be a robust and sensitive tool for the analysis of DNA with as low as 100 pg of template DNA. They also proved to be better than commercial kits in the analysis of DNA from human skeletal remains, with 64% of samples tested producing full profiles, compared to 16% for a commercial kit. The Miniplexes also produced amplification of nuclear DNA from human telogen hairs, with partial profiles obtained from as low as 60 pg of template DNA. These data suggest smaller PCR amplicons may provide a useful alternative to mitochondrial DNA for forensic analysis of degraded DNA from human skeletal remains, telogen hairs, and other challenging samples. In the evaluation of inhibition by qPCR, the effect of amplicon length and primer melting temperature was evaluated in order to determine the binding mechanisms of different PCR inhibitors. Several mechanisms were indicated by the inhibitors tested, including binding of the polymerase, binding to the DNA, and effects on the processivity of the polymerase during primer extension. The data obtained from qPCR illustrated a method by which the type of inhibitor could be inferred in forensic samples, and some methods of reducing inhibition for specific inhibitors were demonstrated. An understanding of the mechanism of the inhibitors found in forensic samples will allow analysts to select the proper methods for inhibition removal or the type of analysis that can be performed, and will increase the information that can be obtained from inhibited samples.
Resumo:
Microarray platforms have been around for many years and while there is a rise of new technologies in laboratories, microarrays are still prevalent. When it comes to the analysis of microarray data to identify differentially expressed (DE) genes, many methods have been proposed and modified for improvement. However, the most popular methods such as Significance Analysis of Microarrays (SAM), samroc, fold change, and rank product are far from perfect. When it comes down to choosing which method is most powerful, it comes down to the characteristics of the sample and distribution of the gene expressions. The most practiced method is usually SAM or samroc but when the data tends to be skewed, the power of these methods decrease. With the concept that the median becomes a better measure of central tendency than the mean when the data is skewed, the tests statistics of the SAM and fold change methods are modified in this thesis. This study shows that the median modified fold change method improves the power for many cases when identifying DE genes if the data follows a lognormal distribution.