3 resultados para Automatic Analysis of Multivariate Categorical Data Sets

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this research was to demonstrate the applicability of reduced-size STR (Miniplex) primer sets to challenging samples and to provide the forensic community with new information regarding the analysis of degraded and inhibited DNA. The Miniplex primer sets were validated in accordance with guidelines set forth by the Scientific Working Group on DNA Analysis Methods (SWGDAM) in order to demonstrate the scientific validity of the kits. The Miniplex sets were also used in the analysis of DNA extracted from human skeletal remains and telogen hair. In addition, a method for evaluating the mechanism of PCR inhibition was developed using qPCR. The Miniplexes were demonstrated to be a robust and sensitive tool for the analysis of DNA with as low as 100 pg of template DNA. They also proved to be better than commercial kits in the analysis of DNA from human skeletal remains, with 64% of samples tested producing full profiles, compared to 16% for a commercial kit. The Miniplexes also produced amplification of nuclear DNA from human telogen hairs, with partial profiles obtained from as low as 60 pg of template DNA. These data suggest smaller PCR amplicons may provide a useful alternative to mitochondrial DNA for forensic analysis of degraded DNA from human skeletal remains, telogen hairs, and other challenging samples. In the evaluation of inhibition by qPCR, the effect of amplicon length and primer melting temperature was evaluated in order to determine the binding mechanisms of different PCR inhibitors. Several mechanisms were indicated by the inhibitors tested, including binding of the polymerase, binding to the DNA, and effects on the processivity of the polymerase during primer extension. The data obtained from qPCR illustrated a method by which the type of inhibitor could be inferred in forensic samples, and some methods of reducing inhibition for specific inhibitors were demonstrated. An understanding of the mechanism of the inhibitors found in forensic samples will allow analysts to select the proper methods for inhibition removal or the type of analysis that can be performed, and will increase the information that can be obtained from inhibited samples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The primary goal of this dissertation is the study of patterns of viral evolution inferred from serially-sampled sequence data, i.e., sequence data obtained from strains isolated at consecutive time points from a single patient or host. RNA viral populations have an extremely high genetic variability, largely due to their astronomical population sizes within host systems, high replication rate, and short generation time. It is this aspect of their evolution that demands special attention and a different approach when studying the evolutionary relationships of serially-sampled sequence data. New methods that analyze serially-sampled data were developed shortly after a groundbreaking HIV-1 study of several patients from which viruses were isolated at recurring intervals over a period of 10 or more years. These methods assume a tree-like evolutionary model, while many RNA viruses have the capacity to exchange genetic material with one another using a process called recombination. ^ A genealogy involving recombination is best described by a network structure. A more general approach was implemented in a new computational tool, Sliding MinPD, one that is mindful of the sampling times of the input sequences and that reconstructs the viral evolutionary relationships in the form of a network structure with implicit representations of recombination events. The underlying network organization reveals unique patterns of viral evolution and could help explain the emergence of disease-associated mutants and drug-resistant strains, with implications for patient prognosis and treatment strategies. In order to comprehensively test the developed methods and to carry out comparison studies with other methods, synthetic data sets are critical. Therefore, appropriate sequence generators were also developed to simulate the evolution of serially-sampled recombinant viruses, new and more through evaluation criteria for recombination detection methods were established, and three major comparison studies were performed. The newly developed tools were also applied to "real" HIV-1 sequence data and it was shown that the results represented within an evolutionary network structure can be interpreted in biologically meaningful ways. ^