2 resultados para Data Bases
em Digital Commons at Florida International University
Resumo:
The aim of this work was to develop a new methodology, which can be used to design new refrigerants that are better than the currently used refrigerants. The methodology draws some parallels with the general approach of computer aided molecular design. However, the mathematical way of representing the molecular structure of an organic compound and the use of meta models during the optimization process make it different. In essence, this approach aimed to generate molecules that conform to various property requirements that are known and specified a priori. A modified way of mathematically representing the molecular structure of an organic compound having up to four carbon atoms, along with atoms of other elements such as hydrogen, oxygen, fluorine, chlorine and bromine, was developed. The normal boiling temperature, enthalpy of vaporization, vapor pressure, tropospheric lifetime and biodegradability of 295 different organic compounds, were collected from open literature and data bases or estimated. Surrogate models linking the previously mentioned quantities with the molecular structure were developed. Constraints ensuring the generation of structurally feasible molecules were formulated and used in commercially available optimization algorithms to generate molecular structures of promising new refrigerants. This study was intended to serve as a proof-of-concept of designing refrigerants using the newly developed methodology.
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^