2 resultados para Biological database

em Digital Commons at Florida International University


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There is limited scientific knowledge on the composition of human odor from different biological specimens and the effect that physiological and psychological health conditions could have on them. There is currently no direct comparison of the volatile organic compounds (VOCs) emanating from different biological specimens collected from healthy individuals as well as individuals with certain diagnosed medical conditions. Therefore the question of matching VOCs present in human odor across various biological samples and across health statuses remains unanswered. The main purpose of this study was to use analytical instrumental methods to compare the VOCs from different biological specimens from the same individual and to compare the populations evaluated in this project. The goals of this study were to utilize headspace solid-phase microextraction gas chromatography mass spectrometry (HS-SPME-GC/MS) to evaluate its potential for profiling VOCs from specimens collected using standard forensic and medical methods over three different populations: healthy group with no diagnosed medical or psychological condition, one group with diagnosed type 2 diabetes, and one group with diagnosed major depressive disorder. The pre-treatment methods of collection materials developed for the study allowed for the removal of targeted VOCs from the sampling kits prior to sampling, extraction and analysis. Optimized SPME-GC/MS conditions has been demonstrated to be capable of sampling, identifying and differentiating the VOCs present in the five biological specimens collected from different subjects and yielded excellent detection limits for the VOCs from buccal swab, breath, blood, and urine with average limits of detection of 8.3 ng. Visual, Spearman rank correlation, and PCA comparisons of the most abundant and frequent VOCs from each specimen demonstrated that each specimen has characteristic VOCs that allow them to be differentiated for both healthy and diseased individuals. Preliminary comparisons of VOC profiles of healthy individuals, patients with type 2 diabetes, and patients with major depressive disorder revealed compounds that could be used as potential biomarkers to differentiate between healthy and diseased individuals. Finally, a human biological specimen compound database has been created compiling the volatile compounds present in the emanations of human hand odor, oral fluids, breath, blood, and urine.