4 resultados para Genetic clustering analysis

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Preimplantation genetic diagnosis (PGD) following in vitro fertilization (IVF) offers couples at risk for transmitting genetic disorders the opportunity to identify affected embryos prior to replacement. In particular, embryo gender determination permits screening for X-linked diseases of unknown etiology. Analysis of embryos can be performed by polymerase chain reaction (PCR) amplification of material obtained by micromanipulation. This approach provides an alternative to the termination of an established pregnancy following chorionic villi sampling or amniocentesis. ^ Lately, the focus of preimplantation diagnosis and intervention has been shifting toward an attempt to correct cytoplasmic deficiencies. Accordingly, it is the aim of this investigation to develop methods to permit the examination of single cells or components thereof for clinical evaluation. In an attempt to lay the groundwork for precise therapeutic intervention for age related aneuploidy, transcripts encoding proteins believed to be involved in the proper segregation of chromosomes during human oocyte maturation were examined and quantified. Following fluorescent rapid cycle RT-PCR analysis it was determined that the concentration of cell cycle checkpoint gene transcripts decreases significantly as maternal age increases. Given the well established link between increasing maternal age and the incidence of aneuploidy, these results suggest that the degradation of these messages in aging oocytes may be involved with inappropriate chromosome separation during meiosis. ^ In order to investigate the cause of embryonic rescue observed following clinical cytoplasmic transfer procedures and with the objective of developing a diagnostic tool, mtDNA concentrations in polar bodies and subcellular components were evaluated. First, the typical concentration of mtDNA in human and mouse oocytes was determined by fluorescent rapid cycle PCR. Some disparity was noted between the copy numbers of individual cytoplasmic samples which may limit the use of the current methodology for the clinical assessment of the corresponding oocyte. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The accurate and reliable estimation of travel time based on point detector data is needed to support Intelligent Transportation System (ITS) applications. It has been found that the quality of travel time estimation is a function of the method used in the estimation and varies for different traffic conditions. In this study, two hybrid on-line travel time estimation models, and their corresponding off-line methods, were developed to achieve better estimation performance under various traffic conditions, including recurrent congestion and incidents. The first model combines the Mid-Point method, which is a speed-based method, with a traffic flow-based method. The second model integrates two speed-based methods: the Mid-Point method and the Minimum Speed method. In both models, the switch between travel time estimation methods is based on the congestion level and queue status automatically identified by clustering analysis. During incident conditions with rapidly changing queue lengths, shock wave analysis-based refinements are applied for on-line estimation to capture the fast queue propagation and recovery. Travel time estimates obtained from existing speed-based methods, traffic flow-based methods, and the models developed were tested using both simulation and real-world data. The results indicate that all tested methods performed at an acceptable level during periods of low congestion. However, their performances vary with an increase in congestion. Comparisons with other estimation methods also show that the developed hybrid models perform well in all cases. Further comparisons between the on-line and off-line travel time estimation methods reveal that off-line methods perform significantly better only during fast-changing congested conditions, such as during incidents. The impacts of major influential factors on the performance of travel time estimation, including data preprocessing procedures, detector errors, detector spacing, frequency of travel time updates to traveler information devices, travel time link length, and posted travel time range, were investigated in this study. The results show that these factors have more significant impacts on the estimation accuracy and reliability under congested conditions than during uncongested conditions. For the incident conditions, the estimation quality improves with the use of a short rolling period for data smoothing, more accurate detector data, and frequent travel time updates.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

To carry out their specific roles in the cell, genes and gene products often work together in groups, forming many relationships among themselves and with other molecules. Such relationships include physical protein-protein interaction relationships, regulatory relationships, metabolic relationships, genetic relationships, and much more. With advances in science and technology, some high throughput technologies have been developed to simultaneously detect tens of thousands of pairwise protein-protein interactions and protein-DNA interactions. However, the data generated by high throughput methods are prone to noise. Furthermore, the technology itself has its limitations, and cannot detect all kinds of relationships between genes and their products. Thus there is a pressing need to investigate all kinds of relationships and their roles in a living system using bioinformatic approaches, and is a central challenge in Computational Biology and Systems Biology. This dissertation focuses on exploring relationships between genes and gene products using bioinformatic approaches. Specifically, we consider problems related to regulatory relationships, protein-protein interactions, and semantic relationships between genes. A regulatory element is an important pattern or "signal", often located in the promoter of a gene, which is used in the process of turning a gene "on" or "off". Predicting regulatory elements is a key step in exploring the regulatory relationships between genes and gene products. In this dissertation, we consider the problem of improving the prediction of regulatory elements by using comparative genomics data. With regard to protein-protein interactions, we have developed bioinformatics techniques to estimate support for the data on these interactions. While protein-protein interactions and regulatory relationships can be detected by high throughput biological techniques, there is another type of relationship called semantic relationship that cannot be detected by a single technique, but can be inferred using multiple sources of biological data. The contributions of this thesis involved the development and application of a set of bioinformatic approaches that address the challenges mentioned above. These included (i) an EM-based algorithm that improves the prediction of regulatory elements using comparative genomics data, (ii) an approach for estimating the support of protein-protein interaction data, with application to functional annotation of genes, (iii) a novel method for inferring functional network of genes, and (iv) techniques for clustering genes using multi-source data.