3 resultados para Statistical Analysis

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation develops a new figure of merit to measure the similarity (or dissimilarity) of Gaussian distributions through a novel concept that relates the Fisher distance to the percentage of data overlap. The derivations are expanded to provide a generalized mathematical platform for determining an optimal separating boundary of Gaussian distributions in multiple dimensions. Real-world data used for implementation and in carrying out feasibility studies were provided by Beckman-Coulter. It is noted that although the data used is flow cytometric in nature, the mathematics are general in their derivation to include other types of data as long as their statistical behavior approximate Gaussian distributions. ^ Because this new figure of merit is heavily based on the statistical nature of the data, a new filtering technique is introduced to accommodate for the accumulation process involved with histogram data. When data is accumulated into a frequency histogram, the data is inherently smoothed in a linear fashion, since an averaging effect is taking place as the histogram is generated. This new filtering scheme addresses data that is accumulated in the uneven resolution of the channels of the frequency histogram. ^ The qualitative interpretation of flow cytometric data is currently a time consuming and imprecise method for evaluating histogram data. This method offers a broader spectrum of capabilities in the analysis of histograms, since the figure of merit derived in this dissertation integrates within its mathematics both a measure of similarity and the percentage of overlap between the distributions under analysis. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The adverse health effects of long-term exposure to lead are well established, with major uptake into the human body occurring mainly through oral ingestion by young children. Lead-based paint was frequently used in homes built before 1978, particularly in inner-city areas. Minority populations experience the effects of lead poisoning disproportionately. ^ Lead-based paint abatement is costly. In the United States, residents of about 400,000 homes, occupied by 900,000 young children, lack the means to correct lead-based paint hazards. The magnitude of this problem demands research on affordable methods of hazard control. One method is encapsulation, defined as any covering or coating that acts as a permanent barrier between the lead-based paint surface and the environment. ^ Two encapsulants were tested for reliability and effective life span through an accelerated lifetime experiment that applied stresses exceeding those encountered under normal use conditions. The resulting time-to-failure data were used to extrapolate the failure time under conditions of normal use. Statistical analysis and models of the test data allow forecasting of long-term reliability relative to the 20-year encapsulation requirement. Typical housing material specimens simulating walls and doors coated with lead-based paint were overstressed before encapsulation. A second, un-aged set was also tested. Specimens were monitored after the stress test with a surface chemical testing pad to identify the presence of lead breaking through the encapsulant. ^ Graphical analysis proposed by Shapiro and Meeker and the general log-linear model developed by Cox were used to obtain results. Findings for the 80% reliability time to failure varied, with close to 21 years of life under normal use conditions for encapsulant A. The application of product A on the aged gypsum and aged wood substrates yielded slightly lower times. Encapsulant B had an 80% reliable life of 19.78 years. ^ This study reveals that encapsulation technologies can offer safe and effective control of lead-based paint hazards and may be less expensive than other options. The U.S. Department of Health and Human Services and the CDC are committed to eliminating childhood lead poisoning by 2010. This ambitious target is feasible, provided there is an efficient application of innovative technology, a goal to which this study aims to contribute. ^