4 resultados para grid, clustering, statistical, clustering
em Digital Commons at Florida International University
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
Abstract Driven by the political and economic forces of cross-strait, Taiwan has become one of the major source markets for Hong Kong tourism industry since 1987. The major purposes of this study were to investigate the following factors (1) The influential factors of travel motivation, (2) The clusters of travel motivations, (3) The marketing segmentation of clusters of Taiwanese tourists to visit Hong Kong. Through ten travel agents, self-report surveys were distributed to collect data from 366 Taiwanese travelers. Hence, four push factors and six pull factors were identified as travel motivations through the factor analysis. Combined with the cluster analysis; five new groups were founded. Finally, five clusters which process unique profiles (location difference, visiting frequency, travel satisfaction, and destination loyalty) were addressed. The suggestions of developing effective market strategies to attract Taiwanese tourists to Hong Kong were also provided.
Resumo:
The present study examines the extent to which blacks are segregated in the suburban community of Coconut Grove, Florida. Hypersegregation, or the general tendency for blacks and whites to live apart, was examined in terms of four distinct dimensions: evenness, exposure, clustering, and concentration. Together, these dimensions define the geographic traits of the target area. Alone these indices can not capture the multi-dimensional levels of segregation and, therefore, by themselves underestimate the severity of segregation and isolation in this community. This study takes a contemporary view of segregation in a Dade County community to see if segregation is the catalyst to the sometime cited violent response of blacks. This study yields results that support the information in the literature review and the thesis research questions sections namely, that the blacks within the Grove do respond violently to the negative effects that racial segregation causes. This thesis is unique in two ways. It examines segregation in a suburban environment rather than an urban inner city, and it presents a responsive analysis of the individuals studied, rather than relying only on demographic and statistical data. ^
Resumo:
In this study, I divided samples from individuals within Afghanistan based upon geography (i.e., north versus south). I determined allelic frequencies and other statistical parameters for 15 STR loci (i.e., D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, Dl3S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, and FGA). I conducted pairwise comparisons with 19 neighboring Eurasian populations to assign Gstatistics and p-values. Categorizing the populations into five groups (i.e., Central Asia, East Asia, South Asia, the Middle East, and the Caucasus/Anatolia), I derived values for intra-population, inter-population, and total variance. Admixture analyses determined the highest allelic contributions to be from the Caucasus/ Anatolia, while negligible contributions were made by Central Asia and East Asia. A Correspondence Analysis revealed clustering of both northern and southern Afghanistan with Georgia, Turkey, northern Iran, and southern Iran of the Caucasus/ Anatolia and the Middle East. A Neighbor-Joining phylogenetic tree was constructed to generate bootstrap values over 1, 000 reiterations.