7 resultados para text classification

em Digital Commons at Florida International University


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation develops a new figure of merit to measure the similarity (or dissimilarity) of Gaussian distributions through a novel concept that relates the Fisher distance to the percentage of data overlap. The derivations are expanded to provide a generalized mathematical platform for determining an optimal separating boundary of Gaussian distributions in multiple dimensions. Real-world data used for implementation and in carrying out feasibility studies were provided by Beckman-Coulter. It is noted that although the data used is flow cytometric in nature, the mathematics are general in their derivation to include other types of data as long as their statistical behavior approximate Gaussian distributions. ^ Because this new figure of merit is heavily based on the statistical nature of the data, a new filtering technique is introduced to accommodate for the accumulation process involved with histogram data. When data is accumulated into a frequency histogram, the data is inherently smoothed in a linear fashion, since an averaging effect is taking place as the histogram is generated. This new filtering scheme addresses data that is accumulated in the uneven resolution of the channels of the frequency histogram. ^ The qualitative interpretation of flow cytometric data is currently a time consuming and imprecise method for evaluating histogram data. This method offers a broader spectrum of capabilities in the analysis of histograms, since the figure of merit derived in this dissertation integrates within its mathematics both a measure of similarity and the percentage of overlap between the distributions under analysis. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research is to establish new optimization methods for pattern recognition and classification of different white blood cells in actual patient data to enhance the process of diagnosis. Beckman-Coulter Corporation supplied flow cytometry data of numerous patients that are used as training sets to exploit the different physiological characteristics of the different samples provided. The methods of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used as promising pattern classification techniques to identify different white blood cell samples and provide information to medical doctors in the form of diagnostic references for the specific disease states, leukemia. The obtained results prove that when a neural network classifier is well configured and trained with cross-validation, it can perform better than support vector classifiers alone for this type of data. Furthermore, a new unsupervised learning algorithm---Density based Adaptive Window Clustering algorithm (DAWC) was designed to process large volumes of data for finding location of high data cluster in real-time. It reduces the computational load to ∼O(N) number of computations, and thus making the algorithm more attractive and faster than current hierarchical algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Flow Cytometry analyzers have become trusted companions due to their ability to perform fast and accurate analyses of human blood. The aim of these analyses is to determine the possible existence of abnormalities in the blood that have been correlated with serious disease states, such as infectious mononucleosis, leukemia, and various cancers. Though these analyzers provide important feedback, it is always desired to improve the accuracy of the results. This is evidenced by the occurrences of misclassifications reported by some users of these devices. It is advantageous to provide a pattern interpretation framework that is able to provide better classification ability than is currently available. Toward this end, the purpose of this dissertation was to establish a feature extraction and pattern classification framework capable of providing improved accuracy for detecting specific hematological abnormalities in flow cytometric blood data. ^ This involved extracting a unique and powerful set of shift-invariant statistical features from the multi-dimensional flow cytometry data and then using these features as inputs to a pattern classification engine composed of an artificial neural network (ANN). The contribution of this method consisted of developing a descriptor matrix that can be used to reliably assess if a donor’s blood pattern exhibits a clinically abnormal level of variant lymphocytes, which are blood cells that are potentially indicative of disorders such as leukemia and infectious mononucleosis. ^ This study showed that the set of shift-and-rotation-invariant statistical features extracted from the eigensystem of the flow cytometric data pattern performs better than other commonly-used features in this type of disease detection, exhibiting an accuracy of 80.7%, a sensitivity of 72.3%, and a specificity of 89.2%. This performance represents a major improvement for this type of hematological classifier, which has historically been plagued by poor performance, with accuracies as low as 60% in some cases. This research ultimately shows that an improved feature space was developed that can deliver improved performance for the detection of variant lymphocytes in human blood, thus providing significant utility in the realm of suspect flagging algorithms for the detection of blood-related diseases.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

South Florida’s watersheds have endured a century of urban and agricultural development and disruption of their hydrology. Spatial characterization of South Florida’s estuarine and coastal waters is important to Everglades’ restoration programs. We applied Factor Analysis and Hierarchical Clustering of water quality data in tandem to characterize and spatially subdivide South Florida’s coastal and estuarine waters. Segmentation rendered forty-four biogeochemically distinct water bodies whose spatial distribution is closely linked to geomorphology, circulation, benthic community pattern, and to water management. This segmentation has been adopted with minor changes by federal and state environmental agencies to derive numeric nutrient criteria.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A comprehensive, broadly accepted vegetation classification is important for ecosystem management, particularly for planning and monitoring. South Florida vegetation classification systems that are currently in use were largely arrived at subjectively and intuitively with the involvement of experienced botanical observers and ecologists, but with little support in terms of quantitative field data. The need to develop a field data-driven classification of South Florida vegetation that builds on the ecological organization has been recognized by the National Park Service and vegetation practitioners in the region. The present work, funded by the National Park Service Inventory and Monitoring Program - South Florida/Caribbean Network (SFCN), covers the first stage of a larger project whose goal is to apply extant vegetation data to test, and revise as necessary, an existing, widely used classification (Rutchey et al. 2006). The objectives of the first phase of the project were (1) to identify useful existing datasets, (2) to collect these data and compile them into a geodatabase, (3) to conduct an initial classification analysis of marsh sites, and (4) to design a strategy for augmenting existing information from poorly represented landscapes in order to develop a more comprehensive south Florida classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this project was to evaluate the use of remote sensing 1) to detect and map Everglades wetland plant communities at different scales; and 2) to compare map products delineated and resampled at various scales with the intent to quantify and describe the quantitative and qualitative differences between such products. We evaluated data provided by Digital Globe’s WorldView 2 (WV2) sensor with a spatial resolution of 2m and data from Landsat’s Thematic and Enhanced Thematic Mapper (TM and ETM+) sensors with a spatial resolution of 30m. We were also interested in the comparability and scalability of products derived from these data sources. The adequacy of each data set to map wetland plant communities was evaluated utilizing two metrics: 1) model-based accuracy estimates of the classification procedures; and 2) design-based post-classification accuracy estimates of derived maps.