13 resultados para Optical pattern recognition Data processing
em Digital Commons at Florida International University
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as “histogram binning” inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. ^ Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. ^ The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. ^ These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. ^ In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation. ^
Resumo:
The need to provide computers with the ability to distinguish the affective state of their users is a major requirement for the practical implementation of affective computing concepts. This dissertation proposes the application of signal processing methods on physiological signals to extract from them features that can be processed by learning pattern recognition systems to provide cues about a person's affective state. In particular, combining physiological information sensed from a user's left hand in a non-invasive way with the pupil diameter information from an eye-tracking system may provide a computer with an awareness of its user's affective responses in the course of human-computer interactions. In this study an integrated hardware-software setup was developed to achieve automatic assessment of the affective status of a computer user. A computer-based "Paced Stroop Test" was designed as a stimulus to elicit emotional stress in the subject during the experiment. Four signals: the Galvanic Skin Response (GSR), the Blood Volume Pulse (BVP), the Skin Temperature (ST) and the Pupil Diameter (PD), were monitored and analyzed to differentiate affective states in the user. Several signal processing techniques were applied on the collected signals to extract their most relevant features. These features were analyzed with learning classification systems, to accomplish the affective state identification. Three learning algorithms: Naïve Bayes, Decision Tree and Support Vector Machine were applied to this identification process and their levels of classification accuracy were compared. The results achieved indicate that the physiological signals monitored do, in fact, have a strong correlation with the changes in the emotional states of the experimental subjects. These results also revealed that the inclusion of pupil diameter information significantly improved the performance of the emotion recognition system. ^
Resumo:
This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as "histogram binning" inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation.
Resumo:
Use of remotely sensed data for environmental and ecological assessment has recently become more widespread in wetland research and management and advantages and limitations of this approach have been addresses (Ozesmi and Bauer 2002). Applications of remote sensing (RS) methods vary in spatial and temporal extent and resolution, in the types of data acquired, and in digital processing and pattern recognition algorithms used.
Resumo:
This presentation was given at the FLVC regional conference at Broward College on May 7, 2015 and introduced scanning, processing, record creation, dissemination, and preservation in FIU Libraries' Digital Collections Center. The main focus was on processing, specifically employing OCR technology with difficult sources.
Resumo:
This research presents several components encompassing the scope of the objective of Data Partitioning and Replication Management in Distributed GIS Database. Modern Geographic Information Systems (GIS) databases are often large and complicated. Therefore data partitioning and replication management problems need to be addresses in development of an efficient and scalable solution. ^ Part of the research is to study the patterns of geographical raster data processing and to propose the algorithms to improve availability of such data. These algorithms and approaches are targeting granularity of geographic data objects as well as data partitioning in geographic databases to achieve high data availability and Quality of Service(QoS) considering distributed data delivery and processing. To achieve this goal a dynamic, real-time approach for mosaicking digital images of different temporal and spatial characteristics into tiles is proposed. This dynamic approach reuses digital images upon demand and generates mosaicked tiles only for the required region according to user's requirements such as resolution, temporal range, and target bands to reduce redundancy in storage and to utilize available computing and storage resources more efficiently. ^ Another part of the research pursued methods for efficient acquiring of GIS data from external heterogeneous databases and Web services as well as end-user GIS data delivery enhancements, automation and 3D virtual reality presentation. ^ There are vast numbers of computing, network, and storage resources idling or not fully utilized available on the Internet. Proposed "Crawling Distributed Operating System "(CDOS) approach employs such resources and creates benefits for the hosts that lend their CPU, network, and storage resources to be used in GIS database context. ^ The results of this dissertation demonstrate effective ways to develop a highly scalable GIS database. The approach developed in this dissertation has resulted in creation of TerraFly GIS database that is used by US government, researchers, and general public to facilitate Web access to remotely-sensed imagery and GIS vector information. ^
Resumo:
This research is to establish new optimization methods for pattern recognition and classification of different white blood cells in actual patient data to enhance the process of diagnosis. Beckman-Coulter Corporation supplied flow cytometry data of numerous patients that are used as training sets to exploit the different physiological characteristics of the different samples provided. The methods of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used as promising pattern classification techniques to identify different white blood cell samples and provide information to medical doctors in the form of diagnostic references for the specific disease states, leukemia. The obtained results prove that when a neural network classifier is well configured and trained with cross-validation, it can perform better than support vector classifiers alone for this type of data. Furthermore, a new unsupervised learning algorithm---Density based Adaptive Window Clustering algorithm (DAWC) was designed to process large volumes of data for finding location of high data cluster in real-time. It reduces the computational load to ∼O(N) number of computations, and thus making the algorithm more attractive and faster than current hierarchical algorithms.
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
This dissertation established a software-hardware integrated design for a multisite data repository in pediatric epilepsy. A total of 16 institutions formed a consortium for this web-based application. This innovative fully operational web application allows users to upload and retrieve information through a unique human-computer graphical interface that is remotely accessible to all users of the consortium. A solution based on a Linux platform with My-SQL and Personal Home Page scripts (PHP) has been selected. Research was conducted to evaluate mechanisms to electronically transfer diverse datasets from different hospitals and collect the clinical data in concert with their related functional magnetic resonance imaging (fMRI). What was unique in the approach considered is that all pertinent clinical information about patients is synthesized with input from clinical experts into 4 different forms, which were: Clinical, fMRI scoring, Image information, and Neuropsychological data entry forms. A first contribution of this dissertation was in proposing an integrated processing platform that was site and scanner independent in order to uniformly process the varied fMRI datasets and to generate comparative brain activation patterns. The data collection from the consortium complied with the IRB requirements and provides all the safeguards for security and confidentiality requirements. An 1-MR1-based software library was used to perform data processing and statistical analysis to obtain the brain activation maps. Lateralization Index (LI) of healthy control (HC) subjects in contrast to localization-related epilepsy (LRE) subjects were evaluated. Over 110 activation maps were generated, and their respective LIs were computed yielding the following groups: (a) strong right lateralization: (HC=0%, LRE=18%), (b) right lateralization: (HC=2%, LRE=10%), (c) bilateral: (HC=20%, LRE=15%), (d) left lateralization: (HC=42%, LRE=26%), e) strong left lateralization: (HC=36%, LRE=31%). Moreover, nonlinear-multidimensional decision functions were used to seek an optimal separation between typical and atypical brain activations on the basis of the demographics as well as the extent and intensity of these brain activations. The intent was not to seek the highest output measures given the inherent overlap of the data, but rather to assess which of the many dimensions were critical in the overall assessment of typical and atypical language activations with the freedom to select any number of dimensions and impose any degree of complexity in the nonlinearity of the decision space.
Resumo:
Accurately assessing the extent of myocardial tissue injury induced by Myocardial infarction (MI) is critical to the planning and optimization of MI patient management. With this in mind, this study investigated the feasibility of using combined fluorescence and diffuse reflectance spectroscopy to characterize a myocardial infarct at the different stages of its development. An animal study was conducted using twenty male Sprague-Dawley rats with MI. In vivo fluorescence spectra at 337 nm excitation and diffuse reflectance between 400 nm and 900 nm were measured from the heart using a portable fiber-optic spectroscopic system. Spectral acquisition was performed on (1) the normal heart region; (2) the region immediately surrounding the infarct; and (3) the infarcted region—one, two, three and four weeks into MI development. The spectral data were divided into six subgroups according to the histopathological features associated with various degrees/severities of myocardial tissue injury as well as various stages of myocardial tissue remodeling, post infarction. Various data processing and analysis techniques were employed to recognize the representative spectral features corresponding to various histopathological features associated with myocardial infarction. The identified spectral features were utilized in discriminant analysis to further evaluate their effectiveness in classifying tissue injuries induced by MI. In this study, it was observed that MI induced significant alterations (p < 0.05) in the diffuse reflectance spectra, especially between 450 nm and 600 nm, from myocardial tissue within the infarcted and surrounding regions. In addition, MI induced a significant elevation in fluorescence intensities at 400 and 460 nm from the myocardial tissue from the same regions. The extent of these spectral alterations was related to the duration of the infarction. Using the spectral features identified, an effective tissue injury classification algorithm was developed which produced a satisfactory overall classification result (87.8%). The findings of this research support the concept that optical spectroscopy represents a useful tool to non-invasively determine the in vivo pathophysiological features of a myocardial infarct and its surrounding tissue, thereby providing valuable real-time feedback to surgeons during various surgical interventions for MI.
Resumo:
This dissertation introduces a novel automated book reader as an assistive technology tool for persons with blindness. The literature shows extensive work in the area of optical character recognition, but the current methodologies available for the automated reading of books or bound volumes remain inadequate and are severely constrained during document scanning or image acquisition processes. The goal of the book reader design is to automate and simplify the task of reading a book while providing a user-friendly environment with a realistic but affordable system design. This design responds to the main concerns of (a) providing a method of image acquisition that maintains the integrity of the source (b) overcoming optical character recognition errors created by inherent imaging issues such as curvature effects and barrel distortion, and (c) determining a suitable method for accurate recognition of characters that yields an interface with the ability to read from any open book with a high reading accuracy nearing 98%. This research endeavor focuses in its initial aim on the development of an assistive technology tool to help persons with blindness in the reading of books and other bound volumes. But its secondary and broader aim is to also find in this design the perfect platform for the digitization process of bound documentation in line with the mission of the Open Content Alliance (OCA), a nonprofit Alliance at making reading materials available in digital form. The theoretical perspective of this research relates to the mathematical developments that are made in order to resolve both the inherent distortions due to the properties of the camera lens and the anticipated distortions of the changing page curvature as one leafs through the book. This is evidenced by the significant increase of the recognition rate of characters and a high accuracy read-out through text to speech processing. This reasonably priced interface with its high performance results and its compatibility to any computer or laptop through universal serial bus connectors extends greatly the prospects for universal accessibility to documentation.
Resumo:
Accurately assessing the extent of myocardial tissue injury induced by Myocardial infarction (MI) is critical to the planning and optimization of MI patient management. With this in mind, this study investigated the feasibility of using combined fluorescence and diffuse reflectance spectroscopy to characterize a myocardial infarct at the different stages of its development. An animal study was conducted using twenty male Sprague-Dawley rats with MI. In vivo fluorescence spectra at 337 nm excitation and diffuse reflectance between 400 nm and 900 nm were measured from the heart using a portable fiber-optic spectroscopic system. Spectral acquisition was performed on - (1) the normal heart region; (2) the region immediately surrounding the infarct; and (3) the infarcted region - one, two, three and four weeks into MI development. The spectral data were divided into six subgroups according to the histopathological features associated with various degrees / severities of myocardial tissue injury as well as various stages of myocardial tissue remodeling, post infarction. Various data processing and analysis techniques were employed to recognize the representative spectral features corresponding to various histopathological features associated with myocardial infarction. The identified spectral features were utilized in discriminant analysis to further evaluate their effectiveness in classifying tissue injuries induced by MI. In this study, it was observed that MI induced significant alterations (p < 0.05) in the diffuse reflectance spectra, especially between 450 nm and 600 nm, from myocardial tissue within the infarcted and surrounding regions. In addition, MI induced a significant elevation in fluorescence intensities at 400 and 460 nm from the myocardial tissue from the same regions. The extent of these spectral alterations was related to the duration of the infarction. Using the spectral features identified, an effective tissue injury classification algorithm was developed which produced a satisfactory overall classification result (87.8%). The findings of this research support the concept that optical spectroscopy represents a useful tool to non-invasively determine the in vivo pathophysiological features of a myocardial infarct and its surrounding tissue, thereby providing valuable real-time feedback to surgeons during various surgical interventions for MI.