919 resultados para Optical pattern recognition Data processing
Resumo:
This article presents the principal results of the doctoral thesis “Recognition of neume notation in historical documents” by Lasko Laskov (Institute of Mathematics and Informatics at Bulgarian Academy of Sciences), successfully defended before the Specialized Academic Council for Informatics and Mathematical Modelling on 07 June 2010.
Resumo:
This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.
Resumo:
With the latest development in computer science, multivariate data analysis methods became increasingly popular among economists. Pattern recognition in complex economic data and empirical model construction can be more straightforward with proper application of modern softwares. However, despite the appealing simplicity of some popular software packages, the interpretation of data analysis results requires strong theoretical knowledge. This book aims at combining the development of both theoretical and applicationrelated data analysis knowledge. The text is designed for advanced level studies and assumes acquaintance with elementary statistical terms. After a brief introduction to selected mathematical concepts, the highlighting of selected model features is followed by a practice-oriented introduction to the interpretation of SPSS1 outputs for the described data analysis methods. Learning of data analysis is usually time-consuming and requires efforts, but with tenacity the learning process can bring about a significant improvement of individual data analysis skills.
Resumo:
This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as “histogram binning” inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. ^ Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. ^ The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. ^ These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. ^ In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation. ^
Resumo:
The need to provide computers with the ability to distinguish the affective state of their users is a major requirement for the practical implementation of affective computing concepts. This dissertation proposes the application of signal processing methods on physiological signals to extract from them features that can be processed by learning pattern recognition systems to provide cues about a person's affective state. In particular, combining physiological information sensed from a user's left hand in a non-invasive way with the pupil diameter information from an eye-tracking system may provide a computer with an awareness of its user's affective responses in the course of human-computer interactions. In this study an integrated hardware-software setup was developed to achieve automatic assessment of the affective status of a computer user. A computer-based "Paced Stroop Test" was designed as a stimulus to elicit emotional stress in the subject during the experiment. Four signals: the Galvanic Skin Response (GSR), the Blood Volume Pulse (BVP), the Skin Temperature (ST) and the Pupil Diameter (PD), were monitored and analyzed to differentiate affective states in the user. Several signal processing techniques were applied on the collected signals to extract their most relevant features. These features were analyzed with learning classification systems, to accomplish the affective state identification. Three learning algorithms: Naïve Bayes, Decision Tree and Support Vector Machine were applied to this identification process and their levels of classification accuracy were compared. The results achieved indicate that the physiological signals monitored do, in fact, have a strong correlation with the changes in the emotional states of the experimental subjects. These results also revealed that the inclusion of pupil diameter information significantly improved the performance of the emotion recognition system. ^
Resumo:
This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as "histogram binning" inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation.
Resumo:
Use of remotely sensed data for environmental and ecological assessment has recently become more widespread in wetland research and management and advantages and limitations of this approach have been addresses (Ozesmi and Bauer 2002). Applications of remote sensing (RS) methods vary in spatial and temporal extent and resolution, in the types of data acquired, and in digital processing and pattern recognition algorithms used.
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
The main objective of this work was to develop a novel dimensionality reduction technique as a part of an integrated pattern recognition solution capable of identifying adulterants such as hazelnut oil in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. A novel Continuous Locality Preserving Projections (CLPP) technique is proposed which allows the modelling of the continuous nature of the produced in-house admixtures as data series instead of discrete points. The maintenance of the continuous structure of the data manifold enables the better visualisation of this examined classification problem and facilitates the more accurate utilisation of the manifold for detecting the adulterants. The performance of the proposed technique is validated with two different spectroscopic techniques (Raman and Fourier transform infrared, FT-IR). In all cases studied, CLPP accompanied by k-Nearest Neighbors (kNN) algorithm was found to outperform any other state-of-the-art pattern recognition techniques.
Resumo:
The advancement of GPS technology has made it possible to use GPS devices as orientation and navigation tools, but also as tools to track spatiotemporal information. GPS tracking data can be broadly applied in location-based services, such as spatial distribution of the economy, transportation routing and planning, traffic management and environmental control. Therefore, knowledge of how to process the data from a standard GPS device is crucial for further use. Previous studies have considered various issues of the data processing at the time. This paper, however, aims to outline a general procedure for processing GPS tracking data. The procedure is illustrated step-by-step by the processing of real-world GPS data of car movements in Borlänge in the centre of Sweden.
Resumo:
The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time. This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problem and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction.
Resumo:
The only method used to date to measure dissolved nitrate concentration (NITRATE) with sensors mounted on profiling floats is based on the absorption of light at ultraviolet wavelengths by nitrate ion (Johnson and Coletti, 2002; Johnson et al., 2010; 2013; D’Ortenzio et al., 2012). Nitrate has a modest UV absorption band with a peak near 210 nm, which overlaps with the stronger absorption band of bromide, which has a peak near 200 nm. In addition, there is a much weaker absorption due to dissolved organic matter and light scattering by particles (Ogura and Hanya, 1966). The UV spectrum thus consists of three components, bromide, nitrate and a background due to organics and particles. The background also includes thermal effects on the instrument and slow drift. All of these latter effects (organics, particles, thermal effects and drift) tend to be smooth spectra that combine to form an absorption spectrum that is linear in wavelength over relatively short wavelength spans. If the light absorption spectrum is measured in the wavelength range around 217 to 240 nm (the exact range is a bit of a decision by the operator), then the nitrate concentration can be determined. Two different instruments based on the same optical principles are in use for this purpose. The In Situ Ultraviolet Spectrophotometer (ISUS) built at MBARI or at Satlantic has been mounted inside the pressure hull of a Teledyne/Webb Research APEX and NKE Provor profiling floats and the optics penetrate through the upper end cap into the water. The Satlantic Submersible Ultraviolet Nitrate Analyzer (SUNA) is placed on the outside of APEX, Provor, and Navis profiling floats in its own pressure housing and is connected to the float through an underwater cable that provides power and communications. Power, communications between the float controller and the sensor, and data processing requirements are essentially the same for both ISUS and SUNA. There are several possible algorithms that can be used for the deconvolution of nitrate concentration from the observed UV absorption spectrum (Johnson and Coletti, 2002; Arai et al., 2008; Sakamoto et al., 2009; Zielinski et al., 2011). In addition, the default algorithm that is available in Satlantic sensors is a proprietary approach, but this is not generally used on profiling floats. There are some tradeoffs in every approach. To date almost all nitrate sensors on profiling floats have used the Temperature Compensated Salinity Subtracted (TCSS) algorithm developed by Sakamoto et al. (2009), and this document focuses on that method. It is likely that there will be further algorithm development and it is necessary that the data systems clearly identify the algorithm that is used. It is also desirable that the data system allow for recalculation of prior data sets using new algorithms. To accomplish this, the float must report not just the computed nitrate, but the observed light intensity. Then, the rule to obtain only one NITRATE parameter is, if the spectrum is present then, the NITRATE should be recalculated from the spectrum while the computation of nitrate concentration can also generate useful diagnostics of data quality.
Resumo:
To analyze the characteristics and predict the dynamic behaviors of complex systems over time, comprehensive research to enable the development of systems that can intelligently adapt to the evolving conditions and infer new knowledge with algorithms that are not predesigned is crucially needed. This dissertation research studies the integration of the techniques and methodologies resulted from the fields of pattern recognition, intelligent agents, artificial immune systems, and distributed computing platforms, to create technologies that can more accurately describe and control the dynamics of real-world complex systems. The need for such technologies is emerging in manufacturing, transportation, hazard mitigation, weather and climate prediction, homeland security, and emergency response. Motivated by the ability of mobile agents to dynamically incorporate additional computational and control algorithms into executing applications, mobile agent technology is employed in this research for the adaptive sensing and monitoring in a wireless sensor network. Mobile agents are software components that can travel from one computing platform to another in a network and carry programs and data states that are needed for performing the assigned tasks. To support the generation, migration, communication, and management of mobile monitoring agents, an embeddable mobile agent system (Mobile-C) is integrated with sensor nodes. Mobile monitoring agents visit distributed sensor nodes, read real-time sensor data, and perform anomaly detection using the equipped pattern recognition algorithms. The optimal control of agents is achieved by mimicking the adaptive immune response and the application of multi-objective optimization algorithms. The mobile agent approach provides potential to reduce the communication load and energy consumption in monitoring networks. The major research work of this dissertation project includes: (1) studying effective feature extraction methods for time series measurement data; (2) investigating the impact of the feature extraction methods and dissimilarity measures on the performance of pattern recognition; (3) researching the effects of environmental factors on the performance of pattern recognition; (4) integrating an embeddable mobile agent system with wireless sensor nodes; (5) optimizing agent generation and distribution using artificial immune system concept and multi-objective algorithms; (6) applying mobile agent technology and pattern recognition algorithms for adaptive structural health monitoring and driving cycle pattern recognition; (7) developing a web-based monitoring network to enable the visualization and analysis of real-time sensor data remotely. Techniques and algorithms developed in this dissertation project will contribute to research advances in networked distributed systems operating under changing environments.
Resumo:
This presentation was given at the FLVC regional conference at Broward College on May 7, 2015 and introduced scanning, processing, record creation, dissemination, and preservation in FIU Libraries' Digital Collections Center. The main focus was on processing, specifically employing OCR technology with difficult sources.
Resumo:
Liquid chromatography coupled with mass spectrometry is one of the most powerful tools in the toxicologist’s arsenal to detect a wide variety of compounds from many different matrices. However, the huge number of potentially abused substances and new substances especially designed as intoxicants poses a problem in a forensic toxicology setting. Most methods are targeted and designed to cover a very specific drug or group of drugs while many other substances remain undetected. High resolution mass spectrometry, more specifically time-of-flight mass spectrometry, represents an extremely powerful tool in analysing a multitude of compounds not only simultaneously but also retroactively. The data obtained through the time-of-flight instrument contains all compounds made available from sample extraction and chromatography, which can be processed at a later time with an improved library to detect previously unrecognised compounds without having to analyse the respective sample again. The aim of this project was to determine the utility and limitations of time-of-flight mass spectrometry as a general and easily expandable screening method. The resolution of time-of-flight mass spectrometry allows for the separation of compounds with the same nominal mass but distinct exact masses without the need to separate them chromatographically. To simulate the wide variety of potentially encountered drugs in such a general screening method, seven drugs (morphine, cocaine, zolpidem, diazepam, amphetamine, MDEA and THC) were chosen to represent this variety in terms of mass, properties and functional groups. Consequently, several liquid-liquid and solid phase extractions were applied to urine samples to determine the most general suitable and unspecific extraction. Chromatography was optimised by investigating the parameters pH, concentration, organic solvent and gradient of the mobile phase to improve data obtained by the time-of-flight instrument. The resulting method was validated as a qualitative confirmation/identification method. Data processing was automated using the software TargetAnalysis, which provides excellent analyte recognition according to retention time, exact mass and isotope pattern. The recognition of isotope patterns allows excellent recognition of analytes even in interference rich mass spectra and proved to be a good positive indicator. Finally, the validated method was applied to samples received from the A& E Department of Glasgow Royal Infirmary in suspected drug abuse cases and samples received from the Scottish Prison Service, which we received from their own prevalence study targeting drugs of abuse in the prison population. The obtained data was processed with a library established in the course of this work.