913 resultados para 0801 Artificial Intelligence and Image Processing
Resumo:
The need to provide computers with the ability to distinguish the affective state of their users is a major requirement for the practical implementation of affective computing concepts. This dissertation proposes the application of signal processing methods on physiological signals to extract from them features that can be processed by learning pattern recognition systems to provide cues about a person's affective state. In particular, combining physiological information sensed from a user's left hand in a non-invasive way with the pupil diameter information from an eye-tracking system may provide a computer with an awareness of its user's affective responses in the course of human-computer interactions. In this study an integrated hardware-software setup was developed to achieve automatic assessment of the affective status of a computer user. A computer-based "Paced Stroop Test" was designed as a stimulus to elicit emotional stress in the subject during the experiment. Four signals: the Galvanic Skin Response (GSR), the Blood Volume Pulse (BVP), the Skin Temperature (ST) and the Pupil Diameter (PD), were monitored and analyzed to differentiate affective states in the user. Several signal processing techniques were applied on the collected signals to extract their most relevant features. These features were analyzed with learning classification systems, to accomplish the affective state identification. Three learning algorithms: Naïve Bayes, Decision Tree and Support Vector Machine were applied to this identification process and their levels of classification accuracy were compared. The results achieved indicate that the physiological signals monitored do, in fact, have a strong correlation with the changes in the emotional states of the experimental subjects. These results also revealed that the inclusion of pupil diameter information significantly improved the performance of the emotion recognition system. ^
Resumo:
This dissertation introduces the design of a multimodal, adaptive real-time assistive system as an alternate human computer interface that can be used by individuals with severe motor disabilities. The proposed design is based on the integration of a remote eye-gaze tracking system, voice recognition software, and a virtual keyboard. The methodology relies on a user profile that customizes eye gaze tracking using neural networks. The user profiling feature facilitates the notion of universal access to computing resources for a wide range of applications such as web browsing, email, word processing and editing. ^ The study is significant in terms of the integration of key algorithms to yield an adaptable and multimodal interface. The contributions of this dissertation stem from the following accomplishments: (a) establishment of the data transport mechanism between the eye-gaze system and the host computer yielding to a significantly low failure rate of 0.9%; (b) accurate translation of eye data into cursor movement through congregate steps which conclude with calibrated cursor coordinates using an improved conversion function; resulting in an average reduction of 70% of the disparity between the point of gaze and the actual position of the mouse cursor, compared with initial findings; (c) use of both a moving average and a trained neural network in order to minimize the jitter of the mouse cursor, which yield an average jittering reduction of 35%; (d) introduction of a new mathematical methodology to measure the degree of jittering of the mouse trajectory; (e) embedding an onscreen keyboard to facilitate text entry, and a graphical interface that is used to generate user profiles for system adaptability. ^ The adaptability nature of the interface is achieved through the establishment of user profiles, which may contain the jittering and voice characteristics of a particular user as well as a customized list of the most commonly used words ordered according to the user's preferences: in alphabetical or statistical order. This allows the system to successfully provide the capability of interacting with a computer. Every time any of the sub-system is retrained, the accuracy of the interface response improves even more. ^
Resumo:
Aerial observations of light pollution can fill an important gap between ground based surveys and nighttime satellite data. Terrestrially bound surveys are labor intensive and are generally limited to a small spatial extent, and while existing satellite data cover the whole world, they are limited to coarse resolution. This paper describes the production of a high resolution (1 m) mosaic image of the city of Berlin, Germany at night. The dataset is spatially analyzed to identify themajor sources of light pollution in the city based on urban land use data. An area-independent 'brightness factor' is introduced that allows direct comparison of the light emission from differently sized land use classes, and the percentage area with values above average brightness is calculated for each class. Using this methodology, lighting associated with streets has been found to be the dominant source of zenith directed light pollution (31.6%), although other land use classes have much higher average brightness. These results are compared with other urban light pollution quantification studies. The minimum resolution required for an analysis of this type is found to be near 10 m. Future applications of high resolution datasets such as this one could include: studies of the efficacy of light pollution mitigation measures, improved light pollution simulations, economic and energy use, the relationship between artificial light and ecological parameters (e.g. circadian rhythm, fitness, mate selection, species distributions, migration barriers and seasonal behavior), or the management of nightscapes. To encourage further scientific inquiry, the mosaic data is freely available at Pangaea.
Resumo:
Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.
Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.
Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.
Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.
Resumo:
The mismatch between human capacity and the acquisition of Big Data such as Earth imagery undermines commitments to Convention on Biological Diversity (CBD) and Aichi targets. Artificial intelligence (AI) solutions to Big Data issues are urgently needed as these could prove to be faster, more accurate, and cheaper. Reducing costs of managing protected areas in remote deep waters and in the High Seas is of great importance, and this is a realm where autonomous technology will be transformative.
Resumo:
[EN]Facial image processing is becoming widespread in human-computer applications, despite its complexity. High-level processes such as face recognition or gender determination rely on low-level routines that must e ectively detect and normalize the faces that appear in the input image. In this paper, a face detection and normalization system is described. The approach taken is based on a cascade of fast, weak classi ers that together try to determine whether a frontal face is present in the image.
Resumo:
Digital Image Processing is a rapidly evolving eld with growing applications in Science and Engineering. It involves changing the nature of an image in order to either improve its pictorial information for human interpretation or render it more suitable for autonomous machine perception. One of the major areas of image processing for human vision applications is image enhancement. The principal goal of image enhancement is to improve visual quality of an image, typically by taking advantage of the response of human visual system. Image enhancement methods are carried out usually in the pixel domain. Transform domain methods can often provide another way to interpret and understand image contents. A suitable transform, thus selected, should have less computational complexity. Sequency ordered arrangement of unique MRT (Mapped Real Transform) coe cients can give rise to an integer-to-integer transform, named Sequency based unique MRT (SMRT), suitable for image processing applications. The development of the SMRT from UMRT (Unique MRT), forward & inverse SMRT algorithms and the basis functions are introduced. A few properties of the SMRT are explored and its scope in lossless text compression is presented.
Resumo:
Multiphase flows, type oil–water-gas are very common among different industrial activities, such as chemical industries and petroleum extraction, and its measurements show some difficulties to be taken. Precisely determining the volume fraction of each one of the elements that composes a multiphase flow is very important in chemical plants and petroleum industries. This work presents a methodology able to determine volume fraction on Annular and Stratified multiphase flow system with the use of neutrons and artificial intelligence, using the principles of transmission/scattering of fast neutrons from a 241Am-Be source and measurements of point flow that are influenced by variations of volume fractions. The proposed geometries used on the mathematical model was used to obtain a data set where the thicknesses referred of each material had been changed in order to obtain volume fraction of each phase providing 119 compositions that were used in the simulation with MCNP-X –computer code based on Monte Carlo Method that simulates the radiation transport. An artificial neural network (ANN) was trained with data obtained using the MCNP-X, and used to correlate such measurements with the respective real fractions. The ANN was able to correlate the data obtained on the simulation with MCNP-X with the volume fractions of the multiphase flows (oil-water-gas), both in the pattern of annular flow as stratified, resulting in a average relative error (%) for each production set of: annular (air= 3.85; water = 4.31; oil=1.08); stratified (air=3.10, water 2.01, oil = 1.45). The method demonstrated good efficiency in the determination of each material that composes the phases, thus demonstrating the feasibility of the technique.
Resumo:
Au cours des dernières décennies, l’effort sur les applications de capteurs infrarouges a largement progressé dans le monde. Mais, une certaine difficulté demeure, en ce qui concerne le fait que les objets ne sont pas assez clairs ou ne peuvent pas toujours être distingués facilement dans l’image obtenue pour la scène observée. L’amélioration de l’image infrarouge a joué un rôle important dans le développement de technologies de la vision infrarouge de l’ordinateur, le traitement de l’image et les essais non destructifs, etc. Cette thèse traite de la question des techniques d’amélioration de l’image infrarouge en deux aspects, y compris le traitement d’une seule image infrarouge dans le domaine hybride espacefréquence, et la fusion d’images infrarouges et visibles employant la technique du nonsubsampled Contourlet transformer (NSCT). La fusion d’images peut être considérée comme étant la poursuite de l’exploration du modèle d’amélioration de l’image unique infrarouge, alors qu’il combine les images infrarouges et visibles en une seule image pour représenter et améliorer toutes les informations utiles et les caractéristiques des images sources, car une seule image ne pouvait contenir tous les renseignements pertinents ou disponibles en raison de restrictions découlant de tout capteur unique de l’imagerie. Nous examinons et faisons une enquête concernant le développement de techniques d’amélioration d’images infrarouges, et ensuite nous nous consacrons à l’amélioration de l’image unique infrarouge, et nous proposons un schéma d’amélioration de domaine hybride avec une méthode d’évaluation floue de seuil amélioré, qui permet d’obtenir une qualité d’image supérieure et améliore la perception visuelle humaine. Les techniques de fusion d’images infrarouges et visibles sont établies à l’aide de la mise en oeuvre d’une mise en registre précise des images sources acquises par différents capteurs. L’algorithme SURF-RANSAC est appliqué pour la mise en registre tout au long des travaux de recherche, ce qui conduit à des images mises en registre de façon très précise et des bénéfices accrus pour le traitement de fusion. Pour les questions de fusion d’images infrarouges et visibles, une série d’approches avancées et efficaces sont proposés. Une méthode standard de fusion à base de NSCT multi-canal est présente comme référence pour les approches de fusion proposées suivantes. Une approche conjointe de fusion, impliquant l’Adaptive-Gaussian NSCT et la transformée en ondelettes (Wavelet Transform, WT) est propose, ce qui conduit à des résultats de fusion qui sont meilleurs que ceux obtenus avec les méthodes non-adaptatives générales. Une approche de fusion basée sur le NSCT employant la détection comprime (CS, compressed sensing) et de la variation totale (TV) à des coefficients d’échantillons clairsemés et effectuant la reconstruction de coefficients fusionnés de façon précise est proposée, qui obtient de bien meilleurs résultats de fusion par le biais d’une pré-amélioration de l’image infrarouge et en diminuant les informations redondantes des coefficients de fusion. Une procédure de fusion basée sur le NSCT utilisant une technique de détection rapide de rétrécissement itératif comprimé (fast iterative-shrinking compressed sensing, FISCS) est proposée pour compresser les coefficients décomposés et reconstruire les coefficients fusionnés dans le processus de fusion, qui conduit à de meilleurs résultats plus rapidement et d’une manière efficace.
Resumo:
Multimedia objects, especially images and figures, are essential for the visualization and interpretation of research findings. The distribution and reuse of these scientific objects is significantly improved under open access conditions, for instance in Wikipedia articles, in research literature, as well as in education and knowledge dissemination, where licensing of images often represents a serious barrier. Whereas scientific publications are retrievable through library portals or other online search services due to standardized indices there is no targeted retrieval and access to the accompanying images and figures yet. Consequently there is a great demand to develop standardized indexing methods for these multimedia open access objects in order to improve the accessibility to this material. With our proposal, we hope to serve a broad audience which looks up a scientific or technical term in a web search portal first. Until now, this audience has little chance to find an openly accessible and reusable image narrowly matching their search term on first try - frustratingly so, even if there is in fact such an image included in some open access article.
Resumo:
Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation.
Resumo:
Theories of sparse signal representation, wherein a signal is decomposed as the sum of a small number of constituent elements, play increasing roles in both mathematical signal processing and neuroscience. This happens despite the differences between signal models in the two domains. After reviewing preliminary material on sparse signal models, I use work on compressed sensing for the electron tomography of biological structures as a target for exploring the efficacy of sparse signal reconstruction in a challenging application domain. My research in this area addresses a topic of keen interest to the biological microscopy community, and has resulted in the development of tomographic reconstruction software which is competitive with the state of the art in its field. Moving from the linear signal domain into the nonlinear dynamics of neural encoding, I explain the sparse coding hypothesis in neuroscience and its relationship with olfaction in locusts. I implement a numerical ODE model of the activity of neural populations responsible for sparse odor coding in locusts as part of a project involving offset spiking in the Kenyon cells. I also explain the validation procedures we have devised to help assess the model's similarity to the biology. The thesis concludes with the development of a new, simplified model of locust olfactory network activity, which seeks with some success to explain statistical properties of the sparse coding processes carried out in the network.
Resumo:
El vertiginoso crecimiento de los centros urbanos, las tecnologías emergentes y la demanda de nuevos servicios por parte de la población plantea encaminar esfuerzos hacia el desarrollo de las ciudades inteligentes. Éste concepto ha tomado fuerza entre los sectores político, económico, social, académico, ambiental y civil; de forma paralela, se han generado iniciativas que conducen hacia la integración de la infraestructura, la tecnología y los servicios para los ciudadanos. En éste contexto, una de las problemáticas con mayor impacto en la sociedad es la seguridad vial. Es necesario contar con mecanismos que disminuyan la accidentalidad, mejoren la atención a incidentes, optimicen la movilidad urbana y planeación municipal, ayuden a reducir el consumo de combustible y la emisión de gases de efecto de invernadero, así como ofrecer información dinámica y efectiva a los viajeros. En este artículo se describen dos (2) enfoques que contribuyen de manera eficiente dicho problema: los videojuegos como juegos serios y los sistemas de transporte inteligente. Ambos enfoques están encaminados a evitar colisiones y su diseño e implementación requieren componentes altamente tecnológicos (e.g. sistemas telemáticos e informáticos, inteligencia artificial, procesamiento de imágenes y modelado 3D).
Resumo:
Tese (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Mecânica, 2016.
Resumo:
It is well known that rib cage dimensions depend on the gender and vary with the age of the individual. Under this setting it is therefore possible to assume that a computational approach to the problem may be thought out and, consequently, this work will focus on the development of an Artificial Intelligence grounded decision support system to predict individual’s age, based on such measurements. On the one hand, using some basic image processing techniques it were extracted such descriptions from chest X-rays (i.e., its maximum width and height). On the other hand, the computational framework was built on top of a Logic Programming Case Base approach to knowledge representation and reasoning, which caters for the handling of incomplete, unknown, or even contradictory information. Furthermore, clustering methods based on similarity analysis among cases were used to distinguish and aggregate collections of historical data in order to reduce the search space, therefore enhancing the cases retrieval and the overall computational process. The accuracy of the proposed model is satisfactory, close to 90%.