992 resultados para swd: Multimodal System
Resumo:
The past few years, multimodal interaction has been gaining importance in virtual environments. Although multimodality renders interacting with an environment more natural and intuitive, the development cycle of such an application is often long and expensive. In our overall field of research, we investigate how modelbased design can facilitate the development process by designing environments through the use of highlevel diagrams. In this scope, we present ‘NiMMiT’, a graphical notation for expressing and evaluating multimodal user interaction; we elaborate on the NiMMiT primitives and demonstrate its use by means of a comprehensive example.
Resumo:
Any automatically measurable, robust and distinctive physical characteristic or personal trait that can be used to identify an individual or verify the claimed identity of an individual, referred to as biometrics, has gained significant interest in the wake of heightened concerns about security and rapid advancements in networking, communication and mobility. Multimodal biometrics is expected to be ultra-secure and reliable, due to the presence of multiple and independent—verification clues. In this study, a multimodal biometric system utilising audio and facial signatures has been implemented and error analysis has been carried out. A total of one thousand face images and 250 sound tracks of 50 users are used for training the proposed system. To account for the attempts of the unregistered signatures data of 25 new users are tested. The short term spectral features were extracted from the sound data and Vector Quantization was done using K-means algorithm. Face images are identified based on Eigen face approach using Principal Component Analysis. The success rate of multimodal system using speech and face is higher when compared to individual unimodal recognition systems
Resumo:
THE RIGORS OF ESTABLISHING INNATENESS and domain specificity pose challenges to adaptationist models of music evolution. In articulating a series of constraints, the authors of the target articles provide strategies for investigating the potential origins of music. We propose additional approaches for exploring theories based on exaptation. We discuss a view of music as a multimodal system of engaging with affect, enabled by capacities of symbolism and a theory of mind.
Resumo:
The present thesis is a study of movie review entertainment (MRE) which is a contemporary Internet-based genre of texts. MRE are movie reviews in video form which are published online, usually as episodes of an MRE web show. Characteristic to MRE is combining humor and honest opinions in varying degrees as well as the use of subject materials, i.e. clips of the movies, as a part of the review. The study approached MRE from a linguistic perspective aiming to discover 1) whether MRE is primarily text- or image-based and what the primary functions of the modes are, 2) how a reviewer linguistically combines subject footage to her/his commentary?, 3) whether there is any internal variation in MRE regarding the aforementioned questions, and 4) how suitable the selected models and theories are in the analysis of this type of contemporary multimodal data. To answer the aforementioned questions, the multimodal system of image—text relations by Martinec and Salway (2005) in combination with categories of cohesion by Halliday and Hasan (1976) were applied to four full MRE videos which were transcribed in their entirety for the study. The primary data represent varying types of MRE: a current movie review, an analytic essay, a riff review, and a humorous essay. The results demonstrated that image vs. text prioritization can vary between reviews and also within a review. The current movie review and the two essays were primarily commentary-focused whereas the riff review was significantly more dependent on the use of imagery as the clips are a major source of humor which is a prominent value in that type of a review. In addition to humor, clips are used to exemplify the commentary. A reviewer also relates new information to the imagery as well as uses two modes to present the information in a review. Linguistically, the most frequent case was that the reviewer names participants and processes lexically in the commentary. Grammatical relations (reference items such as pronouns and adverbs and conjunctive items in the riff review) were also encountered. There was internal variation to a considerable degree. The methods chosen were deemed appropriate to answer the research questions. Further study could go beyond linguistics to include, for instance, genre and media studies.
Resumo:
Sketches are commonly used in the early stages of design. Our previous system allows users to sketch mechanical systems that the computer interprets. However, some parts of the mechanical system might be too hard or too complicated to express in the sketch. Adding speech recognition to create a multimodal system would move us toward our goal of creating a more natural user interface. This thesis examines the relationship between the verbal and sketch input, particularly how to segment and align the two inputs. Toward this end, subjects were recorded while they sketched and talked. These recordings were transcribed, and a set of rules to perform segmentation and alignment was created. These rules represent the knowledge that the computer needs to perform segmentation and alignment. The rules successfully interpreted the 24 data sets that they were given.
Resumo:
This work discusses the application of techniques of ensembles in multimodal recognition systems development in revocable biometrics. Biometric systems are the future identification techniques and user access control and a proof of this is the constant increases of such systems in current society. However, there is still much advancement to be developed, mainly with regard to the accuracy, security and processing time of such systems. In the search for developing more efficient techniques, the multimodal systems and the use of revocable biometrics are promising, and can model many of the problems involved in traditional biometric recognition. A multimodal system is characterized by combining different techniques of biometric security and overcome many limitations, how: failures in the extraction or processing the dataset. Among the various possibilities to develop a multimodal system, the use of ensembles is a subject quite promising, motivated by performance and flexibility that they are demonstrating over the years, in its many applications. Givin emphasis in relation to safety, one of the biggest problems found is that the biometrics is permanently related with the user and the fact of cannot be changed if compromised. However, this problem has been solved by techniques known as revocable biometrics, which consists of applying a transformation on the biometric data in order to protect the unique characteristics, making its cancellation and replacement. In order to contribute to this important subject, this work compares the performance of individual classifiers methods, as well as the set of classifiers, in the context of the original data and the biometric space transformed by different functions. Another factor to be highlighted is the use of Genetic Algorithms (GA) in different parts of the systems, seeking to further maximize their eficiency. One of the motivations of this development is to evaluate the gain that maximized ensembles systems by different GA can bring to the data in the transformed space. Another relevant factor is to generate revocable systems even more eficient by combining two or more functions of transformations, demonstrating that is possible to extract information of a similar standard through applying different transformation functions. With all this, it is clear the importance of revocable biometrics, ensembles and GA in the development of more eficient biometric systems, something that is increasingly important in the present day
Resumo:
Data coming out from various researches carried out over the last years in Italy on the problem of school dispersion in secondary school show that difficulty in studying mathematics is one of the most frequent reasons of discomfort reported by students. Nevertheless, it is definitely unrealistic to think we can do without such knowledge in today society: mathematics is largely taught in secondary school and it is not confined within technical-scientific courses only. It is reasonable to say that, although students may choose academic courses that are, apparently, far away from mathematics, all students will have to come to terms, sooner or later in their life, with this subject. Among the reasons of discomfort given by the study of mathematics, some mention the very nature of this subject and in particular the complex symbolic language through which it is expressed. In fact, mathematics is a multimodal system composed by oral and written verbal texts, symbol expressions, such as formulae and equations, figures and graphs. For this, the study of mathematics represents a real challenge to those who suffer from dyslexia: this is a constitutional condition limiting people performances in relation to the activities of reading and writing and, in particular, to the study of mathematical contents. Here the difficulties in working with verbal and symbolic codes entail, in turn, difficulties in the comprehension of texts from which to deduce operations that, once combined together, would lead to the problem final solution. Information technologies may support this learning disorder effectively. However, these tools have some implementation limits, restricting their use in the study of scientific subjects. Vocal synthesis word processors are currently used to compensate difficulties in reading within the area of classical studies, but they are not used within the area of mathematics. This is because the vocal synthesis (or we should say the screen reader supporting it) is not able to interpret all that is not textual, such as symbols, images and graphs. The DISMATH software, which is the subject of this project, would allow dyslexic users to read technical-scientific documents with the help of a vocal synthesis, to understand the spatial structure of formulae and matrixes, to write documents with a technical-scientific content in a format that is compatible with main scientific editors. The system uses LaTex, a text mathematic language, as mediation system. It is set up as LaTex editor, whose graphic interface, in line with main commercial products, offers some additional specific functions with the capability to support the needs of users who are not able to manage verbal and symbolic codes on their own. LaTex is translated in real time into a standard symbolic language and it is read by vocal synthesis in natural language, in order to increase, through the bimodal representation, the ability to process information. The understanding of the mathematic formula through its reading is made possible by the deconstruction of the formula itself and its “tree” representation, so allowing to identify the logical elements composing it. Users, even without knowing LaTex language, are able to write whatever scientific document they need: in fact the symbolic elements are recalled by proper menus and automatically translated by the software managing the correct syntax. The final aim of the project, therefore, is to implement an editor enabling dyslexic people (but not only them) to manage mathematic formulae effectively, through the integration of different software tools, so allowing a better teacher/learner interaction too.
Resumo:
This article begins with some recent considerations about real-time music, inspired by the latest contribution of French composer Philippe Manoury. Then, through the case study of the scenic performance La Traversée de la nuit, we analyse some perspectives for designing an Informed Virtual Environment dedicated to live show artistic domain.
Resumo:
Biometrics is an efficient technology with great possibilities in the area of security system development for official and commercial applications. The biometrics has recently become a significant part of any efficient person authentication solution. The advantage of using biometric traits is that they cannot be stolen, shared or even forgotten. The thesis addresses one of the emerging topics in Authentication System, viz., the implementation of Improved Biometric Authentication System using Multimodal Cue Integration, as the operator assisted identification turns out to be tedious, laborious and time consuming. In order to derive the best performance for the authentication system, an appropriate feature selection criteria has been evolved. It has been seen that the selection of too many features lead to the deterioration in the authentication performance and efficiency. In the work reported in this thesis, various judiciously chosen components of the biometric traits and their feature vectors are used for realizing the newly proposed Biometric Authentication System using Multimodal Cue Integration. The feature vectors so generated from the noisy biometric traits is compared with the feature vectors available in the knowledge base and the most matching pattern is identified for the purpose of user authentication. In an attempt to improve the success rate of the Feature Vector based authentication system, the proposed system has been augmented with the user dependent weighted fusion technique.
Resumo:
An algorithm for the real-time registration of a retinal video sequence captured with a scanning digital ophthalmoscope (SDO) to a retinal composite image is presented. This method is designed for a computer-assisted retinal laser photocoagulation system to compensate for retinal motion and hence enhance the accuracy, speed, and patient safety of retinal laser treatments. The procedure combines intensity and feature-based registration techniques. For the registration of an individual frame, the translational frame-to-frame motion between preceding and current frame is detected by normalized cross correlation. Next, vessel points on the current video frame are identified and an initial transformation estimate is constructed from the calculated translation vector and the quadratic registration matrix of the previous frame. The vessel points are then iteratively matched to the segmented vessel centerline of the composite image to refine the initial transformation and register the video frame to the composite image. Criteria for image quality and algorithm convergence are introduced, which assess the exclusion of single frames from the registration process and enable a loss of tracking signal if necessary. The algorithm was successfully applied to ten different video sequences recorded from patients. It revealed an average accuracy of 2.47 ± 2.0 pixels (∼23.2 ± 18.8 μm) for 2764 evaluated video frames and demonstrated that it meets the clinical requirements.
Resumo:
Image-guided, computer-assisted neurosurgery has emerged to improve localization and targeting, to provide a better anatomic definition of the surgical field, and to decrease invasiveness. Usually, in image-guided surgery, a computer displays the surgical field in a CT/MR environment, using axial, coronal or sagittal views, or even a 3D representation of the patient. Such a system forces the surgeon to look away from the surgical scene to the computer screen. Moreover, this kind of information, being pre-operative imaging, can not be modified during the operation, so it remains valid for guidance in the first stage of the surgical procedure, and mainly for rigid structures like bones. In order to solve the two constraints mentioned before, we are developing an ultrasoundguided surgical microscope. Such a system takes the advantage that surgical microscopy and ultrasound systems are already used in neurosurgery, so it does not add more complexity to the surgical procedure. We have integrated an optical tracking device in the microscope and an augmented reality overlay system with which we avoid the need to look away from the scene, providing correctly aligned surgical images with sub-millimeter accuracy. In addition to the standard CT and 3D views, we are able to track an ultrasound probe, and using a previous calibration and registration of the imaging, the image obtained is correctly projected to the overlay system, so the surgeon can always localize the target and verify the effects of the intervention. Several tests of the system have been already performed to evaluate the accuracy, and clinical experiments are currently in progress in order to validate the clinical usefulness of the system.
Resumo:
[EN]Social robots are receiving much interest in the robotics community. The most important goal for such robots lies in their interaction capabilities. An attention system is crucial, both as a filter to center the robot’s perceptual resources and as a mean of letting the observer know that the robot has intentionality. In this paper a simple but flexible and functional attentional model is described. The model, which has been implemented in an interactive robot currently under development, fuses both visual and auditive information extracted from the robot’s environment, and can incorporate knowledge-based influences on attention.
Resumo:
Mutations in the SPG4 gene (SPG4-HSP) are the most frequent cause of hereditary spastic paraplegia, but the extent of the neurodegeneration related to the disease is not yet known. Therefore, our objective is to identify regions of the central nervous system damaged in patients with SPG4-HSP using a multi-modal neuroimaging approach. In addition, we aimed to identify possible clinical correlates of such damage. Eleven patients (mean age 46.0 ± 15.0 years, 8 men) with molecular confirmation of hereditary spastic paraplegia, and 23 matched healthy controls (mean age 51.4 ± 14.1years, 17 men) underwent MRI scans in a 3T scanner. We used 3D T1 images to perform volumetric measurements of the brain and spinal cord. We then performed tract-based spatial statistics and tractography analyses of diffusion tensor images to assess microstructural integrity of white matter tracts. Disease severity was quantified with the Spastic Paraplegia Rating Scale. Correlations were then carried out between MRI metrics and clinical data. Volumetric analyses did not identify macroscopic abnormalities in the brain of hereditary spastic paraplegia patients. In contrast, we found extensive fractional anisotropy reduction in the corticospinal tracts, cingulate gyri and splenium of the corpus callosum. Spinal cord morphometry identified atrophy without flattening in the group of patients with hereditary spastic paraplegia. Fractional anisotropy of the corpus callosum and pyramidal tracts did correlate with disease severity. Hereditary spastic paraplegia is characterized by relative sparing of the cortical mantle and remarkable damage to the distal portions of the corticospinal tracts, extending into the spinal cord.
Resumo:
Few studies have investigated in vivo changes of the cholinergic basal forebrain in Alzheimer`s disease (AD) and amnestic mild cognitive impairment (MCI), an at risk stage of AD. Even less is known about alterations of cortical projecting fiber tracts associated with basal forebrain atrophy. In this study, we determined regional atrophy within the basal forebrain in 21 patients with AD and 16 subjects with MCI compared to 20 healthy elderly subjects using deformation-based morphometry of MRI scans. We assessed effects of basal forebrain atrophy on fiber tracts derived from high-resolution diffusion tensor imaging (DTI) using tract-based spatial statistics. We localized significant effects relative to a map of cholinergic nuclei in MRI standard space as determined from a postmortem brain. Patients with AD and MCI subjects showed reduced volumes in basal forebrain areas corresponding to anterior medial and lateral, intermediate and posterior nuclei of the Nucleus basalis of Meynert (NbM) as well as in the diagonal band of Broca nuclei (P < 0.01). Effects in MCI subjects were spatially more restricted than in AD, but occurred at similar locations. The volume of the right antero-lateral NbM nucleus was correlated with intracortical projecting fiber tract integrity such as the corpus callosum, cingulate, and the superior longitudinal, inferior longitudinal, inferior fronto-occipital, and uncinate fasciculus (P < 0.05, corrected for multiple comparisons). Our findings suggest that a multimodal MRI-DTI approach is supportive to determine atrophy of cholinergic nuclei and its effect on intracortical projecting fiber tracts in AD. Hum Brain Mapp 32: 1349-1362, 2011. (C) 2010 Wiley-Liss, Inc.
Resumo:
Esta dissertação apresenta o desenvolvimento de uma plataforma multimodal de aquisição e processamento de sinais. O projeto proposto insere-se no contexto do desenvolvimento de interfaces multimodais para aplicação em dispositivos robóticos cujo propósito é a reabilitação motora adaptando o controle destes dispositivos de acordo com a intenção do usuário. A interface desenvolvida adquire, sincroniza e processa sinais eletroencefalográficos (EEG), eletromiográficos (EMG) e sinais provenientes de sensores inerciais (IMUs). A aquisição dos dados é feita em experimentos realizados com sujeitos saudáveis que executam tarefas motoras de membros inferiores. O objetivo é analisar a intenção de movimento, a ativação muscular e o início efetivo dos movimentos realizados, respectivamente, através dos sinais de EEG, EMG e IMUs. Para este fim, uma análise offline foi realizada. Nessa análise, são utilizadas técnicas de processamento dos sinais biológicos e técnicas para processar sinais provenientes de sensores inerciais. A partir destes, os ângulos da articulação do joelho também são aferidos ao longo dos movimentos. Um protocolo experimental de testes foi proposto para as tarefas realizadas. Os resultados demonstraram que o sistema proposto foi capaz de adquirir, sincronizar, processar e classificar os sinais combinadamente. Análises acerca da acurácia dos classificadores utilizados mostraram que a interface foi capaz de identificar intenção de movimento em 76, 0 ± 18, 2% dos movimentos. A maior média de tempo de antecipação ao movimento foi obtida através da análise do sinal de EEG e foi de 716, 0±546, 1 milisegundos. A partir da análise apenas do sinal de EMG, este valor foi de 88, 34 ± 67, 28 milisegundos. Os resultados das etapas de processamento dos sinais biológicos, a medição dos ângulos da articulação, bem como os valores de acurácia e tempo de antecipação ao movimento se mostraram em conformidade com a literatura atual relacionada.