944 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two new species of myxosporeans (Myxosporea: Myxidiidae), Myxidium tuanfengensis sp. n. and Zschokkella saurogobionis sp. n., Parasitic in freshwater fishes collected from the Yangtze River of China are described in this paper. M. tuanfengensis was found in the liver parenchyma and intestine lumen of Leptobotia taeniops Sauvage, 1878, while Z. saurogobionis was found in the gall bladder of Saurogobio dumerili Bleeker, 1871. The diagnostic characters of M. tuanfengensis are: round or elliptical polysporous plasmodia averaging 118 mum in size; spore oval in frontal view with smooth surface and nearly spindle-shape in sutural view with slightly sinuous sutural ridge, averaging 19.5 x 9.75 x 8.9 mum in size; two large spherical polar capsules 6.8 mum in diameter, with polar filament wound in 4 to 5 coils. The diagnostic characters of Z. saurogobionis are: spore elliptical in both frontal and sutural view measuring 18.3 x 9.8 x 10.8 mum in size; fine sutural ridge in S-form, spore shell marked with 10 to 12 distinct lines paralleled with the sutural line; two spherical polar capsules, 6.7 mum in diameter, with polar filament in 5 coils.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently,Handheld Communication Devices is developing very fast, extending in users and spreading in application fields, and has an promising future. This study investigated the acceptance of the multimodal text entry method and the behavioral characteristics when using it. Based on the general information process model of a bimodal system and the human factor studies about the multimodal map system, the present study mainly focused on the hand-speech bimodal text entry method. For acceptance, the study investigated the subjective perception of the accuracy of speech recognition by Wizard of Oz (WOz) experiment and a questionnaire. Results showed that there was a linear relationship between the speech recognition accuracy and the subjective accuracy. Furthermore, as the familiarity increasing, the difference between the acceptable accuracy and the subjective accuracy gradually decreased. In addition, the similarity of meaning between the outcome of speech recognition and the correct sentences was an important referential criterion. The second study investigated three aspects of the bimodal text entry method, including input, error recovery and modal shifts. The first experiment aimed to find the behavioral characteristics of user when doing error recovery task. Results indicated that participants preferred to correct the error by handwriting, which had no relationship with the input modality. The second experiment aimed to discover the behavioral characteristics of users when doing text entry in various types of text. Results showed that users preferred to speech input in both words and sentences conditions, which was highly consistent among individuals, while no significant difference was found between handwriting and speech input in the character condition. Participants used more direct strategy than jumping strategy to deal with mixed text, especially for the Chinese-English mixed type. The third experiment examined the cognitive load in the different modal shifts, results suggesting that there were significant differences between different shifts. Moreover, relevant little time was needed in the Shift from speech input to hand input. Based on the main findings, implications were discussed as follows: Firstly, when evaluating a speech recognition system, attention should be paid to the fact that the speech recognition accuracy was not equal to the subjective accuracy. Secondly, in order to make a speech input system more acceptable, a good method is to train and supply the feedback for the accuracy in training, which improving the familiarity and sensitivity to the system. Thirdly, both the universal and individual behavioral patterns were taken into consideration to improve the error recovery method. Fourthly, easing the study and the use of speech input, the operations of speech input should be simpler. Fifthly, more convenient text input method for non-Chinese text entry should be provided. Finally, the shifting time between hand input and speech input provides an important parameter for the design of automatic-evoked speech recognition system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

I wish to propose a quite speculative new version of the grandmother cell theory to explain how the brain, or parts of it, may work. In particular, I discuss how the visual system may learn to recognize 3D objects. The model would apply directly to the cortical cells involved in visual face recognition. I will also outline the relation of our theory to existing models of the cerebellum and of motor control. Specific biophysical mechanisms can be readily suggested as part of a basic type of neural circuitry that can learn to approximate multidimensional input-output mappings from sets of examples and that is expected to be replicated in different regions of the brain and across modalities. The main points of the theory are: -the brain uses modules for multivariate function approximation as basic components of several of its information processing subsystems. -these modules are realized as HyperBF networks (Poggio and Girosi, 1990a,b). -HyperBF networks can be implemented in terms of biologically plausible mechanisms and circuitry. The theory predicts a specific type of population coding that represents an extension of schemes such as look-up tables. I will conclude with some speculations about the trade-off between memory and computation and the evolution of intelligence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We report a 75dB, 2.8mW, 100Hz-10kHz envelope detector in a 1.5mm 2.8V CMOS technology. The envelope detector performs input-dc-insensitive voltage-to-currentconverting rectification followed by novel nanopower current-mode peak detection. The use of a subthreshold wide- linear-range transconductor (WLR OTA) allows greater than 1.7Vpp input voltage swings. We show theoretically that this optimal performance is technology-independent for the given topology and may be improved only by spending more power. A novel circuit topology is used to perform 140nW peak detection with controllable attack and release time constants. The lower limits of envelope detection are determined by the more dominant of two effects: The first effect is caused by the inability of amplified high-frequency signals to exceed the deadzone created by exponential nonlinearities in the rectifier. The second effect is due to an output current caused by thermal noise rectification. We demonstrate good agreement of experimentally measured results with theory. The envelope detector is useful in low power bionic implants for the deaf, hearing aids, and speech-recognition front ends. Extension of the envelope detector to higher- frequency applications is straightforward if power consumption is inc

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The work reported here lies in the area of overlap between artificial intelligence software engineering. As research in artificial intelligence, it is a step towards a model of problem solving in the domain of programming. In particular, this work focuses on the routine aspects of programming which involve the application of previous experience with similar programs. I call this programming by inspection. Programming is viewed here as a kind of engineering activity. Analysis and synthesis by inspection area prominent part of expert problem solving in many other engineering disciplines, such as electrical and mechanical engineering. The notion of inspections methods in programming developed in this work is motivated by similar notions in other areas of engineering. This work is also motivated by current practical concerns in the area of software engineering. The inadequacy of current programming technology is universally recognized. Part of the solution to this problem will be to increase the level of automation in programming. I believe that the next major step in the evolution of more automated programming will be interactive systems which provide a mixture of partially automated program analysis, synthesis and verification. One such system being developed at MIT, called the programmer's apprentice, is the immediate intended application of this work. This report concentrates on the knowledge are of the programmer's apprentice, which is the form of a taxonomy of commonly used algorithms and data structures. To the extent that a programmer is able to construct and manipulate programs in terms of the forms in such a taxonomy, he may relieve himself of many details and generally raise the conceptual level of his interaction with the system, as compared with present day programming environments. Also, since it is practical to expand a great deal of effort pre-analyzing the entries in a library, the difficulty of verifying the correctness of programs constructed this way is correspondingly reduced. The feasibility of this approach is demonstrated by the design of an initial library of common techniques for manipulating symbolic data. This document also reports on the further development of a formalism called the plan calculus for specifying computations in a programming language independent manner. This formalism combines both data and control abstraction in a uniform framework that has facilities for representing multiple points of view and side effects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While cochlear implants (CIs) usually provide high levels of speech recognition in quiet, speech recognition in noise remains challenging. To overcome these difficulties, it is important to understand how implanted listeners separate a target signal from interferers. Stream segregation has been studied extensively in both normal and electric hearing, as a function of place of stimulation. However, the effects of pulse rate, independent of place, on the perceptual grouping of sequential sounds in electric hearing have not yet been investigated. A rhythm detection task was used to measure stream segregation. The results of this study suggest that while CI listeners can segregate streams based on differences in pulse rate alone, the amount of stream segregation observed decreases as the base pulse rate increases. Further investigation of the perceptual dimensions encoded by the pulse rate and the effect of sequential presentation of different stimulation rates on perception could be beneficial for the future development of speech processing strategies for CIs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper uses a case study approach to consider the effectiveness of the electronic survey as a research tool to measure the learner voice about experiences of e-learning in a particular institutional case. Two large scale electronic surveys were carried out for the Student Experience of e-Learning (SEEL) project at the University of Greenwich in 2007 and 2008, funded by the UK Higher Education Academy (HEA). The paper considers this case to argue that, although the electronic web-based survey is a convenient method of quantitative and qualitative data collection, enabling higher education institutions swiftly to capture multiple views of large numbers of students regarding experiences of e-learning, for more robust analysis, electronic survey research is best combined with other methods of in-depth qualitative data collection. The advantages and disadvantages of the electronic survey as a research method to capture student experiences of e-learning are the focus of analysis in this short paper, which reports an overview of large-scale data collection (1,000+ responses) from two electronic surveys administered to students using surveymonkey as a web-based survey tool as part of the SEEL research project. Advantages of web-based electronic survey design include flexibility, ease of design, high degree of designer control, convenience, low costs, data security, ease of access and guarantee of confidentiality combined with researcher ability to identify users through email addresses. Disadvantages of electronic survey design include the self-selecting nature of web-enabled respondent participation, which tends to skew data collection towards students who respond effectively to email invitations. The relative inadequacy of electronic surveys to capture in-depth qualitative views of students is discussed with regard to prior recommendations from the JISC-funded Learners' Experiences of e-Learning (LEX) project, in consideration of the results from SEEL in-depth interviews with students. The paper considers the literature on web-based and email electronic survey design, summing up the relative advantages and disadvantages of electronic surveys as a tool for student experience of e-learning research. The paper concludes with a range of recommendations for designing future electronic surveys to capture the learner voice on e-learning, contributing to evidence-based learning technology research development in higher education.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Belief merging operators combine multiple belief bases (a profile) into a collective one. When the conjunction of belief bases is consistent, all the operators agree on the result. However, if the conjunction of belief bases is inconsistent, the results vary between operators. There is no formal manner to measure the results and decide on which operator to select. So, in this paper we propose to evaluate the result of merging operators by using three ordering relations (fairness, satisfaction and strength) over operators for a given profile. Moreover, a relation of conformity over operators is introduced in order to classify how well the operator conforms to the definition of a merging operator. By using the four proposed relations we provide a comparison of some classical merging operators and evaluate the results for some specific profiles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present an improved scheme for line and edge detection in cortical area V1, based on responses of simple and complex cells, truly multi-scale with no free parameters. We illustrate the multi-scale representation for visual reconstruction, and show how object segregation can be achieved with coarse-to-finescale groupings. A two-level object categorization scenario is tested in which pre-categorization is based on coarse scales only, and final categorization on coarse plus fine scales. Processing schemes are discussed in the framework of a complete cortical architecture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of the project "SmartVision: active vision for the blind" is to develop a small and portable but intelligent and reliable system for assisting the blind and visually impaired while navigating autonomously, both outdoor and indoor. In this paper we present an overview of the prototype, design issues, and its different modules which integrate a GIS with GPS, Wi-Fi, RFID tags and computer vision. The prototype addresses global navigation by following known landmarks, local navigation with path tracking and obstacle avoidance, and object recognition. The system does not replace the white cane, but extends it beyond its reach. The user-friendly interface consists of a 4-button hand-held box, a vibration actuator in the handle of the cane, and speech synthesis. A future version may also employ active RFID tags for marking navigation landmarks, and speech recognition may complement speech synthesis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La mise en registre 3D (opération parfois appelée alignement) est un processus de transformation d’ensembles de données 3D dans un même système de coordonnées afin d’en aligner les éléments communs. Deux ensembles de données alignés ensemble peuvent être les scans partiels des deux vues différentes d’un même objet. Ils peuvent aussi être deux modèles complets, générés à des moments différents, d’un même objet ou de deux objets distincts. En fonction des ensembles de données à traiter, les méthodes d’alignement sont classées en mise en registre rigide ou non-rigide. Dans le cas de la mise en registre rigide, les données sont généralement acquises à partir d’objets rigides. Le processus de mise en registre peut être accompli en trouvant une seule transformation rigide globale (rotation, translation) pour aligner l’ensemble de données source avec l’ensemble de données cible. Toutefois, dans le cas non-rigide, où les données sont acquises à partir d’objets déformables, le processus de mise en registre est plus difficile parce qu’il est important de trouver à la fois une transformation globale et des déformations locales. Dans cette thèse, trois méthodes sont proposées pour résoudre le problème de mise en registre non-rigide entre deux ensembles de données (représentées par des maillages triangulaires) acquises à partir d’objets déformables. La première méthode permet de mettre en registre deux surfaces se chevauchant partiellement. La méthode surmonte les limitations des méthodes antérieures pour trouver une grande déformation globale entre deux surfaces. Cependant, cette méthode est limitée aux petites déformations locales sur la surface afin de valider le descripteur utilisé. La seconde méthode est s’appuie sur le cadre de la première et est appliquée à des données pour lesquelles la déformation entre les deux surfaces est composée à la fois d’une grande déformation globale et de petites déformations locales. La troisième méthode, qui se base sur les deux autres méthodes, est proposée pour la mise en registre d’ensembles de données qui sont plus complexes. Bien que la qualité que elle fournit n’est pas aussi bonne que la seconde méthode, son temps de calcul est accéléré d’environ quatre fois parce que le nombre de paramètres optimisés est réduit de moitié. L’efficacité des trois méthodes repose sur des stratégies via lesquelles les correspondances sont déterminées correctement et le modèle de déformation est exploité judicieusement. Ces méthodes sont mises en oeuvre et comparées avec d’autres méthodes sur diverses données afin d’évaluer leur robustesse pour résoudre le problème de mise en registre non-rigide. Les méthodes proposées sont des solutions prometteuses qui peuvent être appliquées dans des applications telles que la mise en registre non-rigide de vues multiples, la reconstruction 3D dynamique, l’animation 3D ou la recherche de modèles 3D dans des banques de données.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Relatório da Prática Profissional Supervisionada Mestrado em Educação Pré-Escolar

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática, Área de Especialização em Tecnologias do Conhecimento e da Decisão

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multifocal intraocular lenses (MF IOLs) have concentric optical zones with different dioptric power, enabling patients to have good visual acuity at multiple focal points. However, several optical limitations have been attributed to this particular design. The purpose of this study is to access the effect of MF IOLs design on the accuracy of retinal optical coherence tomography (OCT). Cross-sectional study conducted at the Refractive Surgery Department of Central Lisbon Hospital Center. Twenty-three eyes of 15 patients with a diffractive MF IOL and 27 eyes of 15 patients with an aspheric monofocal IOL were included in this study. All patients underwent OCT macular scans using Heidelberg Spectralis®. Macular thickness and volume values and image quality (Q factor) were compared between the two groups. There were no statistically significant differences between both groups regarding macular thickness or volume measurements. Retinal OCT image quality was significantly lower in the MF IOL group (p < 0.01). MF IOLs are associated with a significant decrease in OCT image quality. However, this fact does not seem to compromise the accuracy of spectral domain OCT retinal measurements.