944 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cette thèse contribue a la recherche vers l'intelligence artificielle en utilisant des méthodes connexionnistes. Les réseaux de neurones récurrents sont un ensemble de modèles séquentiels de plus en plus populaires capable en principe d'apprendre des algorithmes arbitraires. Ces modèles effectuent un apprentissage en profondeur, un type d'apprentissage machine. Sa généralité et son succès empirique en font un sujet intéressant pour la recherche et un outil prometteur pour la création de l'intelligence artificielle plus générale. Le premier chapitre de cette thèse donne un bref aperçu des sujets de fonds: l'intelligence artificielle, l'apprentissage machine, l'apprentissage en profondeur et les réseaux de neurones récurrents. Les trois chapitres suivants couvrent ces sujets de manière de plus en plus spécifiques. Enfin, nous présentons quelques contributions apportées aux réseaux de neurones récurrents. Le chapitre \ref{arxiv1} présente nos travaux de régularisation des réseaux de neurones récurrents. La régularisation vise à améliorer la capacité de généralisation du modèle, et joue un role clé dans la performance de plusieurs applications des réseaux de neurones récurrents, en particulier en reconnaissance vocale. Notre approche donne l'état de l'art sur TIMIT, un benchmark standard pour cette tâche. Le chapitre \ref{cpgp} présente une seconde ligne de travail, toujours en cours, qui explore une nouvelle architecture pour les réseaux de neurones récurrents. Les réseaux de neurones récurrents maintiennent un état caché qui représente leurs observations antérieures. L'idée de ce travail est de coder certaines dynamiques abstraites dans l'état caché, donnant au réseau une manière naturelle d'encoder des tendances cohérentes de l'état de son environnement. Notre travail est fondé sur un modèle existant; nous décrivons ce travail et nos contributions avec notamment une expérience préliminaire.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A velocidade da informação e do conhecimento instaurou na sociedade contemporânea uma constante busca pela melhoria dos processos informacionais, com vistas a garantir maior rapidez nos processamentos e resultados. No universo público, as exigências caminham de modo similar, sob o olhar do eleitor cidadão, portanto, a proposta da pesquisa é promover um panorama do sistema eletrônico de votação brasileiro, mais precisamente a Urna Eletrônica e transitar desde a concepção do projeto nos idos da década de 90 até o momento atual, apontando um olhar científico para as ações comunicacionais do Tribunal Superior Eleitoral (TSE), no sentido de promover campanhas publicitárias para fomentar a conscientização do sistema informatizado de voto pelos eleitores, supostamente mais rápido e eficiente. A pesquisa utiliza para fins descritivos múltiplas visões da comunicação da urna: por intermédio do órgão mantenedor, os políticos, diretamente envolvidos no pleito competitivo e os consultores políticos, atuantes nas estratégias de bastidores das campanhas eleitorais. Essa diversidade de visões e posições acerca da credibilidade do sistema busca propiciar a pesquisa um caráter de macro compreensão dos impactos de um sistema informatizado em um ambiente democrático.(AU)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O presente estudo avaliou as dimensões transversais dos arcos dentais mandibulares em indivíduos com diferentes padrões faciais. A amostra foi constituída por telerradiografias em norma lateral direita e modelos em gesso de 33 indivíduos, leucodermas, em ambos os sexos, com idade entre 13 e 25 anos, na fase de dentição permanente. O Padrão Facial foi obtido pela análise facial subjetiva em fotografias frontal e de perfil de 1500 documentações ortodonticas, foi utilizada análise cefalométrica por meio do ângulo ANB para confirmar o padrão esquelético, o qual deveria coincidir com a classificação de maloclusão de Angle. A amostra foi dividida em três grupos: Grupo I Padrão I, Classe I de Angle e ANB 2,0 o ±0,5o; Grupo II Padrão II, Classe II divisão 1 de Angle e ANB ≥ 4,0, e Grupo III Padrão III, Classe III de Angle e ANB ≥ - 4,5o. As dimensões transversais do arco foram mensuradas após a digitalização dos modelos em gesso pelo Scanner Dental Wings (3D), a partir dos quais foram estabelecidas as distâncias transversais intercanino, inter 1º PM, inter 2º PM, inter 1º M (cúspide mesial e distal), inter 2º M (cúspide mesial e distal), com o auxílio do software Geomagic Studio® 12. As médias e desvio padrão das dimensões transversais foram obtidas, e, para comparação entre os três grupos foi utilizado a Análise de Variância e teste de Tukey. Em todos os testes estatísticos foi adotado nível de significância de 5% (p<0,05). Houve diferença estatística em duas dimensões transversais das 14 avaliadas no arco maxilar na região mesial do segundo molar (p=0,024) e no mandibular na região distal do primeiro molar (p=0,047). Os arcos dentais mandibulares foram semelhantes nos três grupos estudados.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O objetivo desta pesquisa foi avaliar as alterações faciais decorrentes da Expansão Rápida da Maxila Assistida Cirurgicamente (ERM-AC). A amostra foi composta por 15 pacientes com idade média de 24 anos e 1 mês, sendo 10 do sexo feminino e 5 do sexo masculino, que apresentavam deficiência transversal da maxila, não tinham sido submetidos a tratamento ortodôntico prévio, apresentavam ficha clínica completa e fotografias em norma frontal nas fases pré-tratamento (T1) e 6 meses após a ERM-AC (T2). Mediadas lineares foram obtidas a partir da marcação de pontos de referência em folhas de acetato fixadas sobre as fotografias, para evitar a necessidade de execução de desenho anatômico. Concluiu-se que a padronização de fotografias em todos os tempos da pesquisa é de fundamental importância para que as medidas avaliadas sejam confiáveis. Quando comparados T1 com T2 por meio do teste t de Student não se verificou alteração estatisticamente significante na: largura intercantal (Ind Ine), altura facial média (N - SN), largura do olho direito (Exd Ind), largura do olho esquerdo (Exe Ine), altura facial (N - Me ), largura facial superior (Zid - Zie ), largura da boca (Cbd Cbe) e altura da boca (Ls Li). As medidas altura facial inferior (Sn - Me ) e a largura do nariz (Ald Ale) apresentaram alteração estatisticamente significante após a ERM-AC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reviews some basic issues and methods involved in using neural networks to respond in a desired fashion to a temporally-varying environment. Some popular network models and training methods are introduced. A speech recognition example is then used to illustrate the central difficulty of temporal data processing: learning to notice and remember relevant contextual information. Feedforward network methods are applicable to cases where this problem is not severe. The application of these methods are explained and applications are discussed in the areas of pure mathematics, chemical and physical systems, and economic systems. A more powerful but less practical algorithm for temporal problems, the moving targets algorithm, is sketched and discussed. For completeness, a few remarks are made on reinforcement learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Keyword identification in one of two simultaneous sentences is improved when the sentences differ in F0, particularly when they are almost continuously voiced. Sentences of this kind were recorded, monotonised using PSOLA, and re-synthesised to give a range of harmonic ?F0s (0, 1, 3, and 10 semitones). They were additionally re-synthesised by LPC with the LPC residual frequency shifted by 25% of F0, to give excitation with inharmonic but regularly spaced components. Perceptual identification of frequency-shifted sentences showed a similar large improvement with nominal ?F0 as seen for harmonic sentences, although overall performance was about 10% poorer. We compared performance with that of two autocorrelation-based computational models comprising four stages: (i) peripheral frequency selectivity and half-wave rectification; (ii) within-channel periodicity extraction; (iii) identification of the two major peaks in the summary autocorrelation function (SACF); (iv) a template-based approach to speech recognition using dynamic time warping. One model sampled the correlogram at the target-F0 period and performed spectral matching; the other deselected channels dominated by the interferer and performed matching on the short-lag portion of the residual SACF. Both models reproduced the monotonic increase observed in human performance with increasing ?F0 for the harmonic stimuli, but not for the frequency-shifted stimuli. A revised version of the spectral-matching model, which groups patterns of periodicity that lie on a curve in the frequency-delay plane, showed a closer match to the perceptual data for frequency-shifted sentences. The results extend the range of phenomena originally attributed to harmonic processing to grouping by common spectral pattern.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Monoclonal and polyclonaI antibodies have been produced for use in immunological assays for the detection of Burkholderia pseudomallei and Burkholderia mallei. Monoclonal antibodies recognising a high molecular weight polysaccharide material found in some strains of both species have been shown to be effective in recognising B. pseudomallei and B. mallei and distinguishing them from other organisms. The high molecular weight polysaccharide material is thought to be the capsule of B. pseudomallei and B. mallei and may have important links with virulence. B. pseudomallei and B. mallei are known to be closely related, sharing many epitopes, but antigenic variation has been demonstrated within both the species. The lipopolysaccharide from strains of B. pseudomal/ei and B. mallei has been isolated and the silver stain profiles found to be visually very similar. A monoclonal antibody raised to B. mallei LPS has been found to recognise both B. mallei and B. pseudomallei strains. However, in a small number of B. pseudomallei strains a visually atypical LPS profile has been demonstrated. A monoclonal ant ibody rai sed against this atypical LPS showed no recognition of the typical LPS profile of either B. mallei or B. pseudomallei. This atypical LPS structure has not been reported and may be immunologically distinct from the typical LPS. Molecular biology and antibody engineering techniques have been used in an attempt to produce single-chain antibody fragments reactive to B. pseudomallei. Sequencing of one of the single-chain antibody fragments produced showed high homology with murine immunoglobulin genes, but none of the single-chain antibody fragments were found to be specific to B. pselldomallei.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of the following studies was to explore the effect of systemic vascular and endothelial dysfunction upon the ocular circulation and functionality of the retina. There are 6 principal sections to the present work. Retinal vessel activity in smokers and non-smokers: the principal findings of this work were: chronic smoking affects retinal vessel motion at baseline and during stimulation with flickering light; chronic smoking leads to a vaso-constrictory shift in retinal arteriolar reactivity to flicker; retinal arteriolar elasticity is decreased in chronic smokers. The effect of acute smoking on retinal vessel dynamics in smokers and non-smokers: the principal finding of this work was that retinal reactivity in chronic smokers is blunted when exposed to clicker light provocation immediately after smoking one cigarette. Ocular blood flow in coronary artery disease: The principal findings of this work were: retrobulbar and retinal blood flow is preserved in CAD patients, despite a change pulse wave transmission; arterial retinal response to flickering light provocation is significantly delayed in CAD patients; retinal venular diameters are significantly dilated in CAD patients. Autonomic nervous system function and peripheral circulation in CAD: The principal findings in this work were: CAD patients demonstrate a sympathetic overdrive during a 24 period; a delay in peripheral vascular reactivity (nail-fold capillaries) as observed in patients suffering from CAD could be caused by either arteriosclerotic changes of the vascular walls or due to systemic haemodynamic changes. Visual function in CAD: The principal findings in this work were: overall visual function in CAD patients is preserved, despite a decrease in contrast sensitivity; applying a filtering technique selecting those with greater coefficient of variance which in turn represents a decrease in reliability, some patients appear to have an impaired visual function as assessed using FDT visual field evaluation. Multiple functional, structural and biochemical vascular endothelial dysfunctions in patients suffering from CAD: relationships and possible implications: The principal findings of this work were: BMI significantly correlated with vWF (a marker of endothelial function) in CAD patients. Retinal vascular reactivity showed a significant correlation with peripheral reactivity parameters in controls which lacked in the CAD group and could reflect a loss in vascular endothelial integrity; visual field parameters as assessed by frequency doubling technology were strongly related with systemic vascular elasticity (ambulatory arterial stiffness index) in controls but not CAD patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To represent the local orientation and energy of a 1-D image signal, many models of early visual processing employ bandpass quadrature filters, formed by combining the original signal with its Hilbert transform. However, representations capable of estimating an image signal's 2-D phase have been largely ignored. Here, we consider 2-D phase representations using a method based upon the Riesz transform. For spatial images there exist two Riesz transformed signals and one original signal from which orientation, phase and energy may be represented as a vector in 3-D signal space. We show that these image properties may be represented by a Singular Value Decomposition (SVD) of the higher-order derivatives of the original and the Riesz transformed signals. We further show that the expected responses of even and odd symmetric filters from the Riesz transform may be represented by a single signal autocorrelation function, which is beneficial in simplifying Bayesian computations for spatial orientation. Importantly, the Riesz transform allows one to weight linearly across orientation using both symmetric and asymmetric filters to account for some perceptual phase distortions observed in image signals - notably one's perception of edge structure within plaid patterns whose component gratings are either equal or unequal in contrast. Finally, exploiting the benefits that arise from the Riesz definition of local energy as a scalar quantity, we demonstrate the utility of Riesz signal representations in estimating the spatial orientation of second-order image signals. We conclude that the Riesz transform may be employed as a general tool for 2-D visual pattern recognition by its virtue of representing phase, orientation and energy as orthogonal signal quantities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: The purpose of this study was to determine the extent to which mobility indices (such as walking speed and postural sway), motor initiation, and cognitive function, specifically executive functions, including spatial planning, visual attention, and within participant variability, differentially predicted collisions in the near and far sides of the road with increasing age. Methods: Adults aged over 45 years participated in cognitive tests measuring executive function and visual attention (using Useful Field of View; UFoV®), mobility assessments (walking speed, sit-to-stand, self-reported mobility, and postural sway assessed using motion capture cameras), and gave road crossing choices in a two-way filmed real traffic pedestrian simulation. Results: A stepwise regression model of walking speed, start-up delay variability, and processing speed) explained 49.4% of the variance in near-side crossing errors. Walking speed, start-up delay measures (average & variability), and spatial planning explained 54.8% of the variance in far-side unsafe crossing errors. Start-up delay was predicted by walking speed only (explained 30.5%). Conclusion: Walking speed and start-up delay measures were consistent predictors of unsafe crossing behaviours. Cognitive measures, however, differentially predicted near-side errors (processing speed), and far-side errors (spatial planning). These findings offer potential contributions for identifying and rehabilitating at-risk older pedestrians.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern technology has moved on and completely changed the way that people can use the telephone or mobile to dialogue with information held on computers. Well developed “written speech analysis” does not work with “verbal speech”. The main purpose of our article is, firstly, to highlights the problems and, secondly, to shows the possible ways to solve these problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In an overcapacity world, where the customers can choose from many similar products to satisfy their needs, enterprises are looking for new approaches and tools that can help them not only to maintain, but also to increase their competitive edge. Innovation, flexibility, quality, and service excellence are required to, at the very least, survive the on-going transition that industry is experiencing from mass production to mass customization. In order to help these enterprises, this research develops a Supply Chain Capability Maturity Model named S(CM)2. The Supply Chain Capability Maturity Model is intended to model, analyze, and improve the supply chain management operations of an enterprise. The Supply Chain Capability Maturity Model provides a clear roadmap for enterprise improvement, covering multiple views and abstraction levels of the supply chain, and provides tools to aid the firm in making improvements. The principal research tool applied is the Delphi method, which systematically gathered the knowledge and experience of eighty eight experts in Mexico. The model is validated using a case study and interviews with experts in supply chain management. The resulting contribution is a holistic model of the supply chain integrating multiple perspectives, and providing a systematic procedure for the improvement of a company’s supply chain operations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation focuses on two vital challenges in relation to whale acoustic signals: detection and classification.

In detection, we evaluated the influence of the uncertain ocean environment on the spectrogram-based detector, and derived the likelihood ratio of the proposed Short Time Fourier Transform detector. Experimental results showed that the proposed detector outperforms detectors based on the spectrogram. The proposed detector is more sensitive to environmental changes because it includes phase information.

In classification, our focus is on finding a robust and sparse representation of whale vocalizations. Because whale vocalizations can be modeled as polynomial phase signals, we can represent the whale calls by their polynomial phase coefficients. In this dissertation, we used the Weyl transform to capture chirp rate information, and used a two dimensional feature set to represent whale vocalizations globally. Experimental results showed that our Weyl feature set outperforms chirplet coefficients and MFCC (Mel Frequency Cepstral Coefficients) when applied to our collected data.

Since whale vocalizations can be represented by polynomial phase coefficients, it is plausible that the signals lie on a manifold parameterized by these coefficients. We also studied the intrinsic structure of high dimensional whale data by exploiting its geometry. Experimental results showed that nonlinear mappings such as Laplacian Eigenmap and ISOMAP outperform linear mappings such as PCA and MDS, suggesting that the whale acoustic data is nonlinear.

We also explored deep learning algorithms on whale acoustic data. We built each layer as convolutions with either a PCA filter bank (PCANet) or a DCT filter bank (DCTNet). With the DCT filter bank, each layer has different a time-frequency scale representation, and from this, one can extract different physical information. Experimental results showed that our PCANet and DCTNet achieve high classification rate on the whale vocalization data set. The word error rate of the DCTNet feature is similar to the MFSC in speech recognition tasks, suggesting that the convolutional network is able to reveal acoustic content of speech signals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Integrating information from multiple sources is a crucial function of the brain. Examples of such integration include multiple stimuli of different modalties, such as visual and auditory, multiple stimuli of the same modality, such as auditory and auditory, and integrating stimuli from the sensory organs (i.e. ears) with stimuli delivered from brain-machine interfaces.

The overall aim of this body of work is to empirically examine stimulus integration in these three domains to inform our broader understanding of how and when the brain combines information from multiple sources.

First, I examine visually-guided auditory, a problem with implications for the general problem in learning of how the brain determines what lesson to learn (and what lessons not to learn). For example, sound localization is a behavior that is partially learned with the aid of vision. This process requires correctly matching a visual location to that of a sound. This is an intrinsically circular problem when sound location is itself uncertain and the visual scene is rife with possible visual matches. Here, we develop a simple paradigm using visual guidance of sound localization to gain insight into how the brain confronts this type of circularity. We tested two competing hypotheses. 1: The brain guides sound location learning based on the synchrony or simultaneity of auditory-visual stimuli, potentially involving a Hebbian associative mechanism. 2: The brain uses a ‘guess and check’ heuristic in which visual feedback that is obtained after an eye movement to a sound alters future performance, perhaps by recruiting the brain’s reward-related circuitry. We assessed the effects of exposure to visual stimuli spatially mismatched from sounds on performance of an interleaved auditory-only saccade task. We found that when humans and monkeys were provided the visual stimulus asynchronously with the sound but as feedback to an auditory-guided saccade, they shifted their subsequent auditory-only performance toward the direction of the visual cue by 1.3-1.7 degrees, or 22-28% of the original 6 degree visual-auditory mismatch. In contrast when the visual stimulus was presented synchronously with the sound but extinguished too quickly to provide this feedback, there was little change in subsequent auditory-only performance. Our results suggest that the outcome of our own actions is vital to localizing sounds correctly. Contrary to previous expectations, visual calibration of auditory space does not appear to require visual-auditory associations based on synchrony/simultaneity.

My next line of research examines how electrical stimulation of the inferior colliculus influences perception of sounds in a nonhuman primate. The central nucleus of the inferior colliculus is the major ascending relay of auditory information before it reaches the forebrain, and thus an ideal target for understanding low-level information processing prior to the forebrain, as almost all auditory signals pass through the central nucleus of the inferior colliculus before reaching the forebrain. Thus, the inferior colliculus is the ideal structure to examine to understand the format of the inputs into the forebrain and, by extension, the processing of auditory scenes that occurs in the brainstem. Therefore, the inferior colliculus was an attractive target for understanding stimulus integration in the ascending auditory pathway.

Moreover, understanding the relationship between the auditory selectivity of neurons and their contribution to perception is critical to the design of effective auditory brain prosthetics. These prosthetics seek to mimic natural activity patterns to achieve desired perceptual outcomes. We measured the contribution of inferior colliculus (IC) sites to perception using combined recording and electrical stimulation. Monkeys performed a frequency-based discrimination task, reporting whether a probe sound was higher or lower in frequency than a reference sound. Stimulation pulses were paired with the probe sound on 50% of trials (0.5-80 µA, 100-300 Hz, n=172 IC locations in 3 rhesus monkeys). Electrical stimulation tended to bias the animals’ judgments in a fashion that was coarsely but significantly correlated with the best frequency of the stimulation site in comparison to the reference frequency employed in the task. Although there was considerable variability in the effects of stimulation (including impairments in performance and shifts in performance away from the direction predicted based on the site’s response properties), the results indicate that stimulation of the IC can evoke percepts correlated with the frequency tuning properties of the IC. Consistent with the implications of recent human studies, the main avenue for improvement for the auditory midbrain implant suggested by our findings is to increase the number and spatial extent of electrodes, to increase the size of the region that can be electrically activated and provide a greater range of evoked percepts.

My next line of research employs a frequency-tagging approach to examine the extent to which multiple sound sources are combined (or segregated) in the nonhuman primate inferior colliculus. In the single-sound case, most inferior colliculus neurons respond and entrain to sounds in a very broad region of space, and many are entirely spatially insensitive, so it is unknown how the neurons will respond to a situation with more than one sound. I use multiple AM stimuli of different frequencies, which the inferior colliculus represents using a spike timing code. This allows me to measure spike timing in the inferior colliculus to determine which sound source is responsible for neural activity in an auditory scene containing multiple sounds. Using this approach, I find that the same neurons that are tuned to broad regions of space in the single sound condition become dramatically more selective in the dual sound condition, preferentially entraining spikes to stimuli from a smaller region of space. I will examine the possibility that there may be a conceptual linkage between this finding and the finding of receptive field shifts in the visual system.

In chapter 5, I will comment on these findings more generally, compare them to existing theoretical models, and discuss what these results tell us about processing in the central nervous system in a multi-stimulus situation. My results suggest that the brain is flexible in its processing and can adapt its integration schema to fit the available cues and the demands of the task.