904 resultados para Audio-Visual Automatic Speech Recognition


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

En esta investigación se aborda el tema del comportamiento acústico de las Iglesias Jesuíticas de la ciudad de Córdoba (Argentina) y San Ignacio Mini ubicada en la localidad de San Ignacio, provincia de Misiones (Argentina), construidas hace dos siglos atrás y declaradas Patrimonio de la Humanidad, con el objetivo de evaluar los parámetros que determinan la comprensión de la palabra y la aptitud de cada una de las iglesias para el canto y la música religiosa. En una primera etapa la investigación se orientó a profundizar en las características constructivas interiores de cada templo y a proponer una metodología de análisis para comparar los resultados de las mediciones objetivas, realizadas mediante la implementación de mediciones in situ, con los resultados de las apreciaciones subjetivas resultantes de la elaboración de encuestas, a los fines de caracterizar acústicamente cada espacio sonoro. Se seleccionaron, para la caracterización objetiva de cada templo, aquellos parámetros que permiten sintetizar las propiedades acústicas relacionadas con la música y la palabra, y aquellos que posibilitan medir la proporción efectiva de las primeras reflexiones, consideradas como índices subjetivos de la capacidad de distinción del sonido por parte del oyente. Se comparan los valores alcanzados con las preferencias subjetivas obtenidas en las encuestas de opinión. Se relevaron tiempos de reverberación altos en todas iglesias, fuera de los considerados óptimos para cada recinto. Se analizaron los índices de calidad y se comprobó cómo influyen los diferentes materiales en el comportamiento acústico de cada recinto. Para la evaluación subjetiva se implementó una encuesta ya validada en la que se privilegió la fácil asociación entre parámetros acústicos y psicoacústicos, esto posibilitó encontrar aquellos parámetros objetivos, simulados con público, que estuviesen fuertemente relacionados con el juicio subjetivo, así como aquellos con menor correlación. La búsqueda y relevamiento de material grafico, fotográfico y otros documentos históricos posibilitó la reconstrucción de cada iglesia para su modelización y la evaluación del comportamiento de todos los templos con la presencia de feligreses, no habiéndose podido realizar mediciones bajo esta condición. El interés por obtener datos acústicos más precisos de la Iglesia San Ignacio Mini, que actualmente se encuentra en ruinas, llevó a utilizar herramientas más poderosas de cálculo como el método de las fuentes de imagen “Ray Tracing Impact” por medio del cual se logró la auralización. Para ello se trabajó con un archivo de audio que representó la voz masculina de un sacerdote en el idioma jesuítico-guaraní, recuperando así el patrimonio cultural intangible. ABSTRACT This research addresses the acoustic behavior of the Jesuit Churches in Cordoba City (Argentina) and San Ignacio Mini (located in the town of San Ignacio, Misiones, Argentina), built two centuries ago and declared World Heritage Sites, with the objective to evaluate the parameters that determine the speech comprehension and the ability of each of the churches for singing of religious music. The first step of the work was aimed to further investigate the internal structural characteristics of each temple and to propose an analysis methodology to compare the objective results of in situ measurements with the subjective results of surveys, in order to characterize acoustically each sound-space. For the subjective characterization of each temple, those parameters that allow synthesizing the acoustic properties related to music & speech and measuring the subjective indices for the recognition of sounds, were selected. Also, the values were compared with the ones obtained from the surveys. High reverberation times were found in all churches, which is not considered optimal for the enclosed areas of the temples. The quality indices were analyzed and it was found how the different materials influence in the acoustic behavior of each enclosure. For subjective evaluation, a survey was implemented (that was previously validated) where the association between acoustic and psychoacoustic parameters was privileged; this allowed to find those objective parameters who were strongly related to the subjective ones, as well as those with lower correlation. Photographic and graphic material and other historical documents allowed the reconstruction of each church for its modeling, and also the evaluation of the performance of all the temples in the presence of their congregation. The interest in obtaining more accurate acoustic data of the San Ignacio Mini Church, which is now in ruins, led to the use of most powerful methods, as for example the image-sources "Ray Tracing Impact" method. For this, an audio archive was used, representing a male voice of a priest in the Jesuit-Guaraní language; recovering in this way intangible cultural heritage.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In behavior reminiscent of the responsiveness of human infants to speech, young songbirds innately recognize and prefer to learn the songs of their own species. The acoustic and physiological bases for innate recognition were investigated in fledgling white-crowned sparrows lacking song experience. A behavioral test revealed that the complete conspecific song was not essential for innate recognition: songs composed of single white-crowned sparrow phrases and songs played in reverse elicited vocal responses as strongly as did normal song. In all cases, these responses surpassed those to other species’ songs. Although auditory neurons in the song nucleus HVc and the underlying neostriatum of fledglings did not prefer conspecific song over foreign song, some neurons responded strongly to particular phrase types characteristic of white-crowned sparrows and, thus, could contribute to innate song recognition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In optimal foraging theory, search time is a key variable defining the value of a prey type. But the sensory-perceptual processes that constrain the search for food have rarely been considered. Here we evaluate the flight behavior of bumblebees (Bombus terrestris) searching for artificial flowers of various sizes and colors. When flowers were large, search times correlated well with the color contrast of the targets with their green foliage-type background, as predicted by a model of color opponent coding using inputs from the bees' UV, blue, and green receptors. Targets that made poor color contrast with their backdrop, such as white, UV-reflecting ones, or red flowers, took longest to detect, even though brightness contrast with the background was pronounced. When searching for small targets, bees changed their strategy in several ways. They flew significantly slower and closer to the ground, so increasing the minimum detectable area subtended by an object on the ground. In addition, they used a different neuronal channel for flower detection. Instead of color contrast, they used only the green receptor signal for detection. We relate these findings to temporal and spatial limitations of different neuronal channels involved in stimulus detection and recognition. Thus, foraging speed may not be limited only by factors such as prey density, flight energetics, and scramble competition. Our results show that understanding the behavioral ecology of foraging can substantially gain from knowledge about mechanisms of visual information processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visual habit formation in monkeys, assessed by concurrent visual discrimination learning with 24-h intertrial intervals (ITI), was found earlier to be impaired by removal of the inferior temporal visual area (TE) but not by removal of either the medial temporal lobe or inferior prefrontal convexity, two of TE's major projection targets. To assess the role in this form of learning of another pair of structures to which TE projects, namely the rostral portion of the tail of the caudate nucleus and the overlying ventrocaudal putamen, we injected a neurotoxin into this neostriatal region of several monkeys and tested them on the 24-h ITI task as well as on a test of visual recognition memory. Compared with unoperated monkeys, the experimental animals were unaffected on the recognition test but showed an impairment on the 24-h ITI task that was highly correlated with the extent of their neostriatal damage. The findings suggest that TE and its projection areas in the ventrocaudal neostriatum form part of a circuit that selectively mediates visual habit formation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although much of the brain’s functional organization is genetically predetermined, it appears that some noninnate functions can come to depend on dedicated and segregated neural tissue. In this paper, we describe a series of experiments that have investigated the neural development and organization of one such noninnate function: letter recognition. Functional neuroimaging demonstrates that letter and digit recognition depend on different neural substrates in some literate adults. How could the processing of two stimulus categories that are distinguished solely by cultural conventions become segregated in the brain? One possibility is that correlation-based learning in the brain leads to a spatial organization in cortex that reflects the temporal and spatial clustering of letters with letters in the environment. Simulations confirm that environmental co-occurrence does indeed lead to spatial localization in a neural network that uses correlation-based learning. Furthermore, behavioral studies confirm one critical prediction of this co-occurrence hypothesis, namely, that subjects exposed to a visual environment in which letters and digits occur together rather than separately (postal workers who process letters and digits together in Canadian postal codes) do indeed show less behavioral evidence for segregated letter and digit processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When the illumination of a visual scene changes, the quantity of light reflected from objects is altered. Despite this, the perceived lightness of the objects generally remains constant. This perceptual lightness constancy is thought to be important behaviorally for object recognition. Here we show that interactions from outside the classical receptive fields of neurons in primary visual cortex modulate neural responses in a way that makes them immune to changes in illumination, as is perception. This finding is consistent with the hypothesis that the responses of neurons in primary visual cortex carry information about surface lightness in addition to information about form. It also suggests that lightness constancy, which is sometimes thought to involve “higher-level” processes, is manifest at the first stage of visual cortical processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A imagem mental e a memória visual têm sido consideradas como componentes distintos na codificação da informação, e associados a processos diferentes da memória de trabalho. Evidências experimentais mostram, por exemplo, que o desempenho em tarefas de memória baseadas na geração de imagem mentais (imaginação visual) sofre a interferência do ruído visual dinâmico (RVD), mas não se observa o mesmo efeito em tarefas de memória visual baseadas na percepção visual (memória visual). Embora várias evidências mostrem que tarefas de imaginação e de memória visual sejam baseadas em processos cognitivos diferentes, isso não descarta a possibilidade de utilizarem também processos em comum e que alguns resultados experimentais que apontam diferenças entre as duas tarefas resultem de diferenças metodológicas entre os paradigmas utilizados para estuda-las. Nosso objetivo foi equiparar as tarefas de imagem mental visual e memória visual por meio de tarefas de reconhecimento, com o paradigma de dicas retroativas espaciais. Sequências de letras romanas na forma visual (tarefa de memória visual) e acústicas (tarefa de imagem mental visual) foram apresentadas em quatro localizações espaciais diferentes. No primeiro e segundo experimento analisou-se o tempo do curso de recuperação tanto para o processo de imagem quanto para o processo de memória. No terceiro experimento, comparou-se a estrutura das representações dos dois componentes, por meio da apresentação do RVD durante a etapa de geração e recuperação. Nossos resultados mostram que não há diferenças no armazenamento da informação visual durante o período proposto, porém o RVD afeta a eficiência do processo de recuperação, isto é o tempo de resposta, sendo a representação da imagem mental visual mais suscetível ao ruído. No entanto, o processo temporal da recuperação é diferente para os dois componentes, principalmente para imaginação que requer mais tempo para recuperar a informação do que a memória. Os dados corroboram a relevância do paradigma de dicas retroativas que indica que a atenção espacial é requisitada em representações de organização espacial, independente se são visualizadas ou imaginadas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUÇÃO: Os marcadores clínicos de desenvolvimento possibilitam aos profissionais se familiarizarem com a sequência do desenvolvimento das habilidades auditivas e de linguagem e sinalizarem para a família quando há algum padrão desviante do esperado para o desenvolvimento da criança. O objetivo da presente pesquisa foi determinar os marcadores clínicos de desenvolvimento das habilidades auditivas e de linguagem falada, a partir da análise dos primeiros cinco anos de uso do IC de crianças implantadas antes dos 36 meses; e investigar a influência da idade de implantação no desenvolvimento das habilidades citadas. MÉTODOS: Estudo longitudinal retrospectivo realizado na Seção de Implante Coclear - Centro de Pesquisas Audiológicas (CPA-HRAC/USP). Fizeram parte da amostra 230 crianças que, para análise comparativa, foram dividas em três grupos: operadas e ativadas antes dos 18 meses, entre 19 e 24 meses e entre 25 e 36 meses de idade. Os procedimentos analisados foram: a Infant-Toddler: Meaningful Auditory Integration Scale (IT-MAIS), a Meaningful Use of Speech Scale (MUSS) e as Categorias de Audição e de Linguagem. Os dados coletados foram analisados por meio das estatísticas descritiva e indutiva. RESULTADOS: Durante os primeiros cinco anos de uso do IC foram analisados nove retornos das crianças ao Centro. A partir da análise da mediana, até os 30 ± 3 meses de uso do dispositivo eletrônico grande parte da amostra atingiu 100% na IT-MAIS, quando as habilidades de atenção e de atribuição dos significados aos sons já estavam superadas. Até os 68 ± 6 meses a maioria das crianças alcançou a porcentagem máxima na MUSS e a pontuação máxima nas Categorias de Audição e de Linguagem, ou seja, as crianças já utilizavam a fala espontânea e as estratégias de comunicação em sua rotina, bem como apresentavam as habilidades de reconhecimento auditivo em conjunto aberto e a fluência da linguagem oral, respectivamente. Quando comparados os desempenhos dos grupos, nas avaliações auditivas não houve um padrão de significância estatística e nas avaliações da linguagem os resultados foram significativamente melhores para as crianças implantadas após os 18 meses nos primeiros retornos. Houve fortes correlações entre os resultados das Escalas e Categorias. CONCLUSÕES: As crianças da amostra desenvolveram progressivamente as habilidades auditivas e de linguagem falada ao longo dos primeiros cinco anos de uso do IC. Foi possível determinar os marcadores clínicos de desenvolvimento para as Escalas e Categorias estudadas. A partir deles os profissionais que acompanham a criança no processo de habilitação auditiva, poderão nortear a família, bem como os demais profissionais que atuam com a criança, quanto aos resultados esperados na IT-MAIS, na MUSS e nas Categorias de Audição e Linguagem. Também, foi possível identificar que, mesmo havendo uma restrição quanto as possíveis variáveis que podem interferir na determinação dos marcadores clínicos, houve pacientes com resultados desviantes, sugerindo a importância da definição dos marcadores para, juntamente com a família, o profissional discutir e encontrar outras variáveis que possam influenciar no baixo desempenho da criança. A implantação dentro do período sensível do desenvolvimento pode explicar comportamento auditivo dos grupos quando comparados. Já, quando analisada a linguagem falada, acredita-se que houve a influência de outras variáveis no processo de habilitação auditiva e não apenas a implantação durante o período crítico

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A contribuição da música no campo das ciências humanas vem sendo valorizado pelas ciências da saúde nas últimas décadas, favorecendo relações entre a Fonoaudiologia e a Musicoterapia. A avaliação da percepção musical busca compreender princípios básicos como a discriminação de timbres, melodias, ritmos, intensidade, altura, duração das notas, densidade, entre outros, além de conhecimentos inerentes em relação a audição, bem como as experiências musicais no decorrer da vida. O objetivo deste estudo foi elaborar um teste informatizado de avaliação do reconhecimento de melodias tradicionais brasileiras e verificar o desempenho de crianças com audição normal neste instrumento. Foi realizada a elaboração de um teste, denominado Avaliação do Reconhecimento de Melodias Tradicionais em Crianças normo-ouvintes (ARMTC), em formato de website, composto por 15 melodias tradicionais da cultura brasileira, gravadas com timbre sintetizado de piano, padronizadas com andamentos variáveis, intensidades similares, tonalidade de acordo com a partitura utilizada, reprodução de 12 segundos cada melodia e pausas de quatro segundos entre cada melodia. A casuística foi composta por 155 crianças, com faixa etária entre oito e 11 anos, de ambos os sexos, com limiares auditivos nas frequências de 500 Hz a 4000 Hz dentro dos padrões de normalidade e curva timpanométrica tipo A. Todas as crianças foram submetidas à triagem audiológica (frequências de 500 Hz, 1 KHz, 2 KHz e 4 KHz), Timpanometria e ao ARMTC. O ARMTC foi aplicado em campo livre com intensidade de 65 dBNA, com caixa de som posicionada a 0o azimute, à uma distância de um metro do participante que se manteve sentado. As crianças foram instruídas a clicar na tela do notebook no ícone correspondente ao nome e ilustração da melodia a qual ouviram e prosseguir dessa forma até o término das 15 melodias apresentadas. Na maioria das melodias selecionadas não houve diferença significante entre número de erros/acertos e tempo de reação quando estas variáveis foram correlacionadas ao sexo, idade e local em que o teste foi aplicado. As melodias mais reconhecidas foram: Cai, cai balão, Boi da cara preta, que teve igual score a Caranguejo, Escravos de Jó, O cravo, Parabéns a você e Marcha soldado, as quais obtiveram reconhecimento superior à 70% de acertos e a melodia com menor reconhecimento foi Capelinha de melão.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Deep brain stimulation (DBS) provides significant therapeutic benefit for movement disorders such as Parkinson’s disease (PD). Current DBS devices lack real-time feedback (thus are open loop) and stimulation parameters are adjusted during scheduled visits with a clinician. A closed-loop DBS system may reduce power consumption and side effects by adjusting stimulation parameters based on patient’s behavior. Thus behavior detection is a major step in designing such systems. Various physiological signals can be used to recognize the behaviors. Subthalamic Nucleus (STN) Local field Potential (LFP) is a great candidate signal for the neural feedback, because it can be recorded from the stimulation lead and does not require additional sensors. This thesis proposes novel detection and classification techniques for behavior recognition based on deep brain LFP. Behavior detection from such signals is the vital step in developing the next generation of closed-loop DBS devices. LFP recordings from 13 subjects are utilized in this study to design and evaluate our method. Recordings were performed during the surgery and the subjects were asked to perform various behavioral tasks. Various techniques are used understand how the behaviors modulate the STN. One method studies the time-frequency patterns in the STN LFP during the tasks. Another method measures the temporal inter-hemispheric connectivity of the STN as well as the connectivity between STN and Pre-frontal Cortex (PFC). Experimental results demonstrate that different behaviors create different m odulation patterns in STN and it’s connectivity. We use these patterns as features to classify behaviors. A method for single trial recognition of the patient’s current task is proposed. This method uses wavelet coefficients as features and support vector machine (SVM) as the classifier for recognition of a selection of behaviors: speech, motor, and random. The proposed method is 82.4% accurate for the binary classification and 73.2% for classifying three tasks. As the next step, a practical behavior detection method which asynchronously detects behaviors is proposed. This method does not use any priori knowledge of behavior onsets and is capable of asynchronously detect the finger movements of PD patients. Our study indicates that there is a motor-modulated inter-hemispheric connectivity between LFP signals recorded bilaterally from STN. We utilize a non-linear regression method to measure this inter-hemispheric connectivity and to detect the finger movements. Our experimental results using STN LFP recorded from eight patients with PD demonstrate this is a promising approach for behavior detection and developing novel closed-loop DBS systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, a multimodal and interactive prototype to perform music genre classification is presented. The system is oriented to multi-part files in symbolic format but it can be adapted using a transcription system to transform audio content in music scores. This prototype uses different sources of information to give a possible answer to the user. It has been developed to allow a human expert to interact with the system to improve its results. In its current implementation, it offers a limited range of interaction and multimodality. Further development aimed at full interactivity and multimodal interactions is discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

New low cost sensors and open free libraries for 3D image processing are making important advances in robot vision applications possible, such as three-dimensional object recognition, semantic mapping, navigation and localization of robots, human detection and/or gesture recognition for human-machine interaction. In this paper, a novel method for recognizing and tracking the fingers of a human hand is presented. This method is based on point clouds from range images captured by a RGBD sensor. It works in real time and it does not require visual marks, camera calibration or previous knowledge of the environment. Moreover, it works successfully even when multiple objects appear in the scene or when the ambient light is changed. Furthermore, this method was designed to develop a human interface to control domestic or industrial devices, remotely. In this paper, the method was tested by operating a robotic hand. Firstly, the human hand was recognized and the fingers were detected. Secondly, the movement of the fingers was analysed and mapped to be imitated by a robotic hand.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.