961 resultados para Automatic Speech Recognition


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This idea was explored using a method that ensures interference cannot occur through energetic masking. Three-formant (F1+F2+F3) analogues of natural sentences were synthesized using a monotonous periodic source. Target formants were presented monaurally, with the target ear assigned randomly on each trial. A competitor for F2 (F2C) was presented contralaterally; listeners must reject F2C to optimize recognition. In experiment 1, F2Cs with various frequency and amplitude contours were used. F2Cs with time-varying frequency contours were effective competitors; constant-frequency F2Cs had far less impact. To a lesser extent, amplitude contour also influenced competitor impact; this effect was additive. In experiment 2, F2Cs were created by inverting the F2 frequency contour about its geometric mean and varying its depth of variation over a range from constant to twice the original (0%-200%). The impact on intelligibility was least for constant F2Cs and increased up to ∼100% depth, but little thereafter. The effect of an extraneous formant depends primarily on its frequency contour; interference increases as the depth of variation is increased until the range exceeds that typical for F2 in natural speech.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel approach of automatic ECG analysis based on scale-scale signal representation is proposed. The approach uses curvature scale-space representation to locate main ECG waveform limits and peaks and may be used to correct results of other ECG analysis techniques or independently. Moreover dynamic matching of ECG CSS representations provides robust preliminary recognition of ECG abnormalities which has been proven by experimental results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics-for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This idea was explored using a method that ensures interference occurs only through informational masking. Three-formant analogues of sentences were synthesized using a monotonous periodic source (F0 = 140 Hz). Target formants were presented monaurally; the target ear was assigned randomly on each trial. A competitor for F2 (F2C) was presented contralaterally; listeners must reject F2C to optimize recognition. In experiment 1, F2Cs with various frequency and amplitude contours were used. F2Cs with time-varying frequency contours were effective competitors; constant-frequency F2Cs had far less impact. Amplitude contour also influenced competitor impact; this effect was additive. In experiment 2, F2Cs were created by inverting the F2 frequency contour about its geometric mean and varying its depth of variation over a range from constant to twice the original (0–200%). The impact on intelligibility was least for constant F2Cs and increased up to ~100% depth, but little thereafter. The effect of an extraneous formant depends primarily on its frequency contour; interference increases as the depth of variation is increased until the range exceeds that typical for F2 in natural speech.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Perception and recognition of faces are fundamental cognitive abilities that form a basis for our social interactions. Research has investigated face perception using a variety of methodologies across the lifespan. Habituation, novelty preference, and visual paired comparison paradigms are typically used to investigate face perception in young infants. Storybook recognition tasks and eyewitness lineup paradigms are generally used to investigate face perception in young children. These methodologies have introduced systematic differences including the use of linguistic information for children but not infants, greater memory load for children than infants, and longer exposure times to faces for infants than for older children, making comparisons across age difficult. Thus, research investigating infant and child perception of faces using common methods, measures, and stimuli is needed to better understand how face perception develops. According to predictions of the Intersensory Redundancy Hypothesis (IRH; Bahrick & Lickliter, 2000, 2002), in early development, perception of faces is enhanced in unimodal visual (i.e., silent dynamic face) rather than bimodal audiovisual (i.e., dynamic face with synchronous speech) stimulation. The current study investigated the development of face recognition across children of three ages: 5 – 6 months, 18 – 24 months, and 3.5 – 4 years, using the novelty preference paradigm and the same stimuli for all age groups. It also assessed the role of modality (unimodal visual versus bimodal audiovisual) and memory load (low versus high) on face recognition. It was hypothesized that face recognition would improve across age and would be enhanced in unimodal visual stimulation with a low memory load. Results demonstrated a developmental trend (F(2, 90) = 5.00, p = 0.009) with older children showing significantly better recognition of faces than younger children. In contrast to predictions, no differences were found as a function of modality of presentation (bimodal audiovisual versus unimodal visual) or memory load (low versus high). This study was the first to demonstrate a developmental improvement in face recognition from infancy through childhood using common methods, measures and stimuli consistent across age.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The role of source properties in across-formant integration was explored using three-formant (F1+F2+F3) analogues of natural sentences (targets). In experiment 1, F1+F3 were harmonic analogues (H1+H3) generated using a monotonous buzz source and second-order resonators; in experiment 2, F1+F3 were tonal analogues (T1+T3). F2 could take either form (H2 or T2). Target formants were always presented monaurally; the receiving ear was assigned randomly on each trial. In some conditions, only the target was present; in others, a competitor for F2 (F2C) was presented contralaterally. Buzz-excited or tonal competitors were created using the time-reversed frequency and amplitude contours of F2. Listeners must reject F2C to optimize keyword recognition. Whether or not a competitor was present, there was no effect of source mismatch between F1+F3 and F2. The impact of adding F2C was modest when it was tonal but large when it was harmonic, irrespective of whether F2C matched F1+F3. This pattern was maintained when harmonic and tonal counterparts were loudness-matched (experiment 3). Source type and competition, rather than acoustic similarity, governed the phonetic contribution of a formant. Contrary to earlier research using dichotic targets, requiring across-ear integration to optimize intelligibility, H2C was an equally effective informational masker for H2 as for T2.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 – 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech -- Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions -- A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds -- Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions -- Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it -- Finally features related with emotions in voiced speech are extracted and presented

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a semi-parametric Algorithm for parsing football video structures. The approach works on a two interleaved based process that closely collaborate towards a common goal. The core part of the proposed method focus perform a fast automatic football video annotation by looking at the enhance entropy variance within a series of shot frames. The entropy is extracted on the Hue parameter from the HSV color system, not as a global feature but in spatial domain to identify regions within a shot that will characterize a certain activity within the shot period. The second part of the algorithm works towards the identification of dominant color regions that could represent players and playfield for further activity recognition. Experimental Results shows that the proposed football video segmentation algorithm performs with high accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work we focus on pattern recognition methods related to EMG upper-limb prosthetic control. After giving a detailed review of the most widely used classification methods, we propose a new classification approach. It comes as a result of comparison in the Fourier analysis between able-bodied and trans-radial amputee subjects. We thus suggest a different classification method which considers each surface electrodes contribute separately, together with five time domain features, obtaining an average classification accuracy equals to 75% on a sample of trans-radial amputees. We propose an automatic feature selection procedure as a minimization problem in order to improve the method and its robustness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the Amazon Region, there is a virtual absence of severe malaria and few fatal cases of naturally occurring Plasmodium falciparum infections; this presents an intriguing and underexplored area of research. In addition to the rapid access of infected persons to effective treatment, one cause of this phenomenon might be the recognition of cytoadherent variant proteins on the infected red blood cell (IRBC) surface, including the var gene encoded P. falciparum erythrocyte membrane protein 1. In order to establish a link between cytoadherence, IRBC surface antibody recognition and the presence or absence of malaria symptoms, we phenotype-selected four Amazonian P. falciparum isolates and the laboratory strain 3D7 for their cytoadherence to CD36 and ICAM1 expressed on CHO cells. We then mapped the dominantly expressed var transcripts and tested whether antibodies from symptomatic or asymptomatic infections showed a differential recognition of the IRBC surface. As controls, the 3D7 lineages expressing severe disease-associated phenotypes were used. We showed that there was no profound difference between the frequency and intensity of antibody recognition of the IRBC-exposed P. falciparum proteins in symptomatic vs. asymptomatic infections. The 3D7 lineages, which expressed severe malaria-associated phenotypes, were strongly recognised by most, but not all plasmas, meaning that the recognition of these phenotypes is frequent in asymptomatic carriers, but is not necessarily a prerequisite to staying free of symptoms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Universidade Estadual de Campinas. Faculdade de Educação Física

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A síndrome do X Frágil é a causa mais frequente de deficiência intelectual hereditária. A variante de Dandy-Walker trata-se de uma constelação específica de achados neurorradiológicos. Este estudo relata achados da comunicação oral e escrita de um menino de 15 anos com diagnóstico clínico e molecular da síndrome do X-Frágil e achados de neuroimagem do encéfalo compatíveis com variante de Dandy-Walker. A avaliação fonoaudiológica foi realizada por meio da Observação do Comportamento Comunicativo, aplicação do ABFW - Teste de Linguagem Infantil - Fonologia, Perfil de Habilidades Fonológicas, Teste de Desempenho Escolar, Teste Illinois de Habilidades Psicolinguísticas, avaliação do sistema estomatognático e avaliação audiológica. Observou-se: alteração de linguagem oral quanto às habilidades fonológicas, semânticas, pragmáticas e morfossintáticas; déficits nas habilidades psicolinguísticas (recepção auditiva, expressão verbal, combinação de sons, memória sequencial auditiva e visual, closura auditiva, associação auditiva e visual); e alterações morfológicas e funcionais do sistema estomatognático. Na leitura verificou-se dificuldades na decodificação dos símbolos gráficos e na escrita havia omissões, aglutinações e representações múltiplas com o uso predominante de vogais e dificuldades na organização viso-espacial. Em matemática, apesar do reconhecimento numérico, não realizou operações aritméticas. Não foram observadas alterações na avaliação audiológica periférica. A constelação de sintomas comportamentais, cognitivos, linguísticos e perceptivos, previstos na síndrome do X-Frágil, somada às alterações estruturais do sistema nervoso central, pertencentes à variante de Dandy-Walker, trouxeram interferências marcantes no desenvolvimento das habilidades comunicativas, no aprendizado da leitura e escrita e na integração social do indivíduo.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A modified version of the intruder-resident paradigm was used to investigate if social recognition memory lasts at least 24 h. One hundred and forty-six adult male Wistar rats were used. Independent groups of rats were exposed to an intruder for 0.083, 0.5, 2, 24, or 168 h and tested 24 h after the first encounter with the familiar or a different conspecific. Factor analysis was employed to identify associations between behaviors and treatments. Resident rats exhibited a 24-h social recognition memory, as indicated by a 3- to 5-fold decrease in social behaviors in the second encounter with the same conspecific compared to those observed for a different conspecific, when the duration of the first encounter was 2 h or longer. It was possible to distinguish between two different categories of social behaviors and their expression depended on the duration of the first encounter. Sniffing the anogenital area (49.9% of the social behaviors), sniffing the body (17.9%), sniffing the head (3%), and following the conspecific (3.1%), exhibited mostly by resident rats, characterized social investigation and revealed long-term social recognition memory. However, dominance (23.8%) and mild aggression (2.3%), exhibited by both resident and intruders, characterized social agonistic behaviors and were not affected by memory. Differently, sniffing the environment (76.8% of the non-social behaviors) and rearing (14.3%), both exhibited mostly by adult intruder rats, characterized non-social behaviors. Together, these results show that social recognition memory in rats may last at least 24 h after a 2-h or longer exposure to the conspecific.