901 resultados para temporal speech information


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este trabalho apresenta um sistema neural modular, que processa separadamente informações de contexto espacial e temporal, para a tarefa de reprodução de sequências temporais. Para o desenvolvimento do sistema neural foram considerados redes neurais recorrentes, modelos estocásticos, sistemas neurais modulares e processamento de informações de contexto. Em seguida, foram estudados três modelos com abordagens distintas para aprendizagem de seqüências temporais: uma rede neural parcialmente recorrente, um exemplo de sistema neural modular e um modelo estocástico utilizando a teoria de modelos markovianos escondidos. Com base nos estudos e modelos apresentados, esta pesquisa propõe um sistema formado por dois módulos sucessivos distintos. Uma rede de propagação direta (módulo estimador de contexto espacial) realiza o processamento de contexto espacial identificando a seqüência a ser reproduzida e fornecendo um protótipo do contexto para o segundo módulo. Este é formado por uma rede parcialmente recorrente (módulo de reprodução de sequências temporais) para aprender as informações de contexto temporal e reproduzir em suas saídas a seqüência identificada pelo módulo anterior. Para a finalidade mencionada, este mestrado utiliza a distribuição de Gibbs na saída do módulo para contexto espacial de forma que este forneça probabilidades de contexto espacial, indicando o grau de certeza do módulo e possibilitando a utilização de procedimentos especiais para os casos de dúvida. O sistema neural foi testado em conjuntos contendo trajetórias abertas, fechadas, e com diferentes situações de ambigüidade e complexidade. Duas situações distintas foram avaliadas: (a) capacidade do sistema em reproduzir trajetórias a partir de pontos iniciais treinados; e (b) capacidade de generalização do sistema reproduzindo trajetórias considerando pontos iniciais ou finais em situações não treinadas. A situação (b) é um problema de difícil ) solução em redes neurais devido à falta de contexto temporal, essencial na reprodução de seqüências. Foram realizados experimentos comparando o desempenho do sistema modular proposto com o de uma rede parcialmente recorrente operando sozinha e um sistema modular neural (TOTEM). Os resultados sugerem que o sistema proposto apresentou uma capacidade de generalização significamente melhor, sem que houvesse uma deterioração na capacidade de reproduzir seqüências treinadas. Esses resultados foram obtidos em sistema mais simples que o TOTEM.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Federal Aviation Administration, Atlantic City International Airport, N.J.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Problems in subject access to information organization systems have been under investigation for a long time. Focusing on item-level information discovery and access, researchers have identified a range of subject access problems, including quality and application of metadata, as well as the complexity of user knowledge required for successful subject exploration. While aggregations of digital collections built in the United States and abroad generate collection-level metadata of various levels of granularity and richness, no research has yet focused on the role of collection-level metadata in user interaction with these aggregations. This dissertation research sought to bridge this gap by answering the question “How does collection-level metadata mediate scholarly subject access to aggregated digital collections?” This goal was achieved using three research methods: • in-depth comparative content analysis of collection-level metadata in three large-scale aggregations of cultural heritage digital collections: Opening History, American Memory, and The European Library • transaction log analysis of user interactions, with Opening History, and • interview and observation data on academic historians interacting with two aggregations: Opening History and American Memory. It was found that subject-based resource discovery is significantly influenced by collection-level metadata richness. The richness includes such components as: 1) describing collection’s subject matter with mutually-complementary values in different metadata fields, and 2) a variety of collection properties/characteristics encoded in the free-text Description field, including types and genres of objects in a digital collection, as well as topical, geographic and temporal coverage are the most consistently represented collection characteristics in free-text Description fields. Analysis of user interactions with aggregations of digital collections yields a number of interesting findings. Item-level user interactions were found to occur more often than collection-level interactions. Collection browse is initiated more often than search, while subject browse (topical and geographic) is used most often. Majority of collection search queries fall within FRBR Group 3 categories: object, concept, and place. Significantly more object, concept, and corporate body searches and less individual person, event and class of persons searches were observed in collection searches than in item searches. While collection search is most often satisfied by Description and/or Subjects collection metadata fields, it would not retrieve a significant proportion of collection records without controlled-vocabulary subject metadata (Temporal Coverage, Geographic Coverage, Subjects, and Objects), and free-text metadata (the Description field). Observation data shows that collection metadata records in Opening History and American Memory aggregations are often viewed. Transaction log data show a high level of engagement with collection metadata records in Opening History, with the total page views for collections more than 4 times greater than item page views. Scholars observed viewing collection records valued descriptive information on provenance, collection size, types of objects, subjects, geographic coverage, and temporal coverage information. They also considered the structured display of collection metadata in Opening History more useful than the alternative approach taken by other aggregations, such as American Memory, which displays only the free-text Description field to the end-user. The results extend the understanding of the value of collection-level subject metadata, particularly free-text metadata, for the scholarly users of aggregations of digital collections. The analysis of the collection metadata created by three large-scale aggregations provides a better understanding of collection-level metadata application patterns and suggests best practices. This dissertation is also the first empirical research contribution to test the FRBR model as a conceptual and analytic framework for studying collection-level subject access.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

To identify and categorize complex stimuli such as familiar objects or speech, the human brain integrates information that is abstracted at multiple levels from its sensory inputs. Using cross-modal priming for spoken words and sounds, this functional magnetic resonance imaging study identified 3 distinct classes of visuoauditory incongruency effects: visuoauditory incongruency effects were selective for 1) spoken words in the left superior temporal sulcus (STS), 2) environmental sounds in the left angular gyrus (AG), and 3) both words and sounds in the lateral and medial prefrontal cortices (IFS/mPFC). From a cognitive perspective, these incongruency effects suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels, with the STS being involved in phonological, AG in semantic, and mPFC/IFS in higher conceptual processing. In terms of neural mechanisms, effective connectivity analyses (dynamic causal modeling) suggest that these incongruency effects may emerge via greater bottom-up effects from early auditory regions to intermediate multisensory integration areas (i.e., STS and AG). This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (middle temporal gyrus/STS vs. AG/intraparietal sulcus).

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper proposes an automatic acoustic-phonetic method for estimating voice-onset time of stops. This method requires neither transcription of the utterance nor training of a classifier. It makes use of the plosion index for the automatic detection of burst onsets of stops. Having detected the burst onset, the onset of the voicing following the burst is detected using the epochal information and a temporal measure named the maximum weighted inner product. For validation, several experiments are carried out on the entire TIMIT database and two of the CMU Arctic corpora. The performance of the proposed method compares well with three state-of-the-art techniques. (C) 2014 Acoustical Society of America

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output—a key source of phonetic detail—from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (=30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (˜N1 + N2), F2 (˜N3 + N4) and the higher formants (F3' ˜ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The performance of automatic speech recognition systems deteriorates in the presence of noise. One known solution is to incorporate video information with an existing acoustic speech recognition system. We investigate the performance of the individual acoustic and visual sub-systems and then examine different ways in which the integration of the two systems may be performed. The system is to be implemented in real time on a Texas Instruments' TMS320C80 DSP.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Traditional analytic models for power system fault diagnosis are usually formulated as an unconstrained 0–1 integer programming problem. The key issue of the models is to seek the fault hypothesis that minimizes the discrepancy between the actual and the expected states of the concerned protective relays and circuit breakers. The temporal information of alarm messages has not been well utilized in these methods, and as a result, the diagnosis results may be not unique and hence indefinite, especially when complicated and multiple faults occur. In order to solve this problem, this paper presents a novel analytic model employing the temporal information of alarm messages along with the concept of related path. The temporal relationship among the actions of protective relays and circuit breakers, and the different protection configurations in a modern power system can be reasonably represented by the developed model, and therefore, the diagnosed results will be more definite under different circumstances of faults. Finally, an actual power system fault was served to verify the proposed method.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Studies of semantic impairment arising from brain disease suggest that the anterior temporal lobes are critical for semantic abilities in humans; yet activation of these regions is rarely reported in functional imaging studies of healthy controls performing semantic tasks. Here, we combined neuropsychological and PET functional imaging data to show that when healthy subjects identify concepts at a specific level, the regions activated correspond to the site of maximal atrophy in patients with relatively pure semantic impairment. The stimuli were color photographs of common animals or vehicles, and the task was category verification at specific (e.g., robin), intermediate (e.g., bird), or general (e.g., animal) levels. Specific, relative to general, categorization activated the antero-lateral temporal cortices bilaterally, despite matching of these experimental conditions for difficulty. Critically, in patients with atrophy in precisely these areas, the most pronounced deficit was in the retrieval of specific semantic information.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Autism and Asperger syndrome (AS) are neurodevelopmental disorders characterised by deficient social and communication skills, as well as restricted, repetitive patterns of behaviour. The language development in individuals with autism is significantly delayed and deficient, whereas in individuals with AS, the structural aspects of language develop quite normally. Both groups, however, have semantic-pragmatic language deficits. The present thesis investigated auditory processing in individuals with autism and AS. In particular, the discrimination of and orienting to speech and non-speech sounds was studied, as well as the abstraction of invariant sound features from speech-sound input. Altogether five studies were conducted with auditory event-related brain potentials (ERP); two studies also included a behavioural sound-identification task. In three studies, the subjects were children with autism, in one study children with AS, and in one study adults with AS. In children with autism, even the early stages of sound encoding were deficient. In addition, these children had altered sound-discrimination processes characterised by enhanced spectral but deficient temporal discrimination. The enhanced pitch discrimination may partly explain the auditory hypersensitivity common in autism, and it may compromise the filtering of relevant auditory information from irrelevant information. Indeed, it was found that when sound discrimination required abstracting invariant features from varying input, children with autism maintained their superiority in pitch processing, but lost it in vowel processing. Finally, involuntary orienting to sound changes was deficient in children with autism in particular with respect to speech sounds. This finding is in agreement with previous studies on autism suggesting deficits in orienting to socially relevant stimuli. In contrast to children with autism, the early stages of sound encoding were fairly unimpaired in children with AS. However, sound discrimination and orienting were rather similarly altered in these children as in those with autism, suggesting correspondences in the auditory phenotype in these two disorders which belong to the same continuum. Unlike children with AS, adults with AS showed enhanced processing of duration changes, suggesting developmental changes in auditory processing in this disorder.