967 resultados para fundamental frequency
Resumo:
Two new features have been proposed and used in the Rich Transcription Evaluation 2009 by the Universidad Politécnica de Madrid, which outperform the results of the baseline system. One of the features is the intensity channel contribution, a feature related to the location of the speaker. The second feature is the logarithm of the interpolated fundamental frequency. It is the first time that both features are applied to the clustering stage of multiple distant microphone meetings diarization. It is shown that the inclusion of both features improves the baseline results by 15.36% and 16.71% relative to the development set and the RT 09 set, respectively. If we consider speaker errors only, the relative improvement is 23% and 32.83% on the development set and the RT09 set, respectively.
Resumo:
The response of high-speed bridges at resonance, particularly under flexural vibrations, constitutes a subject of research for many scientists and engineers at the moment. The topic is of great interest because, as a matter of fact, such kind of behaviour is not unlikely to happen due to the elevated operating speeds of modern rains, which in many cases are equal to or even exceed 300 km/h ( [1,2]). The present paper addresses the subject of the evolution of the wheel-rail contact forces during resonance situations in simply supported bridges. Based on a dimensionless formulation of the equations of motion presented in [4], very similar to the one introduced by Klasztorny and Langer in [3], a parametric study is conducted and the contact forces in realistic situations analysed in detail. The effects of rail and wheel irregularities are not included in the model. The bridge is idealised as an Euler-Bernoulli beam, while the train is simulated by a system consisting of rigid bodies, springs and dampers. The situations such that a severe reduction of the contact force could take place are identified and compared with typical situations in actual bridges. To this end, the simply supported bridge is excited at resonace by means of a theoretical train consisting of 15 equidistant axles. The mechanical characteristics of all axles (unsprung mass, semi-sprung mass, and primary suspension system) are identical. This theoretical train permits the identification of the key parameters having an influence on the wheel-rail contact forces. In addition, a real case of a 17.5 m bridges traversed by the Eurostar train is analysed and checked against the theoretical results. The influence of three fundamental parameters is investigated in great detail: a) the ratio of the fundamental frequency of the bridge and natural frequency of the primary suspension of the vehicle; b) the ratio of the total mass of the bridge and the semi-sprung mass of the vehicle and c) the ratio between the length of the bridge and the characteristic distance between consecutive axles. The main conclusions derived from the investigation are: The wheel-rail contact forces undergo oscillations during the passage of the axles over the bridge. During resonance, these oscillations are more severe for the rear wheels than for the front ones. If denotes the span of a simply supported bridge, and the characteristic distance between consecutive groups of loads, the lower the value of , the greater the oscillations of the contact forces at resonance. For or greater, no likelihood of loss of wheel-rail contact has been detected. The ratio between the frequency of the primary suspension of the vehicle and the fundamental frequency of the bridge is denoted by (frequency ratio), and the ratio of the semi-sprung mass of the vehicle (mass of the bogie) and the total mass of the bridge is denoted by (mass ratio). For any given frequency ratio, the greater the mass ratio, the greater the oscillations of the contact forces at resonance. The oscillations of the contact forces at resonance, and therefore the likelihood of loss of wheel-rail contact, present a minimum for approximately between 0.5 and 1. For lower or higher values of the frequency ratio the oscillations of the contact forces increase. Neglecting the possible effects of torsional vibrations, the metal or composite bridges with a low linear mass have been found to be the ones where the contact forces may suffer the most severe oscillations. If single-track, simply supported, composite or metal bridges were used in high-speed lines, and damping ratios below 1% were expected, the minimum contact forces at resonance could drop to dangerous values. Nevertheless, this kind of structures is very unusual in modern high-speed railway lines.
Resumo:
In order to obtain more human like sounding humanmachine interfaces we must first be able to give them expressive capabilities in the way of emotional and stylistic features so as to closely adequate them to the intended task. If we want to replicate those features it is not enough to merely replicate the prosodic information of fundamental frequency and speaking rhythm. The proposed additional layer is the modification of the glottal model, for which we make use of the GlottHMM parameters. This paper analyzes the viability of such an approach by verifying that the expressive nuances are captured by the aforementioned features, obtaining 95% recognition rates on styled speaking and 82% on emotional speech. Then we evaluate the effect of speaker bias and recording environment on the source modeling in order to quantify possible problems when analyzing multi-speaker databases. Finally we propose a speaking styles separation for Spanish based on prosodic features and check its perceptual significance.
Resumo:
In the context of the present conference paper culverts are defined as an opening or conduit passing through an embankment usually for the purpose of conveying water or providing safe pedestrian and animal crossings under rail infrastructure. The clear opening of culverts may reach values of up to 12m however, values around 3m are encountered much more frequently. Depending on the topography, the number of culverts is about 10 times that of bridges. In spite of this, their dynamic behavior has received far less attention than that of bridges. The fundamental frequency of culverts is considerably higher than that of bridges even in the case of short span bridges. As the operational speed of modern high-speed passenger rail systems rises, higher frequencies are excited and thus more energy is encountered in frequency bands where the fundamental frequency of box culverts is located. Many research efforts have been spent on the subject of ballast instability due to bridge resonance, since it was first observed when high-speed trains were introduced to the Paris/Lyon rail line. To prevent this phenomenon from occurring, design codes establish a limit value for the vertical deck acceleration. Obviously one needs some sort of numerical model in order to estimate this acceleration level and at that point things get quite complicated. Not only acceleration but also displacement values are of interest e.g. to estimate the impact factor. According to design manuals the structural design should consider the depth of cover, trench width and condition, bedding type, backfill material, and compaction. The same applies to the numerical model however, the question is: What type of model is appropriate for this job? A 3D model including the embankment and an important part of the soil underneath the culvert is computationally very expensive and hard to justify taking into account the associated costs. Consequently, there is a clear need for simplified models and design rules in order to achieve reasonable costs. This paper will describe the results obtained from a 2D finite element model which has been calibrated by means of a 3D model and experimental data obtained at culverts that belong to the high-speed railway line that links the two towns of Segovia and Valladolid in Spain
Resumo:
Gender detection is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here consists in obtaining uncorrelated glottal and vocal tract components which are parameterized as mel-frequency coefficients. K-fold and cross-validation using QDA and GMM classifiers showed that better detection rates are reached when glottal source and vocal tract parameters are used in a gender-balanced database of running speech from 340 speakers.
Resumo:
An analytical study of cepstral peak prominence (CPP) is presented, intended to provide an insight into its meaning and relation with voice perturbation parameters. To carry out this analysis, a parametric approach is adopted in which voice production is modelled using the traditional source-filter model and the first cepstral peak is assumed to have Gaussian shape. It is concluded that the meaning of CPP is very similar to that of the first rahmonic and some insights are provided on its dependence with fundamental frequency and vocal tract resonances. It is further shown that CPP integrates measures of voice waveform and periodicity perturbations, be them either amplitude, frequency or noise.
Resumo:
El habla es la principal herramienta de comunicación de la que dispone el ser humano que, no sólo le permite expresar su pensamiento y sus sentimientos sino que le distingue como individuo. El análisis de la señal de voz es fundamental para múltiples aplicaciones como pueden ser: síntesis y reconocimiento de habla, codificación, detección de patologías, identificación y reconocimiento de locutor… En el mercado se pueden encontrar herramientas comerciales o de libre distribución para realizar esta tarea. El objetivo de este Proyecto Fin de Grado es reunir varios algoritmos de análisis de la señal de voz en una única herramienta que se manejará a través de un entorno gráfico. Los algoritmos están siendo utilizados en el Grupo de investigación en Aplicaciones MultiMedia y Acústica de la Universidad Politécnica de Madrid para llevar a cabo su tarea investigadora y para ofertar talleres formativos a los alumnos de grado de la Escuela Técnica Superior de Ingeniería y Sistemas de Telecomunicación. Actualmente se ha encontrado alguna dificultad para poder aplicar los algoritmos ya que se han ido desarrollando a lo largo de varios años, por distintas personas y en distintos entornos de programación. Se han adaptado los programas existentes para generar una única herramienta en MATLAB que permite: . Detección de voz . Detección sordo/sonoro . Extracción y revisión manual de frecuencia fundamental de los sonidos sonoros . Extracción y revisión manual de formantes de los sonidos sonoros En todos los casos el usuario puede ajustar los parámetros de análisis y se ha mantenido y, en algunos casos, ampliado la funcionalidad de los algoritmos existentes. Los resultados del análisis se pueden manejar directamente en la aplicación o guardarse en un fichero. Por último se ha escrito el manual de usuario de la aplicación y se ha generado una aplicación independiente que puede instalarse y ejecutarse aunque no se disponga del software o de la versión adecuada de MATLAB. ABSTRACT. The speech is the main communication tool which has the human that as well as allowing to express his thoughts and feelings distinguishes him as an individual. The analysis of speech signal is essential for multiple applications such as: synthesis and recognition of speech, coding, detection of pathologies, identification and speaker recognition… In the market you can find commercial or open source tools to perform this task. The aim of this Final Degree Project is collect several algorithms of speech signal analysis in a single tool which will be managed through a graphical environment. These algorithms are being used in the research group Aplicaciones MultiMedia y Acústica at the Universidad Politécnica de Madrid to carry out its research work and to offer training workshops for students at the Escuela Técnica Superior de Ingeniería y Sistemas de Telecomunicación. Currently some difficulty has been found to be able to apply the algorithms as they have been developing over several years, by different people and in different programming environments. Existing programs have been adapted to generate a single tool in MATLAB that allows: . Voice Detection . Voice/Unvoice Detection . Extraction and manual review of fundamental frequency of voiced sounds . Extraction and manual review formant voiced sounds In all cases the user can adjust the scan settings, we have maintained and in some cases expanded the functionality of existing algorithms. The analysis results can be managed directly in the application or saved to a file. Finally we have written the application user’s manual and it has generated a standalone application that can be installed and run although the user does not have MATLAB software or the appropriate version.
Resumo:
The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized.
Resumo:
A avaliação perceptivo-auditiva tem papel fundamental no estudo e na avaliação da voz, no entanto, por ser subjetiva está sujeita a imprecisões e variações. Por outro lado, a análise acústica permite a reprodutibilidade de resultados, porém precisa ser aprimorada, pois não analisa com precisão vozes com disfonias mais intensas e com ondas caóticas. Assim, elaborar medidas que proporcionem conhecimentos confiáveis em relação à função vocal resulta de uma necessidade antiga dentro desta linha de pesquisa e atuação clínica. Neste contexto, o uso da inteligência artificial, como as redes neurais artificiais, indica ser uma abordagem promissora. Objetivo: Validar um sistema automático utilizando redes neurais artificiais para a avaliação de vozes rugosas e soprosas. Materiais e métodos: Foram selecionadas 150 vozes, desde neutras até com presença em grau intenso de rugosidade e/ou soprosidade, do banco de dados da Clínica de Fonoaudiologia da Faculdade de Odontologia de Bauru (FOB/USP). Dessas vozes, 23 foram excluídas por não responderem aos critérios de inclusão na amostra, assim utilizaram-se 123 vozes. Procedimentos: avaliação perceptivo-auditiva pela escala visual analógica de 100 mm e pela escala numérica de quatro pontos; extração de características do sinal de voz por meio da Transformada Wavelet Packet e dos parâmetros acústicos: jitter, shimmer, amplitude da derivada e amplitude do pitch; e validação do classificador por meio da parametrização, treino, teste e avaliação das redes neurais artificiais. Resultados: Na avaliação perceptivo-auditiva encontrou-se, por meio do teste Coeficiente de Correlação Intraclasse (CCI), concordâncias inter e intrajuiz excelentes, com p = 0,85 na concordância interjuízes e p variando de 0,87 a 0,93 nas concordâncias intrajuiz. Em relação ao desempenho da rede neural artificial, na discriminação da soprosidade e da rugosidade e dos seus respectivos graus, encontrou-se o melhor desempenho para a soprosidade no subconjunto composto pelo jitter, amplitude do pitch e frequência fundamental, no qual obteve-se taxa de acerto de 74%, concordância excelente com a avaliação perceptivo-auditiva da escala visual analógica (0,80 no CCI) e erro médio de 9 mm. Para a rugosidade, o melhor subconjunto foi composto pela Transformada Wavelet Packet com 1 nível de decomposição, jitter, shimmer, amplitude do pitch e frequência fundamental, no qual obteve-se 73% de acerto, concordância excelente (0,84 no CCI), e erro médio de 10 mm. Conclusão: O uso da inteligência artificial baseado em redes neurais artificiais na identificação, e graduação da rugosidade e da soprosidade, apresentou confiabilidade excelente (CCI > 0,80), com resultados semelhantes a concordância interjuízes. Dessa forma, a rede neural artificial revela-se como uma metodologia promissora de avaliação vocal, tendo sua maior vantagem a objetividade na avaliação.
Resumo:
Comunicación presentada en EVACES 2011, 4th International Conference on Experimental Vibration Analysis for Civil Engineering Structures, Varenna (Lecco), Italy, October 3-5, 2011.
Resumo:
Trabalho Final do Curso de Mestrado Integrado em Medicina, Faculdade de Medicina, Universidade de Lisboa, 2014
Resumo:
In recent years, acoustic perturbation measurement has gained clinical and research popularity due to the ease of availability of commercial acoustic analysing software packages in the market. However, because the measurement itself depends critically on the accuracy of frequency tracking from the voice signal, researchers argue that perturbation measures are not suitable for analysing dysphonic voice samples, which are aperiodic in nature. This study compares the fundamental frequency, relative amplitude perturbation, shimmer percent and noise-to-harmonic ratio between a group of dysphonic and non-dysphonic subjects. One hundred and twelve dysphonic subjects ( 93 females and 19 males) and 41 non-dysphonic subjects ( 35 females and 6 males) participated in the study. All the 153 voice samples were categorized into type I ( periodic or nearly periodic), type II ( signals with subharmonic frequencies that approach the fundamental frequency) and type III ( aperiodic) signals. Only the type I ( periodic and nearly periodic) voice signals were acoustically analysed for perturbation measures. Results revealed that the dysphonic female group presented significantly lower fundamental frequency, significantly higher relative amplitude perturbation and shimmer percent values than the non-dysphonic female group. However, none of these three perturbation measures were able to differentiate between male dysphonic and male non-dysphonic subjects. The noise-to-harmonic ratio failed to differentiate between the dysphonic and non-dysphonic voices for both gender groups. These results question the sensitivity of acoustic perturbation measures in detecting dysphonia and suggest that contemporary acoustic perturbation measures are not suitable for analysing dysphonic voice signals, which are even nearly periodic. Copyright (C) 2005 S. Karger AG, Basel.
Resumo:
Primary objective: To examine changes in the relationship between intonation, voice range and mood following music therapy programmes in people with traumatic brain injury. Research design: Data from four case studies were pooled and effect size, ANOVA and correlation calculations were performed to evaluate the effectiveness of treatment. Methods and procedures: Subjects sang three self-selected songs for 15 sessions. Speaking fundamental frequency, fundamental frequency variability, slope, voice range and mood were analysed pre- and post-session. Results: Immediate treatment effects were not found. Long-term improvements in affective intonation were found in three subjects, especially in fundamental frequency. Voice range improved over time and was positively correlated with the three intonation components. Mood scale data showed that immediate effects were in the negative direction whereas there weres increases in positive mood state in the longer-term. Conclusions: Findings suggest that, in the long-term, song singing can improve vocal range and mood and enhance the affective intonation styles of people with TBI.
Resumo:
Harmonically related components are typically heard as a unified entity with a rich timbre and a pitch corresponding to the fundamental frequency. Mistuning a component generally has four consequences: (i) the global pitch of the complex shifts in the same direction as the mistuning; (ii) the component makes a reduced contribution to global pitch; (iii) the component is heard out as a separate sound with a pure timbre; (iv) its pitch differs from that of a pure tone of equal frequency in a small but systematic way. Local interactions between neighbouring components cannot explain these effects; instead they are usually explained in terms of the global operation of a single harmonic-template mechanism. However, several observations indicate that separate mechanisms govern the selection of spectral components for perceptual fusion and for the computation of global pitch. First, an increase in mistuning causes a harmonic to be heard out before it begins to be excluded from the computation of global pitch. Second, a single even harmonic added to an odd-harmonic complex is typically more salient than its odd neighbours. Third, the mistuning of a component in frequency-shifted stimuli, or stimuli with a moderate spectral stretch, results in changes in salience and component pitch like those seen for harmonic stimuli. Fourth, the global pitch of frequency-shifted stimuli is predicted well by the weighted fit of a harmonic template, but, with the exception of the lowest component, the fusion of individual partials for shifted stimuli is best predicted by the common pattern of spectral spacing. Fifth, our sensitivity to spectral pattern is surprisingly resistant to random variations in component spacing induced by applying mistunings to several harmonics at once. These findings are evaluated in the context of an autocorrelogram model of the proposed pitch/grouping dissociation. © S. Hirzel Verlag · EAA.
Resumo:
This thesis describes a series of experiments investigating both sequential and concurrent auditory grouping in implant listeners. Some grouping cues used by normal-hearing listeners should also be available to implant listeners, while others (e.g. fundamental frequency) are unlikely to be useful. As poor spectral resolution may also limit implant listeners’ performance, the spread of excitation in the cochlea was assessed using Neural Response Telemetry (NRT) and the results were related to those of the perceptual tasks. Experiment 1 evaluated sequential segregation of alternating tone sequences; no effect of rate or evidence of perceptual ambiguity was found, suggesting that automatic stream segregation had not occurred. Experiment 2 was an electrode pitch-ranking task; some relationship was found between pitch-ranking judgements (especially confidence scores) and reported segregation. Experiment 3 used a temporal discrimination task; this also failed to provide evidence of automatic stream segregation, because no interaction was found between the effects of sequence length and electrode separation. Experiment 4 explored schema-based grouping using interleaved melody discrimination; listeners were not able to segregate targets and distractors based on pitch differences, unless accompanied by substantial level differences. Experiment 5 evaluated concurrent segregation in a task requiring the detection of level changes in individual components of a complex tone. Generally, large changes were needed and abrupt changes were no easier to detect than gradual ones. In experiment 6, NRT testing confirmed substantially overlapping simulation by intracochlear electrodes. Overall, little or no evidence of auditory grouping by implant listeners was found.