988 resultados para Speech Rate
Resumo:
The aims of the present study were to compare the perceptual assessments of deviant speech signs (dysarthria) exhibited by Australian and Swedish speakers with multiple sclerosis (MS) and to explore whether judgements of dysarthria differed depending on whether the speakers and the judges spoke the same or different languages. Ten Australian and 10 Swedish individuals with MS (matched as closely as possible for age, gender, progression type and severity of dysarthria) were assessed by 2 Australian and 2 Swedish clinically experienced judges using a protocol including 33 speech parameters. Results show that the following perceptual dimensions were identified by both pairs of judges in both groups of speakers to a just noticeable or moderate degree: imprecise consonants, inappropriate pitch level, reduced general rate, and glottal fry. The reliability (Spearman rank-order correlation) of the consensus ratings from the Australian and the Swedish judges was high, with a mean rho of 85.7 for the Australian speakers and mean rho of 84.3 for the Swedish speakers. The most difficult perceptual parameters to assess (i.e. to agree on) included harshness, level of pitch and loudness, precision of consonants and general stress pattern. The study indicated that perceptual assessments of speech characteristics in individuals with MS are informative and can be achieved with high inter-judge reliability irrespective of the judge's knowledge of the speaker's language. Copyright (C) 2003 S. Karger AG, Basel.
Resumo:
Binocular rivalry occurs when different images are presented simultaneously to corresponding points within the left and right eyes. Under these conditions, the observer's perception will alternate between the two perceptual alternatives. Motivated by the reported link between the rate of perceptual alternations, symptoms of psychosis and an incidental observation that the rhythmicity of perceptual alternations during binocular rivalry was greatly increased 10 h after the consumption of LSD, this study aimed to investigate the pharmacology underlying binocular rivalry and to explore the connection between the timing of perceptual switching and psychosis. Psilocybin (4-phosphoryloxy-N,N-dimethyltryptamine, PY) was chosen for the study because, like LSD, it is known to act as an agonist at serotonin (5-HT)(1A) and 5-HT2A receptors and to produce an altered state sometimes marked by psychosis-like symptoms. A total of 12 healthy human volunteers were tested under placebo, low-dose ( 115 mg/kg) and high-dose ( 250 mg/kg) PY conditions. In line with predictions, under both low- and high-dose conditions, the results show that at 90 min postadministration ( the peak of drug action), rate and rhythmicity of perceptual alternations were significantly reduced from placebo levels. Following the 90 min testing period, the perceptual switch rate successively increased, with some individuals showing increases well beyond pretest levels at the final testing, 360 min postadministration. However, as some subjects had still not returned to pretest levels by this time, the mean phase duration at 360 min was not found to differ significantly from placebo. Reflecting the drug-induced changes in rivalry phase durations, subjects showed clear changes in psychological state as indexed by the 5D-ASC ( altered states of consciousness) rating scales. This study suggests the involvement of serotonergic pathways in binocular rivalry and supports the previously proposed role of a brainstem oscillator in perceptual rivalry alternations and symptoms of psychosis.
Resumo:
The present study examined 24 individuals with either complete or incomplete injuries to the cervical spinal cord through the use of standardized assessments of dysarthria and a perceptual rating scale. Perceptual assessment revealed predominantly prosodic and phonatory disturbances, while physical impairments were common in the respiratory and laryngeal subsystems of speech production. A reduction in intelligibility and speaking rate resulted in a diminished communicative effectiveness ratio for most participants. Individuals showed a high degree of variation, with no clear relationship between lesion type and impairments present. Further investigation is required to verify the physiological nature of the respiratory and laryngeal impairments found in the present investigation and to determine the relative contributions of these to the overall presentation of speech and voice post cervical spinal cord injury (CSI).
Resumo:
At present there is no standard assessment method for rating and comparing the quality of synthesized speech. This study assesses the suitability of Time Frequency Warping (TFW) modulation for use as a reference device for assessing synthesized speech. Time Frequency Warping modulation introduces timing errors into natural speech that produce perceptual errors similar to those found in synthetic speech. It is proposed that TFW modulation used in conjunction with a listening effort test would provide a standard assessment method for rating the quality of synthesized speech. This study identifies the most suitable TFW modulation variable parameter to be used for assessing synthetic speech and assess the results of several assessment tests that rate examples of synthesized speech in terms of the TFW variable parameter and listening effort. The study also attempts to identify the attributes of speech that differentiate synthetic, TFW modulated and natural speech.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 − F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. © 2014 The Author(s).
Resumo:
Research on aphasia has struggled to identify apraxia of speech (AoS) as an independent deficit affecting a processing level separate from phonological assembly and motor implementation. This is because AoS is characterized by both phonological and phonetic errors and, therefore, can be interpreted as a combination of deficits at the phonological and the motoric level rather than as an independent impairment. We apply novel psycholinguistic analyses to the perceptually phonological errors made by 24 Italian aphasic patients. We show that only patients with relative high rate (>10%) of phonetic errors make sound errors which simplify the phonology of the target. Moreover, simplifications are strongly associated with other variables indicative of articulatory difficulties - such as a predominance of errors on consonants rather than vowels -but not with other measures - such as rate of words reproduced correctly or rates of lexical errors. These results indicate that sound errors cannot arise at a single phonological level because they are different in different patients. Instead, different patterns: (1) provide evidence for separate impairments and the existence of a level of articulatory planning/programming intermediate between phonological selection and motor implementation; (2) validate AoS as an independent impairment at this level, characterized by phonetic errors and phonological simplifications; (3) support the claim that linguistic principles of complexity have an articulatory basis since they only apply in patients with associated articulatory difficulties.
Resumo:
We propose a study of the mathematical properties of voice as an audio signal -- This work includes signals in which the channel conditions are not ideal for emotion recognition -- Multiresolution analysis- discrete wavelet transform – was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states -- ANNs proved to be a system that allows an appropriate classification of such states -- This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features -- Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify
Resumo:
The study of acoustic communication in animals often requires not only the recognition of species specific acoustic signals but also the identification of individual subjects, all in a complex acoustic background. Moreover, when very long recordings are to be analyzed, automatic recognition and identification processes are invaluable tools to extract the relevant biological information. A pattern recognition methodology based on hidden Markov models is presented inspired by successful results obtained in the most widely known and complex acoustical communication signal: human speech. This methodology was applied here for the first time to the detection and recognition of fish acoustic signals, specifically in a stream of round-the-clock recordings of Lusitanian toadfish (Halobatrachus didactylus) in their natural estuarine habitat. The results show that this methodology is able not only to detect the mating sounds (boatwhistles) but also to identify individual male toadfish, reaching an identification rate of ca. 95%. Moreover this method also proved to be a powerful tool to assess signal durations in large data sets. However, the system failed in recognizing other sound types.
Resumo:
A comunicação verbal humana é realizada em dois sentidos, existindo uma compreensão de ambas as partes que resulta em determinadas considerações. Este tipo de comunicação, também chamada de diálogo, para além de agentes humanos pode ser constituído por agentes humanos e máquinas. A interação entre o Homem e máquinas, através de linguagem natural, desempenha um papel importante na melhoria da comunicação entre ambos. Com o objetivo de perceber melhor a comunicação entre Homem e máquina este documento apresenta vários conhecimentos sobre sistemas de conversação Homemmáquina, entre os quais, os seus módulos e funcionamento, estratégias de diálogo e desafios a ter em conta na sua implementação. Para além disso, são ainda apresentados vários sistemas de Speech Recognition, Speech Synthesis e sistemas que usam conversação Homem-máquina. Por último são feitos testes de performance sobre alguns sistemas de Speech Recognition e de forma a colocar em prática alguns conceitos apresentados neste trabalho, é apresentado a implementação de um sistema de conversação Homem-máquina. Sobre este trabalho várias ilações foram obtidas, entre as quais, a alta complexidade dos sistemas de conversação Homem-máquina, a baixa performance no reconhecimento de voz em ambientes com ruído e as barreiras que se podem encontrar na implementação destes sistemas.
Resumo:
Raman spectroscopy of formamide-intercalated kaolinites treated using controlled-rate thermal analysis technology (CRTA), allowing the separation of adsorbed formamide from intercalated formamide in formamide-intercalated kaolinites, is reported. The Raman spectra of the CRTA-treated formamide-intercalated kaolinites are significantly different from those of the intercalated kaolinites, which display a combination of both intercalated and adsorbed formamide. An intense band is observed at 3629 cm-1, attributed to the inner surface hydroxyls hydrogen bonded to the formamide. Broad bands are observed at 3600 and 3639 cm-1, assigned to the inner surface hydroxyls, which are hydrogen bonded to the adsorbed water molecules. The hydroxyl-stretching band of the inner hydroxyl is observed at 3621 cm-1 in the Raman spectra of the CRTA-treated formamide-intercalated kaolinites. The results of thermal analysis show that the amount of intercalated formamide between the kaolinite layers is independent of the presence of water. Significant differences are observed in the CO stretching region between the adsorbed and intercalated formamide.
Resumo:
The thermal behaviour of halloysite fully expanded with hydrazine-hydrate has been investigated in nitrogen atmosphere under dynamic heating and at a constant, pre-set decomposition rate of 0.15 mg min-1. Under controlled-rate thermal analysis (CRTA) conditions it was possible to resolve the closely overlapping decomposition stages and to distinguish between adsorbed and bonded reagent. Three types of bonded reagent could be identified. The loosely bonded reagent amounting to 0.20 mol hydrazine-hydrate per mol inner surface hydroxyl is connected to the internal and external surfaces of the expanded mineral and is present as a space filler between the sheets of the delaminated mineral. The strongly bonded (intercalated) hydrazine-hydrate is connected to the kaolinite inner surface OH groups by the formation of hydrogen bonds. Based on the thermoanalytical results two different types of bonded reagent could be distinguished in the complex. Type 1 reagent (approx. 0.06 mol hydrazine-hydrate/mol inner surface OH) is liberated between 77 and 103°C. Type 2 reagent is lost between 103 and 227°C, corresponding to a quantity of 0.36 mol hydrazine/mol inner surface OH. When heating the complex to 77°C under CRTA conditions a new reflection appears in the XRD pattern with a d-value of 9.6 Å, in addition to the 10.2 Ĺ reflection. This new reflection disappears in contact with moist air and the complex re-expands to the original d-value of 10.2 Å in a few h. The appearance of the 9.6 Å reflection is interpreted as the expansion of kaolinite with hydrazine alone, while the 10.2 Å one is due to expansion with hydrazine-hydrate. FTIR (DRIFT) spectroscopic results showed that the treated mineral after intercalation/deintercalation and heat treatment to 300°C is slightly more ordered than the original (untreated) clay.