12 resultados para Speaker Recognition, Text-constrained, Multilingual, Speaker Verification, HMMs
em Bucknell University Digital Commons - Pensilvania - USA
Resumo:
Pictorial representations of three-dimensional objects are often used to investigate animal cognitive abilities; however, investigators rarely evaluate whether the animals conceptualize the two-dimensional image as the object it is intended to represent. We tested for picture recognition in lion-tailed macaques by presenting five monkeys with digitized images of familiar foods on a touch screen. Monkeys viewed images of two different foods and learned that they would receive a piece of the one they touched first. After demonstrating that they would reliably select images of their preferred foods on one set of foods, animals were transferred to images of a second set of familiar foods. We assumed that if the monkeys recognized the images, they would spontaneously select images of their preferred foods on the second set of foods. Three monkeys selected images of their preferred foods significantly more often than chance on their first transfer session. In an additional test of the monkeys' picture recognition abilities, animals were presented with pairs of food images containing a medium-preference food paired with either a high-preference food or a low-preference food. The same three monkeys selected the medium-preference foods significantly more often when they were paired with low-preference foods and significantly less often when those same foods were paired with high-preference foods. Our novel design provided convincing evidence that macaques recognized the content of two-dimensional images on a touch screen. Results also suggested that the animals understood the connection between the two-dimensional images and the three-dimensional objects they represented.
Resumo:
We present a new method for the enhancement of speech. The method is designed for scenarios in which targeted speaker enrollment as well as system training within the typical noise environment are feasible. The proposed procedure is fundamentally different from most conventional and state-of-the-art denoising approaches. Instead of filtering a distorted signal we are resynthesizing a new “clean” signal based on its likely characteristics. These characteristics are estimated from the distorted signal. A successful implementation of the proposed method is presented. Experiments were performed in a scenario with roughly one hour of clean speech training data. Our results show that the proposed method compares very favorably to other state-of-the-art systems in both objective and subjective speech quality assessments. Potential applications for the proposed method include jet cockpit communication systems and offline methods for the restoration of audio recordings.
Resumo:
Eighty-one listeners defined by three age ranges (18–30, 31–59, and over 60 years) and three levels of musical experience performed an immediate recognition task requiring the detection of alterations in melodies. On each trial, a brief melody was presented, followed 5 sec later by a test stimulus that either was identical to the target or had two pitches changed, for a same–different judgment. Each melody pair was presented at 0.6 note/sec, 3.0 notes/sec, or 6.0 notes/sec. Performance was better with familiar melodies than with unfamiliar melodies. Overall performance declined slightly with age and improved substantially with increasing experience, in agreement with earlier results in an identification task. Tempo affected performance on familiar tunes (moderate was best), but not on unfamiliar tunes. We discuss these results in terms of theories of dynamic attending, cognitive slowing, and working memory in aging.
Resumo:
We investigated the effect of level-of-processing manipulations on “remember” and “know” responses in episodic melody recognition (Experiments 1 and 2) and how this effect is modulated by item familiarity (Experiment 2). In Experiment 1, participants performed 2 conceptual and 2 perceptual orienting tasks while listening to familiar melodies: judging the mood, continuing the tune, tracing the pitch contour, and counting long notes. The conceptual mood task led to higher d' rates for “remember” but not “know” responses. In Experiment 2, participants either judged the mood or counted long notes of tunes with high and low familiarity. A level-of-processing effect emerged again in participants’ “remember” d' rates regardless of melody familiarity. Results are discussed within the distinctive processing framework.
Resumo:
We tested normal young and elderly adults and elderly Alzheimer’s disease (AD) patients on recognition memory for tunes. In Experiment 1, AD patients and age-matched controls received a study list and an old/new recognition test of highly familiar, traditional tunes, followed by a study list and test of novel tunes. The controls performed better than did the AD patients. The controls showed the “mirror effect” of increased hits and reduced false alarms for traditional versus novel tunes, whereas the patients false-alarmed as often to traditional tunes as to novel tunes. Experiment 2 compared young adults and healthy elderly persons using a similar design. Performance was lower in the elderly group, but both younger and older subjects showed the mirror effect. Experiment 3 produced confusion between preexperimental familiarity and intraexperimental familiarity by mixing traditional and novel tunes in the study lists and tests. Here, the subjects in both age groups resembled the patients of Experiment 1 in failing to show the mirror effect. Older subjects again performed more poorly, and they differed qualitatively from younger subjects in setting stricter criteria for more nameable tunes. Distinguishing different sources of global familiarity is a factor in tune recognition, and the data suggest that this type of source monitoring is impaired in AD and involves different strategies in younger and older adults.
Resumo:
The authors examined the effects of age, musical experience, and characteristics of musical stimuli on a melodic short-term memory task in which participants had to recognize whether a tune was an exact transposition of another tune recently presented. Participants were musicians and nonmusicians between ages 18 and 30 or 60 and 80. In 4 experiments, the authors found that age and experience affected different aspects of the task, with experience becoming more influential when interference was provided during the task. Age and experience interacted only weakly, and neither age nor experience influenced the superiority of tonal over atonal materials. Recognition memory for the sequences did not reflect the same pattern of results as the transposition task. The implications of these results for theories of aging, experience, and music cognition are discussed.
Resumo:
Each year, the Research Committee of the Ohio Music Education Association sponsors a half-day Research Forum prior to the beginning of the state music education association conference. In 2004, Dr. Patricia J. Flowers, Professor of Music at the Ohio State University was the guest speaker. This article summarizes her talk on the process of becoming a music education researcher
Resumo:
Speech is often a multimodal process, presented audiovisually through a talking face. One area of speech perception influenced by visual speech is speech segmentation, or the process of breaking a stream of speech into individual words. Mitchel and Weiss (2013) demonstrated that a talking face contains specific cues to word boundaries and that subjects can correctly segment a speech stream when given a silent video of a speaker. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2013). In Experiment 1, subjects were found to spend the most time watching the eyes and mouth, with a trend suggesting that the mouth was viewed more than the eyes. Although subjects displayed significant learning of word boundaries, performance was not correlated with gaze duration on any individual feature, nor was performance correlated with a behavioral measure of autistic-like traits. However, trends suggested that as autistic-like traits increased, gaze duration of the mouth increased and gaze duration of the eyes decreased, similar to significant trends seen in autistic populations (Boratston & Blakemore, 2007). In Experiment 2, the same video was modified so that a black bar covered the eyes or mouth. Both videos elicited learning of word boundaries that was equivalent to that seen in the first experiment. Again, no correlations were found between segmentation performance and SRS scores in either condition. These results, taken with those in Experiment, suggest that neither the eyes nor mouth are critical to speech segmentation and that perhaps more global head movements indicate word boundaries (see Graf, Cosatto, Strom, & Huang, 2002). Future work will elucidate the contribution of individual features relative to global head movements, as well as extend these results to additional types of speech tasks.
Resumo:
The Telephone Conference Network, sponsored by The Pennsylvania State University's Coordinating Council for Health Care, is designed as a cost-effective format for providing inservice training in geriatric mental health for individuals who serve the elderly. Institutions which subscribe to the Telephone Conference Network are equipped with a conference speaker and telephone hook-up providing a two-way line of communication, and may choose from a variety of inservice programs. Mailed evaluations were completed by participants (N=73) in the "Skills to Manage Moods" program, a series of four 1-hour sessions designed to teach participants the skills needed to help patients cope with depression and to deliver the program to others. The majority of respondents reported high levels of satisfaction with the Telephone Conference Network system and the specific program in which they participated. Although 85 percent reported that they would be able to use the skills learned in the program on the job, 50 percent reported that they would not be interested in teaching these skills to others. The convenience and efficiency of the Telephone Conference Network were the most frequently mentioned strengths of the system, while the physical facilities and the program delivery format adopted by the individual institutions were the most frequently mentioned weaknesses. These data suggested several recommendations for Network subscribers and for professionals offering telephone conference programs, including ensuring optimal class enrollment and adequate physical facilities, and participant involvement in program implementation.
Resumo:
A variety of research has documented high levels of depression among older adults in the health care setting. Additional research has shown that care providers in health care settings are not very effective at diagnosing comorbid depression.This is a troublesome finding since comorbid depression has been linked to a number of negative outcomes in older adults. Early results have indicated that comorbid depression may be associated with a number of unfavorable consequences ranging from impairments in physical functioning to increased mortality.The health care setting with arguably the highest rate of physical impairment is the nursing home and it is the nursing home where the effects of comorbid depression may be most costly. Therefore, the current analysis uses data from the Institutional Population Component of the NationalMedical Expenditure Survey (US Department of Health and Human Services, 1990) to explore rates of both recognized and unrecognized comorbid depression in the nursing home setting. Using a constructed proxy variable representative of the DSM-III-R diagnosis of depression, results indicate that approximately 8.1% of nursing home residents have an unrecognized potential comorbid depression.
Resumo:
WE INVESTIGATED HOW WELL STRUCTURAL FEATURES such as note density or the relative number of changes in the melodic contour could predict success in implicit and explicit memory for unfamiliar melodies. We also analyzed which features are more likely to elicit increasingly confident judgments of "old" in a recognition memory task. An automated analysis program computed structural aspects of melodies, both independent of any context, and also with reference to the other melodies in the testset and the parent corpus of pop music. A few features predicted success in both memory tasks, which points to a shared memory component. However, motivic complexity compared to a large corpus of pop music had different effects on explicit and implicit memory. We also found that just a few features are associated with different rates of "old" judgments, whether the items were old or new. Rarer motives relative to the testset predicted hits and rarer motives relative to the corpus predicted false alarms. This data-driven analysis provides further support for both shared and separable mechanisms in implicit and explicit memory retrieval, as well as the role of distinctiveness in true and false judgments of familiarity.
Resumo:
Recent advances in the field of statistical learning have established that learners are able to track regularities of multimodal stimuli, yet it is unknown whether the statistical computations are performed on integrated representations or on separate, unimodal representations. In the present study, we investigated the ability of adults to integrate audio and visual input during statistical learning. We presented learners with a speech stream synchronized with a video of a speaker's face. In the critical condition, the visual (e.g., /gi/) and auditory (e.g., /mi/) signals were occasionally incongruent, which we predicted would produce the McGurk illusion, resulting in the perception of an audiovisual syllable (e.g., /ni/). In this way, we used the McGurk illusion to manipulate the underlying statistical structure of the speech streams, such that perception of these illusory syllables facilitated participants' ability to segment the speech stream. Our results therefore demonstrate that participants can integrate audio and visual input to perceive the McGurk illusion during statistical learning. We interpret our findings as support for modality-interactive accounts of statistical learning.