878 resultados para syllable feature
Resumo:
Acoustic classification of anurans (frogs) has received increasing attention for its promising application in biological and environment studies. In this study, a novel feature extraction method for frog call classification is presented based on the analysis of spectrograms. The frog calls are first automatically segmented into syllables. Then, spectral peak tracks are extracted to separate desired signal (frog calls) from background noise. The spectral peak tracks are used to extract various syllable features, including: syllable duration, dominant frequency, oscillation rate, frequency modulation, and energy modulation. Finally, a k-nearest neighbor classifier is used for classifying frog calls based on the results of principal component analysis. The experiment results show that syllable features can achieve an average classification accuracy of 90.5% which outperforms Mel-frequency cepstral coefficients features (79.0%).
Resumo:
Frog protection has become increasingly essential due to the rapid decline of its biodiversity. Therefore, it is valuable to develop new methods for studying this biodiversity. In this paper, a novel feature extraction method is proposed based on perceptual wavelet packet decomposition for classifying frog calls in noisy environments. Pre-processing and syllable segmentation are first applied to the frog call. Then, a spectral peak track is extracted from each syllable if possible. Track duration, dominant frequency and oscillation rate are directly extracted from the track. With k-means clustering algorithm, the calculated dominant frequency of all frog species is clustered into k parts, which produce a frequency scale for wavelet packet decomposition. Based on the adaptive frequency scale, wavelet packet decomposition is applied to the frog calls. Using the wavelet packet decomposition coefficients, a new feature set named perceptual wavelet packet decomposition sub-band cepstral coefficients is extracted. Finally, a k-nearest neighbour (k-NN) classifier is used for the classification. The experiment results show that the proposed features can achieve an average classification accuracy of 97.45% which outperforms syllable features (86.87%) and Mel-frequency cepstral coefficients (MFCCs) feature (90.80%).
Resumo:
Acoustic feature based speech (syllable) rate estimation and syllable nuclei detection are important problems in automatic speech recognition (ASR), computer assisted language learning (CALL) and fluency analysis. A typical solution for both the problems consists of two stages. The first stage involves computing a short-time feature contour such that most of the peaks of the contour correspond to the syllabic nuclei. In the second stage, the peaks corresponding to the syllable nuclei are detected. In this work, instead of the peak detection, we perform a mode-shape classification, which is formulated as a supervised binary classification problem - mode-shapes representing the syllabic nuclei as one class and remaining as the other. We use the temporal correlation and selected sub-band correlation (TCSSBC) feature contour and the mode-shapes in the TCSSBC feature contour are converted into a set of feature vectors using an interpolation technique. A support vector machine classifier is used for the classification. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora in a five-fold cross validation setup. The average correlation coefficients for the syllable rate estimation turn out to be 0.6761, 0.6928 and 0.3604 for three corpora respectively, which outperform those obtained by the best of the existing peak detection techniques. Similarly, the average F-scores (syllable level) for the syllable nuclei detection are 0.8917, 0.8200 and 0.7637 for three corpora respectively. (C) 2016 Elsevier B.V. All rights reserved.
Resumo:
OBJECTIVES The objectives of the present study were to investigate temporal/spectral sound-feature processing in preschool children (4 to 7 years old) with peripheral hearing loss compared with age-matched controls. The results verified the presence of statistical learning, which was diminished in children with hearing impairments (HIs), and elucidated possible perceptual mediators of speech production. DESIGN Perception and production of the syllables /ba/, /da/, /ta/, and /na/ were recorded in 13 children with normal hearing and 13 children with HI. Perception was assessed physiologically through event-related potentials (ERPs) recorded by EEG in a multifeature mismatch negativity paradigm and behaviorally through a discrimination task. Temporal and spectral features of the ERPs during speech perception were analyzed, and speech production was quantitatively evaluated using speech motor maximum performance tasks. RESULTS Proximal to stimulus onset, children with HI displayed a difference in map topography, indicating diminished statistical learning. In later ERP components, children with HI exhibited reduced amplitudes in the N2 and early parts of the late disciminative negativity components specifically, which are associated with temporal and spectral control mechanisms. Abnormalities of speech perception were only subtly reflected in speech production, as the lone difference found in speech production studies was a mild delay in regulating speech intensity. CONCLUSIONS In addition to previously reported deficits of sound-feature discriminations, the present study results reflect diminished statistical learning in children with HI, which plays an early and important, but so far neglected, role in phonological processing. Furthermore, the lack of corresponding behavioral abnormalities in speech production implies that impaired perceptual capacities do not necessarily translate into productive deficits.
Resumo:
This study of the veranda as seen through the eyes of Lady Maria Nugent and Michael Scott, alias Tom Cringle, clearly demonstrates the important role that the piazza, as it was then more commonly known, played in the life of early nineteenth century Caribbean colonial society. The popularity of the veranda throughout the region, in places influenced by different European as well as African cultures, and among all classes of people, suggests that the appeal of this typical feature was based on something more than architectural fashion. A place of relative comfort in hot weather, the veranda is also a space at the interface of indoors and outdoors which allows for a wide variety of uses, for solitary or small or large group activities, many of which were noted by Nugent and Scott. Quintessentially, the veranda is a place in which to relax and take pleasure, not least of which is the enjoyment of the prospect, be it a panoramic view, a peaceful garden or a lively street scene. Despite the great changes in the nature of society, in the Caribbean and in many other parts of the world, the veranda and related structures such as the balcony continue to play at least as important a role in daily life as they did two centuries ago. The veranda of today’s Californian or Australian bungalow, and the balcony of the apartment block in the residential area of the modern city are among the contemporary equivalents of the lower and upper piazzas of Lady Nugent’s and Tom Cringle’s day.
Resumo:
Hybrid face recognition, using image (2D) and structural (3D) information, has explored the fusion of Nearest Neighbour classifiers. This paper examines the effectiveness of feature modelling for each individual modality, 2D and 3D. Furthermore, it is demonstrated that the fusion of feature modelling techniques for the 2D and 3D modalities yields performance improvements over the individual classifiers. By fusing the feature modelling classifiers for each modality with equal weights the average Equal Error Rate improves from 12.60% for the 2D classifier and 12.10% for the 3D classifier to 7.38% for the Hybrid 2D+3D clasiffier.