29 resultados para audiovisual speech perception
Resumo:
Firstly, prosodic boundaries of 1991 common sentences were labeled based on speech perception experiment, relation between prosodic structure and syntactic structure was examined after immediate constituent analysis, an example of prosodic phrasing from text sentences was provided using CART. Then, using designed sentences, phenomena of downstep and declination in pitch downtrend of Chinese declarative sentences were examined, commonness and speciality of Chinese intonation were discussed. The main results of the study are: 1 The distribution patterns of prosodic phrase boundaries for different syntactic structures are different, and there is great freedom in prosodic chunking. The relation between syntactic structure and prosodic structure can only be discussed in statistical sense. 2 Besides of syntactic relation, the second most important factor which influences prosodic phrase boundaries is length. The distances to the front boundary and the back boundary are more important than the lengths of the left syntactic contituent and the right one. In our corpus, the length distributions of prosodic phrases are 5±3 syllables. 3 Automatic downstep can lower intonation linearly, but is affected by stress easily. Non-automatic downstep lowers the higher part of pitch contours and has no effect on the lower one of the intonation. 4 The downtrend reason of low point is declination. The extent of declination relates to not only tones of low points, but also their positions in prosodic words, the baselines decline much faster when low point are in the initial position of a prosodic word. In long sentences, the baselines of prosodic phrases are the basic declination units, and the whole declination pattern of a sentence is related to syntactic relations between two neighboring prosodic phrases.
Resumo:
Considerable studies find that developmental dyslexia is associated with deficits in phonological processing skills, especially phonological awareness. In order to explore the nature of phonological awareness deficits in dyslexia, researchers have begun to investigate the role of speech perception. The findings about speech perception abilities in dyslexics are inconsistent. The heterogeneity of dyslexia may be responsible for the inconsistency of findings. Considering the general suggestion that phonological awareness deficits in dyslexia are attributed to categorical perception deficits, it is more direct to examine whether children with phonological awareness difficulties or phonological dyslexia show speech categorization deficits consistently. The present study would investigate whether Chinese children with phonological awareness deficits or phonological dyslexia showed abnormal speech perception. The whole study consisted of two parts. Part I screened children with phonological-awareness deficits from Year 3 kindergartens and examined their abilities of perceiving native category continuum, nonnative category contrasts and non-speech sound series. Part II selected phonological dyslexics from an elementary school as participants, and further explored the relation between phonological deficits and speech perception. The first two experiments of Part II examined separately the abilities to label stimuli in native category continuum and brief stops in different contexts, the last experiment investigated the adaptation effects of different participant groups. The main conclusions are as follows: 1) Children with phonological dyslexia showed categorical perception deficits: they had lower consistency than controls when perceiving stimuli within phonetic categories, especially for the stimuli which were not natural sounds. 2) Children with phonological dyslexia exhibited a general difficulty of perceiving brief segments of stops from different contexts. 3) Children with phonological dyslexia did not show adaptation to repeatedly presented stimuli. Based on the present conclusions and the findings of previous studies, we suggested that the representations of sound stimuli in phonological dyslexics’ brains are different from those in normal children’s; the representations of sound stimuli in dyslexics’ cortical neural networks are more diffuse and inconsistent.
Resumo:
The research investigates the acoustic-phonetic correlates of various levels of syntactic boundaries and the perception of prosody in Mandarin Chinese, more specifically, the way speakers express the syntatic relations between sentence compounents and teh perceptual representations of prosody. The relation between phonology and syntax in Chinese language is studied by comparing the perceptual representations and syntactic structures of sentences. The results may have theoretical and practical implications for research in fields of speech perception, linguistics and psycholinguistics, and for the development of speech engineering in China.
Resumo:
We compared early stages of face processing in young and older participants as indexed by ERPs elicited by faces and non-face stimuli presented in upright and inverted orientations. The P1 and N170 components were larger in older than in young participant
Resumo:
Whether mice perceive the depth of space dependent on the visual size of object targets was explored when visual cues such as perspective and partial occlusion in space were excluded. A mouse was placed on a platform the height of which is adjustable. The platform located inside a box in which all other walls were dark exception its bottom through that light was projected as a sole visual cue. The visual object cue was composed of 4x4 grids to allow a mouse estimating the distance of the platform relative to the grids. Three sizes of grids reduced in a proportion of 2/3 and seven distances with an equal interval between the platform and the grids at the bottom were applied in the experiments. The duration of a mouse staying on the platform at each height was recorded when the different sizes of the grids were presented randomly to test whether the Judgment of the mouse for the depth of the platform from the bottom was affected by the size information of the visual target. The results from all conditions of three object sizes show that time of mice staying on the platform became longer with the increase in height. In distance of 20 similar to 30 cm, the mice did not use the size information of a target to judge the depth, while mainly used the information of binocular disparity. In distance less than 20 cm or more than 30 cm, however, especially in much higher distance 50 cm, 60 cm and 70 cm, the mice were able to use the size information to do so in order to compensate the lack of binocular disparity information from both eyes. Because the mice have only 1/3 of the visual field that is binocular. This behavioral paradigm established in the current study is a useful model and can be applied to the experiments using transgenic mouse as an animal model to investigate the relationships between behaviors and gene functions.
Resumo:
We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems.We observe that this may be true for a recognition tasks based on geometrical learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions via the Hilbert transform. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy, Experiments show method based on ICA and geometrical learning outperforms HMM in different number of train samples.
Resumo:
In the light of descriptive geometry and notions in set theory, this paper re-defines the basic elements in space such as curve and surface and so on, presents some fundamental notions with respect to the point cover based on the High-dimension space (HDS) point covering theory, finally takes points from mapping part of speech signals to HDS, so as to analyze distribution information of these speech points in HDS, and various geometric covering objects for speech points and their relationship. Besides, this paper also proposes a new algorithm for speaker independent continuous digit speech recognition based on the HDS point dynamic searching theory without end-points detection and segmentation. First from the different digit syllables in real continuous digit speech, we establish the covering area in feature space for continuous speech. During recognition, we make use of the point covering dynamic searching theory in HDS to do recognition, and then get the satisfying recognized results. At last, compared to HMM (Hidden Markov models)-based method, from the development trend of the comparing results, as sample amount increasing, the difference of recognition rate between two methods will decrease slowly, while sample amount approaching to be very large, two recognition rates all close to 100% little by little. As seen from the results, the recognition rate of HDS point covering method is higher than that of in HMM (Hidden Markov models) based method, because, the point covering describes the morphological distribution for speech in HDS, whereas HMM-based method is only a probability distribution, whose accuracy is certainly inferior to point covering.
Resumo:
In this paper, a novel approach for mandarin speech emotion recognition, that is mandarin speech emotion recognition based on high dimensional geometry theory, is proposed. The human emotions are classified into 6 archetypal classes: fear, anger, happiness, sadness, surprise and disgust. According to the characteristics of these emotional speech signals, the amplitude, pitch frequency and formant are used as the feature parameters for speech emotion recognition. The new method called high dimensional geometry theory is applied for recognition. Compared with traditional GSVM model, the new method has some advantages. It is noted that this method has significant values for researches and applications henceforth.
Resumo:
Based on biomimetic pattern recognition theory, we proposed a novel speaker-independent continuous speech keyword-spotting algorithm. Without endpoint detection and division, we can get the minimum distance curve between continuous speech samples and every keyword-training net through the dynamic searching to the feature-extracted continuous speech. Then we can count the number of the keywords by investigating the vale-value and the numbers of the vales in the curve. Experiments of small vocabulary continuous speech with various speaking rate have got good recognition results and proved the validity of the algorithm.
Resumo:
In speaker-independent speech recognition, the disadvantage of the most diffused technology (HMMs, or Hidden Markov models) is not only the need of many more training samples, but also long train time requirement. This paper describes the use of Biomimetic pattern recognition (BPR) in recognizing some mandarin continuous speech in a speaker-independent manner. A speech database was developed for the course of study. The vocabulary of the database consists of 15 Chinese dish's names, the length of each name is 4 Chinese words. Neural networks (NNs) based on Multi-weight neuron (MWN) model are used to train and recognize the speech sounds. The number of MWN was investigated to achieve the optimal performance of the NNs-based BPR. This system, which is based on BPR and can carry out real time recognition reaches a recognition rate of 98.14% for the first option and 99.81% for the first two options to the persons from different provinces of China speaking common Chinese speech. Experiments were also carried on to evaluate Continuous density hidden Markov models (CDHMM), Dynamic time warping (DTW) and BPR for speech recognition. The Experiment results show that BPR outperforms CDHMM and DTW especially in the cases of samples of a finite size.
Resumo:
We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems. We observe that this may be true for recognition tasks based on Geometrical Learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy. Experiments show that the method based on ICA and Geometrical Learning outperforms HMM in a different number of training samples.
Resumo:
In this paper, we presents HyperSausage Neuron based on the High-Dimension Space(HDS), and proposes a new algorithm for speaker independent continuous digit speech recognition. At last, compared to HMM-based method, the recognition rate of HyperSausage Neuron method is higher than that of in HMM-based method.
Resumo:
In speaker-independent speech recognition, the disadvantage of the most diffused technology ( Hidden Markov Models) is not only the need of many more training samples, but also long train time requirement. This paper describes the use of Biomimetic Pattern Recognition (BPR) in recognizing some Mandarin Speech in a speaker-independent manner. The vocabulary of the system consists of 15 Chinese dish's names. Neural networks based on Multi-Weight Neuron (MWN) model are used to train and recognize the speech sounds. Experimental results are presented to show that the system, which can carry out real time recognition of the persons from different provinces speaking common Chinese speech, outperforms HMMs especially in the cases of samples of a finite size.