954 resultados para Digit speech recognition


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Speech is a natural mode of communication for people and speech recognition is an intensive area of research due to its versatile applications. This paper presents a comparative study of various feature extraction methods based on wavelets for recognizing isolated spoken words. Isolated words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. This work includes two speech recognition methods. First one is a hybrid approach with Discrete Wavelet Transforms and Artificial Neural Networks and the second method uses a combination of Wavelet Packet Decomposition and Artificial Neural Networks. Features are extracted by using Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Training, testing and pattern recognition are performed using Artificial Neural Networks (ANN). The proposed method is implemented for 50 speakers uttering 20 isolated words each. The experimental results obtained show the efficiency of these techniques in recognizing speech

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers. This paper presents a report on the development of a speaker independent, continuous transcription system for Malayalam. The system employs Hidden Markov Model (HMM) for acoustic modeling and Mel Frequency Cepstral Coefficient (MFCC) for feature extraction. It is trained with 21 male and female speakers in the age group ranging from 20 to 40 years. The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics.Formalisms for manipulating these models have been developedrelatively independently in these research communities. In this paper weexplore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independencenetworks (PINs). The paper contains a self-contained review of the basic principles of PINs.It is shown that the well-known forward-backward (F-B) and Viterbialgorithms for HMMs are special cases of more general inference algorithms forarbitrary PINs. Furthermore, the existence of inference and estimationalgorithms for more general graphical models provides a set of analysistools for HMM practitioners who wish to explore a richer class of HMMstructures.Examples of relatively complex models to handle sensorfusion and coarticulationin speech recognitionare introduced and treated within the graphical model framework toillustrate the advantages of the general approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

List of references in Harvard format for the accessibility text tutorial created by Denis's Angels.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper reviews a study to determine the relation between the aided articulation index and the aided speech recognition scores obtained with the Monosyllable, Trochee and Spondee (MTS) Test, when administered to hearing-impaired children.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The equivalency of 34 TIMIT sentence lists was evaluated using adult cochlear implant recipients to determine if they should be recommended for future clinical or research use. Because these sentences incorporate gender, dialect and speaking rate variations, they have the potential to better represent speech recognition abilities in real-world communication situations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Inconsistencies exist between traditional objective measures such as speech recognition and localization, and subjective reports of bimodal benefit. The purpose of this study was to expand the set of objective measures of bimodal benefit to include non-traditional listening tests, and to examine possible correlations between objective measures of auditory perception and subjective satisfaction reports.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It has been shown through a number of experiments that neural networks can be used for a phonetic typewriter. Algorithms can be looked on as producing self-organizing feature maps which correspond to phonemes. In the Chinese language the utterance of a Chinese character consists of a very simple string of Chinese phonemes. With this as a starting point, a neural network feature map for Chinese phonemes can be built up. In this paper, feature map structures for Chinese phonemes are discussed and tested. This research on a Chinese phonetic feature map is important both for Chinese speech recognition and for building a Chinese phonetic typewriter.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This Capstone Project attempts to determine the ability of normal hearing children to resolve spectral information, and the relationship between spectral resolution ability and speech recognition ability in noise. This study also examines how these abilities develop with age.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dynamic Time Warping (DTW), a pattern matching technique traditionally used for restricted vocabulary speech recognition, is based on a temporal alignment of the input signal with the template models. The principal drawback of DTW is its high computational cost as the lengths of the signals increase. This paper shows extended results over our previously published conference paper, which introduces an optimized version of the DTW I hat is based on the Discrete Wavelet Transform (DWT). (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Allt eftersom utvecklingen går framåt inom applikationer och system så förändras också sättet på vilket vi interagerar med systemet på. Hittills har navigering och användning av applikationer och system mestadels skett med händerna och då genom mus och tangentbord. På senare tid så har navigering via touch-skärmar och rösten blivit allt mer vanligt. Då man ska styra en applikation med hjälp av rösten är det viktigt att vem som helst kan styra applikationen, oavsett vilken dialekt man har. För att kunna se hur korrekt ett röstigenkännings-API (Application Programming Interface) uppfattar svenska dialekter så initierades denna studie med dokumentstudier om dialekters kännetecken och ljudkombinationer. Dessa kännetecken och ljudkombinationer låg till grund för de ord vi valt ut till att testa API:et med. Varje dialekt fick alltså ett ord uppbyggt för att vara extra svårt för API:et att uppfatta när det uttalades av just den aktuella dialekten. Därefter utvecklades en prototyp, närmare bestämt en android-applikation som fungerade som ett verktyg i datainsamlingen. Då arbetet innehåller en prototyp och en undersökning så valdes Design and Creation Research som forskningsstrategi med datainsamlingsmetoderna dokumentstudier och observationer för att få önskat resultat. Data samlades in via observationer med prototypen som hjälpmedel och med hjälp av dokumentstudier. Det empiriska data som registrerats via observationerna och med hjälp av applikationen påvisade att vissa dialekter var lättare för API:et att uppfatta korrekt. I vissa fall var resultaten väntade då vissa ord uppbyggda av ljudkombinationer i enlighet med teorin skulle uttalas väldigt speciellt av en viss dialekt. Ibland blev det väldigt låga resultat på just dessa ord men i andra fall förvånansvärt höga. Slutsatsen vi drog av detta var att de ord vi valt ut med en baktanke om att de skulle få låga resultat för den speciella dialekten endast visade sig stämma vid två tillfällen. Det var istället det ord innehållande sje- och tje-ljud som enligt teorin var gemensamma kännetecken för alla dialekter som fick lägst resultat överlag.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We develop an algorithm for the detection and classification of affective sound events underscored by specific patterns of sound energy dynamics. We relate the portrayal of these events to proposed high level affect or emotional coloring of the events. In this paper, four possible characteristic sound energy events are identified that convey well established meanings through their dynamics to portray and deliver certain affect, sentiment related to the horror film genre. Our algorithm is developed with the ultimate aim of automatically structuring sections of films that contain distinct shades of emotion related to horror themes for nonlinear media access and navigation. An average of 82% of the energy events, obtained from the analysis of the audio tracks of sections of four sample films corresponded correctly to the proposed affect. While the discrimination between certain sound energy event types was low, the algorithm correctly detected 71% of the occurrences of the sound energy events within audio tracks of the films analyzed, and thus forms a useful basis for determining affective scenes characteristic of horror in movies.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

ARAUJO, Márcio V. ; ALSINA, Pablo J. ; MEDEIROS, Adelardo A. D. ; PEREIRA, Jonathan P.P. ; DOMINGOS, Elber C. ; ARAÚJO, Fábio M.U. ; SILVA, Jáder S. . Development of an Active Orthosis Prototype for Lower Limbs. In: INTERNATIONAL CONGRESS OF MECHANICAL ENGINEERING, 20., 2009, Gramado, RS. Proceedings… Gramado, RS: [s. n.], 2009