961 resultados para automatic speech recognition


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Speech signals are one of the most important means of communication among the human beings. In this paper, a comparative study of two feature extraction techniques are carried out for recognizing speaker independent spoken isolated words. First one is a hybrid approach with Linear Predictive Coding (LPC) and Artificial Neural Networks (ANN) and the second method uses a combination of Wavelet Packet Decomposition (WPD) and Artificial Neural Networks. Voice signals are sampled directly from the microphone and then they are processed using these two techniques for extracting the features. Words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. Training, testing and pattern recognition are performed using Artificial Neural Networks. Back propagation method is used to train the ANN. The proposed method is implemented for 50 speakers uttering 20 isolated words each. Both the methods produce good recognition accuracy. But Wavelet Packet Decomposition is found to be more suitable for recognizing speech because of its multi-resolution characteristics and efficient time frequency localizations

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Speech is a natural mode of communication for people and speech recognition is an intensive area of research due to its versatile applications. This paper presents a comparative study of various feature extraction methods based on wavelets for recognizing isolated spoken words. Isolated words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. This work includes two speech recognition methods. First one is a hybrid approach with Discrete Wavelet Transforms and Artificial Neural Networks and the second method uses a combination of Wavelet Packet Decomposition and Artificial Neural Networks. Features are extracted by using Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Training, testing and pattern recognition are performed using Artificial Neural Networks (ANN). The proposed method is implemented for 50 speakers uttering 20 isolated words each. The experimental results obtained show the efficiency of these techniques in recognizing speech

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers. This paper presents a report on the development of a speaker independent, continuous transcription system for Malayalam. The system employs Hidden Markov Model (HMM) for acoustic modeling and Mel Frequency Cepstral Coefficient (MFCC) for feature extraction. It is trained with 21 male and female speakers in the age group ranging from 20 to 40 years. The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics.Formalisms for manipulating these models have been developedrelatively independently in these research communities. In this paper weexplore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independencenetworks (PINs). The paper contains a self-contained review of the basic principles of PINs.It is shown that the well-known forward-backward (F-B) and Viterbialgorithms for HMMs are special cases of more general inference algorithms forarbitrary PINs. Furthermore, the existence of inference and estimationalgorithms for more general graphical models provides a set of analysistools for HMM practitioners who wish to explore a richer class of HMMstructures.Examples of relatively complex models to handle sensorfusion and coarticulationin speech recognitionare introduced and treated within the graphical model framework toillustrate the advantages of the general approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

List of references in Harvard format for the accessibility text tutorial created by Denis's Angels.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La voz como herramienta de trabajo de los docentes, puede afectarse por su uso prolongado, abuso o conductas de mal uso, que desencadenan limitaciones funcionales de origen laboral. Uno de los síntomas más frecuentes de quienes usan masivamente su voz con fines ocupacionales es la fatiga laríngea (FL), o cansancio vocal por debilitamiento muscular. El presente estudio quasiexperimental longitudinal pre- postest evaluó el efecto que el uso de la voz, analizando variables sociodemográficas, de salud y trabajo, los estilos de vida y los factores de riesgo ocupacionales, pero principalmente el efecto que produce el uso prolongado de la voz sobre las variables físico acústicas después de un día de trabajo, en 99 docentes de una institución de educación superior en Colombia, en comparación con trabajadores con menor uso vocal. Se aplicó un cuestionario de sintomatología vocal para controlar los sesgos, se le tomaron grabaciones pre y post jornada a cada trabajador con el software Speech Analizer® y se reportaron los cambios subjetivos tras un día de trabajo a cada trabajador. Fueron hallados cambios en las variables físico – acústicas como efecto del uso prolongado de la voz después de un día de trabajo en los dos grupos de participantes, en cuyo caso el efecto fue más significativo en los docentes que en los administrativos – no docentes. El riesgo de presentar trastornos de la voz se asoció directamente con la exposición a factores de riesgo ocupacionales y aquellos asociados a condiciones de salud y al estilo de vida de los individuos, cuyas consecuencias fueron mayores para el grupo de docentes; dado que al ser la voz su principal herramienta de trabajo, el uso fue mayor y asimismo la probabilidad de desencadenar sintomatología vocal, derivada de la fatiga laríngea. La variable de fo promedio para la fonación sostenida de la vocal /a/, que representa una sonido neutro en tonalidad o el tono habitual, mostró diferencias significativas entre grupos (p=0,048). Para este caso, el grupo de docentes registró un aumento de la fo en el postest en comparación con un cambio no significativo para el grupo de administrativos luego del uso prolongado de la voz. En consecuencia, hubo diferencias en el valor registrado para la máxima fo (p =0,025), mínima fo (p=0,011) y el rango de fo (p=0,012) en la emisión sostenida de la vocal /a/. Para el caso del grupo de administrativos, las diferencias significativas estuvieron dadas por la disminución de la fo, rango y máxima y mínima frecuencia en las tres vocales (/a/, /i/, /o/) en contraste con lo ocurrido para el grupo de docentes. En la intensidad de la voz fueron encontradas también diferencias significativas entre grupos (p=0,001) con un decrecimiento del volumen en el postest, tanto promedio como mínimo, máximo y rango de la intensidad, en la fonación sostenida de la vocal /a/ para el grupo de docentes; ninguna significancia estadística fue hallada en el grupo de administrativos para estas variables. Se demostró a través de mediciones objetivas y resultados verificables, el fenómeno de la fatiga laríngea, asociados a los efectos que se presentan tras la demanda vocal continua, discriminando el impacto, entre las variables de cargo y género.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper reviews a study to determine the relation between the aided articulation index and the aided speech recognition scores obtained with the Monosyllable, Trochee and Spondee (MTS) Test, when administered to hearing-impaired children.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The equivalency of 34 TIMIT sentence lists was evaluated using adult cochlear implant recipients to determine if they should be recommended for future clinical or research use. Because these sentences incorporate gender, dialect and speaking rate variations, they have the potential to better represent speech recognition abilities in real-world communication situations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Inconsistencies exist between traditional objective measures such as speech recognition and localization, and subjective reports of bimodal benefit. The purpose of this study was to expand the set of objective measures of bimodal benefit to include non-traditional listening tests, and to examine possible correlations between objective measures of auditory perception and subjective satisfaction reports.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It has been shown through a number of experiments that neural networks can be used for a phonetic typewriter. Algorithms can be looked on as producing self-organizing feature maps which correspond to phonemes. In the Chinese language the utterance of a Chinese character consists of a very simple string of Chinese phonemes. With this as a starting point, a neural network feature map for Chinese phonemes can be built up. In this paper, feature map structures for Chinese phonemes are discussed and tested. This research on a Chinese phonetic feature map is important both for Chinese speech recognition and for building a Chinese phonetic typewriter.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This Capstone Project attempts to determine the ability of normal hearing children to resolve spectral information, and the relationship between spectral resolution ability and speech recognition ability in noise. This study also examines how these abilities develop with age.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dynamic Time Warping (DTW), a pattern matching technique traditionally used for restricted vocabulary speech recognition, is based on a temporal alignment of the input signal with the template models. The principal drawback of DTW is its high computational cost as the lengths of the signals increase. This paper shows extended results over our previously published conference paper, which introduces an optimized version of the DTW I hat is based on the Discrete Wavelet Transform (DWT). (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this thesis, a new algorithm has been proposed to segment the foreground of the fingerprint from the image under consideration. The algorithm uses three features, mean, variance and coherence. Based on these features, a rule system is built to help the algorithm to efficiently segment the image. In addition, the proposed algorithm combine split and merge with modified Otsu. Both enhancements techniques such as Gaussian filter and histogram equalization are applied to enhance and improve the quality of the image. Finally, a post processing technique is implemented to counter the undesirable effect in the segmented image. Fingerprint recognition system is one of the oldest recognition systems in biometrics techniques. Everyone have a unique and unchangeable fingerprint. Based on this uniqueness and distinctness, fingerprint identification has been used in many applications for a long period. A fingerprint image is a pattern which consists of two regions, foreground and background. The foreground contains all important information needed in the automatic fingerprint recognition systems. However, the background is a noisy region that contributes to the extraction of false minutiae in the system. To avoid the extraction of false minutiae, there are many steps which should be followed such as preprocessing and enhancement. One of these steps is the transformation of the fingerprint image from gray-scale image to black and white image. This transformation is called segmentation or binarization. The aim for fingerprint segmentation is to separate the foreground from the background. Due to the nature of fingerprint image, the segmentation becomes an important and challenging task. The proposed algorithm is applied on FVC2000 database. Manual examinations from human experts show that the proposed algorithm provides an efficient segmentation results. These improved results are demonstrating in diverse experiments.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Allt eftersom utvecklingen går framåt inom applikationer och system så förändras också sättet på vilket vi interagerar med systemet på. Hittills har navigering och användning av applikationer och system mestadels skett med händerna och då genom mus och tangentbord. På senare tid så har navigering via touch-skärmar och rösten blivit allt mer vanligt. Då man ska styra en applikation med hjälp av rösten är det viktigt att vem som helst kan styra applikationen, oavsett vilken dialekt man har. För att kunna se hur korrekt ett röstigenkännings-API (Application Programming Interface) uppfattar svenska dialekter så initierades denna studie med dokumentstudier om dialekters kännetecken och ljudkombinationer. Dessa kännetecken och ljudkombinationer låg till grund för de ord vi valt ut till att testa API:et med. Varje dialekt fick alltså ett ord uppbyggt för att vara extra svårt för API:et att uppfatta när det uttalades av just den aktuella dialekten. Därefter utvecklades en prototyp, närmare bestämt en android-applikation som fungerade som ett verktyg i datainsamlingen. Då arbetet innehåller en prototyp och en undersökning så valdes Design and Creation Research som forskningsstrategi med datainsamlingsmetoderna dokumentstudier och observationer för att få önskat resultat. Data samlades in via observationer med prototypen som hjälpmedel och med hjälp av dokumentstudier. Det empiriska data som registrerats via observationerna och med hjälp av applikationen påvisade att vissa dialekter var lättare för API:et att uppfatta korrekt. I vissa fall var resultaten väntade då vissa ord uppbyggda av ljudkombinationer i enlighet med teorin skulle uttalas väldigt speciellt av en viss dialekt. Ibland blev det väldigt låga resultat på just dessa ord men i andra fall förvånansvärt höga. Slutsatsen vi drog av detta var att de ord vi valt ut med en baktanke om att de skulle få låga resultat för den speciella dialekten endast visade sig stämma vid två tillfällen. Det var istället det ord innehållande sje- och tje-ljud som enligt teorin var gemensamma kännetecken för alla dialekter som fick lägst resultat överlag.