5 resultados para robust speech recognition

em Dalarna University College Electronic Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Voice processing in real-time is challenging. A drawback of previous work for Hypokinetic Dysarthria (HKD) recognition is the requirement of controlled settings in a laboratory environment. A personal digital assistant (PDA) has been developed for home assessment of PD patients. The PDA offers sound processing capabilities, which allow for developing a module for recognition and quantification HKD. Objective: To compose an algorithm for assessment of PD speech severity in the home environment based on a review synthesis. Methods: A two-tier review methodology is utilized. The first tier focuses on real-time problems in speech detection. In the second tier, acoustics features that are robust to medication changes in Levodopa-responsive patients are investigated for HKD recognition. Keywords such as Hypokinetic Dysarthria , and Speech recognition in real time were used in the search engines. IEEE explorer produced the most useful search hits as compared to Google Scholar, ELIN, EBRARY, PubMed and LIBRIS. Results: Vowel and consonant formants are the most relevant acoustic parameters to reflect PD medication changes. Since relevant speech segments (consonants and vowels) contains minority of speech energy, intelligibility can be improved by amplifying the voice signal using amplitude compression. Pause detection and peak to average power rate calculations for voice segmentation produce rich voice features in real time. Enhancements in voice segmentation can be done by inducing Zero-Crossing rate (ZCR). Consonants have high ZCR whereas vowels have low ZCR. Wavelet transform is found promising for voice analysis since it quantizes non-stationary voice signals over time-series using scale and translation parameters. In this way voice intelligibility in the waveforms can be analyzed in each time frame. Conclusions: This review evaluated HKD recognition algorithms to develop a tool for PD speech home-assessment using modern mobile technology. An algorithm that tackles realtime constraints in HKD recognition based on the review synthesis is proposed. We suggest that speech features may be further processed using wavelet transforms and used with a neural network for detection and quantification of speech anomalies related to PD. Based on this model, patients' speech can be automatically categorized according to UPDRS speech ratings.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Allt eftersom utvecklingen går framåt inom applikationer och system så förändras också sättet på vilket vi interagerar med systemet på. Hittills har navigering och användning av applikationer och system mestadels skett med händerna och då genom mus och tangentbord. På senare tid så har navigering via touch-skärmar och rösten blivit allt mer vanligt. Då man ska styra en applikation med hjälp av rösten är det viktigt att vem som helst kan styra applikationen, oavsett vilken dialekt man har. För att kunna se hur korrekt ett röstigenkännings-API (Application Programming Interface) uppfattar svenska dialekter så initierades denna studie med dokumentstudier om dialekters kännetecken och ljudkombinationer. Dessa kännetecken och ljudkombinationer låg till grund för de ord vi valt ut till att testa API:et med. Varje dialekt fick alltså ett ord uppbyggt för att vara extra svårt för API:et att uppfatta när det uttalades av just den aktuella dialekten. Därefter utvecklades en prototyp, närmare bestämt en android-applikation som fungerade som ett verktyg i datainsamlingen. Då arbetet innehåller en prototyp och en undersökning så valdes Design and Creation Research som forskningsstrategi med datainsamlingsmetoderna dokumentstudier och observationer för att få önskat resultat. Data samlades in via observationer med prototypen som hjälpmedel och med hjälp av dokumentstudier. Det empiriska data som registrerats via observationerna och med hjälp av applikationen påvisade att vissa dialekter var lättare för API:et att uppfatta korrekt. I vissa fall var resultaten väntade då vissa ord uppbyggda av ljudkombinationer i enlighet med teorin skulle uttalas väldigt speciellt av en viss dialekt. Ibland blev det väldigt låga resultat på just dessa ord men i andra fall förvånansvärt höga. Slutsatsen vi drog av detta var att de ord vi valt ut med en baktanke om att de skulle få låga resultat för den speciella dialekten endast visade sig stämma vid två tillfällen. Det var istället det ord innehållande sje- och tje-ljud som enligt teorin var gemensamma kännetecken för alla dialekter som fick lägst resultat överlag.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Since last two decades researches have been working on developing systems that can assistsdrivers in the best way possible and make driving safe. Computer vision has played a crucialpart in design of these systems. With the introduction of vision techniques variousautonomous and robust real-time traffic automation systems have been designed such asTraffic monitoring, Traffic related parameter estimation and intelligent vehicles. Among theseautomatic detection and recognition of road signs has became an interesting research topic.The system can assist drivers about signs they don’t recognize before passing them.Aim of this research project is to present an Intelligent Road Sign Recognition System basedon state-of-the-art technique, the Support Vector Machine. The project is an extension to thework done at ITS research Platform at Dalarna University [25]. Focus of this research work ison the recognition of road signs under analysis. When classifying an image its location, sizeand orientation in the image plane are its irrelevant features and one way to get rid of thisambiguity is to extract those features which are invariant under the above mentionedtransformation. These invariant features are then used in Support Vector Machine forclassification. Support Vector Machine is a supervised learning machine that solves problemin higher dimension with the help of Kernel functions and is best know for classificationproblems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The project introduces an application using computer vision for Hand gesture recognition. A camera records a live video stream, from which a snapshot is taken with the help of interface. The system is trained for each type of count hand gestures (one, two, three, four, and five) at least once. After that a test gesture is given to it and the system tries to recognize it.A research was carried out on a number of algorithms that could best differentiate a hand gesture. It was found that the diagonal sum algorithm gave the highest accuracy rate. In the preprocessing phase, a self-developed algorithm removes the background of each training gesture. After that the image is converted into a binary image and the sums of all diagonal elements of the picture are taken. This sum helps us in differentiating and classifying different hand gestures.Previous systems have used data gloves or markers for input in the system. I have no such constraints for using the system. The user can give hand gestures in view of the camera naturally. A completely robust hand gesture recognition system is still under heavy research and development; the implemented system serves as an extendible foundation for future work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents a system to recognise and classify road and traffic signs for the purpose of developing an inventory of them which could assist the highway engineers’ tasks of updating and maintaining them. It uses images taken by a camera from a moving vehicle. The system is based on three major stages: colour segmentation, recognition, and classification. Four colour segmentation algorithms are developed and tested. They are a shadow and highlight invariant, a dynamic threshold, a modification of de la Escalera’s algorithm and a Fuzzy colour segmentation algorithm. All algorithms are tested using hundreds of images and the shadow-highlight invariant algorithm is eventually chosen as the best performer. This is because it is immune to shadows and highlights. It is also robust as it was tested in different lighting conditions, weather conditions, and times of the day. Approximately 97% successful segmentation rate was achieved using this algorithm.Recognition of traffic signs is carried out using a fuzzy shape recogniser. Based on four shape measures - the rectangularity, triangularity, ellipticity, and octagonality, fuzzy rules were developed to determine the shape of the sign. Among these shape measures octangonality has been introduced in this research. The final decision of the recogniser is based on the combination of both the colour and shape of the sign. The recogniser was tested in a variety of testing conditions giving an overall performance of approximately 88%.Classification was undertaken using a Support Vector Machine (SVM) classifier. The classification is carried out in two stages: rim’s shape classification followed by the classification of interior of the sign. The classifier was trained and tested using binary images in addition to five different types of moments which are Geometric moments, Zernike moments, Legendre moments, Orthogonal Fourier-Mellin Moments, and Binary Haar features. The performance of the SVM was tested using different features, kernels, SVM types, SVM parameters, and moment’s orders. The average classification rate achieved is about 97%. Binary images show the best testing results followed by Legendre moments. Linear kernel gives the best testing results followed by RBF. C-SVM shows very good performance, but ?-SVM gives better results in some case.