13 resultados para Audio-visual Speech Recognition, Visual Feature Extraction, Free-parts, Monolithic, ROI
em Dalarna University College Electronic Archive
Resumo:
Condition monitoring of wooden railway sleepers applications are generallycarried out by visual inspection and if necessary some impact acoustic examination iscarried out intuitively by skilled personnel. In this work, a pattern recognition solutionhas been proposed to automate the process for the achievement of robust results. Thestudy presents a comparison of several pattern recognition techniques together withvarious nonstationary feature extraction techniques for classification of impactacoustic emissions. Pattern classifiers such as multilayer perceptron, learning cectorquantization and gaussian mixture models, are combined with nonstationary featureextraction techniques such as Short Time Fourier Transform, Continuous WaveletTransform, Discrete Wavelet Transform and Wigner-Ville Distribution. Due to thepresence of several different feature extraction and classification technqies, datafusion has been investigated. Data fusion in the current case has mainly beeninvestigated on two levels, feature level and classifier level respectively. Fusion at thefeature level demonstrated best results with an overall accuracy of 82% whencompared to the human operator.
Resumo:
Parkinson’s disease is a clinical syndrome manifesting with slowness and instability. As it is a progressive disease with varying symptoms, repeated assessments are necessary to determine the outcome of treatment changes in the patient. In the recent past, a computer-based method was developed to rate impairment in spiral drawings. The downside of this method is that it cannot separate the bradykinetic and dyskinetic spiral drawings. This work intends to construct the computer method which can overcome this weakness by using the Hilbert-Huang Transform (HHT) of tangential velocity. The work is done under supervised learning, so a target class is used which is acquired from a neurologist using a web interface. After reducing the dimension of HHT features by using PCA, classification is performed. C4.5 classifier is used to perform the classification. Results of the classification are close to random guessing which shows that the computer method is unsuccessful in assessing the cause of drawing impairment in spirals when evaluated against human ratings. One promising reason is that there is no difference between the two classes of spiral drawings. Displaying patients self ratings along with the spirals in the web application is another possible reason for this, as the neurologist may have relied too much on this in his own ratings.
Resumo:
I denna uppsats har filmljudet i krigsfilmerna Apocalypse Now och Saving Private Ryan undersökts. Detta har gjorts för att försöka bidra med ökad förståelse för filmljudets användningsområde och funktioner, främst för filmerna i fråga, men även för krigsfilm rent generellt. Filmljud i denna kontext omfattar allt det ljud som finns i film, men utesluter dock all ickediegetisk musik. Båda filmerna har undersökts genom en audio-visuell analys. En sådan analys görs genom att detaljgranska båda filmernas ljud- och bildinnehåll var för sig, för att slutligen undersöka samma filmsekvens som helhet då ljudet och bilden satts ihop igen. Den audio-visuella analysmetod som nyttjats i uppsatsen är Michel Chions metod, Masking. De 30 minuter film som analyserades placerades sedan i olika filmljudzoner, där respektive filmljudzons ljudinnehåll bland annat visade vilka främsta huvudfunktioner somfilmljudet hade i dessa filmer. Dessa funktioner är till för att bibehålla åskådarens fokus och intresse, att skapa närhet till rollkaraktärerna, samt att tillföra en hög känsla av realism och närvaro. Intentionerna med filmljudet verkade vara att flytta åskådaren in i filmens verklighet, att låta åskådaren bli ett med filmen. Att återspegla denna känsla av realism, närvaro, fokus samt intresse, visade sig också vara de intentioner som funnits redan i de båda filmernas förproduktionsstadier. Detta bevisar att de lyckats åstadkomma det de eftersträvat. Men om filmljudet använts på samma sätt eller innehar samma funktioner i krigsfilm rent genrellt går inte att säga.I have for this bachelor’s thesis examined the movie sound of the classic warfare movies Apocalypse Now and Saving Private Ryan. This is an attempt to contribute to a more profound comprehension of the appliance and importance of movie sound. In this context movie sound implies all kinds of sounds within the movies, accept from non-diegetic music. These two movies have been examined by an audio-visual analysis. It's done by auditing the sound and picture content separately, and then combined to audit the same sequence as a whole. Michel Chion, which is the founder of this analysis, calls this method Masking. The sound in this 30 minute sequence was then divided into different zones, where every zone represented a certain main function. These functions are provided to create a stronger connection to the characters, sustain the viewers interest and bring a sense of realism and presence. It seems though the intention with the movies sound is to bring the viewers to the scene in hand, and let it become their reality. To mirror this sense of realism, presence, focus and interest, proves to be the intention from an early stage of the production. This bachelor’s thesis demonstrates a success in their endeavours. Although it can’t confirm whether the movie sound have been utilized in the same manner or if they posess the same functions to warefare movies in general.
Resumo:
The project introduces an application using computer vision for Hand gesture recognition. A camera records a live video stream, from which a snapshot is taken with the help of interface. The system is trained for each type of count hand gestures (one, two, three, four, and five) at least once. After that a test gesture is given to it and the system tries to recognize it.A research was carried out on a number of algorithms that could best differentiate a hand gesture. It was found that the diagonal sum algorithm gave the highest accuracy rate. In the preprocessing phase, a self-developed algorithm removes the background of each training gesture. After that the image is converted into a binary image and the sums of all diagonal elements of the picture are taken. This sum helps us in differentiating and classifying different hand gestures.Previous systems have used data gloves or markers for input in the system. I have no such constraints for using the system. The user can give hand gestures in view of the camera naturally. A completely robust hand gesture recognition system is still under heavy research and development; the implemented system serves as an extendible foundation for future work.
Resumo:
This paper examines a popular music song (Heartbeats by Jose Gonzalez) as a sign system in television advertising. The study was conducted through qualitative questionnaires in connection to an audio-visual method of analysis called Masking. The method facilitates the analysis of isolated parts in the audio-visual spectrum by masking/hiding parts of the audio-visual totality.The survey had seven respondents where a hermeneutic epistemological approach was used. For the analysis Cooper's theory of brand identity (Practical and Symbolic Attitudes to Buying Brands) was used together with an interaction model for music in audio-visual advertising called "Modes of music-image interaction”. The results showed that the music was associated with values as genuine, honest, responsibility, purity, independence and innovation. The music's symbolic values helped to position the brand in a lifestyle context. The music also helped to express the target group’s identity and attitudes by being innovative and independent. It also enhanced the perception of the visual colour rendition in the film. In general the television advertisement perceived more positive and entertaining when the music was present. In other words the music's social and cultural position contributed to raise the film's credibility. A deeper social and cultural value was created in the movie through resonance between symbolic values of the music and symbolic values of the film.
Resumo:
Literacy is an invaluable asset to have, and has allowed for communication, documentation and the spreading of ideas since the beginning of the written language. With technological advancements, and new possibilities to communicate, it is important to question the degree to which people’s abilities to utilise these new methods have developed in relation to these emerging technologies. The purpose of this bachelor’s thesis is to analyse the state of students’ at Dalarna University mulitimodal literacy, as well as their experience of multimodality in their education. This has led to the two main research questions: What is the state of the students at Dalarna University multimodal literacy? And: How have the students at Dalarna University experienced multimodality in education? The paper is based on a mixed-method study that incorporates both a quantitative and qualitative aspect to it. The main thrust of the research paper is, however, based on a quantitative study that was conducted online and emailed to students via their program coordinators. The scope of the research is in audio-visual modes, i.e. audio, video and images, while textual literacy is presumed and serves as an inspiration to the study. The purpose of the study is to analyse the state of the students’ multimodal literacy and their experience of multimodality in education. The study revealed that the students at Dalarna University have most skill in image editing, while not being very literate in audio or video editing. The students seem to have had mediocre experience creating meaning through multimodality both in private use and in their respective educational institutions. The study also reveals that students prefer learning by means of video (rather than text or audio), yet are not able to create meaning (communicate) through it.
Resumo:
Background: Voice processing in real-time is challenging. A drawback of previous work for Hypokinetic Dysarthria (HKD) recognition is the requirement of controlled settings in a laboratory environment. A personal digital assistant (PDA) has been developed for home assessment of PD patients. The PDA offers sound processing capabilities, which allow for developing a module for recognition and quantification HKD. Objective: To compose an algorithm for assessment of PD speech severity in the home environment based on a review synthesis. Methods: A two-tier review methodology is utilized. The first tier focuses on real-time problems in speech detection. In the second tier, acoustics features that are robust to medication changes in Levodopa-responsive patients are investigated for HKD recognition. Keywords such as Hypokinetic Dysarthria , and Speech recognition in real time were used in the search engines. IEEE explorer produced the most useful search hits as compared to Google Scholar, ELIN, EBRARY, PubMed and LIBRIS. Results: Vowel and consonant formants are the most relevant acoustic parameters to reflect PD medication changes. Since relevant speech segments (consonants and vowels) contains minority of speech energy, intelligibility can be improved by amplifying the voice signal using amplitude compression. Pause detection and peak to average power rate calculations for voice segmentation produce rich voice features in real time. Enhancements in voice segmentation can be done by inducing Zero-Crossing rate (ZCR). Consonants have high ZCR whereas vowels have low ZCR. Wavelet transform is found promising for voice analysis since it quantizes non-stationary voice signals over time-series using scale and translation parameters. In this way voice intelligibility in the waveforms can be analyzed in each time frame. Conclusions: This review evaluated HKD recognition algorithms to develop a tool for PD speech home-assessment using modern mobile technology. An algorithm that tackles realtime constraints in HKD recognition based on the review synthesis is proposed. We suggest that speech features may be further processed using wavelet transforms and used with a neural network for detection and quantification of speech anomalies related to PD. Based on this model, patients' speech can be automatically categorized according to UPDRS speech ratings.
Resumo:
Loop-teknik i solistiska sammanhang är en idag väletablerad musicerande form dock är möjligheterna att använda tekniken i ensembleform ett mer eller mindre oprövat fält. I projektet “Audiovisuella loopar” som genomförts vid Högskolan Dalarna presenteras ett system för kollektiv livelooping där upp till fyra personer loop-musicerar tillsammans och där loopandet samtidigt kan spelas in och spelas upp som videoklipp. Tekniken har visat sig ha en stark kreativ potential. Med enbart en kort instruktion så startar en kollektiv process där den omedelbara feedbacken ger ett “fl ow” som lockar fram skaparglädjen hos musikanterna. Dessa erfarenheter pekar på spännande möjligheter att använda tekniken i musikundervisning och musikterapi.
Resumo:
The ever increasing spurt in digital crimes such as image manipulation, image tampering, signature forgery, image forgery, illegal transaction, etc. have hard pressed the demand to combat these forms of criminal activities. In this direction, biometrics - the computer-based validation of a persons' identity is becoming more and more essential particularly for high security systems. The essence of biometrics is the measurement of person’s physiological or behavioral characteristics, it enables authentication of a person’s identity. Biometric-based authentication is also becoming increasingly important in computer-based applications because the amount of sensitive data stored in such systems is growing. The new demands of biometric systems are robustness, high recognition rates, capability to handle imprecision, uncertainties of non-statistical kind and magnanimous flexibility. It is exactly here that, the role of soft computing techniques comes to play. The main aim of this write-up is to present a pragmatic view on applications of soft computing techniques in biometrics and to analyze its impact. It is found that soft computing has already made inroads in terms of individual methods or in combination. Applications of varieties of neural networks top the list followed by fuzzy logic and evolutionary algorithms. In a nutshell, the soft computing paradigms are used for biometric tasks such as feature extraction, dimensionality reduction, pattern identification, pattern mapping and the like.
Resumo:
For those who are not new to the world of Japanese animation, known mainly as anime, the debate of "dub vs. sub" is by no means anything out of the ordinary, but rather a very heated argument amongst fans. The study will focus on the differences in the US English version between the two approaches of translating audio-visual media, namely subtitling (official subtitles and fanmade subtitles) and dubbing, in a qualitative context. More precisely, which of the two approaches can store the most information from the same audiovisual segment, in order to satisfy the needs of the anime audience. In order to draw substantial conclusions, the analysis will be conducted on a corpus of 1 episode from the first season of the popular mid-nineties TV animated series, Sailor Moon. The main objective of this research is to analyze the three versions and compare the findings to what anime fans expect each of them to provide, in terms of how culture specific terms are handled, how accurate the translation is, localization, censorship, and omission. As for the fans’ opinions, the study will include a survey regarding the personal preference of fans when it comes to choosing between the official subtitled version, the fanmade subtitles and the dubbed version.
Resumo:
The purpose of this work in progress study was to test the concept of recognising plants using images acquired by image sensors in a controlled noise-free environment. The presence of vegetation on railway trackbeds and embankments presents potential problems. Woody plants (e.g. Scots pine, Norway spruce and birch) often establish themselves on railway trackbeds. This may cause problems because legal herbicides are not effective in controlling them; this is particularly the case for conifers. Thus, if maintenance administrators knew the spatial position of plants along the railway system, it may be feasible to mechanically harvest them. Primary data were collected outdoors comprising around 700 leaves and conifer seedlings from 11 species. These were then photographed in a laboratory environment. In order to classify the species in the acquired image set, a machine learning approach known as Bag-of-Features (BoF) was chosen. Irrespective of the chosen type of feature extraction and classifier, the ability to classify a previously unseen plant correctly was greater than 85%. The maintenance planning of vegetation control could be improved if plants were recognised and localised. It may be feasible to mechanically harvest them (in particular, woody plants). In addition, listed endangered species growing on the trackbeds can be avoided. Both cases are likely to reduce the amount of herbicides, which often is in the interest of public opinion. Bearing in mind that natural objects like plants are often more heterogeneous within their own class rather than outside it, the results do indeed present a stable classification performance, which is a sound prerequisite in order to later take the next step to include a natural background. Where relevant, species can also be listed under the Endangered Species Act.
Resumo:
Allt eftersom utvecklingen går framåt inom applikationer och system så förändras också sättet på vilket vi interagerar med systemet på. Hittills har navigering och användning av applikationer och system mestadels skett med händerna och då genom mus och tangentbord. På senare tid så har navigering via touch-skärmar och rösten blivit allt mer vanligt. Då man ska styra en applikation med hjälp av rösten är det viktigt att vem som helst kan styra applikationen, oavsett vilken dialekt man har. För att kunna se hur korrekt ett röstigenkännings-API (Application Programming Interface) uppfattar svenska dialekter så initierades denna studie med dokumentstudier om dialekters kännetecken och ljudkombinationer. Dessa kännetecken och ljudkombinationer låg till grund för de ord vi valt ut till att testa API:et med. Varje dialekt fick alltså ett ord uppbyggt för att vara extra svårt för API:et att uppfatta när det uttalades av just den aktuella dialekten. Därefter utvecklades en prototyp, närmare bestämt en android-applikation som fungerade som ett verktyg i datainsamlingen. Då arbetet innehåller en prototyp och en undersökning så valdes Design and Creation Research som forskningsstrategi med datainsamlingsmetoderna dokumentstudier och observationer för att få önskat resultat. Data samlades in via observationer med prototypen som hjälpmedel och med hjälp av dokumentstudier. Det empiriska data som registrerats via observationerna och med hjälp av applikationen påvisade att vissa dialekter var lättare för API:et att uppfatta korrekt. I vissa fall var resultaten väntade då vissa ord uppbyggda av ljudkombinationer i enlighet med teorin skulle uttalas väldigt speciellt av en viss dialekt. Ibland blev det väldigt låga resultat på just dessa ord men i andra fall förvånansvärt höga. Slutsatsen vi drog av detta var att de ord vi valt ut med en baktanke om att de skulle få låga resultat för den speciella dialekten endast visade sig stämma vid två tillfällen. Det var istället det ord innehållande sje- och tje-ljud som enligt teorin var gemensamma kännetecken för alla dialekter som fick lägst resultat överlag.