14 resultados para Visual Speaker Recognition, Visual Speech Recognition, Cascading Appearance-Based Features

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speaker diarization is the process of sorting speeches according to the speaker. Diarization helps to search and retrieve what a certain speaker uttered in a meeting. Applications of diarization systemsextend to other domains than meetings, for example, lectures, telephone, television, and radio. Besides, diarization enhances the performance of several speech technologies such as speaker recognition, automatic transcription, and speaker tracking. Methodologies previously used in developing diarization systems are discussed. Prior results and techniques are studied and compared. Methods such as Hidden Markov Models and Gaussian Mixture Models that are used in speaker recognition and other speech technologies are also used in speaker diarization. The objective of this thesis is to develop a speaker diarization system in meeting domain. Experimental part of this work indicates that zero-crossing rate can be used effectively in breaking down the audio stream into segments, and adaptive Gaussian Models fit adequately short audio segments. Results show that 35 Gaussian Models and one second as average length of each segment are optimum values to build a diarization system for the tested data. Uniting the segments which are uttered by same speaker is done in a bottom-up clustering by a newapproach of categorizing the mixture weights.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Localization, which is the ability of a mobile robot to estimate its position within its environment, is a key capability for autonomous operation of any mobile robot. This thesis presents a system for indoor coarse and global localization of a mobile robot based on visual information. The system is based on image matching and uses SIFT features as natural landmarks. Features extracted from training images arestored in a database for use in localization later. During localization an image of the scene is captured using the on-board camera of the robot, features are extracted from the image and the best match is searched from the database. Feature matching is done using the k-d tree algorithm. Experimental results showed that localization accuracy increases with the number of training features used in the training database, while, on the other hand, increasing number of features tended to have a negative impact on the computational time. For some parts of the environment the error rate was relatively high due to a strong correlation of features taken from those places across the environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tässä diplomityössä perehdytään puhujantunnistukseen ja sen käyttökelpoisuuteen käyttäjän henkilöllisyyden todentamisessa osana puhelinverkon lisäarvopalveluja. Puhelimitse ohjattavat palvelut ovat yleensä perustuneet puhelimen näppäimillä lähetettäviin äänitaajuusvalintoihin. Käyttäjän henkilöllisyydestä on voitu varmistua esimerkiksi käyttäjätunnuksen ja salaisen tunnusluvun perusteella. Tulevaisuudessa palvelut voivat perustua puheentunnistukseen, jolloin myös käyttäjän todentaminen äänen perusteella vaikuttaa järkevältä. Työssä esitellään aluksi erilaisia biometrisiä tunnistamismenetelmiä. Työssä perehdytään tarkemmin äänen perusteella tapahtuvaan puhujan todentamiseen. Työn käytännön osuudessa toteutettiin puhelinverkon palveluihin soveltuva puhujantodennussovelluksen prototyyppi. Työn tarkoituksena oli selvittää teknologian käyttömahdollisuuksia sekä kerätä kokemusta puhujantodennuspalvelun toteuttamisesta tulevaisuutta silmällä pitäen. Prototyypin toteutuksessa ohjelmointikielenä käytettiin Javaa.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation considers the segmental durations of speech from the viewpoint of speech technology, especially speech synthesis. The idea is that better models of segmental durations lead to higher naturalness and better intelligibility. These features are the key factors for better usability and generality of synthesized speech technology. Even though the studies are based on a Finnish corpus the approaches apply to all other languages as well. This is possibly due to the fact that most of the studies included in this dissertation are about universal effects taking place on utterance boundaries. Also the methods invented and used here are suitable for any other study of another language. This study is based on two corpora of news reading speech and sentences read aloud. The other corpus is read aloud by a 39-year-old male, whilst the other consists of several speakers in various situations. The use of two corpora is twofold: it involves a comparison of the corpora and a broader view on the matters of interest. The dissertation begins with an overview to the phonemes and the quantity system in the Finnish language. Especially, we are covering the intrinsic durations of phonemes and phoneme categories, as well as the difference of duration between short and long phonemes. The phoneme categories are presented to facilitate the problem of variability of speech segments. In this dissertation we cover the boundary-adjacent effects on segmental durations. In initial positions of utterances we find that there seems to be initial shortening in Finnish, but the result depends on the level of detail and on the individual phoneme. On the phoneme level we find that the shortening or lengthening only affects the very first ones at the beginning of an utterance. However, on average, the effect seems to shorten the whole first word on the word level. We establish the effect of final lengthening in Finnish. The effect in Finnish has been an open question for a long time, whilst Finnish has been the last missing piece for it to be a universal phenomenon. Final lengthening is studied from various angles and it is also shown that it is not a mere effect of prominence or an effect of speech corpus with high inter- and intra-speaker variation. The effect of final lengthening seems to extend from the final to the penultimate word. On a phoneme level it reaches a much wider area than the initial effect. We also present a normalization method suitable for corpus studies on segmental durations. The method uses an utterance-level normalization approach to capture the pattern of segmental durations within each utterance. This prevents the impact of various problematic variations within the corpora. The normalization is used in a study on final lengthening to show that the results on the effect are not caused by variation in the material. The dissertation shows an implementation and prowess of speech synthesis on a mobile platform. We find that the rule-based method of speech synthesis is a real-time software solution, but the signal generation process slows down the system beyond real time. Future aspects of speech synthesis on limited platforms are discussed. The dissertation considers ethical issues on the development of speech technology. The main focus is on the development of speech synthesis with high naturalness, but the problems and solutions are applicable to any other speech technology approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The usage of digital content, such as video clips and images, has increased dramatically during the last decade. Local image features have been applied increasingly in various image and video retrieval applications. This thesis evaluates local features and applies them to image and video processing tasks. The results of the study show that 1) the performance of different local feature detector and descriptor methods vary significantly in object class matching, 2) local features can be applied in image alignment with superior results against the state-of-the-art, 3) the local feature based shot boundary detection method produces promising results, and 4) the local feature based hierarchical video summarization method shows promising new new research direction. In conclusion, this thesis presents the local features as a powerful tool in many applications and the imminent future work should concentrate on improving the quality of the local features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature extraction is the part of pattern recognition, where the sensor data is transformed into a more suitable form for the machine to interpret. The purpose of this step is also to reduce the amount of information passed to the next stages of the system, and to preserve the essential information in the view of discriminating the data into different classes. For instance, in the case of image analysis the actual image intensities are vulnerable to various environmental effects, such as lighting changes and the feature extraction can be used as means for detecting features, which are invariant to certain types of illumination changes. Finally, classification tries to make decisions based on the previously transformed data. The main focus of this thesis is on developing new methods for the embedded feature extraction based on local non-parametric image descriptors. Also, feature analysis is carried out for the selected image features. Low-level Local Binary Pattern (LBP) based features are in a main role in the analysis. In the embedded domain, the pattern recognition system must usually meet strict performance constraints, such as high speed, compact size and low power consumption. The characteristics of the final system can be seen as a trade-off between these metrics, which is largely affected by the decisions made during the implementation phase. The implementation alternatives of the LBP based feature extraction are explored in the embedded domain in the context of focal-plane vision processors. In particular, the thesis demonstrates the LBP extraction with MIPA4k massively parallel focal-plane processor IC. Also higher level processing is incorporated to this framework, by means of a framework for implementing a single chip face recognition system. Furthermore, a new method for determining optical flow based on LBPs, designed in particular to the embedded domain is presented. Inspired by some of the principles observed through the feature analysis of the Local Binary Patterns, an extension to the well known non-parametric rank transform is proposed, and its performance is evaluated in face recognition experiments with a standard dataset. Finally, an a priori model where the LBPs are seen as combinations of n-tuples is also presented

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The subject of this thesis was the acquisition of difficult non-native vowels by speakers of two different languages. In order to study the subject, a group of Finnish speakers and another group of American English speakers were recruited and they underwent a short listen-and-repeat training that included as stimuli the semisynthetically created pseudowords /ty:ti/ and /tʉ:ti/. The aim was to study the effect of the training method on the subjects as well as the possible influence of the speakers’ native language on the process of acquisition. The selection of the target vowels /y/ and /ʉ/ was made according to the Speech Learning Model and Perceptual Assimilation Model, both of which predict that second language speech sounds that share similar features with sounds of a person’s native language are most difficult for the person to learn. The vowel /ʉ/ is similar to Finnish vowels as well as to vowels of English, whereas /y/ exists in Finnish but not in English, although it is similar to other English vowels. Therefore, it can be hypothesized that /ʉ/ is a difficult vowel for both groups to learn and /y/ is difficult for English speakers. The effect of training was tested with a pretest-training-posttest protocol in which the stimuli were played alternately and the subjects’ task was to repeat the heard stimuli. The training method was thought to improve the production of non-native sounds by engaging different feedback mechanisms, such as auditory and somatosensory. These, according to Template Theory, modify the production of speech by altering the motor commands from the internal speech system or the feedforward signal which translates the motoric commands into articulatory movements. The subjects’ productions during the test phases were recorded and an acoustic analysis was performed in which the formant values of the target vowels were extracted. Statistical analyses showed a statistically significant difference between groups in the first formant, signaling a possible effect of native motor commands. Furthermore, a statistically significant difference between groups was observed in the standard deviation of the formants in the production of /y/, showing the uniformity of native production. The training had no observable effect, possibly due to the short nature of the training protocol.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Convolutional Neural Networks (CNN) have become the state-of-the-art methods on many large scale visual recognition tasks. For a lot of practical applications, CNN architectures have a restrictive requirement: A huge amount of labeled data are needed for training. The idea of generative pretraining is to obtain initial weights of the network by training the network in a completely unsupervised way and then fine-tune the weights for the task at hand using supervised learning. In this thesis, a general introduction to Deep Neural Networks and algorithms are given and these methods are applied to classification tasks of handwritten digits and natural images for developing unsupervised feature learning. The goal of this thesis is to find out if the effect of pretraining is damped by recent practical advances in optimization and regularization of CNN. The experimental results show that pretraining is still a substantial regularizer, however, not a necessary step in training Convolutional Neural Networks with rectified activations. On handwritten digits, the proposed pretraining model achieved a classification accuracy comparable to the state-of-the-art methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Local features are used in many computer vision tasks including visual object categorization, content-based image retrieval and object recognition to mention a few. Local features are points, blobs or regions in images that are extracted using a local feature detector. To make use of extracted local features the localized interest points are described using a local feature descriptor. A descriptor histogram vector is a compact representation of an image and can be used for searching and matching images in databases. In this thesis the performance of local feature detectors and descriptors is evaluated for object class detection task. Features are extracted from image samples belonging to several object classes. Matching features are then searched using random image pairs of a same class. The goal of this thesis is to find out what are the best detector and descriptor methods for such task in terms of detector repeatability and descriptor matching rate.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

During a possible loss of coolant accident in BWRs, a large amount of steam will be released from the reactor pressure vessel to the suppression pool. Steam will be condensed into the suppression pool causing dynamic and structural loads to the pool. The formation and break up of bubbles can be measured by visual observation using a suitable pattern recognition algorithm. The aim of this study was to improve the preliminary pattern recognition algorithm, developed by Vesa Tanskanen in his doctoral dissertation, by using MATLAB. Video material from the PPOOLEX test facility, recorded during thermal stratification and mixing experiments, was used as a reference in the development of the algorithm. The developed algorithm consists of two parts: the pattern recognition of the bubbles and the analysis of recognized bubble images. The bubble recognition works well, but some errors will appear due to the complex structure of the pool. The results of the image analysis were reasonable. The volume and the surface area of the bubbles were not evaluated. Chugging frequencies calculated by using FFT fitted well into the results of oscillation frequencies measured in the experiments. The pattern recognition algorithm works in the conditions it is designed for. If the measurement configuration will be changed, some modifications have to be done. Numerous improvements are proposed for the future 3D equipment.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many industrial applications need object recognition and tracking capabilities. The algorithms developed for those purposes are computationally expensive. Yet ,real time performance, high accuracy and small power consumption are essential measures of the system. When all these requirements are combined, hardware acceleration of these algorithms becomes a feasible solution. The purpose of this study is to analyze the current state of these hardware acceleration solutions, which algorithms have been implemented in hardware and what modifications have been done in order to adapt these algorithms to hardware.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Tässä työssä raportoidaan hybridihitsauksesta otettujen suurnopeuskuvasarjojen automaattisen analyysijärjestelmän kehittäminen.Järjestelmän tarkoitus oli tuottaa tietoa, joka avustaisi analysoijaa arvioimaan kuvatun hitsausprosessin laatua. Tutkimus keskittyi valokaaren taajuuden säännöllisyyden ja lisäainepisaroiden lentosuuntien mittaamiseen. Valokaaria havaittiin kuvasarjoista sumean c-means-klusterointimenetelmän avullaja perättäisten valokaarien välistä aikaväliä käytettiin valokaaren taajuuden säännöllisyyden mittarina. Pisaroita paikannettiin menetelmällä, jossa yhdistyi pääkomponenttianalyysi ja tukivektoriluokitin. Kalman-suodinta käytettiin tuottamaan arvioita pisaroiden lentosuunnista ja nopeuksista. Lentosuunnanmääritysmenetelmä luokitteli pisarat niiden arvioitujen lentosuuntien perusteella. Järjestelmän kehittämiseen käytettävissä olleet kuvasarjat poikkesivat merkittävästi toisistaan kuvanlaadun ja pisaroiden ulkomuodon osalta, johtuen eroista kuvaus- ja hitsausprosesseissa. Analyysijärjestelmä kehitettiin toimimaan pienellä osajoukolla kuvasarjoja, joissa oli tietynlainen kuvaus- ja hitsausprosessi ja joiden kuvanlaatu ja pisaroiden ulkomuoto olivat samankaltaisia, mutta järjestelmää testattiin myös osajoukon ulkopuolisilla kuvasarjoilla. Testitulokset osoittivat, että lentosuunnanmääritystarkkuus oli kohtuullisen suuri osajoukonsisällä ja pieni muissa kuvasarjoissa. Valokaaren taajuuden säännöllisyyden määritys oli tarkka useammassa kuvasarjassa.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The number of digital images has been increasing exponentially in the last few years. People have problems managing their image collections and finding a specific image. An automatic image categorization system could help them to manage images and find specific images. In this thesis, an unsupervised visual object categorization system was implemented to categorize a set of unknown images. The system is unsupervised, and hence, it does not need known images to train the system which needs to be manually obtained. Therefore, the number of possible categories and images can be huge. The system implemented in the thesis extracts local features from the images. These local features are used to build a codebook. The local features and the codebook are then used to generate a feature vector for an image. Images are categorized based on the feature vectors. The system is able to categorize any given set of images based on the visual appearance of the images. Images that have similar image regions are grouped together in the same category. Thus, for example, images which contain cars are assigned to the same cluster. The unsupervised visual object categorization system can be used in many situations, e.g., in an Internet search engine. The system can categorize images for a user, and the user can then easily find a specific type of image.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Predation is an important selective force that has led to the evolution of a variety of fascinating anti-predator adaptations, such as many types of protective coloration and prey behaviours. Because the evolution of life has begun in the aquatic environment and many anti-predator adaptations are found already in relative primitive taxa, it is likely that many of these adaptations evolved initially in the aquatic environment. Yet, there has been surprisingly little research on the mechanisms and function of antipredator adaptations in aquatic systems. To understand the function of anti-predator adaptations and natural selection imposed on prey appearance and behaviour, I have investigated how protective coloration can be used, either as such or together with behavioural adaptations, to manipulate predator behaviour and decrease predation risk. To this end I conducted a series of behaviour ecological laboratory experiments in which I manipulated the visual appearance of artificial backgrounds and prey items. In paper I of this thesis, I investigated background choice as an anti-predator strategy, by observing the habitat choice of the least killifish (Heterandria formosa) between pairs of artificial backgrounds, both in the presence and absence of predation threat. It has been suggested that prey could decrease their risk of being detected by predators either by preferring backgrounds into which they blend or by preferring visually complex backgrounds. The least killifish preferred a background that matched their patterning to a background that mismatched it, showing that they are able to respond to cues of visual similarity between their colour pattern and the surrounding environment. Interestingly however, in female least killifish visual complexity of the background was a more important cue for habitat safety and may override or act together with background matching when searching for a safe habitat. It is possible that in females, preference for visually complex backgrounds is associated with lower opportunity costs than preference for matching backgrounds would be. Generally, the least killifish showed stronger preference while under predation threat, indicating that their background choice behaviour is an antipredator adaptation. Many aquatic prey species have eyespots, which are colour patterns that consist of roughly concentric rings and have received their name because they for humans often resemble the vertebrate eye. I investigated the anti-predator function of eyespots against predation by fish in papers II, III and IV. Some eyespots have been suggested to benefit prey by diverting the strikes of predators away from vital parts of the prey body or towards a direction that facilitates prey escape. Although proposed over a century ago, the divertive effect of eyespots has proven to be difficult to show experimentally. In papers II and III, I tested for divertive effect of eyespots towards attacking fish by presenting artificial prey with eyespots to laboratory reared three-spined sticklebacks (Gasterosteus aculeatus). I found that eyespots strongly influenced the behaviour of attacking sticklebacks and effectively drew their strikes towards the eyespots. To further investigate this divertive effect and whether the specific shape of eyespots is important for it, I tested in paper III the response of fish also to other markings than eyespots. I found that eyespots were generally more effective in diverting the first strikes of attacking fish compared to other prey markings. My findings suggest that the common occurrence of eyespots in aquatic prey species can at least partly be explained by the divertive effect of the eyespot shape, possibly together with the relative simple developmental mechanisms underlying circular colour patterns. An eyebar is a stripe that runs through the eye, and this pattern has been suggested to obscure the real eyes of the prey by visually blending parts of the eyes and head of the prey and by creating false edges. In paper III, I show that an eyebar effectively disrupts an eyelike shape. This suggests that eyebars provide an effective way to conceal the eyes and consequently obstruct detection and recognition of prey. This experiment also demonstrates that through concealment of the eyes, eyebars could be used to enhance the divertive effect of eyespots, which can explain the common occurrence of eyebars in many species of fish that have eyespots. Larger eyespots have been shown to intimidate some terrestrial predators, such as passerine birds, either because they resemble the eyes of the predator’s own enemy or because highly salient features may have an intimidating effect. In papers II and IV, I investigated whether the occurrence of eyespots in some aquatic prey could be explained by their intimidating effect predatory fish. In paper IV, I also investigated the reason for the intimidating effect of eyelike prey marks. In paper II, I found no clear intimidating effect of eyespots, whereas in paper IV, using a different approach, I found that sticklebacks hesitated to attack towards eyelike but not towards non-eyelike marks. Importantly, paper IV therefore presents the first rigorous evidence for the idea that eye mimicry, and not merely conspicuousness, underlies the intimidating effect. It also showed that the hesitation shown by fish towards eyelike marks is partly an innate response that is reinforced by encounters with predators. Collectively, this thesis shows that prey colour pattern and the visual appearance of the habitat influence the behaviour of fish. The results demonstrate that protective coloration provides numerous distinctive ways for aquatic prey to escape predation. Thus, visual perception and behaviour of fish are important factors shaping the appearance and behaviours of aquatic prey.