961 resultados para automatic speech recognition
Resumo:
In fetal brain MRI, most of the high-resolution reconstruction algorithms rely on brain segmentation as a preprocessing step. Manual brain segmentation is however highly time-consuming and therefore not a realistic solution. In this work, we assess on a large dataset the performance of Multiple Atlas Fusion (MAF) strategies to automatically address this problem. Firstly, we show that MAF significantly increase the accuracy of brain segmentation as regards single-atlas strategy. Secondly, we show that MAF compares favorably with the most recent approach (Dice above 0.90). Finally, we show that MAF could in turn provide an enhancement in terms of reconstruction quality.
Resumo:
The recognition of prior experiential learning (RPEL) involves the assessment ofskills and knowledge acquired by an individual through previous experience, which isnot necessarily related to an academic context. RPEL practices are far from generalisedin higher education, and there is a lack of specific guidelines on how to implement RPLprograms in particular settings, such as management education or online programs. TheRPEL pilot program developed in a Spanish virtual university is used throughout thearticle as the basis for further reflection on the design and implementation of RPEL inonline postgraduate education in the business field. The role of competences as a centraltheoretical foundation for RPEL is explained, and the context and characteristics of theRPEL program described. Special attention is paid to the key elements of the program¿sdesign and to the practical aspects of its implementation. The results of the program areassessed and general conclusions and suggestions for further research are discussed.
Resumo:
In this paper, we propose a new supervised linearfeature extraction technique for multiclass classification problemsthat is specially suited to the nearest neighbor classifier (NN).The problem of finding the optimal linear projection matrix isdefined as a classification problem and the Adaboost algorithmis used to compute it in an iterative way. This strategy allowsthe introduction of a multitask learning (MTL) criterion in themethod and results in a solution that makes no assumptions aboutthe data distribution and that is specially appropriated to solvethe small sample size problem. The performance of the methodis illustrated by an application to the face recognition problem.The experiments show that the representation obtained followingthe multitask approach improves the classic feature extractionalgorithms when using the NN classifier, especially when we havea few examples from each class
Resumo:
Peer-reviewed
Resumo:
Behavior-based navigation of autonomous vehicles requires the recognition of the navigable areas and the potential obstacles. In this paper we describe a model-based objects recognition system which is part of an image interpretation system intended to assist the navigation of autonomous vehicles that operate in industrial environments. The recognition system integrates color, shape and texture information together with the location of the vanishing point. The recognition process starts from some prior scene knowledge, that is, a generic model of the expected scene and the potential objects. The recognition system constitutes an approach where different low-level vision techniques extract a multitude of image descriptors which are then analyzed using a rule-based reasoning system to interpret the image content. This system has been implemented using a rule-based cooperative expert system
Resumo:
This Master's thesis addresses the design and implementation of the optical character recognition (OCR) system for a mobile device working on the Symbian operating system. The developed OCR system, named OCRCapriccio, emphasizes the modularity, effective extensibility and reuse. The system consists of two parts which are the graphical user interface and the OCR engine that was implemented as a plug-in. In fact, the plug-in includes two implementations of the OCR engine for enabling two types of recognition: the bitmap comparison based recognition and statistical recognition. The implementation results have shown that the approach based on bitmap comparison is more suitable for the Symbian environment because of its nature. Although the current implementation of bitmap comparison is lacking in accuracy, further development should be done in its direction. The biggest challenges of this work were related to developing an OCR scheme that would be suitable for Symbian OS Smartphones that have limited computational power and restricted resources.
Resumo:
The degradation of the catalytic filaments is the main factor limiting the industrial implementation of the hot wire chemical vapor deposition (HWCVD) technique. Up to now, no solution has been found to protect the catalytic filaments used in HWCVD without compromising their catalytic activity. Probably, the definitive solution relies on the automatic replacement of the catalytic filaments. In this work, the results of the validation tests of a new apparatus for the automatic replacement of the catalytic filaments are reported. The functionalities of the different parts have been validated using a 0.2 mm diameter tungsten filament under uc-Si:H deposition conditions.
Resumo:
In liberalized electricity markets, which have taken place in many countries over the world, the electricity distribution companies operate in the competitive conditions. Therefore, accurate information about the customers’ energy consumption plays an essential role for the budget keeping of the distribution company and for correct planning and operation of the distribution network. This master’s thesis is focused on the description of the possible benefits for the electric utilities and residential customers from the automatic meter reading system usage. Major benefits of the AMR, illustrated in the thesis, are distribution network management, power quality monitoring, load modelling, and detection of the illegal usage of the electricity. By the example of the power system state estimation, it was illustrated that even the partial installation of the AMR in the customer side leads to more accurate data about the voltage and power levels in the whole network. The thesis also contains the description of the present situation of the AMR integration in Russia.
Resumo:
Language acquisition is a complex process that requires the synergic involvement of different cognitive functions, which include extracting and storing the words of the language and their embedded rules for progressive acquisition of grammatical information. As has been shown in other fields that study learning processes, synchronization mechanisms between neuronal assemblies might have a key role during language learning. In particular, studying these dynamics may help uncover whether different oscillatory patterns sustain more item-based learning of words and rule-based learning from speech input. Therefore, we tracked the modulation of oscillatory neural activity during the initial exposure to an artificial language, which contained embedded rules. We analyzed both spectral power variations, as a measure of local neuronal ensemble synchronization, as well as phase coherence patterns, as an index of the long-range coordination of these local groups of neurons. Synchronized activity in the gamma band (2040 Hz), previously reported to be related to the engagement of selective attention, showed a clear dissociation of local power and phase coherence between distant regions. In this frequency range, local synchrony characterized the subjects who were focused on word identification and was accompanied by increased coherence in the theta band (48 Hz). Only those subjects who were able to learn the embedded rules showed increased gamma band phase coherence between frontal, temporal, and parietal regions.
Resumo:
In this paper, we present the Melodic Analysis of Speech method (MAS) that enables us to carry out complete and objective descriptions of a language's intonation, from a phonetic (melodic) point of view as well as from a phonological point of view. It is based on the acoustic-perceptive method by Cantero (2002), which has already been used in research on prosody in different languages. In this case, we present the results of its application in Spanish and Catalan.
Resumo:
This dissertation considers the segmental durations of speech from the viewpoint of speech technology, especially speech synthesis. The idea is that better models of segmental durations lead to higher naturalness and better intelligibility. These features are the key factors for better usability and generality of synthesized speech technology. Even though the studies are based on a Finnish corpus the approaches apply to all other languages as well. This is possibly due to the fact that most of the studies included in this dissertation are about universal effects taking place on utterance boundaries. Also the methods invented and used here are suitable for any other study of another language. This study is based on two corpora of news reading speech and sentences read aloud. The other corpus is read aloud by a 39-year-old male, whilst the other consists of several speakers in various situations. The use of two corpora is twofold: it involves a comparison of the corpora and a broader view on the matters of interest. The dissertation begins with an overview to the phonemes and the quantity system in the Finnish language. Especially, we are covering the intrinsic durations of phonemes and phoneme categories, as well as the difference of duration between short and long phonemes. The phoneme categories are presented to facilitate the problem of variability of speech segments. In this dissertation we cover the boundary-adjacent effects on segmental durations. In initial positions of utterances we find that there seems to be initial shortening in Finnish, but the result depends on the level of detail and on the individual phoneme. On the phoneme level we find that the shortening or lengthening only affects the very first ones at the beginning of an utterance. However, on average, the effect seems to shorten the whole first word on the word level. We establish the effect of final lengthening in Finnish. The effect in Finnish has been an open question for a long time, whilst Finnish has been the last missing piece for it to be a universal phenomenon. Final lengthening is studied from various angles and it is also shown that it is not a mere effect of prominence or an effect of speech corpus with high inter- and intra-speaker variation. The effect of final lengthening seems to extend from the final to the penultimate word. On a phoneme level it reaches a much wider area than the initial effect. We also present a normalization method suitable for corpus studies on segmental durations. The method uses an utterance-level normalization approach to capture the pattern of segmental durations within each utterance. This prevents the impact of various problematic variations within the corpora. The normalization is used in a study on final lengthening to show that the results on the effect are not caused by variation in the material. The dissertation shows an implementation and prowess of speech synthesis on a mobile platform. We find that the rule-based method of speech synthesis is a real-time software solution, but the signal generation process slows down the system beyond real time. Future aspects of speech synthesis on limited platforms are discussed. The dissertation considers ethical issues on the development of speech technology. The main focus is on the development of speech synthesis with high naturalness, but the problems and solutions are applicable to any other speech technology approaches.