924 resultados para acoustic speech recognition system
Resumo:
In the last few years, the number of systems and devices that use voice based interaction has grown significantly. For a continued use of these systems, the interface must be reliable and pleasant in order to provide an optimal user experience. However there are currently very few studies that try to evaluate how pleasant is a voice from a perceptual point of view when the final application is a speech based interface. In this paper we present an objective definition for voice pleasantness based on the composition of a representative feature subset and a new automatic voice pleasantness classification and intensity estimation system. Our study is based on a database composed by European Portuguese female voices but the methodology can be extended to male voices or to other languages. In the objective performance evaluation the system achieved a 9.1% error rate for voice pleasantness classification and a 15.7% error rate for voice pleasantness intensity estimation.
Resumo:
In research on Silent Speech Interfaces (SSI), different sources of information (modalities) have been combined, aiming at obtaining better performance than the individual modalities. However, when combining these modalities, the dimensionality of the feature space rapidly increases, yielding the well-known "curse of dimensionality". As a consequence, in order to extract useful information from this data, one has to resort to feature selection (FS) techniques to lower the dimensionality of the learning space. In this paper, we assess the impact of FS techniques for silent speech data, in a dataset with 4 non-invasive and promising modalities, namely: video, depth, ultrasonic Doppler sensing, and surface electromyography. We consider two supervised (mutual information and Fisher's ratio) and two unsupervised (meanmedian and arithmetic mean geometric mean) FS filters. The evaluation was made by assessing the classification accuracy (word recognition error) of three well-known classifiers (knearest neighbors, support vector machines, and dynamic time warping). The key results of this study show that both unsupervised and supervised FS techniques improve on the classification accuracy on both individual and combined modalities. For instance, on the video component, we attain relative performance gains of 36.2% in error rates. FS is also useful as pre-processing for feature fusion. Copyright © 2014 ISCA.
Resumo:
The aim of this work is to characterize the nanofilm consisting of the benzoic acid-modified glassy carbon (GC) electrode system through multidimensional scaling space analysis. The surface modification is based on the electrochemical reaction between the GC electrode and benzoic acid-diazonium salt (BA-DAS). As a result, the nonofilms regarding the benzoic acid-glassy carbon (BA-GC) electrode surface was obtained. For the analysis of the naonfilm of BC-GC electrode system, the IR spectra of the modified BA-GC electrode surface, GC surface and BA-DAS were recorded in the spectral range of 599.84 – 3996.34 [cm–1]. The IR data vectors of the above three forms were processed by the using the multidimensional scaling space approach to demonstrate the existence of a nanofilm on the modified BA-GC electrode system. Two- and three-dimensional MDS profiles obtained by application of multidimensional scaling approach to the data sets {CG1,...,CG10}, {BA-GC1,...,BA-GC10} and {FILM1,...,FILM10} allow a good recognition of the nanofilm on the modified glassy carbon (GC) electrode system.
Resumo:
This paper reports on the design and development of an Android-based context-aware system to support Erasmus students during their mobility in Porto. It enables: (i) guest users to create, rate and store personal points of interest (POI) in a private, local on board database; and (ii) authenticated users to upload and share POI as well as get and rate recommended POI from the shared central database. The system is a distributed client / server application. The server interacts with a central database that maintains the user profiles and the shared POI organized by category and rating. The Android GUI application works both as a standalone application and as a client module. In standalone mode, guest users have access to generic info, a map-based interface and a local database to store and retrieve personal POI. Upon successful authentication, users can, additionally, share POI as well as get and rate recommendations sorted by category, rating and distance-to-user.
Resumo:
As the wireless cellular market reaches competitive levels never seen before, network operators need to focus on maintaining Quality of Service (QoS) a main priority if they wish to attract new subscribers while keeping existing customers satisfied. Speech Quality as perceived by the end user is one major example of a characteristic in constant need of maintenance and improvement. It is in this topic that this Master Thesis project fits in. Making use of an intrusive method of speech quality evaluation, as a means to further study and characterize the performance of speech codecs in second-generation (2G) and third-generation (3G) technologies. Trying to find further correlation between codecs with similar bit rates, along with the exploration of certain transmission parameters which may aid in the assessment of speech quality. Due to some limitations concerning the audio analyzer equipment that was to be employed, a different system for recording the test samples was sought out. Although the new designed system is not standard, after extensive testing and optimization of the system's parameters, final results were found reliable and satisfactory. Tests include a set of high and low bit rate codecs for both 2G and 3G, where values were compared and analysed, leading to the outcome that 3G speech codecs perform better, under the approximately same conditions, when compared with 2G. Reinforcing the idea that 3G is, with no doubt, the best choice if the costumer looks for the best possible listening speech quality. Regarding the transmission parameters chosen for the experiment, the Receiver Quality (RxQual) and Received Energy per Chip to the Power Density Ratio (Ec/N0), these were subject to speech quality correlation tests. Final results of RxQual were compared to those of prior studies from different researchers and, are considered to be of important relevance. Leading to the confirmation of RxQual as a reliable indicator of speech quality. As for Ec/N0, it is not possible to state it as a speech quality indicator however, it shows clear thresholds for which the MOS values decrease significantly. The studied transmission parameters show that they can be used not only for network management purposes but, at the same time, give an expected idea to the communications engineer (or technician) of the end-to-end speech quality consequences. With the conclusion of the work new ideas for future studies come to mind. Considering that the fourth-generation (4G) cellular technologies are now beginning to take an important place in the global market, as the first all-IP network structure, it seems of great relevance that 4G speech quality should be subject of evaluation. Comparing it to 3G, not only in narrowband but also adding wideband scenarios with the most recent standard objective method of speech quality assessment, POLQA. Also, new data found on Ec/N0 tests, justifies further research studies with the intention of validating the assumptions made in this work.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Resumo:
The robotics community is concerned with the ability to infer and compare the results from researchers in areas such as vision perception and multi-robot cooperative behavior. To accomplish that task, this paper proposes a real-time indoor visual ground truth system capable of providing accuracy with at least more magnitude than the precision of the algorithm to be evaluated. A multi-camera architecture is proposed under the ROS (Robot Operating System) framework to estimate the 3D position of objects and the implementation and results were contextualized to the Robocup Middle Size League scenario.
Resumo:
The process of visually exploring underwater environments is still a complex problem. Underwater vision systems require complementary means of sensor information to help overcome water disturbances. This work proposes the development of calibration methods for a structured light based system consisting on a camera and a laser with a line beam. Two different calibration procedures that require only two images from different viewpoints were developed and tested in dry and underwater environments. Results obtained show, an accurate calibration for the camera/projector pair with errors close to 1 mm even in the presence of a small stereos baseline.
Resumo:
In this paper, a rule-based automatic syllabifier for Danish is described using the Maximal Onset Principle. Prior success rates of rule-based methods applied to Portuguese and Catalan syllabification modules were on the basis of this work. The system was implemented and tested using a very small set of rules. The results gave rise to 96.9% and 98.7% of word accuracy rate, contrary to our initial expectations, being Danish a language with a complex syllabic structure and thus difficult to be rule-driven. Comparison with data-driven syllabification system using artificial neural networks showed a higher accuracy rate of the former system.
Resumo:
This paper presents a preliminary acoustic study concerning the development of the first prototype of a patented removable module for interior partitioning. It is a prefabricated, vertical element for division of interior spaces that does not require the use of gutters or technical support. A set of such modules, linearly disposed, will create a division, allowing the personalization of any indoor area, including open office spaces, rooms, among others. The main characteristic that distinguishes this element from the existing solutions available on the market is that its mobility relies exclusively on a set of integrated bearings at the base of each module. Through an incorporated elevation system, the user can lower the module, move it to the desired position and re-elevate it until pressed against the ledge of the ceiling, making it stable. In this sense, and taking into account its acoustic behavior, several tests were made in the LNEC acoustics lab. Airborne sound insulation tests for different typologies of the prototype were conducted, according to the applicable standards EN ISO 354:2003, EN ISO 717-1:2013 and EN ISO 10140-2:2010. Some important conclusions and analysis of the prototype viability were extracted.
Resumo:
Graphics based systems of Augmented and Alternative Communication are widely used to promote communication in people with Autism Spectrum Disorders. This study discusses an integration of Augmented Reality in communication interventions, by relating elements of Augmented and Alternative Communication and Applied Behaviour Analysis strategies. An architecture for an Augmented Reality based interactive system to assist interventions is proposed. STAR provides an Augmented Reality tool to assist interventions performed by therapists and support for parents to join in and participate in the child’s intervention. Finally we report on the usage of the Augmented Reality tool in interventions with children with Autism Spectrum Disorders.
Resumo:
Hand gestures are a powerful way for human communication, with lots of potential applications in the area of human computer interaction. Vision-based hand gesture recognition techniques have many proven advantages compared with traditional devices, giving users a simpler and more natural way to communicate with electronic devices. This work proposes a generic system architecture based in computer vision and machine learning, able to be used with any interface for humancomputer interaction. The proposed solution is mainly composed of three modules: a pre-processing and hand segmentation module, a static gesture interface module and a dynamic gesture interface module. The experiments showed that the core of vision-based interaction systems can be the same for all applications and thus facilitate the implementation. In order to test the proposed solutions, three prototypes were implemented. For hand posture recognition, a SVM model was trained and used, able to achieve a final accuracy of 99.4%. For dynamic gestures, an HMM model was trained for each gesture that the system could recognize with a final average accuracy of 93.7%. The proposed solution as the advantage of being generic enough with the trained models able to work in real-time, allowing its application in a wide range of human-machine applications.
Resumo:
Dissertação de mestrado em Ciências da Linguagem
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
Magdeburg, Univ., Fak. für Elektrotechnik und Informationstechnik, Diss., 2011