30 resultados para Speech processing systems.

em Instituto Politécnico do Porto, Portugal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a linguistically rule-based grapheme-to-phone (G2P) transcription algorithm is described for European Portuguese. A complete set of phonological and phonetic transcription rules regarding the European Portuguese standard variety is presented. This algorithm was implemented and tested by using online newspaper articles. The obtained experimental results gave rise to 98.80% of accuracy rate. Future developments in order to increase this value are foreseen. Our purpose with this work is to develop a module/ tool that can improve synthetic speech naturalness in European Portuguese. Other applications of this system can be expected like language teaching/learning. These results, together with our perspectives of future improvements, have proved the dramatic importance of linguistic knowledge on the development of Text-to-Speech systems (TTS).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The relation of automatic auditory discrimination, measured with MMN, with the type of stimuli has not been well established in the literature, despite its importance as an electrophysiological measure of central sound representation. In this study, MMN response was elicited by pure-tone and speech binaurally passive auditory oddball paradigm in a group of 8 normal young adult subjects at the same intensity level (75 dB SPL). The frequency difference in pure-tone oddball was 100 Hz (standard = 1 000 Hz; deviant = 1 100 Hz; same duration = 100 ms), in speech oddball (standard /ba/; deviant /pa/; same duration = 175 ms) the Portuguese phonemes are both plosive bi-labial in order to maintain a narrow frequency band. Differences were found across electrode location between speech and pure-tone stimuli. Larger MMN amplitude, duration and higher latency to speech were verified compared to pure-tone in Cz and Fz as well as significance differences in latency and amplitude between mastoids. Results suggest that speech may be processed differently than non-speech; also it may occur in a later stage due to overlapping processes since more neural resources are required to speech processing.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Database query languages on relations (for example SQL) make it possible to join two relations. This operation is very common in desktop/server database systems but unfortunately query processing systems in networked embedded computer systems currently do not support this operation; specifically, the query processing systems TAG, TinyDB, Cougar do not support this. We show how a prioritized medium access control (MAC) protocol can be used to efficiently execute the database operation join for networked embedded computer systems where all computer nodes are in a single broadcast domain.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper reports on the design and development of an Android-based context-aware system to support Erasmus students during their mobility in Porto. It enables: (i) guest users to create, rate and store personal points of interest (POI) in a private, local on board database; and (ii) authenticated users to upload and share POI as well as get and rate recommended POI from the shared central database. The system is a distributed client / server application. The server interacts with a central database that maintains the user profiles and the shared POI organized by category and rating. The Android GUI application works both as a standalone application and as a client module. In standalone mode, guest users have access to generic info, a map-based interface and a local database to store and retrieve personal POI. Upon successful authentication, users can, additionally, share POI as well as get and rate recommendations sorted by category, rating and distance-to-user.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Speech interfaces for Assistive Technologies are not common and are usually replaced by others. The market they are targeting is not considered attractive and speech technologies are still not well spread. Industry still thinks they present some performance risks, especially Speech Recognition systems. As speech is the most elemental and natural way for communication, it has strong potential for enhancing inclusion and quality of life for broader groups of users with special needs, such as people with cerebral palsy and elderly staying at their homes. This work is a position paper in which the authors argue for the need to make speech become the basic interface in assistive technologies. Among the main arguments, we can state: speech is the easiest way to interact with machines; there is a growing market for embedded speech in assistive technologies, since the number of disabled and elderly people is expanding; speech technology is already mature to be used but needs adaptation to people with special needs; there is still a lot of R&D to be done in this area, especially when thinking about the Portuguese market. The main challenges are presented and future directions are proposed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Network control systems (NCSs) are spatially distributed systems in which the communication between sensors, actuators and controllers occurs through a shared band-limited digital communication network. However, the use of a shared communication network, in contrast to using several dedicated independent connections, introduces new challenges which are even more acute in large scale and dense networked control systems. In this paper we investigate a recently introduced technique of gathering information from a dense sensor network to be used in networked control applications. Obtaining efficiently an approximate interpolation of the sensed data is exploited as offering a good tradeoff between accuracy in the measurement of the input signals and the delay to the actuation. These are important aspects to take into account for the quality of control. We introduce a variation to the state-of-the-art algorithms which we prove to perform relatively better because it takes into account the changes over time of the input signal within the process of obtaining an approximate interpolation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, a rule-based automatic syllabifier for Danish is described using the Maximal Onset Principle. Prior success rates of rule-based methods applied to Portuguese and Catalan syllabification modules were on the basis of this work. The system was implemented and tested using a very small set of rules. The results gave rise to 96.9% and 98.7% of word accuracy rate, contrary to our initial expectations, being Danish a language with a complex syllabic structure and thus difficult to be rule-driven. Comparison with data-driven syllabification system using artificial neural networks showed a higher accuracy rate of the former system.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The recent developments on Hidden Markov Models (HMM) based speech synthesis showed that this is a promising technology fully capable of competing with other established techniques. However some issues still lack a solution. Several authors report an over-smoothing phenomenon on both time and frequencies which decreases naturalness and sometimes intelligibility. In this work we present a new vowel intelligibility enhancement algorithm that uses a discrete Kalman filter (DKF) for tracking frame based parameters. The inter-frame correlations are modelled by an autoregressive structure which provides an underlying time frame dependency and can improve time-frequency resolution. The system’s performance has been evaluated using objective and subjective tests and the proposed methodology has led to improved results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

7th Mediterranean Conference on Information Systems, MCIS 2012, Guimaraes, Portugal, September 8-10, 2012, Proceedings Series: Lecture Notes in Business Information Processing, Vol. 129

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a methodology supported on the data base knowledge discovery process (KDD), in order to find out the failure probability of electrical equipments’, which belong to a real electrical high voltage network. Data Mining (DM) techniques are used to discover a set of outcome failure probability and, therefore, to extract knowledge concerning to the unavailability of the electrical equipments such us power transformers and high-voltages power lines. The framework includes several steps, following the analysis of the real data base, the pre-processing data, the application of DM algorithms, and finally, the interpretation of the discovered knowledge. To validate the proposed methodology, a case study which includes real databases is used. This data have a heavy uncertainty due to climate conditions for this reason it was used fuzzy logic to determine the set of the electrical components failure probabilities in order to reestablish the service. The results reflect an interesting potential of this approach and encourage further research on the topic.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Magnetic resonance (MR) imaging has been used to analyse and evaluate the vocal tract shape through different techniques and with promising results in several fields. Our purpose is to demonstrate the relevance of MR and image processing for the vocal tract study. The extraction of contours of the air cavities allowed the set - up of a number of 3D reconstruction image stacks by means of the combination of orthogonally oriented sets of slices for e ach articulatory gesture, as a new approach to solve the expected spatial under sampling of the imaging process. In result these models give improved information for the visualization of morphologic and anatomical aspects and are useful for partial measure ments of the vocal tract shape in different situations. Potential use can be found in Medical and therapeutic applications as well as in acoustic articulatory speech modelling.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The first and second authors would like to thank the support of the PhD grants with references SFRH/BD/28817/2006 and SFRH/PROTEC/49517/2009, respectively, from Fundação para a Ciência e Tecnol ogia (FCT). This work was partially done in the scope of the project “Methodologies to Analyze Organs from Complex Medical Images – Applications to Fema le Pelvic Cavity”, wi th reference PTDC/EEA- CRO/103320/2008, financially supported by FCT.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Temporal lobe epilepsy (TLE) is a neurological disorder that directly affects cortical areas responsible for auditory processing. The resulting abnormalities can be assessed using event-related potentials (ERP), which have high temporal resolution. However, little is known about TLE in terms of dysfunction of early sensory memory encoding or possible correlations between EEGs, linguistic deficits, and seizures. Mismatch negativity (MMN) is an ERP component – elicited by introducing a deviant stimulus while the subject is attending to a repetitive behavioural task – which reflects pre-attentive sensory memory function and reflects neuronal auditory discrimination and perceptional accuracy. Hypothesis: We propose an MMN protocol for future clinical application and research based on the hypothesis that children with TLE may have abnormal MMN for speech and non-speech stimuli. The MMN can be elicited with a passive auditory oddball paradigm, and the abnormalities might be associated with the location and frequency of epileptic seizures. Significance: The suggested protocol might contribute to a better understanding of the neuropsychophysiological basis of MMN. We suggest that in TLE central sound representation may be decreased for speech and non-speech stimuli. Discussion: MMN arises from a difference to speech and non-speech stimuli across electrode sites. TLE in childhood might be a good model for studying topographic and functional auditory processing and its neurodevelopment, pointing to MMN as a possible clinical tool for prognosis, evaluation, follow-up, and rehabilitation for TLE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

TLE in infancy has been the subject of varied research. Topographical and structural evidence is coincident with the neuronal systems responsible for auditory processing of the highest specialization and complexity. Recent studies have been showing the need of a hemispheric asymmetry for an optimization in central auditory processing (CAP) and acquisition and learning of a language system. A new functional research paradigm is required to study mental processes that require methods of cognitive-sensory information analysis processed in very short periods of time (msec), such as the ERPs. Thus, in this article, we hypothesize that the TLE in infancy could be a good model for topographic and functional study of CAP and its development process, contributing to a better understanding of the learning difficulties that children with this neurological disorder have.