968 resultados para Speech synthesis Data processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This talk, which was the keynote address of the NAS Colloquium on Human-Machine Communication by Voice, discusses the past, present, and future of human-machine communications, especially speech recognition and speech synthesis. Progress in these technologies is reviewed in the context of the general progress in computer and communications technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reviews some basic issues and methods involved in using neural networks to respond in a desired fashion to a temporally-varying environment. Some popular network models and training methods are introduced. A speech recognition example is then used to illustrate the central difficulty of temporal data processing: learning to notice and remember relevant contextual information. Feedforward network methods are applicable to cases where this problem is not severe. The application of these methods are explained and applications are discussed in the areas of pure mathematics, chemical and physical systems, and economic systems. A more powerful but less practical algorithm for temporal problems, the moving targets algorithm, is sketched and discussed. For completeness, a few remarks are made on reinforcement learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as “histogram binning” inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. ^ Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. ^ The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. ^ These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. ^ In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the overarching questions in the field of infant perceptual and cognitive development concerns how selective attention is organized during early development to facilitate learning. The following study examined how infants' selective attention to properties of social events (i.e., prosody of speech and facial identity) changes in real time as a function of intersensory redundancy (redundant audiovisual, nonredundant unimodal visual) and exploratory time. Intersensory redundancy refers to the spatially coordinated and temporally synchronous occurrence of information across multiple senses. Real time macro- and micro-structural change in infants' scanning patterns of dynamic faces was also examined. ^ According to the Intersensory Redundancy Hypothesis, information presented redundantly and in temporal synchrony across two or more senses recruits infants' selective attention and facilitates perceptual learning of highly salient amodal properties (properties that can be perceived across several sensory modalities such as the prosody of speech) at the expense of less salient modality specific properties. Conversely, information presented to only one sense facilitates infants' learning of modality specific properties (properties that are specific to a particular sensory modality such as facial features) at the expense of amodal properties (Bahrick & Lickliter, 2000, 2002). ^ Infants' selective attention and discrimination of prosody of speech and facial configuration was assessed in a modified visual paired comparison paradigm. In redundant audiovisual stimulation, it was predicted infants would show discrimination of prosody of speech in the early phases of exploration and facial configuration in the later phases of exploration. Conversely, in nonredundant unimodal visual stimulation, it was predicted infants would show discrimination of facial identity in the early phases of exploration and prosody of speech in the later phases of exploration. Results provided support for the first prediction and indicated that following redundant audiovisual exposure, infants showed discrimination of prosody of speech earlier in processing time than discrimination of facial identity. Data from the nonredundant unimodal visual condition provided partial support for the second prediction and indicated that infants showed discrimination of facial identity, but not prosody of speech. The dissertation study contributes to the understanding of the nature of infants' selective attention and processing of social events across exploratory time.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as "histogram binning" inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 – 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Field-programmable gate arrays are ideal hosts to custom accelerators for signal, image, and data processing but de- mand manual register transfer level design if high performance and low cost are desired. High-level synthesis reduces this design burden but requires manual design of complex on-chip and off-chip memory architectures, a major limitation in applications such as video processing. This paper presents an approach to resolve this shortcoming. A constructive process is described that can derive such accelerators, including on- and off-chip memory storage from a C description such that a user-defined throughput constraint is met. By employing a novel statement-oriented approach, dataflow intermediate models are derived and used to support simple ap- proaches for on-/off-chip buffer partitioning, derivation of custom on-chip memory hierarchies and architecture transformation to ensure user-defined throughput constraints are met with minimum cost. When applied to accelerators for full search motion estima- tion, matrix multiplication, Sobel edge detection, and fast Fourier transform, it is shown how real-time performance up to an order of magnitude in advance of existing commercial HLS tools is enabled whilst including all requisite memory infrastructure. Further, op- timizations are presented that reduce the on-chip buffer capacity and physical resource cost by up to 96% and 75%, respectively, whilst maintaining real-time performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The advancement of GPS technology has made it possible to use GPS devices as orientation and navigation tools, but also as tools to track spatiotemporal information. GPS tracking data can be broadly applied in location-based services, such as spatial distribution of the economy, transportation routing and planning, traffic management and environmental control. Therefore, knowledge of how to process the data from a standard GPS device is crucial for further use. Previous studies have considered various issues of the data processing at the time. This paper, however, aims to outline a general procedure for processing GPS tracking data. The procedure is illustrated step-by-step by the processing of real-world GPS data of car movements in Borlänge in the centre of Sweden.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este estudo tem como objetivo conhecer as representações sociais dos profissionais de saúde sobre o trabalho multiprofissional no Serviço Público de Saúde no município de Bandeirantes, Paraná. Foram entrevistados 44 profissionais de saúde de nível superior, com quatro questões abertas que abordaram aspectos de interesse para o tema. Para a análise dos dados, tomou-se como base o referencial da Teoria da Representação Social. Para o processamento dos dados, utilizou-se a técnica do Discurso do Sujeito Coletivo, por meio da qual se construíram os discursos-sínteses com auxílio do programa Qualiquantisoft. Nos discursos obtidos, os profissionais de saúde entrevistados consideraram seu trabalho uma rotina de atendimento programado, determinado pela demanda, desgastante, porém vocacionado. Destacaram que o trabalho multiprofissional é a integração de vários campos da área da saúde, entre profissionais de outras áreas e de outras especialidades para ter uma equipe formada para solucionar os problemas. Relataram que, para o desenvolvimento do trabalho multiprofissional, seria necessária maior interação entre os gestores e os profissionais; recursos materiais e físicos para a melhoria do atendimento; capacitação, conscientização, contratação de profissionais para o serviço; remuneração salarial e organização do serviço de saúde. Os conteúdos revelaram barreiras para o desenvolvimento do trabalho multiprofissional, como ausência de novas formas de gestão, flexibilização das relações de trabalho e necessidade de resolução de questões antigas, como remuneração salarial, planos de cargos e carreiras, e organização do serviço, com instalação de mecanismos que possam evitar a intensa rotatividade de profissionais.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sound source localization (SSL) is an essential task in many applications involving speech capture and enhancement. As such, speaker localization with microphone arrays has received significant research attention. Nevertheless, existing SSL algorithms for small arrays still have two significant limitations: lack of range resolution, and accuracy degradation with increasing reverberation. The latter is natural and expected, given that strong reflections can have amplitudes similar to that of the direct signal, but different directions of arrival. Therefore, correctly modeling the room and compensating for the reflections should reduce the degradation due to reverberation. In this paper, we show a stronger result. If modeled correctly, early reflections can be used to provide more information about the source location than would have been available in an anechoic scenario. The modeling not only compensates for the reverberation, but also significantly increases resolution for range and elevation. Thus, we show that under certain conditions and limitations, reverberation can be used to improve SSL performance. Prior attempts to compensate for reverberation tried to model the room impulse response (RIR). However, RIRs change quickly with speaker position, and are nearly impossible to track accurately. Instead, we build a 3-D model of the room, which we use to predict early reflections, which are then incorporated into the SSL estimation. Simulation results with real and synthetic data show that even a simplistic room model is sufficient to produce significant improvements in range and elevation estimation, tasks which would be very difficult when relying only on direct path signal components.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reviews current research and contemporary theories of subcortical participation in the motor control of speech production and language processing. As a necessary precursor to the discussion of the functional roles of the basal ganglia and thalamus, the neuroanatomy of the basal ganglial-thalamocortical circuitry is described. Contemporary models of hypokinetic and hyperkinetic movement disorders based on recent neuroanatomical descriptions of the multi-segmented circuits that characterise basal ganglion anatomy are described. Reported effects of surgically induced lesions in the globus pallidus and thalamus on speech production are reviewed. In addition, contemporary models proposed to explain the possible contribution of various subcortical structures to language processing are described and discussed in the context of evidence gained from observation of the effects of circumscribed surgically induced lesions in the basal ganglia and thalamus on language function. The potential of studies based on examination of the speech/language outcomes of patients undergoing pallidotomy and thalamotomy to further inform the debate relating to the role of subcortical structures in speech motor control and language processing is highlighted. Copyright (C) 2001 S. Karger AG, Basel.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de Mestrado em Psicologia da Educação, especialidade em Contextos Comunitários.