20 resultados para Decoding Speech Prosody
em National Center for Biotechnology Information - NCBI
Resumo:
Investigation of the three-generation KE family, half of whose members are affected by a pronounced verbal dyspraxia, has led to identification of their core deficit as one involving sequential articulation and orofacial praxis. A positron emission tomography activation study revealed functional abnormalities in both cortical and subcortical motor-related areas of the frontal lobe, while quantitative analyses of magnetic resonance imaging scans revealed structural abnormalities in several of these same areas, particularly the caudate nucleus, which was found to be abnormally small bilaterally. A recent linkage study [Fisher, S., Vargha-Khadem, F., Watkins, K. E., Monaco, A. P. & Pembry, M. E. (1998) Nat. Genet. 18, 168–170] localized the abnormal gene (SPCH1) to a 5.6-centiMorgan interval in the chromosomal band 7q31. The genetic mutation or deletion in this region has resulted in the abnormal development of several brain areas that appear to be critical for both orofacial movements and sequential articulation, leading to marked disruption of speech and expressive language.
Resumo:
The temporally encoded information obtained by vibrissal touch could be decoded “passively,” involving only input-driven elements, or “actively,” utilizing intrinsically driven oscillators. A previous study suggested that the trigeminal somatosensory system of rats does not obey the bottom-up order of activation predicted by passive decoding. Thus, we have tested whether this system obeys the predictions of active decoding. We have studied cortical single units in the somatosensory cortices of anesthetized rats and guinea pigs and found that about a quarter of them exhibit clear spontaneous oscillations, many of them around whisking frequencies (≈10 Hz). The frequencies of these oscillations could be controlled locally by glutamate. These oscillations could be forced to track the frequency of induced rhythmic whisker movements at a stable, frequency-dependent, phase difference. During these stimulations, the response intensities of multiunits at the thalamic recipient layers of the cortex decreased, and their latencies increased, with increasing input frequency. These observations are consistent with thalamocortical loops implementing phase-locked loops, circuits that are most efficient in decoding temporally encoded information like that obtained by active vibrissal touch. According to this model, and consistent with our results, populations of thalamic “relay” neurons function as phase “comparators” that compare cortical timing expectations with the actual input timing and represent the difference by their population output rate.
Resumo:
The three genes, gatC, gatA, and gatB, which constitute the transcriptional unit of the Bacillus subtilis glutamyl-tRNAGln amidotransferase have been cloned. Expression of this transcriptional unit results in the production of a heterotrimeric protein that has been purified to homogeneity. The enzyme furnishes a means for formation of correctly charged Gln-tRNAGln through the transamidation of misacylated Glu-tRNAGln, functionally replacing the lack of glutaminyl-tRNA synthetase activity in Gram-positive eubacteria, cyanobacteria, Archaea, and organelles. Disruption of this operon is lethal. This demonstrates that transamidation is the only pathway to Gln-tRNAGln in B. subtilis and that glutamyl-tRNAGln amidotransferase is a novel and essential component of the translational apparatus.
Resumo:
The τ and γ subunits of DNA polymerase III are both encoded by a single gene in Escherichia coli and Thermus thermophilus. γ is two-thirds the size of τ and shares virtually all its amino acid sequence with τ. E. coli and T. thermophilus have evolved very different mechanisms for setting the approximate 1:1 ratio between τ and γ. Both mechanisms put ribosomes into alternate reading frames so that stop codons in the new frame serve to make the smaller γ protein. In E. coli, ≈50% of initiating ribosomes translate the dnaX mRNA conventionally to give τ, but the other 50% shift into the −1 reading frame at a specific site (A AAA AAG) in the mRNA to produce γ. In T. thermophilus ribosomal frameshifting is not required: the dnaX mRNA is a heterogeneous population of molecules with different numbers of A residues arising from transcriptional slippage on a run of nine T residues in the DNA template. Translation of the subpopulation containing nine As (or +/− multiples of three As) yields τ. The rest of the population of mRNAs (containing nine +/− nonmultiples of three As) puts ribosomes into the alternate reading frames to produce the γ protein(s). It is surprising that two rather similar dnaX sequences in E. coli and T. thermophilus lead to very different mechanisms of expression.
Resumo:
Spoken language is one of the most compact and structured ways to convey information. The linguistic ability to structure individual words into larger sentence units permits speakers to express a nearly unlimited range of meanings. This ability is rooted in speakers' knowledge of syntax and in the corresponding process of syntactic encoding. Syntactic encoding is highly automatized, operates largely outside of conscious awareness, and overlaps closely in time with several other processes of language production. With the use of positron emission tomography we investigated the cortical activations during spoken language production that are related to the syntactic encoding process. In the paradigm of restrictive scene description, utterances varying in complexity of syntactic encoding were elicited. Results provided evidence that the left Rolandic operculum, caudally adjacent to Broca's area, is involved in both sentence-level and local (phrase-level) syntactic encoding during speaking.
Resumo:
Lesions to left frontal cortex in humans produce speech production impairments (nonfluent aphasia). These impairments vary from subject to subject and performance on certain speech production tasks can be relatively preserved in some patients. A possible explanation for preservation of function under these circumstances is that areas outside left prefrontal cortex are used to compensate for the injured brain area. We report here a direct demonstration of preserved language function in a stroke patient (LF1) apparently due to the activation of a compensatory brain pathway. We used functional brain imaging with positron emission tomography (PET) as a basis for this study.
Resumo:
Computer speech synthesis has reached a high level of performance, with increasingly sophisticated models of linguistic structure, low error rates in text analysis, and high intelligibility in synthesis from phonemic input. Mass market applications are beginning to appear. However, the results are still not good enough for the ubiquitous application that such technology will eventually have. A number of alternative directions of current research aim at the ultimate goal of fully natural synthetic speech. One especially promising trend is the systematic optimization of large synthesis systems with respect to formal criteria of evaluation. Speech recognition has progressed rapidly in the past decade through such approaches, and it seems likely that their application in synthesis will produce similar improvements.
Resumo:
The term "speech synthesis" has been used for diverse technical approaches. In this paper, some of the approaches used to generate synthetic speech in a text-to-speech system are reviewed, and some of the basic motivations for choosing one method over another are discussed. It is important to keep in mind, however, that speech synthesis models are needed not just for speech generation but to help us understand how speech is created, or even how articulation can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages are discussed as special challenges facing the speech synthesis community.
Resumo:
Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.
Resumo:
The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized.
Resumo:
This paper introduces the session on advanced speech recognition technology. The two papers comprising this session argue that current technology yields a performance that is only an order of magnitude in error rate away from human performance and that incremental improvements will bring us to that desired level. I argue that, to the contrary, present performance is far removed from human performance and a revolution in our thinking is required to achieve the goal. It is further asserted that to bring about the revolution more effort should be expended on basic research and less on trying to prematurely commercialize a deficient technology.
Resumo:
In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.
Resumo:
Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.
Resumo:
The integration of speech recognition with natural language understanding raises issues of how to adapt natural language processing to the characteristics of spoken language; how to cope with errorful recognition output, including the use of natural language information to reduce recognition errors; and how to use information from the speech signal, beyond just the sequence of words, as an aid to understanding. This paper reviews current research addressing these questions in the Spoken Language Program sponsored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken language differs from standard written language and discuss methods of coping with the difficulties of spontaneous speech. I then look at how systems cope with errors in speech recognition and at attempts to use natural language information to reduce recognition errors. Finally, I discuss how prosodic information in the speech signal might be used to improve understanding.