14 resultados para Freedom of Speech
em National Center for Biotechnology Information - NCBI
Resumo:
The term "speech synthesis" has been used for diverse technical approaches. In this paper, some of the approaches used to generate synthetic speech in a text-to-speech system are reviewed, and some of the basic motivations for choosing one method over another are discussed. It is important to keep in mind, however, that speech synthesis models are needed not just for speech generation but to help us understand how speech is created, or even how articulation can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages are discussed as special challenges facing the speech synthesis community.
Resumo:
The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized.
Resumo:
The integration of speech recognition with natural language understanding raises issues of how to adapt natural language processing to the characteristics of spoken language; how to cope with errorful recognition output, including the use of natural language information to reduce recognition errors; and how to use information from the speech signal, beyond just the sequence of words, as an aid to understanding. This paper reviews current research addressing these questions in the Spoken Language Program sponsored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken language differs from standard written language and discuss methods of coping with the difficulties of spontaneous speech. I then look at how systems cope with errors in speech recognition and at attempts to use natural language information to reduce recognition errors. Finally, I discuss how prosodic information in the speech signal might be used to improve understanding.
Resumo:
Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.
Resumo:
Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.
Resumo:
Investigation of the three-generation KE family, half of whose members are affected by a pronounced verbal dyspraxia, has led to identification of their core deficit as one involving sequential articulation and orofacial praxis. A positron emission tomography activation study revealed functional abnormalities in both cortical and subcortical motor-related areas of the frontal lobe, while quantitative analyses of magnetic resonance imaging scans revealed structural abnormalities in several of these same areas, particularly the caudate nucleus, which was found to be abnormally small bilaterally. A recent linkage study [Fisher, S., Vargha-Khadem, F., Watkins, K. E., Monaco, A. P. & Pembry, M. E. (1998) Nat. Genet. 18, 168170] localized the abnormal gene (SPCH1) to a 5.6-centiMorgan interval in the chromosomal band 7q31. The genetic mutation or deletion in this region has resulted in the abnormal development of several brain areas that appear to be critical for both orofacial movements and sequential articulation, leading to marked disruption of speech and expressive language.
Resumo:
In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.
Resumo:
Atomic level structures have been determined for the soluble forms of several colicins and toxins, but the structural changes that occur after membrane binding have not been well characterized. Changes occurring in the transition from the soluble to membrane-bound state of the C-terminal 190-residue channel polypeptide of colicin E1 (P190) bound to anionic membranes are described. In the membrane-bound state, the -helical content increases from 6064% to 8090%, with a concomitant increase in the average length of the helical segments from 12 to 16 or 17 residues, close to the length required to span the membrane bilayer in the open channel state. The average distance between helical segments is increased and interhelix interactions are weakened, as shown by a major loss of tertiary structure interactions, decreased efficiency of fluorescence resonance energy transfer from an energy donor on helix V of P190 to an acceptor on helix IX, and decreased resonance energy transfer at higher temperatures, not observed in soluble P190, implying freedom of motion of helical segments. Weaker interactions are also shown by a calorimetric thermal transition of low cooperativity, and the extended nature of the helical array is shown by a 3- to 4-fold increase in the average area subtended per molecule to 4,200 2 on the membrane surface. The latter, with analysis of the heat capacity changes, implies the absence of a developed hydrophobic core in the membrane-bound P190. The membrane interfacial layer thus serves to promote formation of a highly helical extended two-dimensional flexible net. The properties of the membrane-bound state of the colicin channel domain (i.e., hydrophobic anchor, lengthened and loosely coupled -helices, and close association with the membrane interfacial layer) are plausible structural features for the state that is a prerequisite for voltage gating, formation of transmembrane helices, and channel opening.
Resumo:
Although the catalytic (C) subunit of cAMP-dependent protein kinase is N-myristylated, it is a soluble protein, and no physiological role has been identified for its myristyl moiety. To determine whether the interaction of the two regulatory (R) subunit isoforms (RI and RII) with the N-myristylated C subunit affects its ability to target membranes, the effect of N-myristylation and the RI and RII subunit isoforms on C subunit binding to phosphatidylcholine/phosphatidylserine liposomes was examined. Only the combination of N-myristylation and RII subunit interaction produced a dramatic increase in the rate of liposomal binding. To assess whether the RII subunit also increased the conformational flexibility of the C subunit N terminus, the effect of N-myristylation and the RI and RII subunits on the rotational freedom of the C subunit N terminus was measured. Specifically, fluorescein maleimide was conjugated to Cys-16 in the N-terminal domain of a K16C mutant of the C subunit, and the time-resolved emission anisotropy was determined. The interaction of the RII subunit, but not the RI subunit, significantly increased the backbone flexibility around the site of mutation and labeling, strongly suggesting that RII subunit binding to the myristylated C subunit induced a unique conformation of the C subunit that is associated with an increase in both the N-terminal flexibility and the exposure of the N-myristate. RII subunit thus appears to serve as an intermolecular switch that disrupts of the link between the N-terminal and core catalytic domains of the C subunit to expose the N-myristate and poise the holoenzyme for interaction with membranes.
Resumo:
Conflicts can occur between the principle of freedom of information treasured by librarians and ethical standards of scientific research involving the propriety of using data derived from immoral or dishonorable experimentation. A prime example of this conflict was brought to the attention of the medical and library communities in 1995 when articles claiming that the subjects of the illustrations in the classic anatomy atlas, Eduard Pernkopf's Topographische Anatomie des Menschen, were victims of the Nazi holocaust. While few have disputed the accuracy, artistic, or educational value of the Pernkopf atlas, some have argued that the use of such subjects violates standards of medical ethics involving inhuman and degrading treatment of subjects or disrespect of a human corpse. Efforts were made to remove the book from medical libraries. In this article, the history of the Pernkopf atlas and the controversy surrounding it are reviewed. The results of a survey of academic medical libraries concerning their treatment of the Pernkopf atlas are reported, and the ethical implications of these issues as they affect the responsibilities of librarians is discussed.
Resumo:
At the forefront of debates on language are new data demonstrating infants' early acquisition of information about their native language. The data show that infants perceptually map critical aspects of ambient language in the first year of life before they can speak. Statistical properties of speech are picked up through exposure to ambient language. Moreover, linguistic experience alters infants' perception of speech, warping perception in the service of language. Infants' strategies are unexpected and unpredicted by historical views. A new theoretical position has emerged, and six postulates of this position are described.
Resumo:
Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.
Resumo:
This paper introduces the session "Technology in the Year 2001" and is the first of four papers dealing with the future of human-machine communication by voice. In looking to the future it is important to recognize both the difficulties of technological forecasting and the frailties of the technology as it exists today--frailties that are manifestations of our limited scientific understanding of human cognition. The technology to realize truly advanced applications does not yet exist and cannot be supported by our presently incomplete science of speech. To achieve this long-term goal, the authors advocate a fundamental research program using a cybernetic approach substantially different from more conventional synthetic approaches. In a cybernetic approach, feedback control systems will allow a machine to adapt to a linguistically rich environment using reinforcement learning.
Resumo:
Research in speech recognition and synthesis over the past several decades has brought speech technology to a point where it is being used in "real-world" applications. However, despite the progress, the perception remains that the current technology is not flexible enough to allow easy voice communication with machines. The focus of speech research is now on producing systems that are accurate and robust but that do not impose unnecessary constraints on the user. This chapter takes a critical look at the shortcomings of the current speech recognition and synthesis algorithms, discusses the technical challenges facing research, and examines the new directions that research in speech recognition and synthesis must take in order to form the basis of new solutions suitable for supporting a wide range of applications.