16 resultados para 080107 Natural Language Processing
em National Center for Biotechnology Information - NCBI
Resumo:
The field of natural language processing (NLP) has seen a dramatic shift in both research direction and methodology in the past several years. In the past, most work in computational linguistics tended to focus on purely symbolic methods. Recently, more and more work is shifting toward hybrid methods that combine new empirical corpus-based methods, including the use of probabilistic and information-theoretic techniques, with traditional symbolic methods. This work is made possible by the recent availability of linguistic databases that add rich linguistic annotation to corpora of natural language text. Already, these methods have led to a dramatic improvement in the performance of a variety of NLP systems with similar improvement likely in the coming years. This paper focuses on these trends, surveying in particular three areas of recent progress: part-of-speech tagging, stochastic parsing, and lexical semantics.
Resumo:
The integration of speech recognition with natural language understanding raises issues of how to adapt natural language processing to the characteristics of spoken language; how to cope with errorful recognition output, including the use of natural language information to reduce recognition errors; and how to use information from the speech signal, beyond just the sequence of words, as an aid to understanding. This paper reviews current research addressing these questions in the Spoken Language Program sponsored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken language differs from standard written language and discuss methods of coping with the difficulties of spontaneous speech. I then look at how systems cope with errors in speech recognition and at attempts to use natural language information to reduce recognition errors. Finally, I discuss how prosodic information in the speech signal might be used to improve understanding.
Resumo:
This paper provides an overview of the colloquium's discussion session on natural language understanding, which followed presentations by M. Bates [Bates, M. (1995) Proc. Natl. Acad. Sci. USA 92, 9977-9982] and R. C. Moore [Moore, R. C. (1995) Proc. Natl. Acad. Sci. USA 92, 9983-9988]. The paper reviews the dual role of language processing in providing understanding of the spoken input and an additional source of constraint in the recognition process. To date, language processing has successfully provided understanding but has provided only limited (and computationally expensive) constraint. As a result, most current systems use a loosely coupled, unidirectional interface, such as N-best or a word network, with natural language constraints as a postprocess, to filter or resort the recognizer output. However, the level of discourse context provides significant constraint on what people can talk about and how things can be referred to; when the system becomes an active participant, it can influence this order. But sources of discourse constraint have not been extensively explored, in part because these effects can only be seen by studying systems in the context of their use in interactive problem solving. This paper argues that we need to study interactive systems to understand what kinds of applications are appropriate for the current state of technology and how the technology can move from the laboratory toward real applications.
Resumo:
This paper surveys some of the fundamental problems in natural language (NL) understanding (syntax, semantics, pragmatics, and discourse) and the current approaches to solving them. Some recent developments in NL processing include increased emphasis on corpus-based rather than example- or intuition-based work, attempts to measure the coverage and effectiveness of NL systems, dealing with discourse and dialogue phenomena, and attempts to use both analytic and stochastic knowledge. Critical areas for the future include grammars that are appropriate to processing large amounts of real language; automatic (or at least semi-automatic) methods for deriving models of syntax, semantics, and pragmatics; self-adapting systems; and integration with speech processing. Of particular importance are techniques that can be tuned to such requirements as full versus partial understanding and spoken language versus text. Portability (the ease with which one can configure an NL system for a particular application) is one of the largest barriers to application of this technology.
Resumo:
The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.
Resumo:
As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper.
Resumo:
Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.
Resumo:
Cerebral organization during sentence processing in English and in American Sign Language (ASL) was characterized by employing functional magnetic resonance imaging (fMRI) at 4 T. Effects of deafness, age of language acquisition, and bilingualism were assessed by comparing results from (i) normally hearing, monolingual, native speakers of English, (ii) congenitally, genetically deaf, native signers of ASL who learned English late and through the visual modality, and (iii) normally hearing bilinguals who were native signers of ASL and speakers of English. All groups, hearing and deaf, processing their native language, English or ASL, displayed strong and repeated activation within classical language areas of the left hemisphere. Deaf subjects reading English did not display activation in these regions. These results suggest that the early acquisition of a natural language is important in the expression of the strong bias for these areas to mediate language, independently of the form of the language. In addition, native signers, hearing and deaf, displayed extensive activation of homologous areas within the right hemisphere, indicating that the specific processing requirements of the language also in part determine the organization of the language systems of the brain.
Resumo:
Mature immunologically competent dendritic cells are the most efficient antigen-presenting cells that powerfully activate T cells and initiate and sustain immune responses. Indeed, dendritic cells are able to efficiently capture antigens, express high levels of costimulatory molecules, and produce the combination of cytokines required to create a powerful immune response. They are also considered to be important in initiating autoimmune disease by efficiently presenting autoantigens to self-reactive T cells that, in this case, will mount a pathogenic autoimmune reaction. Triggering T cells is not a simple on–off procedure, as T cell receptor responds to minor changes in ligand with gradations of T cell activation and effector functions. These “misfit” peptides have been called Altered Peptide Ligands, and have been shown to have important biological significance. Here, we show that fully capable dendritic cells may present, upon natural antigen processing, a self-epitope with Altered Peptide Ligands features that can unexpectedly induce anergy in a human autoreactive T cell clone. These results indicate that presentation of a self-epitope by immunologically competent dendritic cells does not always mean “danger” and show a mechanism involved in the fine balance between activation and tolerance induction in humans.
Resumo:
A simple evolutionary process can discover sophisticated methods for emergent information processing in decentralized spatially extended systems. The mechanisms underlying the resulting emergent computation are explicated by a technique for analyzing particle-based logic embedded in pattern-forming systems. Understanding how globally coordinated computation can emerge in evolution is relevant both for the scientific understanding of natural information processing and for engineering new forms of parallel computing systems.
Resumo:
We compared magnetoencephalographic responses for natural vowels and for sounds consisting of two pure tones that represent the two lowest formant frequencies of these vowels. Our aim was to determine whether spectral changes in successive stimuli are detected differently for speech and nonspeech sounds. The stimuli were presented in four blocks applying an oddball paradigm (20% deviants, 80% standards): (i) /α/ tokens as deviants vs. /i/ tokens as standards; (ii) /e/ vs. /i/; (iii) complex tones representing /α/ formants vs. /i/ formants; and (iv) complex tones representing /e/ formants vs. /i/ formants. Mismatch fields (MMFs) were calculated by subtracting the source waveform produced by standards from that produced by deviants. As expected, MMF amplitudes for the complex tones reflected acoustic deviation: the amplitudes were stronger for the complex tones representing /α/ than /e/ formants, i.e., when the spectral difference between standards and deviants was larger. In contrast, MMF amplitudes for the vowels were similar despite their different spectral composition, whereas the MMF onset time was longer for /e/ than for /α/. Thus the degree of spectral difference between standards and deviants was reflected by the MMF amplitude for the nonspeech sounds and by the MMF latency for the vowels.
Resumo:
Several unanswered questions in T cell immunobiology relating to intracellular processing or in vivo antigen presentation could be approached if convenient, specific, and sensitive reagents were available for detecting the peptide–major histocompatibility complex (MHC) class I or class II ligands recognized by αβ T cell receptors. For this reason, we have developed a method using homogeneously loaded peptide–MHC class II complexes to generate and select specific mAb reactive with these structures using hen egg lysozyme (HEL) and I-Ak as a model system. mAbs specific for either HEL-(46–61)–Ak or HEL-(116–129)–Ak have been isolated. They cross-react with a small subset of I-Ak molecules loaded with self peptides but can nonetheless be used for flow cytometry, immunoprecipitation, Western blotting, and intracellular immunofluorescence to detect specific HEL peptide–MHC class II complexes formed by either peptide exposure or natural processing of native HEL. An example of the utility of these reagents is provided herein by using one of the anti-HEL-(46–61)–Ak specific mAbs to visualize intracellular compartments where I-Ak is loaded with HEL-derived peptides early after antigen administration. Other uses, especially for in vivo tracking of specific ligand-bearing antigen-presenting cells, are discussed.
Resumo:
Syntax denotes a rule system that allows one to predict the sequencing of communication signals. Despite its significance for both human speech processing and animal acoustic communication, the representation of syntactic structure in the mammalian brain has not been studied electrophysiologically at the single-unit level. In the search for a neuronal correlate for syntax, we used playback of natural and temporally destructured complex species-specific communication calls—so-called composites—while recording extracellularly from neurons in a physiologically well defined area (the FM–FM area) of the mustached bat’s auditory cortex. Even though this area is known to be involved in the processing of target distance information for echolocation, we found that units in the FM–FM area were highly responsive to composites. The finding that neuronal responses were strongly affected by manipulation in the time domain of the natural composite structure lends support to the hypothesis that syntax processing in mammals occurs at least at the level of the nonprimary auditory cortex.
Resumo:
This article reviews attempts to characterize the mental operations mediated by left inferior prefrontal cortex, especially the anterior and inferior portion of the gyrus, with the functional neuroimaging techniques of positron emission tomography and functional magnetic resonance imaging. Activations in this region occur during semantic, relative to nonsemantic, tasks for the generation of words to semantic cues or the classification of words or pictures into semantic categories. This activation appears in the right prefrontal cortex of people known to be atypically right-hemisphere dominant for language. In this region, activations are associated with meaningful encoding that leads to superior explicit memory for stimuli and deactivations with implicit semantic memory (repetition priming) for words and pictures. New findings are reported showing that patients with global amnesia show deactivations in the same region associated with repetition priming, that activation in this region reflects selection of a response from among numerous relative to few alternatives, and that activations in a portion of this region are associated specifically with semantic relative to phonological processing. It is hypothesized that activations in left inferior prefrontal cortex reflect a domain-specific semantic working memory capacity that is invoked more for semantic than nonsemantic analyses regardless of stimulus modality, more for initial than for repeated semantic analysis of a word or picture, more when a response must be selected from among many than few legitimate alternatives, and that yields superior later explicit memory for experiences.
Resumo:
Natural killer (NK) cells are inhibited from killing cellular targets by major histocompatibility complex (MHC) class I molecules. In the mouse, this can be mediated by the Ly-49A NK cell receptor that specifically binds the H-2Dd MHC class I molecule, then inhibits NK cell activity. Previous experiments have indicated that Ly-49A recognizes the alpha 1/alpha 2 domains of MHC class I and that no specific MHC-bound peptide appeared to be involved. We demonstrate here that alanine-substituted peptides, having only the minimal anchor motifs, stabilized H-2Dd expression and provided resistance to H-2Dd-transfected, transporter associated with processing (TAP)-deficient cells from lysis by Ly-49A+ NK cells. Peptide-induced resistance was blocked only by an mAb that binds a conformational determinant on H-2Dd. Moreover, stabilization of "empty" H-2Dd heavy chains by exogenous beta 2-microglobulin did not confer resistance. In contrast to data for MHC class I-restricted T cells that are specific for peptides displayed MHC molecules, these data indicate that NK cells are specific for a peptide-induced conformational determinant, independent of specific peptide. This fundamental distinction between NK cells and T cells further implies that NK cells are sensitive only to global changes in MHC class I conformation or expression, rather than to specific pathogen-encoded peptides. This is consistent with the "missing self" hypothesis, which postulates that NK cells survey tissues for normal expression of MHC class I.