938 resultados para Linear Attention,Conditional Language Model,Natural Language Generation,FLAX,Rare diseases
Resumo:
Clinical text understanding (CTU) is of interest to health informatics because critical clinical information frequently represented as unconstrained text in electronic health records are extensively used by human experts to guide clinical practice, decision making, and to document delivery of care, but are largely unusable by information systems for queries and computations. Recent initiatives advocating for translational research call for generation of technologies that can integrate structured clinical data with unstructured data, provide a unified interface to all data, and contextualize clinical information for reuse in multidisciplinary and collaborative environment envisioned by CTSA program. This implies that technologies for the processing and interpretation of clinical text should be evaluated not only in terms of their validity and reliability in their intended environment, but also in light of their interoperability, and ability to support information integration and contextualization in a distributed and dynamic environment. This vision adds a new layer of information representation requirements that needs to be accounted for when conceptualizing implementation or acquisition of clinical text processing tools and technologies for multidisciplinary research. On the other hand, electronic health records frequently contain unconstrained clinical text with high variability in use of terms and documentation practices, and without commitmentto grammatical or syntactic structure of the language (e.g. Triage notes, physician and nurse notes, chief complaints, etc). This hinders performance of natural language processing technologies which typically rely heavily on the syntax of language and grammatical structure of the text. This document introduces our method to transform unconstrained clinical text found in electronic health information systems to a formal (computationally understandable) representation that is suitable for querying, integration, contextualization and reuse, and is resilient to the grammatical and syntactic irregularities of the clinical text. We present our design rationale, method, and results of evaluation in processing chief complaints and triage notes from 8 different emergency departments in Houston Texas. At the end, we will discuss significance of our contribution in enabling use of clinical text in a practical bio-surveillance setting.
Resumo:
OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS
Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs.
These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools.
Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate.
However, linguistic annotation tools have still some limitations, which can be summarised as follows:
1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.).
2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts.
3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc.
A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved.
In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool.
Therefore, it would be quite useful to find a way to
(i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools;
(ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate.
Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned.
Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section.
2. GOALS OF THE PRESENT WORK
As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based
Resumo:
The Quality of Life of a person may depend on early attention to his neurodevel-opment disorders in childhood. Identification of language disorders under the age of six years old can speed up required diagnosis and/or treatment processes. This paper details the enhancement of a Clinical Decision Support System (CDSS) aimed to assist pediatricians and language therapists at early identification and re-ferral of language disorders. The system helps to fine tune the Knowledge Base of Language Delays (KBLD) that was already developed and validated in clinical routine with 146 children. Medical experts supported the construction of Gades CDSS by getting scientific consensus from literature and fifteen years of regis-tered use cases of children with language disorders. The current research focuses on an innovative cooperative model that allows the evolution of the KBLD of Gades through the supervised evaluation of the CDSS learnings with experts¿ feedback. The deployment of the resulting system is being assessed under a mul-tidisciplinary team of seven experts from the fields of speech therapist, neonatol-ogy, pediatrics, and neurology.
Resumo:
An important part of human intelligence is the ability to use language. Humans learn how to use language in a society of language users, which is probably the most effective way to learn a language from the ground up. Principles that might allow an artificial agents to learn language this way are not known at present. Here we present a framework which begins to address this challenge. Our auto-catalytic, endogenous, reflective architecture (AERA) supports the creation of agents that can learn natural language by observation. We present results from two experiments where our S1 agent learns human communication by observing two humans interacting in a realtime mock television interview, using gesture and situated language. Results show that S1 can learn multimodal complex language and multimodal communicative acts, using a vocabulary of 100 words with numerous sentence formats, by observing unscripted interaction between the humans, with no grammar being provided to it a priori, and only high-level information about the format of the human interaction in the form of high-level goals of the interviewer and interviewee and a small ontology. The agent learns both the pragmatics, semantics, and syntax of complex sentences spoken by the human subjects on the topic of recycling of objects such as aluminum cans, glass bottles, plastic, and wood, as well as use of manual deictic reference and anaphora.
Resumo:
This paper presents a dynamic LM adaptation based on the topic that has been identified on a speech segment. We use LSA and the given topic labels in the training dataset to obtain and use the topic models. We propose a dynamic language model adaptation to improve the recognition performance in "a two stages" AST system. The final stage makes use of the topic identification with two variants: the first on uses just the most probable topic and the other one depends on the relative distances of the topics that have been identified. We perform the adaptation of the LM as a linear interpolation between a background model and topic-based LM. The interpolation weight id dynamically adapted according to different parameters. The proposed method is evaluated on the Spanish partition of the EPPS speech database. We achieved a relative reduction in WER of 11.13% over the baseline system which uses a single blackground LM.
Resumo:
This PhD dissertation is framed in the emergent fields of Reverse Logistics and ClosedLoop Supply Chain (CLSC) management. This subarea of supply chain management has gained researchers and practitioners' attention over the last 15 years to become a fully recognized subdiscipline of the Operations Management field. More specifically, among all the activities that are included within the CLSC area, the focus of this dissertation is centered in direct reuse aspects. The main contribution of this dissertation to current knowledge is twofold. First, a framework for the so-called reuse CLSC is developed. This conceptual model is grounded in a set of six case studies conducted by the author in real industrial settings. The model has also been contrasted with existing literature and with academic and professional experts on the topic as well. The framework encompasses four building blocks. In the first block, a typology for reusable articles is put forward, distinguishing between Returnable Transport Items (RTI), Reusable Packaging Materials (RPM), and Reusable Products (RP). In the second block, the common characteristics that render reuse CLSC difficult to manage from a logistical standpoint are identified, namely: fleet shrinkage, significant investment and limited visibility. In the third block, the main problems arising in the management of reuse CLSC are analyzed, such as: (1) define fleet size dimension, (2) control cycle time and promote articles rotation, (3) control return rate and prevent shrinkage, (4) define purchase policies for new articles, (5) plan and control reconditioning activities, and (6) balance inventory between depots. Finally, in the fourth block some solutions to those issues are developed. Firstly, problems (2) and (3) are addressed through the comparative analysis of alternative strategies for controlling cycle time and return rate. Secondly, a methodology for calculating the required fleet size is elaborated (problem (1)). This methodology is valid for different configurations of the physical flows in the reuse CLSC. Likewise, some directions are pointed out for further development of a similar method for defining purchase policies for new articles (problem (4)). The second main contribution of this dissertation is embedded in the solutions part (block 4) of the conceptual framework and comprises a two-level decision problem integrating two mixed integer linear programming (MILP) models that have been formulated and solved to optimality using AIMMS as modeling language, CPLEX as solver and Excel spreadsheet for data introduction and output presentation. The results obtained are analyzed in order to measure in a client-supplier system the economic impact of two alternative control strategies (recovery policies) in the context of reuse. In addition, the models support decision-making regarding the selection of the appropriate recovery policy against the characteristics of demand pattern and the structure of the relevant costs in the system. The triangulation of methods used in this thesis has enabled to address the same research topic with different approaches and thus, the robustness of the results obtained is strengthened.
Resumo:
Several languages have been proposed for the task of describing networks of systems, either to help on managing, simulate or deploy testbeds for testing purposes. However, there is no one specifically designed to describe the honeynets, covering the specific characteristics in terms of applications and tools included in the honeypot systems that make the honeynet. In this paper, the requirements of honeynet description are studied and a survey of existing description languages is presented, concluding that a CIM (Common Information Model) match the basic requirements. Thus, a CIM like technology independent honeynet description language (TIHDL) is proposed. The language is defined being independent of the platform where the honeynet will be deployed later, and it can be translated, either using model-driven techniques or other translation mechanisms, into the description languages of honeynet deployment platforms and tools. This approach gives flexibility to allow the use of a combination of heterogeneous deployment platforms. Besides, a flexible virtual honeynet generation tool (HoneyGen) based on the approach and description language proposed and capable of deploying honeynets over VNX (Virtual Networks over LinuX) and Honeyd platforms is presented for validation purposes.
Resumo:
This paper surveys some of the fundamental problems in natural language (NL) understanding (syntax, semantics, pragmatics, and discourse) and the current approaches to solving them. Some recent developments in NL processing include increased emphasis on corpus-based rather than example- or intuition-based work, attempts to measure the coverage and effectiveness of NL systems, dealing with discourse and dialogue phenomena, and attempts to use both analytic and stochastic knowledge. Critical areas for the future include grammars that are appropriate to processing large amounts of real language; automatic (or at least semi-automatic) methods for deriving models of syntax, semantics, and pragmatics; self-adapting systems; and integration with speech processing. Of particular importance are techniques that can be tuned to such requirements as full versus partial understanding and spoken language versus text. Portability (the ease with which one can configure an NL system for a particular application) is one of the largest barriers to application of this technology.
Resumo:
The integration of speech recognition with natural language understanding raises issues of how to adapt natural language processing to the characteristics of spoken language; how to cope with errorful recognition output, including the use of natural language information to reduce recognition errors; and how to use information from the speech signal, beyond just the sequence of words, as an aid to understanding. This paper reviews current research addressing these questions in the Spoken Language Program sponsored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken language differs from standard written language and discuss methods of coping with the difficulties of spontaneous speech. I then look at how systems cope with errors in speech recognition and at attempts to use natural language information to reduce recognition errors. Finally, I discuss how prosodic information in the speech signal might be used to improve understanding.
Resumo:
The field of natural language processing (NLP) has seen a dramatic shift in both research direction and methodology in the past several years. In the past, most work in computational linguistics tended to focus on purely symbolic methods. Recently, more and more work is shifting toward hybrid methods that combine new empirical corpus-based methods, including the use of probabilistic and information-theoretic techniques, with traditional symbolic methods. This work is made possible by the recent availability of linguistic databases that add rich linguistic annotation to corpora of natural language text. Already, these methods have led to a dramatic improvement in the performance of a variety of NLP systems with similar improvement likely in the coming years. This paper focuses on these trends, surveying in particular three areas of recent progress: part-of-speech tagging, stochastic parsing, and lexical semantics.
Resumo:
In recent years, several explanatory models have been developed which attempt to analyse the predictive worth of various factors in relation to academic achievement, as well as the direct and indirect effects that they produce. The aim of this study was to examine a structural model incorporating various cognitive and motivational variables which influence student achievement in the two basic core skills in the Spanish curriculum: Spanish Language and Mathematics. These variables included differential aptitudes, specific self-concept, goal orientations, effort and learning strategies. The sample comprised 341 Spanish students in their first year of Compulsory Secondary Education. Various tests and questionnaires were used to assess each student, and Structural Equation Modelling (SEM) was employed to study the relationships in the initial model. The proposed model obtained a satisfactory fit for the two subjects studied, and all the relationships hypothesised were significant. The variable with the most explanatory power regarding academic achievement was mathematical and verbal aptitude. Also notable was the direct influence of specific self-concept on achievement, goal-orientation and effort, as was the mediatory effect that effort and learning strategies had between academic goals and final achievement.
Resumo:
This introduction provides an overview of the state-of-the-art technology in Applications of Natural Language to Information Systems. Specifically, we analyze the need for such technologies to successfully address the new challenges of modern information systems, in which the exploitation of the Web as a main data source on business systems becomes a key requirement. It will also discuss the reasons why Human Language Technologies themselves have shifted their focus onto new areas of interest very directly linked to the development of technology for the treatment and understanding of Web 2.0. These new technologies are expected to be future interfaces for the new information systems to come. Moreover, we will review current topics of interest to this research community, and will present the selection of manuscripts that have been chosen by the program committee of the NLDB 2011 conference as representative cornerstone research works, especially highlighting their contribution to the advancement of such technologies.