53 resultados para Linear Attention,Conditional Language Model,Natural Language Generation,FLAX,Rare diseases
em Universidad Politécnica de Madrid
Resumo:
En el presente Trabajo de Fin de Máster se ha realizado un análisis sobre las técnicas y herramientas de Generación de Lenguaje Natural (GLN), así como las modificaciones a la herramienta Simple NLG para generar expresiones en el idioma Español. Dicha extensión va a permitir ampliar el grupo de personas a las cuales se les transmite la información, ya que alrededor de 540 millones de personas hablan español. Keywords - Generación de Lenguaje Natural, técnicas de GLN, herramientas de GLN, Inteligencia Artificial, análisis, SimpleNLG.---ABSTRACT---In this Master's Thesis has been performed an analysis on techniques and tools for Natural Language Generation (NLG), also the Simple NLG tool has been modified in order to generate expressions in the Spanish language. This modification will allow transmitting the information to more people; around 540 million people speak Spanish. Keywords - Natural Language Generation, NLG tools, NLG techniques, Artificial Intelligence, analysis, SimpleNLG.
Resumo:
In the context of the Semantic Web, natural language descriptions associated with ontologies have proven to be of major importance not only to support ontology developers and adopters, but also to assist in tasks such as ontology mapping, information extraction, or natural language generation. In the state-of-the-art we find some attempts to provide guidelines for URI local names in English, and also some disagreement on the use of URIs for describing ontology elements. When trying to extrapolate these ideas to a multilingual scenario, some of these approaches fail to provide a valid solution. On the basis of some real experiences in the translation of ontologies from English into Spanish, we provide a preliminary set of guidelines for naming and labeling ontologies in a multilingual scenario.
Resumo:
Effective data summarization methods that use AI techniques can help humans understand large sets of data. In this paper, we describe a knowledge-based method for automatically generating summaries of geospatial and temporal data, i.e. data with geographical and temporal references. The method is useful for summarizing data streams, such as GPS traces and traffic information, that are becoming more prevalent with the increasing use of sensors in computing devices. The method presented here is an initial architecture for our ongoing research in this domain. In this paper we describe the data representations we have designed for our method, our implementations of components to perform data abstraction and natural language generation. We also discuss evaluation results that show the ability of our method to generate certain types of geospatial and temporal descriptions.
Resumo:
Esta tesis estudia el comportamiento de la región exterior de una capa límite turbulenta sin gradientes de presiones. Se ponen a prueba dos teorías relativamente bien establecidas. La teoría de semejanza para la pared supone que en el caso de haber una pared rugosa, el fluido sólo percibe el cambio en la fricción superficial que causa, y otros efectos secundarios quedarán confinados a una zona pegada a la pared. El consenso actual es que dicha teoría es aproximadamente cierta. En el extremo exterior de la capa límite existe una región producida por la interacción entre las estructuras turbulentas y el flujo irrotacional de la corriente libre llamada interfaz turbulenta/no turbulenta. La mayoría de los resultados al respecto sugieren la presencia de fuerzas de cortadura ligeramente más intensa, lo que la hace distinta al resto del flujo turbulento. Las propiedades de esa región probablemente cambien si la velocidad de crecimiento de la capa límite aumenta, algo que puede conseguirse aumentando la fricción en la pared. La rugosidad y la ingestión de masa están entonces relacionadas, y el comportamiento local de la interfaz turbulenta/no turbulenta puede explicar el motivo por el que las capas límite sobre paredes rugosas no se comportan como en el caso de tener paredes lisas precisamente en la zona exterior. Para estudiar las capas límite a números de Reynolds lo suficientemente elevados, se ha desarrollado un nuevo código de alta resolución para la simulación numérica directa de capas límite turbulentas sin gradiente de presión. Dicho código es capaz de simular capas límite en un intervalo de números de Reynolds entre ReT = 100 — 2000 manteniendo una buena escalabilidad hasta los dos millones de hilos en superordenadores de tipo Blue Gene/Q. Se ha guardado especial atención a la generación de condiciones de contorno a la entrada correctas. Los resultados obtenidos están en concordancia con los resultados previos, tanto en el caso de simulaciones como de experimentos. La interfaz turbulenta/no turbulenta de una capa límite se ha analizado usando un valor umbral del módulo de la vorticidad. Dicho umbral se considera un parámetro para analizar cada superficie obtenida de un contorno del módulo de la vorticidad. Se han encontrado dos regímenes distintos en función del umbral escogido con propiedades opuestas, separados por una transición topológica gradual. Las características geométricas de la zona escalan con o99 cuando u^/isdgg es la unidad de vorticidad. Las propiedades del íluido relativas a la posición del contorno de vorticidad han sido analizados para una serie de umbrales utilizando el campo de distancias esféricas, que puede obtenerse con independencia de la complejidad de la superficie de referencia. Las propiedades del fluido a una distancia dada del inerfaz también dependen del umbral de vorticidad, pero tienen características parecidas con independencia del número de Reynolds. La interacción entre la turbulencia y el flujo no turbulento se restringe a una zona muy fina con un espesor del orden de la escala de Kolmogorov local. Hacia el interior del flujo turbulento las propiedades son indistinguibles del resto de la capa límite. Se ha simulado una capa límite sin gradiente de presiones con una fuerza volumétrica cerca de la pared. La el forzado ha sido diseñado para aumentar la fricción en la pared sin introducir ningún efecto geométrico obvio. La simulación consta de dos dominios, un primer dominio más pequeño y a baja resolución que se encarga de generar condiciones de contorno correctas, y un segundo dominio mayor y a alta resolución donde se aplica el forzado. El estudio de los perfiles y los coeficientes de autocorrelación sugieren que los dos casos, el liso y el forzado, no colapsan más allá de la capa logarítmica por la complejidad geométrica de la zona intermitente, y por el hecho que la distancia a la pared no es una longitud característica. Los efectos causados por la geometría de la zona intermitente pueden evitarse utilizando el interfaz como referencia, y la distancia esférica para el análisis de sus propiedades. Las propiedades condicionadas del flujo escalan con 5QQ y u/uT, las dos únicas escalas contenidas en el modelo de semejanza de pared de Townsend, consistente con estos resultados. ABSTRACT This thesis studies the characteristics of the outer region of zero-pressure-gradient turbulent boundary layers at moderate Reynolds numbers. Two relatively established theories are put to test. The wall similarity theory states that with the presence of roughness, turbulent motion is mostly affected by the additional drag caused by the roughness, and that other secondary effects are restricted to a region very close to the wall. The consensus is that this theory is valid, but only as a first approximation. At the edge of the boundary layer there is a thin layer caused by the interaction between the turbulent eddies and the irroational fluid of the free stream, called turbulent/non-turbulent interface. The bulk of results about this layer suggest the presence of some localized shear, with properties that make it distinguishable from the rest of the turbulent flow. The properties of the interface are likely to change if the rate of spread of the turbulent boundary layer is amplified, an effect that is usually achieved by increasing the drag. Roughness and entrainment are therefore linked, and the local features of the turbulent/non-turbulent interface may explain the reason why rough-wall boundary layers deviate from the wall similarity theory precisely far from the wall. To study boundary layers at a higher Reynolds number, a new high-resolution code for the direct numerical simulation of a zero pressure gradient turbulent boundary layers over a flat plate has been developed. This code is able to simulate a wide range of Reynolds numbers from ReT =100 to 2000 while showing a linear weak scaling up to around two million threads in the BG/Q architecture. Special attention has been paid to the generation of proper inflow boundary conditions. The results are in good agreement with existing numerical and experimental data sets. The turbulent/non-turbulent interface of a boundary layer is analyzed by thresholding the vorticity magnitude field. The value of the threshold is considered a parameter in the analysis of the surfaces obtained from isocontours of the vorticity magnitude. Two different regimes for the surface can be distinguished depending on the threshold, with a gradual topological transition across which its geometrical properties change significantly. The width of the transition scales well with oQg when u^/udgg is used as a unit of vorticity. The properties of the flow relative to the position of the vorticity magnitude isocontour are analyzed within the same range of thresholds, using the ball distance field, which can be obtained regardless of the size of the domain and complexity of the interface. The properties of the flow at a given distance to the interface also depend on the threshold, but they are similar regardless of the Reynolds number. The interaction between the turbulent and the non-turbulent flow occurs in a thin layer with a thickness that scales with the Kolmogorov length. Deeper into the turbulent side, the properties are undistinguishable from the rest of the turbulent flow. A zero-pressure-gradient turbulent boundary layer with a volumetric near-wall forcing has been simulated. The forcing has been designed to increase the wall friction without introducing any obvious geometrical effect. The actual simulation is split in two domains, a smaller one in charge of the generation of correct inflow boundary conditions, and a second and larger one where the forcing is applied. The study of the one-point and twopoint statistics suggest that the forced and the smooth cases do not collapse beyond the logarithmic layer may be caused by the geometrical complexity of the intermittent region, and by the fact that the scaling with the wall-normal coordinate is no longer present. The geometrical effects can be avoided using the turbulent/non-turbulent interface as a reference frame, and the minimum distance respect to it. The conditional analysis of the vorticity field with the alternative reference frame recovers the scaling with 5QQ and v¡uT already present in the logarithmic layer, the only two length-scales allowed if Townsend’s wall similarity hypothesis is valid.
Resumo:
El presente trabajo se ha centrado en la investigación de soluciones para automatizar la tarea del enriquecimiento de fuentes de datos sobre redes de sensores con descripciones lingüísticas, con el fin de facilitar la posterior generación de textos en lenguaje natural. El uso de descripciones en lenguaje natural facilita el acceso a los datos a una mayor diversidad de usuarios y, como consecuencia, permite aprovechar mejor las inversiones en redes de sensores. En el trabajo se ha considerado el uso de bases de datos abiertas para abordar la necesidad de disponer de un gran volumen y diversidad de conocimiento geográfico. Se ha analizado también el enriquecimiento de datos dentro de enfoques metodológicos de curación de datos y métodos de generación de lenguaje natural. Como resultado del trabajo, se ha planteado un método general basado en una estrategia de generación y prueba que incluye una forma de representación y uso del conocimiento heurístico con varias etapas de razonamiento para la construcción de descripciones lingüísticas de enriquecimiento de datos. En la evaluación de la propuesta general se han manejado tres escenarios, dos de ellos para generación de referencias geográficas sobre redes de sensores complejas de dimensión real y otro para la generación de referencias temporales. Los resultados de la evaluación han mostrado la validez práctica de la propuesta general exhibiendo mejoras de rendimiento respecto a otros enfoques. Además, el análisis de los resultados ha permitido identificar y cuantificar el impacto previsible de diversas líneas de mejora en bases de datos abiertas. ABSTRACT This work has focused on the search for solutions to automate the task of enrichment sensor-network-based data sources with textual descriptions, so as to facilitate the generation of natural language texts. Using natural language descriptions facilitates data access to a wider range of users and, therefore, allows better leveraging investments in sensor networks. In this work we have considered the use of open databases to address the need for a large volume and diversity of geographical knowledge. We have also analyzed data enrichment in methodological approaches and data curation methods of natural language generation. As a result, it has raised a general method based on a strategy of generating and testing that includes a representation using heuristic knowledge with several stages of reasoning for the construction of linguistic descriptions of data enrichment. In assessing the overall proposal three scenarios have been addressed, two of them in the environmental domain with complex sensor networks and another real dimension in the time domain. The evaluation results have shown the validity and practicality of our proposal, showing performance improvements over other approaches. Furthermore, the analysis of the results has allowed identifying and quantifying the expected impact of various lines of improvement in open databases.
Resumo:
A new high-resolution code for the direct numerical simulation of a zero pressure gradient turbulent boundary layers over a flat plate has been developed. Its purpose is to simulate a wide range of Reynolds numbers from Reθ = 300 to 6800 while showing a linear weak scaling up to 32,768 cores in the BG/P architecture. Special attention has been paid to the generation of proper inflow boundary conditions. The results are in good agreement with existing numerical and experimental data sets.
Resumo:
The presence of Pinus nigra in central Spain, where its natural populations are very rare, has led to different interpretations of the current vegetation dynamics. Complementary to the available palynological evidence, macroremains provide local information of high taxonomic resolution that helps to reconstruct the palaeobiogeography of a given species. Here we present new macrofossil data from Tubilla del Lago, a small palaeolake located at the eastern part of the northern Iberian Meseta. We identified 17 wood samples and 71 cones on the basis of their wood anatomy and morphology, respectively. S ome of the fossil samples were radiocarbon dated (~4.230-3210 years cal BP). The results demonstrate the Holocene presence of P. nigra in the study area, where it is currently extinct. This evidence, together with other published palaeobotanical studies, indicates that the forests dominated by P. nigra must have had a larger importance on the landscape prior to the anthropogenic influence on the northern Iberian Meseta.
Resumo:
The Linked Data initiative offers a straight method to publish structured data in the World Wide Web and link it to other data, resulting in a world wide network of semantically codified data known as the Linked Open Data cloud. The size of the Linked Open Data cloud, i.e. the amount of data published using Linked Data principles, is growing exponentially, including life sciences data. However, key information for biological research is still missing in the Linked Open Data cloud. For example, the relation between orthologs genes and genetic diseases is absent, even though such information can be used for hypothesis generation regarding human diseases. The OGOLOD system, an extension of the OGO Knowledge Base, publishes orthologs/diseases information using Linked Data. This gives the scientists the ability to query the structured information in connection with other Linked Data and to discover new information related to orthologs and human diseases in the cloud.
Resumo:
We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to improve the performance of the speech recognition (up to a 14.82% of relative improvement), which leads to an improvement in both the language understanding and the dialogue management tasks.
Resumo:
Providing descriptions of isolated sensors and sensor networks in natural language, understandable by the general public, is useful to help users find relevant sensors and analyze sensor data. In this paper, we discuss the feasibility of using geographic knowledge from public databases available on the Web (such as OpenStreetMap, Geonames, or DBpedia) to automatically construct such descriptions. We present a general method that uses such information to generate sensor descriptions in natural language. The results of the evaluation of our method in a hydrologic national sensor network showed that this approach is feasible and capable of generating adequate sensor descriptions with a lower development effort compared to other approaches. In the paper we also analyze certain problems that we found in public databases (e.g., heterogeneity, non-standard use of labels, or rigid search methods) and their impact in the generation of sensor descriptions.
Resumo:
OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS
Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs.
These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools.
Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate.
However, linguistic annotation tools have still some limitations, which can be summarised as follows:
1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.).
2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts.
3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc.
A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved.
In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool.
Therefore, it would be quite useful to find a way to
(i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools;
(ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate.
Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned.
Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section.
2. GOALS OF THE PRESENT WORK
As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based
Resumo:
The Quality of Life of a person may depend on early attention to his neurodevel-opment disorders in childhood. Identification of language disorders under the age of six years old can speed up required diagnosis and/or treatment processes. This paper details the enhancement of a Clinical Decision Support System (CDSS) aimed to assist pediatricians and language therapists at early identification and re-ferral of language disorders. The system helps to fine tune the Knowledge Base of Language Delays (KBLD) that was already developed and validated in clinical routine with 146 children. Medical experts supported the construction of Gades CDSS by getting scientific consensus from literature and fifteen years of regis-tered use cases of children with language disorders. The current research focuses on an innovative cooperative model that allows the evolution of the KBLD of Gades through the supervised evaluation of the CDSS learnings with experts¿ feedback. The deployment of the resulting system is being assessed under a mul-tidisciplinary team of seven experts from the fields of speech therapist, neonatol-ogy, pediatrics, and neurology.
Resumo:
An important part of human intelligence is the ability to use language. Humans learn how to use language in a society of language users, which is probably the most effective way to learn a language from the ground up. Principles that might allow an artificial agents to learn language this way are not known at present. Here we present a framework which begins to address this challenge. Our auto-catalytic, endogenous, reflective architecture (AERA) supports the creation of agents that can learn natural language by observation. We present results from two experiments where our S1 agent learns human communication by observing two humans interacting in a realtime mock television interview, using gesture and situated language. Results show that S1 can learn multimodal complex language and multimodal communicative acts, using a vocabulary of 100 words with numerous sentence formats, by observing unscripted interaction between the humans, with no grammar being provided to it a priori, and only high-level information about the format of the human interaction in the form of high-level goals of the interviewer and interviewee and a small ontology. The agent learns both the pragmatics, semantics, and syntax of complex sentences spoken by the human subjects on the topic of recycling of objects such as aluminum cans, glass bottles, plastic, and wood, as well as use of manual deictic reference and anaphora.
Resumo:
This paper presents a dynamic LM adaptation based on the topic that has been identified on a speech segment. We use LSA and the given topic labels in the training dataset to obtain and use the topic models. We propose a dynamic language model adaptation to improve the recognition performance in "a two stages" AST system. The final stage makes use of the topic identification with two variants: the first on uses just the most probable topic and the other one depends on the relative distances of the topics that have been identified. We perform the adaptation of the LM as a linear interpolation between a background model and topic-based LM. The interpolation weight id dynamically adapted according to different parameters. The proposed method is evaluated on the Spanish partition of the EPPS speech database. We achieved a relative reduction in WER of 11.13% over the baseline system which uses a single blackground LM.
Resumo:
This PhD dissertation is framed in the emergent fields of Reverse Logistics and ClosedLoop Supply Chain (CLSC) management. This subarea of supply chain management has gained researchers and practitioners' attention over the last 15 years to become a fully recognized subdiscipline of the Operations Management field. More specifically, among all the activities that are included within the CLSC area, the focus of this dissertation is centered in direct reuse aspects. The main contribution of this dissertation to current knowledge is twofold. First, a framework for the so-called reuse CLSC is developed. This conceptual model is grounded in a set of six case studies conducted by the author in real industrial settings. The model has also been contrasted with existing literature and with academic and professional experts on the topic as well. The framework encompasses four building blocks. In the first block, a typology for reusable articles is put forward, distinguishing between Returnable Transport Items (RTI), Reusable Packaging Materials (RPM), and Reusable Products (RP). In the second block, the common characteristics that render reuse CLSC difficult to manage from a logistical standpoint are identified, namely: fleet shrinkage, significant investment and limited visibility. In the third block, the main problems arising in the management of reuse CLSC are analyzed, such as: (1) define fleet size dimension, (2) control cycle time and promote articles rotation, (3) control return rate and prevent shrinkage, (4) define purchase policies for new articles, (5) plan and control reconditioning activities, and (6) balance inventory between depots. Finally, in the fourth block some solutions to those issues are developed. Firstly, problems (2) and (3) are addressed through the comparative analysis of alternative strategies for controlling cycle time and return rate. Secondly, a methodology for calculating the required fleet size is elaborated (problem (1)). This methodology is valid for different configurations of the physical flows in the reuse CLSC. Likewise, some directions are pointed out for further development of a similar method for defining purchase policies for new articles (problem (4)). The second main contribution of this dissertation is embedded in the solutions part (block 4) of the conceptual framework and comprises a two-level decision problem integrating two mixed integer linear programming (MILP) models that have been formulated and solved to optimality using AIMMS as modeling language, CPLEX as solver and Excel spreadsheet for data introduction and output presentation. The results obtained are analyzed in order to measure in a client-supplier system the economic impact of two alternative control strategies (recovery policies) in the context of reuse. In addition, the models support decision-making regarding the selection of the appropriate recovery policy against the characteristics of demand pattern and the structure of the relevant costs in the system. The triangulation of methods used in this thesis has enabled to address the same research topic with different approaches and thus, the robustness of the results obtained is strengthened.