Biblioteca Digital

53 resultados para Linear Attention,Conditional Language Model,Natural Language Generation,FLAX,Rare diseases

em Universidad Politécnica de Madrid

Adaptación de una herramienta de Generación de Lenguaje Natural al idioma Español

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En el presente Trabajo de Fin de Máster se ha realizado un análisis sobre las técnicas y herramientas de Generación de Lenguaje Natural (GLN), así como las modificaciones a la herramienta Simple NLG para generar expresiones en el idioma Español. Dicha extensión va a permitir ampliar el grupo de personas a las cuales se les transmite la información, ya que alrededor de 540 millones de personas hablan español. Keywords - Generación de Lenguaje Natural, técnicas de GLN, herramientas de GLN, Inteligencia Artificial, análisis, SimpleNLG.---ABSTRACT---In this Master's Thesis has been performed an analysis on techniques and tools for Natural Language Generation (NLG), also the Simple NLG tool has been modified in order to generate expressions in the Spanish language. This modification will allow transmitting the information to more people; around 540 million people speak Spanish. Keywords - Natural Language Generation, NLG tools, NLG techniques, Artificial Intelligence, analysis, SimpleNLG.

Style Guidelines for Naming and Labeling Ontologies in the Multilingual Web

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the context of the Semantic Web, natural language descriptions associated with ontologies have proven to be of major importance not only to support ontology developers and adopters, but also to assist in tasks such as ontology mapping, information extraction, or natural language generation. In the state-of-the-art we find some attempts to provide guidelines for URI local names in English, and also some disagreement on the use of URIs for describing ontology elements. When trying to extrapolate these ideas to a multilingual scenario, some of these approaches fail to provide a valid solution. On the basis of some real experiences in the translation of ontologies from English into Spanish, we provide a preliminary set of guidelines for naming and labeling ontologies in a multilingual scenario.

Generating descriptions that summarize geospatial and temporal data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Effective data summarization methods that use AI techniques can help humans understand large sets of data. In this paper, we describe a knowledge-based method for automatically generating summaries of geospatial and temporal data, i.e. data with geographical and temporal references. The method is useful for summarizing data streams, such as GPS traces and traffic information, that are becoming more prevalent with the increasing use of sensors in computing devices. The method presented here is an initial architecture for our ongoing research in this domain. In this paper we describe the data representations we have designed for our method, our implementations of components to perform data abstraction and natural language generation. We also discuss evaluation results that show the ability of our method to generate certain types of geospatial and temporal descriptions.

Entrainment Effects in Turbulent Boundary Layers

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esta tesis estudia el comportamiento de la región exterior de una capa límite turbulenta sin gradientes de presiones. Se ponen a prueba dos teorías relativamente bien establecidas. La teoría de semejanza para la pared supone que en el caso de haber una pared rugosa, el fluido sólo percibe el cambio en la fricción superficial que causa, y otros efectos secundarios quedarán confinados a una zona pegada a la pared. El consenso actual es que dicha teoría es aproximadamente cierta. En el extremo exterior de la capa límite existe una región producida por la interacción entre las estructuras turbulentas y el flujo irrotacional de la corriente libre llamada interfaz turbulenta/no turbulenta. La mayoría de los resultados al respecto sugieren la presencia de fuerzas de cortadura ligeramente más intensa, lo que la hace distinta al resto del flujo turbulento. Las propiedades de esa región probablemente cambien si la velocidad de crecimiento de la capa límite aumenta, algo que puede conseguirse aumentando la fricción en la pared. La rugosidad y la ingestión de masa están entonces relacionadas, y el comportamiento local de la interfaz turbulenta/no turbulenta puede explicar el motivo por el que las capas límite sobre paredes rugosas no se comportan como en el caso de tener paredes lisas precisamente en la zona exterior. Para estudiar las capas límite a números de Reynolds lo suficientemente elevados, se ha desarrollado un nuevo código de alta resolución para la simulación numérica directa de capas límite turbulentas sin gradiente de presión. Dicho código es capaz de simular capas límite en un intervalo de números de Reynolds entre ReT = 100 — 2000 manteniendo una buena escalabilidad hasta los dos millones de hilos en superordenadores de tipo Blue Gene/Q. Se ha guardado especial atención a la generación de condiciones de contorno a la entrada correctas. Los resultados obtenidos están en concordancia con los resultados previos, tanto en el caso de simulaciones como de experimentos. La interfaz turbulenta/no turbulenta de una capa límite se ha analizado usando un valor umbral del módulo de la vorticidad. Dicho umbral se considera un parámetro para analizar cada superficie obtenida de un contorno del módulo de la vorticidad. Se han encontrado dos regímenes distintos en función del umbral escogido con propiedades opuestas, separados por una transición topológica gradual. Las características geométricas de la zona escalan con o99 cuando u^/isdgg es la unidad de vorticidad. Las propiedades del íluido relativas a la posición del contorno de vorticidad han sido analizados para una serie de umbrales utilizando el campo de distancias esféricas, que puede obtenerse con independencia de la complejidad de la superficie de referencia. Las propiedades del fluido a una distancia dada del inerfaz también dependen del umbral de vorticidad, pero tienen características parecidas con independencia del número de Reynolds. La interacción entre la turbulencia y el flujo no turbulento se restringe a una zona muy fina con un espesor del orden de la escala de Kolmogorov local. Hacia el interior del flujo turbulento las propiedades son indistinguibles del resto de la capa límite. Se ha simulado una capa límite sin gradiente de presiones con una fuerza volumétrica cerca de la pared. La el forzado ha sido diseñado para aumentar la fricción en la pared sin introducir ningún efecto geométrico obvio. La simulación consta de dos dominios, un primer dominio más pequeño y a baja resolución que se encarga de generar condiciones de contorno correctas, y un segundo dominio mayor y a alta resolución donde se aplica el forzado. El estudio de los perfiles y los coeficientes de autocorrelación sugieren que los dos casos, el liso y el forzado, no colapsan más allá de la capa logarítmica por la complejidad geométrica de la zona intermitente, y por el hecho que la distancia a la pared no es una longitud característica. Los efectos causados por la geometría de la zona intermitente pueden evitarse utilizando el interfaz como referencia, y la distancia esférica para el análisis de sus propiedades. Las propiedades condicionadas del flujo escalan con 5QQ y u/uT, las dos únicas escalas contenidas en el modelo de semejanza de pared de Townsend, consistente con estos resultados. ABSTRACT This thesis studies the characteristics of the outer region of zero-pressure-gradient turbulent boundary layers at moderate Reynolds numbers. Two relatively established theories are put to test. The wall similarity theory states that with the presence of roughness, turbulent motion is mostly affected by the additional drag caused by the roughness, and that other secondary effects are restricted to a region very close to the wall. The consensus is that this theory is valid, but only as a first approximation. At the edge of the boundary layer there is a thin layer caused by the interaction between the turbulent eddies and the irroational fluid of the free stream, called turbulent/non-turbulent interface. The bulk of results about this layer suggest the presence of some localized shear, with properties that make it distinguishable from the rest of the turbulent flow. The properties of the interface are likely to change if the rate of spread of the turbulent boundary layer is amplified, an effect that is usually achieved by increasing the drag. Roughness and entrainment are therefore linked, and the local features of the turbulent/non-turbulent interface may explain the reason why rough-wall boundary layers deviate from the wall similarity theory precisely far from the wall. To study boundary layers at a higher Reynolds number, a new high-resolution code for the direct numerical simulation of a zero pressure gradient turbulent boundary layers over a flat plate has been developed. This code is able to simulate a wide range of Reynolds numbers from ReT =100 to 2000 while showing a linear weak scaling up to around two million threads in the BG/Q architecture. Special attention has been paid to the generation of proper inflow boundary conditions. The results are in good agreement with existing numerical and experimental data sets. The turbulent/non-turbulent interface of a boundary layer is analyzed by thresholding the vorticity magnitude field. The value of the threshold is considered a parameter in the analysis of the surfaces obtained from isocontours of the vorticity magnitude. Two different regimes for the surface can be distinguished depending on the threshold, with a gradual topological transition across which its geometrical properties change significantly. The width of the transition scales well with oQg when u^/udgg is used as a unit of vorticity. The properties of the flow relative to the position of the vorticity magnitude isocontour are analyzed within the same range of thresholds, using the ball distance field, which can be obtained regardless of the size of the domain and complexity of the interface. The properties of the flow at a given distance to the interface also depend on the threshold, but they are similar regardless of the Reynolds number. The interaction between the turbulent and the non-turbulent flow occurs in a thin layer with a thickness that scales with the Kolmogorov length. Deeper into the turbulent side, the properties are undistinguishable from the rest of the turbulent flow. A zero-pressure-gradient turbulent boundary layer with a volumetric near-wall forcing has been simulated. The forcing has been designed to increase the wall friction without introducing any obvious geometrical effect. The actual simulation is split in two domains, a smaller one in charge of the generation of correct inflow boundary conditions, and a second and larger one where the forcing is applied. The study of the one-point and twopoint statistics suggest that the forced and the smooth cases do not collapse beyond the logarithmic layer may be caused by the geometrical complexity of the intermittent region, and by the fact that the scaling with the wall-normal coordinate is no longer present. The geometrical effects can be avoided using the turbulent/non-turbulent interface as a reference frame, and the minimum distance respect to it. The conditional analysis of the vorticity field with the alternative reference frame recovers the scaling with 5QQ and v¡uT already present in the logarithmic layer, the only two length-scales allowed if Townsend’s wall similarity hypothesis is valid.

Enriquecimiento de fuentes de datos con descripciones lingüísticas mediante bases de datos geográficas externas

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El presente trabajo se ha centrado en la investigación de soluciones para automatizar la tarea del enriquecimiento de fuentes de datos sobre redes de sensores con descripciones lingüísticas, con el fin de facilitar la posterior generación de textos en lenguaje natural. El uso de descripciones en lenguaje natural facilita el acceso a los datos a una mayor diversidad de usuarios y, como consecuencia, permite aprovechar mejor las inversiones en redes de sensores. En el trabajo se ha considerado el uso de bases de datos abiertas para abordar la necesidad de disponer de un gran volumen y diversidad de conocimiento geográfico. Se ha analizado también el enriquecimiento de datos dentro de enfoques metodológicos de curación de datos y métodos de generación de lenguaje natural. Como resultado del trabajo, se ha planteado un método general basado en una estrategia de generación y prueba que incluye una forma de representación y uso del conocimiento heurístico con varias etapas de razonamiento para la construcción de descripciones lingüísticas de enriquecimiento de datos. En la evaluación de la propuesta general se han manejado tres escenarios, dos de ellos para generación de referencias geográficas sobre redes de sensores complejas de dimensión real y otro para la generación de referencias temporales. Los resultados de la evaluación han mostrado la validez práctica de la propuesta general exhibiendo mejoras de rendimiento respecto a otros enfoques. Además, el análisis de los resultados ha permitido identificar y cuantificar el impacto previsible de diversas líneas de mejora en bases de datos abiertas. ABSTRACT This work has focused on the search for solutions to automate the task of enrichment sensor-network-based data sources with textual descriptions, so as to facilitate the generation of natural language texts. Using natural language descriptions facilitates data access to a wider range of users and, therefore, allows better leveraging investments in sensor networks. In this work we have considered the use of open databases to address the need for a large volume and diversity of geographical knowledge. We have also analyzed data enrichment in methodological approaches and data curation methods of natural language generation. As a result, it has raised a general method based on a strategy of generating and testing that includes a representation using heuristic knowledge with several stages of reasoning for the construction of linguistic descriptions of data enrichment. In assessing the overall proposal three scenarios have been addressed, two of them in the environmental domain with complex sensor networks and another real dimension in the time domain. The evaluation results have shown the validity and practicality of our proposal, showing performance improvements over other approaches. Furthermore, the analysis of the results has allowed identifying and quantifying the expected impact of various lines of improvement in open databases.

A code for direct numerical simulation of turbulent boundary layers at high Reynolds numbers in BG/P supercomputers

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new high-resolution code for the direct numerical simulation of a zero pressure gradient turbulent boundary layers over a flat plate has been developed. Its purpose is to simulate a wide range of Reynolds numbers from Reθ = 300 to 6800 while showing a linear weak scaling up to 32,768 cores in the BG/P architecture. Special attention has been paid to the generation of proper inflow boundary conditions. The results are in good agreement with existing numerical and experimental data sets.

The past distribution of pinus nigra arnold in northern iberia. Contribution from its macroremains.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The presence of Pinus nigra in central Spain, where its natural populations are very rare, has led to different interpretations of the current vegetation dynamics. Complementary to the available palynological evidence, macroremains provide local information of high taxonomic resolution that helps to reconstruct the palaeobiogeography of a given species. Here we present new macrofossil data from Tubilla del Lago, a small palaeolake located at the eastern part of the northern Iberian Meseta. We identified 17 wood samples and 71 cones on the basis of their wood anatomy and morphology, respectively. S ome of the fossil samples were radiocarbon dated (~4.230-3210 years cal BP). The results demonstrate the Holocene presence of P. nigra in the study area, where it is currently extinct. This evidence, together with other published palaeobotanical studies, indicates that the forests dominated by P. nigra must have had a larger importance on the landscape prior to the anthropogenic influence on the northern Iberian Meseta.

Publishing Orthology and Diseases Information in the Linked Open Data Cloud

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Linked Data initiative offers a straight method to publish structured data in the World Wide Web and link it to other data, resulting in a world wide network of semantically codified data known as the Linked Open Data cloud. The size of the Linked Open Data cloud, i.e. the amount of data published using Linked Data principles, is growing exponentially, including life sciences data. However, key information for biological research is still missing in the Linked Open Data cloud. For example, the relation between orthologs genes and genetic diseases is absent, even though such information can be used for hypothesis generation regarding human diseases. The OGOLOD system, an extension of the OGO Knowledge Base, publishes orthologs/diseases information using Linked Data. This gives the scientists the ability to query the structured information in connection with other Linked Data and to discover new information related to orthologs and human diseases in the cloud.

On the dynamic adaptation of language models based on dialogue information

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to improve the performance of the speech recognition (up to a 14.82% of relative improvement), which leads to an improvement in both the language understanding and the dialogue management tasks.

Using Open Geographic Data to Generate Natural Language Descriptions for Hydrological Sensor Networks

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Providing descriptions of isolated sensors and sensor networks in natural language, understandable by the general public, is useful to help users find relevant sensors and analyze sensor data. In this paper, we discuss the feasibility of using geographic knowledge from public databases available on the Web (such as OpenStreetMap, Geonames, or DBpedia) to automatically construct such descriptions. We present a general method that uses such information to generate sensor descriptions in natural language. The results of the evaluation of our method in a hydrologic national sensor network showed that this approach is feasible and capable of generating adequate sensor descriptions with a lower development effort compared to other approaches. In the paper we also analyze certain problems that we found in public databases (e.g., heterogeneity, non-standard use of labels, or rigid search methods) and their impact in the generation of sensor descriptions.

OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web 1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs. These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools. Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate. However, linguistic annotation tools have still some limitations, which can be summarised as follows: 1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.). 2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts. 3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc. A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved. In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool. Therefore, it would be quite useful to find a way to (i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools; (ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate. Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned. Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section. 2. GOALS OF THE PRESENT WORK As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based triples, as in the usual Semantic Web languages (namely RDF(S) and OWL), in order for the model to be considered suitable for the Semantic Web. Besides, to be useful for the Semantic Web, this model should provide a way to automate the annotation of web pages. As for the present work, this requirement involved reusing the linguistic annotation tools purchased by the OEG research group (http://www.oeg-upm.net), but solving beforehand (or, at least, minimising) some of their limitations. Therefore, this model had to minimise these limitations by means of the integration of several linguistic annotation tools into a common architecture. Since this integration required the interoperation of tools and their annotations, ontologies were proposed as the main technological component to make them effectively interoperate. From the very beginning, it seemed that the formalisation of the elements and the knowledge underlying linguistic annotations within an appropriate set of ontologies would be a great step forward towards the formulation of such a model (henceforth referred to as OntoTag). Obviously, first, to combine the results of the linguistic annotation tools that operated at the same level, their annotation schemas had to be unified (or, preferably, standardised) in advance. This entailed the unification (id. standardisation) of their tags (both their representation and their meaning), and their format or syntax. Second, to merge the results of the linguistic annotation tools operating at different levels, their respective annotation schemas had to be (a) made interoperable and (b) integrated. And third, in order for the resulting annotations to suit the Semantic Web, they had to be specified by means of an ontology-based vocabulary, and structured by means of ontology-based triples, as hinted above. Therefore, a new annotation scheme had to be devised, based both on ontologies and on this type of triples, which allowed for the combination and the integration of the annotations of any set of linguistic annotation tools. This annotation scheme was considered a fundamental part of the model proposed here, and its development was, accordingly, another major objective of the present work. All these goals, aims and objectives could be re-stated more clearly as follows: Goal 1: Development of a set of ontologies for the formalisation of the linguistic knowledge relating linguistic annotation. Sub-goal 1.1: Ontological formalisation of the EAGLES (1996a; 1996b) de facto standards for morphosyntactic and syntactic annotation, in a way that helps respect the triple structure recommended for annotations in these works (which is isomorphic to the triple structures used in the context of the Semantic Web). Sub-goal 1.2: Incorporation into this preliminary ontological formalisation of other existing standards and standard proposals relating the levels mentioned above, such as those currently under development within ISO/TC 37 (the ISO Technical Committee dealing with Terminology, which deals also with linguistic resources and annotations). Sub-goal 1.3: Generalisation and extension of the recommendations in EAGLES (1996a; 1996b) and ISO/TC 37 to the semantic level, for which no ISO/TC 37 standards have been developed yet. Sub-goal 1.4: Ontological formalisation of the generalisations and/or extensions obtained in the previous sub-goal as generalisations and/or extensions of the corresponding ontology (or ontologies). Sub-goal 1.5: Ontological formalisation of the knowledge required to link, combine and unite the knowledge represented in the previously developed ontology (or ontologies). Goal 2: Development of OntoTag’s annotation scheme, a standard-based abstract scheme for the hybrid (linguistically-motivated and ontological-based) annotation of texts. Sub-goal 2.1: Development of the standard-based morphosyntactic annotation level of OntoTag’s scheme. This level should include, and possibly extend, the recommendations of EAGLES (1996a) and also the recommendations included in the ISO/MAF (2008) standard draft. Sub-goal 2.2: Development of the standard-based syntactic annotation level of the hybrid abstract scheme. This level should include, and possibly extend, the recommendations of EAGLES (1996b) and the ISO/SynAF (2010) standard draft. Sub-goal 2.3: Development of the standard-based semantic annotation level of OntoTag’s (abstract) scheme. Sub-goal 2.4: Development of the mechanisms for a convenient integration of the three annotation levels already mentioned. These mechanisms should take into account the recommendations included in the ISO/LAF (2009) standard draft. Goal 3: Design of OntoTag’s (abstract) annotation architecture, an abstract architecture for the hybrid (semantic) annotation of texts (i) that facilitates the integration and interoperation of different linguistic annotation tools, and (ii) whose results comply with OntoTag’s annotation scheme. Sub-goal 3.1: Specification of the decanting processes that allow for the classification and separation, according to their corresponding levels, of the results of the linguistic tools annotating at several different levels. Sub-goal 3.2: Specification of the standardisation processes that allow (a) complying with the standardisation requirements of OntoTag’s annotation scheme, as well as (b) combining the results of those linguistic tools that share some level of annotation. Sub-goal 3.3: Specification of the merging processes that allow for the combination of the output annotations and the interoperation of those linguistic tools that share some level of annotation. Sub-goal 3.4: Specification of the merge processes that allow for the integration of the results and the interoperation of those tools performing their annotations at different levels. Goal 4: Generation of OntoTagger’s schema, a concrete instance of OntoTag’s abstract scheme for a concrete set of linguistic annotations. These linguistic annotations result from the tools and the resources available in the research group, namely • Bitext’s DataLexica (http://www.bitext.com/EN/datalexica.asp), • LACELL’s (POS) tagger (http://www.um.es/grupos/grupo-lacell/quees.php), • Connexor’s FDG (http://www.connexor.eu/technology/machinese/glossary/fdg/), and • EuroWordNet (Vossen et al., 1998). This schema should help evaluate OntoTag’s underlying hypotheses, stated below. Consequently, it should implement, at least, those levels of the abstract scheme dealing with the annotations of the set of tools considered in this implementation. This includes the morphosyntactic, the syntactic and the semantic levels. Goal 5: Implementation of OntoTagger’s configuration, a concrete instance of OntoTag’s abstract architecture for this set of linguistic tools and annotations. This configuration (1) had to use the schema generated in the previous goal; and (2) should help support or refute the hypotheses of this work as well (see the next section). Sub-goal 5.1: Implementation of the decanting processes that facilitate the classification and separation of the results of those linguistic resources that provide annotations at several different levels (on the one hand, LACELL’s tagger operates at the morphosyntactic level and, minimally, also at the semantic level; on the other hand, FDG operates at the morphosyntactic and the syntactic levels and, minimally, at the semantic level as well). Sub-goal 5.2: Implementation of the standardisation processes that allow (i) specifying the results of those linguistic tools that share some level of annotation according to the requirements of OntoTagger’s schema, as well as (ii) combining these shared level results. In particular, all the tools selected perform morphosyntactic annotations and they had to be conveniently combined by means of these processes. Sub-goal 5.3: Implementation of the merging processes that allow for the combination (and possibly the improvement) of the annotations and the interoperation of the tools that share some level of annotation (in particular, those relating the morphosyntactic level, as in the previous sub-goal). Sub-goal 5.4: Implementation of the merging processes that allow for the integration of the different standardised and combined annotations aforementioned, relating all the levels considered. Sub-goal 5.5: Improvement of the semantic level of this configuration by adding a named entity recognition, (sub-)classification and annotation subsystem, which also uses the named entities annotated to populate a domain ontology, in order to provide a concrete application of the present work in the two areas involved (the Semantic Web and Corpus Linguistics). 3. MAIN RESULTS: ASSESSMENT OF ONTOTAG’S UNDERLYING HYPOTHESES The model developed in the present thesis tries to shed some light on (i) whether linguistic annotation tools can effectively interoperate; (ii) whether their results can be combined and integrated; and, if they can, (iii) how they can, respectively, interoperate and be combined and integrated. Accordingly, several hypotheses had to be supported (or rejected) by the development of the OntoTag model and OntoTagger (its implementation). The hypotheses underlying OntoTag are surveyed below. Only one of the hypotheses (H.6) was rejected; the other five could be confirmed. H.1 The annotations of different levels (or layers) can be integrated into a sort of overall, comprehensive, multilayer and multilevel annotation, so that their elements can complement and refer to each other. • CONFIRMED by the development of: o OntoTag’s annotation scheme, o OntoTag’s annotation architecture, o OntoTagger’s (XML, RDF, OWL) annotation schemas, o OntoTagger’s configuration. H.2 Tool-dependent annotations can be mapped onto a sort of tool-independent annotations and, thus, can be standardised. • CONFIRMED by means of the standardisation phase incorporated into OntoTag and OntoTagger for the annotations yielded by the tools. H.3 Standardisation should ease: H.3.1: The interoperation of linguistic tools. H.3.2: The comparison, combination (at the same level and layer) and integration (at different levels or layers) of annotations. • H.3 was CONFIRMED by means of the development of OntoTagger’s ontology-based configuration: o Interoperation, comparison, combination and integration of the annotations of three different linguistic tools (Connexor’s FDG, Bitext’s DataLexica and LACELL’s tagger); o Integration of EuroWordNet-based, domain-ontology-based and named entity annotations at the semantic level. o Integration of morphosyntactic, syntactic and semantic annotations. H.4 Ontologies and Semantic Web technologies (can) play a crucial role in the standardisation of linguistic annotations, by providing consensual vocabularies and standardised formats for annotation (e.g., RDF triples). • CONFIRMED by means of the development of OntoTagger’s RDF-triple-based annotation schemas. H.5 The rate of errors introduced by a linguistic tool at a given level, when annotating, can be reduced automatically by contrasting and combining its results with the ones coming from other tools, operating at the same level. However, these other tools might be built following a different technological (stochastic vs. rule-based, for example) or theoretical (dependency vs. HPS-grammar-based, for instance) approach. • CONFIRMED by the results yielded by the evaluation of OntoTagger. H.6 Each linguistic level can be managed and annotated independently. • REJECTED: OntoTagger’s experiments and the dependencies observed among the morphosyntactic annotations, and between them and the syntactic annotations. In fact, Hypothesis H.6 was already rejected when OntoTag’s ontologies were developed. We observed then that several linguistic units stand on an interface between levels, belonging thereby to both of them (such as morphosyntactic units, which belong to both the morphological level and the syntactic level). Therefore, the annotations of these levels overlap and cannot be handled independently when merged into a unique multileveled annotation. 4. OTHER MAIN RESULTS AND CONTRIBUTIONS First, interoperability is a hot topic for both the linguistic annotation community and the whole Computer Science field. The specification (and implementation) of OntoTag’s architecture for the combination and integration of linguistic (annotation) tools and annotations by means of ontologies shows a way to make these different linguistic annotation tools and annotations interoperate in practice. Second, as mentioned above, the elements involved in linguistic annotation were formalised in a set (or network) of ontologies (OntoTag’s linguistic ontologies). • On the one hand, OntoTag’s network of ontologies consists of − The Linguistic Unit Ontology (LUO), which includes a mostly hierarchical formalisation of the different types of linguistic elements (i.e., units) identifiable in a written text; − The Linguistic Attribute Ontology (LAO), which includes also a mostly hierarchical formalisation of the different types of features that characterise the linguistic units included in the LUO; − The Linguistic Value Ontology (LVO), which includes the corresponding formalisation of the different values that the attributes in the LAO can take; − The OIO (OntoTag’s Integration Ontology), which  Includes the knowledge required to link, combine and unite the knowledge represented in the LUO, the LAO and the LVO;  Can be viewed as a knowledge representation ontology that describes the most elementary vocabulary used in the area of annotation. • On the other hand, OntoTag’s ontologies incorporate the knowledge included in the different standards and recommendations for linguistic annotation released so far, such as those developed within the EAGLES and the SIMPLE European projects or by the ISO/TC 37 committee: − As far as morphosyntactic annotations are concerned, OntoTag’s ontologies formalise the terms in the EAGLES (1996a) recommendations and their corresponding terms within the ISO Morphosyntactic Annotation Framework (ISO/MAF, 2008) standard; − As for syntactic annotations, OntoTag’s ontologies incorporate the terms in the EAGLES (1996b) recommendations and their corresponding terms within the ISO Syntactic Annotation Framework (ISO/SynAF, 2010) standard draft; − Regarding semantic annotations, OntoTag’s ontologies generalise and extend the recommendations in EAGLES (1996a; 1996b) and, since no stable standards or standard drafts have been released for semantic annotation by ISO/TC 37 yet, they incorporate the terms in SIMPLE (2000) instead; − The terms coming from all these recommendations and standards were supplemented by those within the ISO Data Category Registry (ISO/DCR, 2008) and also of the ISO Linguistic Annotation Framework (ISO/LAF, 2009) standard draft when developing OntoTag’s ontologies. Third, we showed that the combination of the results of tools annotating at the same level can yield better results (both in precision and in recall) than each tool separately. In particular, 1. OntoTagger clearly outperformed two of the tools integrated into its configuration, namely DataLexica and FDG in all the combination sub-phases in which they overlapped (i.e. POS tagging, lemma annotation and morphological feature annotation). As far as the remaining tool is concerned, i.e. LACELL’s tagger, it was also outperformed by OntoTagger in POS tagging and lemma annotation, and it did not behave better than OntoTagger in the morphological feature annotation layer. 2. As an immediate result, this implies that a) This type of combination architecture configurations can be applied in order to improve significantly the accuracy of linguistic annotations; and b) Concerning the morphosyntactic level, this could be regarded as a way of constructing more robust and more accurate POS tagging systems. Fourth, Semantic Web annotations are usually performed by humans or else by machine learning systems. Both of them leave much to be desired: the former, with respect to their annotation rate; the latter, with respect to their (average) precision and recall. In this work, we showed how linguistic tools can be wrapped in order to annotate automatically Semantic Web pages using ontologies. This entails their fast, robust and accurate semantic annotation. As a way of example, as mentioned in Sub-goal 5.5, we developed a particular OntoTagger module for the recognition, classification and labelling of named entities, according to the MUC and ACE tagsets (Chinchor, 1997; Doddington et al., 2004). These tagsets were further specified by means of a domain ontology, namely the Cinema Named Entities Ontology (CNEO). This module was applied to the automatic annotation of ten different web pages containing cinema reviews (that is, around 5000 words). In addition, the named entities annotated with this module were also labelled as instances (or individuals) of the classes included in the CNEO and, then, were used to populate this domain ontology. • The statistical results obtained from the evaluation of this particular module of OntoTagger can be summarised as follows. On the one hand, as far as recall (R) is concerned, (R.1) the lowest value was 76,40% (for file 7); (R.2) the highest value was 97, 50% (for file 3); and (R.3) the average value was 88,73%. On the other hand, as far as the precision rate (P) is concerned, (P.1) its minimum was 93,75% (for file 4); (R.2) its maximum was 100% (for files 1, 5, 7, 8, 9, and 10); and (R.3) its average value was 98,99%. • These results, which apply to the tasks of named entity annotation and ontology population, are extraordinary good for both of them. They can be explained on the basis of the high accuracy of the annotations provided by OntoTagger at the lower levels (mainly at the morphosyntactic level). However, they should be conveniently qualified, since they might be too domain- and/or language-dependent. It should be further experimented how our approach works in a different domain or a different language, such as French, English, or German. • In any case, the results of this application of Human Language Technologies to Ontology Population (and, accordingly, to Ontological Engineering) seem very promising and encouraging in order for these two areas to collaborate and complement each other in the area of semantic annotation. Fifth, as shown in the State of the Art of this work, there are different approaches and models for the semantic annotation of texts, but all of them focus on a particular view of the semantic level. Clearly, all these approaches and models should be integrated in order to bear a coherent and joint semantic annotation level. OntoTag shows how (i) these semantic annotation layers could be integrated together; and (ii) they could be integrated with the annotations associated to other annotation levels. Sixth, we identified some recommendations, best practices and lessons learned for annotation standardisation, interoperation and merge. They show how standardisation (via ontologies, in this case) enables the combination, integration and interoperation of different linguistic tools and their annotations into a multilayered (or multileveled) linguistic annotation, which is one of the hot topics in the area of Linguistic Annotation. And last but not least, OntoTag’s annotation scheme and OntoTagger’s annotation schemas show a way to formalise and annotate coherently and uniformly the different units and features associated to the different levels and layers of linguistic annotation. This is a great scientific step ahead towards the global standardisation of this area, which is the aim of ISO/TC 37 (in particular, Subcommittee 4, dealing with the standardisation of linguistic annotations and resources).

A supervised cooperative learning system for early detection of language disorders

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Quality of Life of a person may depend on early attention to his neurodevel-opment disorders in childhood. Identification of language disorders under the age of six years old can speed up required diagnosis and/or treatment processes. This paper details the enhancement of a Clinical Decision Support System (CDSS) aimed to assist pediatricians and language therapists at early identification and re-ferral of language disorders. The system helps to fine tune the Knowledge Base of Language Delays (KBLD) that was already developed and validated in clinical routine with 146 children. Medical experts supported the construction of Gades CDSS by getting scientific consensus from literature and fifteen years of regis-tered use cases of children with language disorders. The current research focuses on an innovative cooperative model that allows the evolution of the KBLD of Gades through the supervised evaluation of the CDSS learnings with experts¿ feedback. The deployment of the resulting system is being assessed under a mul-tidisciplinary team of seven experts from the fields of speech therapist, neonatol-ogy, pediatrics, and neurology.

Autonomous Acquisition of Natural Language

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An important part of human intelligence is the ability to use language. Humans learn how to use language in a society of language users, which is probably the most effective way to learn a language from the ground up. Principles that might allow an artificial agents to learn language this way are not known at present. Here we present a framework which begins to address this challenge. Our auto-catalytic, endogenous, reflective architecture (AERA) supports the creation of agents that can learn natural language by observation. We present results from two experiments where our S1 agent learns human communication by observing two humans interacting in a realtime mock television interview, using gesture and situated language. Results show that S1 can learn multimodal complex language and multimodal communicative acts, using a vocabulary of 100 words with numerous sentence formats, by observing unscripted interaction between the humans, with no grammar being provided to it a priori, and only high-level information about the format of the human interaction in the form of high-level goals of the interviewer and interviewee and a small ontology. The agent learns both the pragmatics, semantics, and syntax of complex sentences spoken by the human subjects on the topic of recycling of objects such as aluminum cans, glass bottles, plastic, and wood, as well as use of manual deictic reference and anaphora.

Dynamic topic-based adaptation of language models: a comparison between different approaches

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a dynamic LM adaptation based on the topic that has been identified on a speech segment. We use LSA and the given topic labels in the training dataset to obtain and use the topic models. We propose a dynamic language model adaptation to improve the recognition performance in "a two stages" AST system. The final stage makes use of the topic identification with two variants: the first on uses just the most probable topic and the other one depends on the relative distances of the topics that have been identified. We perform the adaptation of the LM as a linear interpolation between a background model and topic-based LM. The interpolation weight id dynamically adapted according to different parameters. The proposed method is evaluated on the Spanish partition of the EPPS speech database. We achieved a relative reduction in WER of 11.13% over the baseline system which uses a single blackground LM.

A management model for closed-loop supply chains of reusable articles

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This PhD dissertation is framed in the emergent fields of Reverse Logistics and ClosedLoop Supply Chain (CLSC) management. This subarea of supply chain management has gained researchers and practitioners' attention over the last 15 years to become a fully recognized subdiscipline of the Operations Management field. More specifically, among all the activities that are included within the CLSC area, the focus of this dissertation is centered in direct reuse aspects. The main contribution of this dissertation to current knowledge is twofold. First, a framework for the so-called reuse CLSC is developed. This conceptual model is grounded in a set of six case studies conducted by the author in real industrial settings. The model has also been contrasted with existing literature and with academic and professional experts on the topic as well. The framework encompasses four building blocks. In the first block, a typology for reusable articles is put forward, distinguishing between Returnable Transport Items (RTI), Reusable Packaging Materials (RPM), and Reusable Products (RP). In the second block, the common characteristics that render reuse CLSC difficult to manage from a logistical standpoint are identified, namely: fleet shrinkage, significant investment and limited visibility. In the third block, the main problems arising in the management of reuse CLSC are analyzed, such as: (1) define fleet size dimension, (2) control cycle time and promote articles rotation, (3) control return rate and prevent shrinkage, (4) define purchase policies for new articles, (5) plan and control reconditioning activities, and (6) balance inventory between depots. Finally, in the fourth block some solutions to those issues are developed. Firstly, problems (2) and (3) are addressed through the comparative analysis of alternative strategies for controlling cycle time and return rate. Secondly, a methodology for calculating the required fleet size is elaborated (problem (1)). This methodology is valid for different configurations of the physical flows in the reuse CLSC. Likewise, some directions are pointed out for further development of a similar method for defining purchase policies for new articles (problem (4)). The second main contribution of this dissertation is embedded in the solutions part (block 4) of the conceptual framework and comprises a two-level decision problem integrating two mixed integer linear programming (MILP) models that have been formulated and solved to optimality using AIMMS as modeling language, CPLEX as solver and Excel spreadsheet for data introduction and output presentation. The results obtained are analyzed in order to measure in a client-supplier system the economic impact of two alternative control strategies (recovery policies) in the context of reuse. In addition, the models support decision-making regarding the selection of the appropriate recovery policy against the characteristics of demand pattern and the structure of the relevant costs in the system. The triangulation of methods used in this thesis has enabled to address the same research topic with different approaches and thus, the robustness of the results obtained is strengthened.

«
1
2
3
4
»