Biblioteca Digital

59 resultados para Simplification of Ontologies

em Universidad Politécnica de Madrid

Combining statistical and semantic approaches to the translation of ontologies and taxonomies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ontologies and taxonomies are widely used to organize concepts providing the basis for activities such as indexing, and as background knowledge for NLP tasks. As such, translation of these resources would prove useful to adapt these systems to new languages. However, we show that the nature of these resources is significantly different from the "free-text" paradigm used to train most statistical machine translation systems. In particular, we see significant differences in the linguistic nature of these resources and such resources have rich additional semantics. We demonstrate that as a result of these linguistic differences, standard SMT methods, in particular evaluation metrics, can produce poor performance. We then look to the task of leveraging these semantics for translation, which we approach in three ways: by adapting the translation system to the domain of the resource; by examining if semantics can help to predict the syntactic structure used in translation; and by evaluating if we can use existing translated taxonomies to disambiguate translations. We present some early results from these experiments, which shed light on the degree of success we may have with each approach

Fast modularisation and aomic decomposition of ontologies using axiom dependency hypergraphs

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we define the notion of an axiom dependency hypergraph, which explicitly represents how axioms are included into a module by the algorithm for computing locality-based modules. A locality-based module of an ontology corresponds to a set of connected nodes in the hypergraph, and atoms of an ontology to strongly connected components. Collapsing the strongly connected components into single nodes yields a condensed hypergraph that comprises a representation of the atomic decomposition of the ontology. To speed up the condensation of the hypergraph, we first reduce its size by collapsing the strongly connected components of its graph fragment employing a linear time graph algorithm. This approach helps to significantly reduce the time needed for computing the atomic decomposition of an ontology. We provide an experimental evaluation for computing the atomic decomposition of large biomedical ontologies. We also demonstrate a significant improvement in the time needed to extract locality-based modules from an axiom dependency hypergraph and its condensed version.

Strong Connectivity in Directed Hypergraphs and its Application to the Atomic Decomposition of Ontologies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Los hipergrafos dirigidos se han empleado en problemas relacionados con lógica proposicional, bases de datos relacionales, linguística computacional y aprendizaje automático. Los hipergrafos dirigidos han sido también utilizados como alternativa a los grafos (bipartitos) dirigidos para facilitar el estudio de las interacciones entre componentes de sistemas complejos que no pueden ser fácilmente modelados usando exclusivamente relaciones binarias. En este contexto, este tipo de representación es conocida como hiper-redes. Un hipergrafo dirigido es una generalización de un grafo dirigido especialmente adecuado para la representación de relaciones de muchos a muchos. Mientras que una arista en un grafo dirigido define una relación entre dos de sus nodos, una hiperarista en un hipergrafo dirigido define una relación entre dos conjuntos de sus nodos. La conexión fuerte es una relación de equivalencia que divide el conjunto de nodos de un hipergrafo dirigido en particiones y cada partición define una clase de equivalencia conocida como componente fuertemente conexo. El estudio de los componentes fuertemente conexos de un hipergrafo dirigido puede ayudar a conseguir una mejor comprensión de la estructura de este tipo de hipergrafos cuando su tamaño es considerable. En el caso de grafo dirigidos, existen algoritmos muy eficientes para el cálculo de los componentes fuertemente conexos en grafos de gran tamaño. Gracias a estos algoritmos, se ha podido averiguar que la estructura de la WWW tiene forma de “pajarita”, donde más del 70% del los nodos están distribuidos en tres grandes conjuntos y uno de ellos es un componente fuertemente conexo. Este tipo de estructura ha sido también observada en redes complejas en otras áreas como la biología. Estudios de naturaleza similar no han podido ser realizados en hipergrafos dirigidos porque no existe algoritmos capaces de calcular los componentes fuertemente conexos de este tipo de hipergrafos. En esta tesis doctoral, hemos investigado como calcular los componentes fuertemente conexos de un hipergrafo dirigido. En concreto, hemos desarrollado dos algoritmos para este problema y hemos determinado que son correctos y cuál es su complejidad computacional. Ambos algoritmos han sido evaluados empíricamente para comparar sus tiempos de ejecución. Para la evaluación, hemos producido una selección de hipergrafos dirigidos generados de forma aleatoria inspirados en modelos muy conocidos de grafos aleatorios como Erdos-Renyi, Newman-Watts-Strogatz and Barabasi-Albert. Varias optimizaciones para ambos algoritmos han sido implementadas y analizadas en la tesis. En concreto, colapsar los componentes fuertemente conexos del grafo dirigido que se puede construir eliminando ciertas hiperaristas complejas del hipergrafo dirigido original, mejora notablemente los tiempos de ejecucion de los algoritmos para varios de los hipergrafos utilizados en la evaluación. Aparte de los ejemplos de aplicación mencionados anteriormente, los hipergrafos dirigidos han sido también empleados en el área de representación de conocimiento. En concreto, este tipo de hipergrafos se han usado para el cálculo de módulos de ontologías. Una ontología puede ser definida como un conjunto de axiomas que especifican formalmente un conjunto de símbolos y sus relaciones, mientras que un modulo puede ser entendido como un subconjunto de axiomas de la ontología que recoge todo el conocimiento que almacena la ontología sobre un conjunto especifico de símbolos y sus relaciones. En la tesis nos hemos centrado solamente en módulos que han sido calculados usando la técnica de localidad sintáctica. Debido a que las ontologías pueden ser muy grandes, el cálculo de módulos puede facilitar las tareas de re-utilización y mantenimiento de dichas ontologías. Sin embargo, analizar todos los posibles módulos de una ontología es, en general, muy costoso porque el numero de módulos crece de forma exponencial con respecto al número de símbolos y de axiomas de la ontología. Afortunadamente, los axiomas de una ontología pueden ser divididos en particiones conocidas como átomos. Cada átomo representa un conjunto máximo de axiomas que siempre aparecen juntos en un modulo. La decomposición atómica de una ontología es definida como un grafo dirigido de tal forma que cada nodo del grafo corresponde con un átomo y cada arista define una dependencia entre una pareja de átomos. En esta tesis introducimos el concepto de“axiom dependency hypergraph” que generaliza el concepto de descomposición atómica de una ontología. Un modulo en una ontología correspondería con un componente conexo en este tipo de hipergrafos y un átomo de una ontología con un componente fuertemente conexo. Hemos adaptado la implementación de nuestros algoritmos para que funcionen también con axiom dependency hypergraphs y poder de esa forma calcular los átomos de una ontología. Para demostrar la viabilidad de esta idea, hemos incorporado nuestros algoritmos en una aplicación que hemos desarrollado para la extracción de módulos y la descomposición atómica de ontologías. A la aplicación la hemos llamado HyS y hemos estudiado sus tiempos de ejecución usando una selección de ontologías muy conocidas del área biomédica, la mayoría disponibles en el portal de Internet NCBO. Los resultados de la evaluación muestran que los tiempos de ejecución de HyS son mucho mejores que las aplicaciones más rápidas conocidas. ABSTRACT Directed hypergraphs are an intuitive modelling formalism that have been used in problems related to propositional logic, relational databases, computational linguistic and machine learning. Directed hypergraphs are also presented as an alternative to directed (bipartite) graphs to facilitate the study of the interactions between components of complex systems that cannot naturally be modelled as binary relations. In this context, they are known as hyper-networks. A directed hypergraph is a generalization of a directed graph suitable for representing many-to-many relationships. While an edge in a directed graph defines a relation between two nodes of the graph, a hyperedge in a directed hypergraph defines a relation between two sets of nodes. Strong-connectivity is an equivalence relation that induces a partition of the set of nodes of a directed hypergraph into strongly-connected components. These components can be collapsed into single nodes. As result, the size of the original hypergraph can significantly be reduced if the strongly-connected components have many nodes. This approach might contribute to better understand how the nodes of a hypergraph are connected, in particular when the hypergraphs are large. In the case of directed graphs, there are efficient algorithms that can be used to compute the strongly-connected components of large graphs. For instance, it has been shown that the macroscopic structure of the World Wide Web can be represented as a “bow-tie” diagram where more than 70% of the nodes are distributed into three large sets and one of these sets is a large strongly-connected component. This particular structure has been also observed in complex networks in other fields such as, e.g., biology. Similar studies cannot be conducted in a directed hypergraph because there does not exist any algorithm for computing the strongly-connected components of the hypergraph. In this thesis, we investigate ways to compute the strongly-connected components of directed hypergraphs. We present two new algorithms and we show their correctness and computational complexity. One of these algorithms is inspired by Tarjan’s algorithm for directed graphs. The second algorithm follows a simple approach to compute the stronglyconnected components. This approach is based on the fact that two nodes of a graph that are strongly-connected can also reach the same nodes. In other words, the connected component of each node is the same. Both algorithms are empirically evaluated to compare their performances. To this end, we have produced a selection of random directed hypergraphs inspired by existent and well-known random graphs models like Erd˝os-Renyi and Newman-Watts-Strogatz. Besides the application examples that we mentioned earlier, directed hypergraphs have also been employed in the field of knowledge representation. In particular, they have been used to compute the modules of an ontology. An ontology is defined as a collection of axioms that provides a formal specification of a set of terms and their relationships; and a module is a subset of an ontology that completely captures the meaning of certain terms as defined in the ontology. In particular, we focus on the modules computed using the notion of syntactic locality. As ontologies can be very large, the computation of modules facilitates the reuse and maintenance of these ontologies. Analysing all modules of an ontology, however, is in general not feasible as the number of modules grows exponentially in the number of terms and axioms of the ontology. Nevertheless, the modules can succinctly be represented using the Atomic Decomposition of an ontology. Using this representation, an ontology can be partitioned into atoms, which are maximal sets of axioms that co-occur in every module. The Atomic Decomposition is then defined as a directed graph such that each node correspond to an atom and each edge represents a dependency relation between two atoms. In this thesis, we introduce the notion of an axiom dependency hypergraph which is a generalization of the atomic decomposition of an ontology. A module in the ontology corresponds to a connected component in the hypergraph, and the atoms of the ontology to the strongly-connected components. We apply our algorithms for directed hypergraphs to axiom dependency hypergraphs and in this manner, we compute the atoms of an ontology. To demonstrate the viability of this approach, we have implemented the algorithms in the application HyS which computes the modules of ontologies and calculate their atomic decomposition. In the thesis, we provide an experimental evaluation of HyS with a selection of large and prominent biomedical ontologies, most of which are available in the NCBO Bioportal. HyS outperforms state-of-the-art implementations in the tasks of extracting modules and computing the atomic decomposition of these ontologies.

The Current Landscape of Pitfalls in Ontologies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A growing number of ontologies are already available thanks to development initiatives in many different fields. In such ontology developments, developers must tackle a wide range of difficulties and handicaps, which can result in the appearance of anomalies in the resulting ontologies. Therefore, ontology evaluation plays a key role in ontology development projects. OOPS! is an on-line tool that automatically detects pitfalls, considered as potential errors or problems, and thus may help ontology developers to improve their ontologies. To gain insight in the existence of pitfalls and to assess whether there are differences among ontologies developed by novices, a random set of already scanned ontologies, and existing well-known ones, data of 406 OWL ontologies were analysed on OOPS!’s 21 pitfalls, of which 24 ontologies were also examined manually on the detected pitfalls. The various analyses performed show only minor differences between the three sets of ontologies, therewith providing a general landscape of pitfalls in ontologies.

Style Guidelines for Naming and Labeling Ontologies in the Multilingual Web

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the context of the Semantic Web, natural language descriptions associated with ontologies have proven to be of major importance not only to support ontology developers and adopters, but also to assist in tasks such as ontology mapping, information extraction, or natural language generation. In the state-of-the-art we find some attempts to provide guidelines for URI local names in English, and also some disagreement on the use of URIs for describing ontology elements. When trying to extrapolate these ideas to a multilingual scenario, some of these approaches fail to provide a valid solution. On the basis of some real experiences in the translation of ontologies from English into Spanish, we provide a preliminary set of guidelines for naming and labeling ontologies in a multilingual scenario.

A Method for Developing Ontologies from User-Generated Classication Systems

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract Web 2.0 applications enabled users to classify information resources using their own vocabularies. The bottom-up nature of these user-generated classification systems have turned them into interesting knowledge sources, since they provide a rich terminology generated by potentially large user communities. Previous research has shown that it is possible to elicit some emergent semantics from the aggregation of individual classifications in these systems. However the generation of ontologies from them is still an open research problem. In this thesis we address the problem of how to tap into user-generated classification systems for building domain ontologies. Our objective is to design a method to develop domain ontologies from user-generated classifications systems. To do so, we rely on ontologies in the Web of Data to formalize the semantics of the knowledge collected from the classification system. Current ontology development methodologies have recognized the importance of reusing knowledge from existing resources. Thus, our work is framed within the NeOn methodology scenario for building ontologies by reusing and reengineering non-ontological resources. The main contributions of this work are: An integrated method to develop ontologies from user-generated classification systems. With this method we extract a domain terminology from the classification system and then we formalize the semantics of this terminology by reusing ontologies in the Web of Data. Identification and adaptation of existing techniques for implementing the activities in the method so that they can fulfill the requirements of each activity. A novel study about emerging semantics in user-generated lists. Resumen La web 2.0 permitió a los usuarios clasificar recursos de información usando su propio vocabulario. Estos sistemas de clasificación generados por usuarios son recursos interesantes para la extracción de conocimiento debido principalmente a que proveen una extensa terminología generada por grandes comunidades de usuarios. Se ha demostrado en investigaciones previas que es posible obtener una semántica emergente de estos sistemas. Sin embargo la generación de ontologías a partir de ellos es todavía un problema de investigación abierto. Esta tesis trata el problema de cómo aprovechar los sistemas de clasificación generados por usuarios en la construcción de ontologías de dominio. Así el objetivo de la tesis es diseñar un método para desarrollar ontologías de dominio a partir de sistemas de clasificación generados por usuarios. El método propuesto reutiliza conceptualizaciones existentes en ontologías publicadas en la Web de Datos para formalizar la semántica del conocimiento que se extrae del sistema de clasificación. Por tanto, este trabajo está enmarcado dentro del escenario para desarrollar ontologías mediante la reutilización y reingeniería de recursos no ontológicos que se ha definido en la Metodología NeOn. Las principales contribuciones de este trabajo son: Un método integrado para desarrollar una ontología de dominio a partir de sistemas de clasificación generados por usuarios. En este método se extrae una terminología de dominio del sistema de clasificación y posteriormente se formaliza su semántica reutilizando ontologías en la Web de Datos. La identificación y adaptación de un conjunto de técnicas para implementar las actividades propuestas en el método de tal manera que puedan cumplir automáticamente los requerimientos de cada actividad. Un novedoso estudio acerca de la semántica emergente en las listas generadas por usuarios en la Web.

Three dimensional (3D) microstructure-based modeling of interfacial decohesion in particle reinforced metal matrix composites

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Modeling and prediction of the overall elastic–plastic response and local damage mechanisms in heterogeneous materials, in particular particle reinforced composites, is a very complex problem. Microstructural complexities such as the inhomogeneous spatial distribution of particles, irregular morphology of the particles, and anisotropy in particle orientation after secondary processing, such as extrusion, significantly affect deformation behavior. We have studied the effect of particle/matrix interface debonding in SiC particle reinforced Al alloy matrix composites with (a) actual microstructure consisting of angular SiC particles and (b) idealized ellipsoidal SiC particles. Tensile deformation in SiC particle reinforced Al matrix composites was modeled using actual microstructures reconstructed from serial sectioning approach. Interfacial debonding was modeled using user-defined cohesive zone elements. Modeling with the actual microstructure (versus idealized ellipsoids) has a significant influence on: (a) localized stresses and strains in particle and matrix, and (b) far-field strain at which localized debonding takes place. The angular particles exhibited higher degree of load transfer and are more sensitive to interfacial debonding. Larger decreases in stress are observed in the angular particles, because of the flat surfaces, normal to the loading axis, which bear load. Furthermore, simplification of particle morphology may lead to erroneous results.

Aerodynamic Optimization of the Nose Shape of a High-speed Train

Relevância:

90.00% 90.00%

Publicador:

Resumo:

La influencia de la aerodinámica en el diseño de los trenes de alta velocidad, unida a la necesidad de resolver nuevos problemas surgidos con el aumento de la velocidad de circulación y la reducción de peso del vehículo, hace evidente el interés de plantear un estudio de optimización que aborde tales puntos. En este contexto, se presenta en esta tesis la optimización aerodinámica del testero de un tren de alta velocidad, llevada a cabo mediante el uso de métodos de optimización avanzados. Entre estos métodos, se ha elegido aquí a los algoritmos genéticos y al método adjunto como las herramientas para llevar a cabo dicha optimización. La base conceptual, las características y la implementación de los mismos se detalla a lo largo de la tesis, permitiendo entender los motivos de su elección, y las consecuencias, en términos de ventajas y desventajas que cada uno de ellos implican. El uso de los algorimos genéticos implica a su vez la necesidad de una parametrización geométrica de los candidatos a óptimo y la generación de un modelo aproximado que complementa al método de optimización. Estos puntos se describen de modo particular en el primer bloque de la tesis, enfocada a la metodología seguida en este estudio. El segundo bloque se centra en la aplicación de los métodos a fin de optimizar el comportamiento aerodinámico del tren en distintos escenarios. Estos escenarios engloban los casos más comunes y también algunos de los más exigentes a los que hace frente un tren de alta velocidad: circulación en campo abierto con viento frontal o viento lateral, y entrada en túnel. Considerando el caso de viento frontal en campo abierto, los dos métodos han sido aplicados, permitiendo una comparación de las diferentes metodologías, así como el coste computacional asociado a cada uno, y la minimización de la resistencia aerodinámica conseguida en esa optimización. La posibilidad de evitar parametrizar la geometría y, por tanto, reducir el coste computacional del proceso de optimización es la característica más significativa de los métodos adjuntos, mientras que en el caso de los algoritmos genéticos se destaca la simplicidad y capacidad de encontrar un óptimo global en un espacio de diseño multi-modal o de resolver problemas multi-objetivo. El caso de viento lateral en campo abierto considera nuevamente los dos métoxi dos de optimización anteriores. La parametrización se ha simplificado en este estudio, lo que notablemente reduce el coste numérico de todo el estudio de optimización, a la vez que aún recoge las características geométricas más relevantes en un tren de alta velocidad. Este análisis ha permitido identificar y cuantificar la influencia de cada uno de los parámetros geométricos incluídos en la parametrización, y se ha observado que el diseño de la arista superior a barlovento es fundamental, siendo su influencia mayor que la longitud del testero o que la sección frontal del mismo. Finalmente, se ha considerado un escenario más a fin de validar estos métodos y su capacidad de encontrar un óptimo global. La entrada de un tren de alta velocidad en un túnel es uno de los casos más exigentes para un tren por el pico de sobrepresión generado, el cual afecta a la confortabilidad del pasajero, así como a la estabilidad del vehículo y al entorno próximo a la salida del túnel. Además de este problema, otro objetivo a minimizar es la resistencia aerodinámica, notablemente superior al caso de campo abierto. Este problema se resuelve usando algoritmos genéticos. Dicho método permite obtener un frente de Pareto donde se incluyen el conjunto de óptimos que minimizan ambos objetivos. ABSTRACT Aerodynamic design of trains influences several aspects of high-speed trains performance in a very significant level. In this situation, considering also that new aerodynamic problems have arisen due to the increase of the cruise speed and lightness of the vehicle, it is evident the necessity of proposing an optimization study concerning the train aerodynamics. Thus, the aerodynamic optimization of the nose shape of a high-speed train is presented in this thesis. This optimization is based on advanced optimization methods. Among these methods, genetic algorithms and the adjoint method have been selected. A theoretical description of their bases, the characteristics and the implementation of each method is detailed in this thesis. This introduction permits understanding the causes of their selection, and the advantages and drawbacks of their application. The genetic algorithms requirethe geometrical parameterization of any optimal candidate and the generation of a metamodel or surrogate model that complete the optimization process. These points are addressed with a special attention in the first block of the thesis, focused on the methodology considered in this study. The second block is referred to the use of these methods with the purpose of optimizing the aerodynamic performance of a high-speed train in several scenarios. These scenarios englobe the most representative operating conditions of high-speed trains, and also some of the most exigent train aerodynamic problems: front wind and cross-wind situations in open air, and the entrance of a high-speed train in a tunnel. The genetic algorithms and the adjoint method have been applied in the minimization of the aerodynamic drag on the train with front wind in open air. The comparison of these methods allows to evaluate the methdology and computational cost of each one, as well as the resulting minimization of the aerodynamic drag. Simplicity and robustness, the straightforward realization of a multi-objective optimization, and the capability of searching a global optimum are the main attributes of genetic algorithm. However, the requirement of geometrically parameterize any optimal candidate is a significant drawback that is avoided with the use of the adjoint method. This independence of the number of design variables leads to a relevant reduction of the pre-processing and computational cost. Considering the cross-wind stability, both methods are used again for the minimization of the side force. In this case, a simplification of the geometric parameterization of the train nose is adopted, what dramatically reduces the computational cost of the optimization process. Nevertheless, some of the most important geometrical characteristics are still described with this simplified parameterization. This analysis identifies and quantifies the influence of each design variable on the side force on the train. It is observed that the A-pillar roundness is the most demanding design parameter, with a more important effect than the nose length or the train cross-section area. Finally, a third scenario is considered for the validation of these methods in the aerodynamic optimization of a high-speed train. The entrance of a train in a tunnel is one of the most exigent train aerodynamic problems. The aerodynamic consequences of high-speed trains running in a tunnel are basically resumed in two correlated phenomena, the generation of pressure waves and an increase in aerodynamic drag. This multi-objective optimization problem is solved with genetic algorithms. The result is a Pareto front where a set of optimal solutions that minimize both objectives.

Developing ontologies for representing data about key performance indicators

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Multiple indicators are of interest in smart cities at different scales and for different stakeholders. In open environments, such as The Web, or when indicator information has to be interchanged across systems, contextual information (e.g., unit of measurement, measurement method) should be transmitted together with the data and the lack of such information might cause undesirable effects. Describing the data by means of ontologies increases interoperability among datasets and applications. However, methodological guidance is crucial during ontology development in order to transform the art of modeling in an engineering activity. In the current paper, we present a methodological approach for modelling data about Key Performance Indicators and their context with an application example of such guidelines.

Gestión intelectual de las prácticas comunicativas en arquitectura : S, M, L, XL, el gran evento = The intellectual management of communication practices in architecture : S, M, L, XL, the big event

Relevância:

90.00% 90.00%

Publicador:

Resumo:

La tesis doctoral se centra en la posibilidad de entender que la práctica de arquitectura puede encontrar en las prácticas comunicativas un apoyo instrumental, que sobrepasa cualquier simplificación clásica del uso de los medios como una mera aplicación superficial, post-producida o sencillamente promocional. A partir de esta premisa se exponen casos del último cuarto del siglo XX y se detecta que amenazas como el riesgo de la banalización, la posible saturación de la imagen pública o la previsible asociación incorrecta con otros individuos en presentaciones grupales o por temáticas, han podido influir en un crecimiento notable de la adquisición de control, por parte de los arquitectos, en sus oportunidades mediáticas. Esto es, como si la arquitectura hubiera empezado a superar y optimizar algo inevitable, que las fórmulas expositivas y las publicaciones, o más bien del exponer(se) y publicar(se), son herramientas disponibles para activar algún tipo de gestión intelectual de la comunicación e información circulante sobre si misma. Esta práctica de “autoedición” se analiza en un periodo concreto de la trayectoria de OMA -Office for Metropolitan Architecture-, estudio considerado pionero en el uso eficiente, oportunista y personalizado de los medios. Así, la segunda parte de la tesis se ocupa del análisis de su conocida monografía S,M,L,XL (1995), un volumen que contó con gran participación por parte de sus protagonistas durante la edición, y de cuyo proceso de producción apenas se había investigado. Esta publicación señaló un punto de inflexión en su género alterando todo formato y restricciones anteriores, y se ha convertido en un volumen emblemático para la disciplina que ninguna réplica posterior ha podido superar. Aquí se presenta a su vez como el desencadenante de la construcción de un “gran evento” que concluye en la transformación de la identidad de OMA en 10 años, paradójicamente entre el nacimiento de la Fundación Groszstadt y el arranque de la actividad de AMO, dos entidades paralelas clave anexas a OMA. Este planteamiento deviene de cómo la investigación desvela que S,M,L,XL es una pieza más, central pero no independiente, dentro de una suma de acciones e individuos, así como otras publicaciones, exposiciones, eventos y también artículos ensayados y proyectos, en particular Bigness, Generic City, Euralille y los concursos de 1989. Son significativos aspectos como la apertura a una autoría múltiple, encabezada por Rem Koolhaas y el diseñador gráfico Bruce Mau, acompañados en los agradecimientos de la editora Jennifer Sigler y cerca de una centena de nombres, cuyas aportaciones no necesariamente se basan en la construcción de fragmentos del libro. La supresión de ciertos límites permite superar también las tareas inicialmente relevantes en la edición de una publicación. Un objetivo general de la tesis es también la reflexión sobre relaciones anteriormente cuestionadas, como la establecida entre la arquitectura y los mercados o la economía. Tomando como punto de partida la idea de “design intelligence” sugerida por Michael Speaks (2001), se extrae de sus argumentos que lo esencial es el hallazgo de la singularidad o inteligencia propia de cada estudio de arquitectura o diseño. Asimismo se explora si en la construcción de ese tipo de fórmulas magistrales se alojaban también combinaciones de interés y productivas entre asuntos como la eficiencia y la creatividad, o la organización y las ideas. En esta dinámica de relaciones bidireccionales, y en ese presente de exceso de información, se fundamenta la propuesta de una equivalencia más evidenciada entre la “socialización” del trabajo del arquitecto, al compartirlo públicamente e introducir nuevas conversaciones, y la relación inversa a partir del trabajo sobre la “socialización” misma. Como si la consciencia sobre el uso de los medios pudiera ser efectivamente instrumental, y contribuir al desarrollo de la práctica de arquitectura, desde una perspectiva idealmente comprometida e intelectual. ABSTRACT The dissertation argues the possibility to understand that the practice of architecture can find an instrumental support in the practices of communication, overcoming any classical simplification of the use of media, generally reduced to superficial treatments or promotional efforts. Thus some cases of the last decades of the 20th century are presented. Some threats detected, such as the risk of triviality, the saturation of the public image or the foreseeable wrong association among individuals when they are introduced as part of thematic groups, might have encouraged a noticeable increase of command taken by architects when there is chance to intervene in a media environment. In other words, it can be argued that architecture has started to overcome and optimize the inevitable, the fact that exhibition formulas and publications, or simply the practice of (self)exhibition or (self)publication, are tools at our disposal for the activation of any kind of intellectual management of communication and circulating information about itself. This practice of “self-edition” is analyzed in a specific timeframe of OMA’s trajectory, an office that is considered as a ground-breaking actor in the efficient and opportunistic use of media. Then the second part of the thesis dissects their monograph S,M,L,XL (1995), a volume in which its main characters were deeply involved in terms of edition and design, a process barely analyzed up to now. This publication marked a turning point in its own genre, disrupting old formats and traditional restrictions. It became such an emblematic volume for the discipline that none of the following attempts of replica has ever been able to improve this precedent. Here, the book is also presented as the element that triggers the construction of a “big event” that concludes in the transformation of OMA identity in 10 years. Paradoxically, between the birth of the Groszstadt Foundation and the early steps of AMO, both two entities parallel and connected to OMA. This positions emerge from how the research unveils that S,M,L,XL is one more piece, a key one but not an unrelated element, within a sum of actions and individuals, as well as other publications, exhibitions, articles and projects, in particular Bigness, Generic City, Euralille and the competitions of 1989. Among the remarkable innovations of the monograph, there is an outstanding openness to a regime of multiple authorship, headed by Rem Koolhaas and the graphic designer Bruce Mau, who share the acknowledgements page with the editor, Jennifer Sigler, and almost 100 people, not necessarily responsible for specific fragments of the book. In this respect, the dissolution of certain limits made possible that the expected tasks in the edition of a publication could be trespassed. A general goal of the thesis is also to open a debate on typically questioned relations, particularly between architecture and markets or economy. Using the idea of “design intelligence”, outlined by Michael Speaks in 2001, the thesis pulls out its essence, basically the interest in detecting the singularity, or particular intelligence of every office of architecture and design. Then it explores if in the construction of this kind of ingenious formulas one could find interesting and useful combinations among issues like efficiency and creativity, or organization and ideas. This dynamic of bidirectional relations, rescued urgently at this present moment of excess of information, is based on the proposal for a more evident equivalence between the “socialization” of the work in architecture, anytime it is shared in public, and the opposite concept, the work on the proper act of “socialization” itself. As if a new awareness of the capacities of the use of media could turn it into an instrumental force, capable of contributing to the development of the practice of architecture, from an ideally committed and intelectual perspective.

Essentials in Ontology Engineering: Methodologies, Languages, and Tools

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the beginning of the 90s, ontology development was similar to an art: ontology developers did not have clear guidelines on how to build ontologies but only some design criteria to be followed. Work on principles, methods and methodologies, together with supporting technologies and languages, made ontology development become an engineering discipline, the so-called Ontology Engineering. Ontology Engineering refers to the set of activities that concern the ontology development process and the ontology life cycle, the methods and methodologies for building ontologies, and the tool suites and languages that support them. Thanks to the work done in the Ontology Engineering field, the development of ontologies within and between teams has increased and improved, as well as the possibility of reusing ontologies in other developments and in final applications. Currently, ontologies are widely used in (a) Knowledge Engineering, Artificial Intelligence and Computer Science, (b) applications related to knowledge management, natural language processing, e-commerce, intelligent information integration, information retrieval, database design and integration, bio-informatics, education, and (c) the Semantic Web, the Semantic Grid, and the Linked Data initiative. In this paper, we provide an overview of Ontology Engineering, mentioning the most outstanding and used methodologies, languages, and tools for building ontologies. In addition, we include some words on how all these elements can be used in the Linked Data initiative.

OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web

Relevância:

80.00% 80.00%

Publicador:

Resumo:

OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web 1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs. These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools. Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate. However, linguistic annotation tools have still some limitations, which can be summarised as follows: 1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.). 2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts. 3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc. A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved. In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool. Therefore, it would be quite useful to find a way to (i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools; (ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate. Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned. Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section. 2. GOALS OF THE PRESENT WORK As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based triples, as in the usual Semantic Web languages (namely RDF(S) and OWL), in order for the model to be considered suitable for the Semantic Web. Besides, to be useful for the Semantic Web, this model should provide a way to automate the annotation of web pages. As for the present work, this requirement involved reusing the linguistic annotation tools purchased by the OEG research group (http://www.oeg-upm.net), but solving beforehand (or, at least, minimising) some of their limitations. Therefore, this model had to minimise these limitations by means of the integration of several linguistic annotation tools into a common architecture. Since this integration required the interoperation of tools and their annotations, ontologies were proposed as the main technological component to make them effectively interoperate. From the very beginning, it seemed that the formalisation of the elements and the knowledge underlying linguistic annotations within an appropriate set of ontologies would be a great step forward towards the formulation of such a model (henceforth referred to as OntoTag). Obviously, first, to combine the results of the linguistic annotation tools that operated at the same level, their annotation schemas had to be unified (or, preferably, standardised) in advance. This entailed the unification (id. standardisation) of their tags (both their representation and their meaning), and their format or syntax. Second, to merge the results of the linguistic annotation tools operating at different levels, their respective annotation schemas had to be (a) made interoperable and (b) integrated. And third, in order for the resulting annotations to suit the Semantic Web, they had to be specified by means of an ontology-based vocabulary, and structured by means of ontology-based triples, as hinted above. Therefore, a new annotation scheme had to be devised, based both on ontologies and on this type of triples, which allowed for the combination and the integration of the annotations of any set of linguistic annotation tools. This annotation scheme was considered a fundamental part of the model proposed here, and its development was, accordingly, another major objective of the present work. All these goals, aims and objectives could be re-stated more clearly as follows: Goal 1: Development of a set of ontologies for the formalisation of the linguistic knowledge relating linguistic annotation. Sub-goal 1.1: Ontological formalisation of the EAGLES (1996a; 1996b) de facto standards for morphosyntactic and syntactic annotation, in a way that helps respect the triple structure recommended for annotations in these works (which is isomorphic to the triple structures used in the context of the Semantic Web). Sub-goal 1.2: Incorporation into this preliminary ontological formalisation of other existing standards and standard proposals relating the levels mentioned above, such as those currently under development within ISO/TC 37 (the ISO Technical Committee dealing with Terminology, which deals also with linguistic resources and annotations). Sub-goal 1.3: Generalisation and extension of the recommendations in EAGLES (1996a; 1996b) and ISO/TC 37 to the semantic level, for which no ISO/TC 37 standards have been developed yet. Sub-goal 1.4: Ontological formalisation of the generalisations and/or extensions obtained in the previous sub-goal as generalisations and/or extensions of the corresponding ontology (or ontologies). Sub-goal 1.5: Ontological formalisation of the knowledge required to link, combine and unite the knowledge represented in the previously developed ontology (or ontologies). Goal 2: Development of OntoTag’s annotation scheme, a standard-based abstract scheme for the hybrid (linguistically-motivated and ontological-based) annotation of texts. Sub-goal 2.1: Development of the standard-based morphosyntactic annotation level of OntoTag’s scheme. This level should include, and possibly extend, the recommendations of EAGLES (1996a) and also the recommendations included in the ISO/MAF (2008) standard draft. Sub-goal 2.2: Development of the standard-based syntactic annotation level of the hybrid abstract scheme. This level should include, and possibly extend, the recommendations of EAGLES (1996b) and the ISO/SynAF (2010) standard draft. Sub-goal 2.3: Development of the standard-based semantic annotation level of OntoTag’s (abstract) scheme. Sub-goal 2.4: Development of the mechanisms for a convenient integration of the three annotation levels already mentioned. These mechanisms should take into account the recommendations included in the ISO/LAF (2009) standard draft. Goal 3: Design of OntoTag’s (abstract) annotation architecture, an abstract architecture for the hybrid (semantic) annotation of texts (i) that facilitates the integration and interoperation of different linguistic annotation tools, and (ii) whose results comply with OntoTag’s annotation scheme. Sub-goal 3.1: Specification of the decanting processes that allow for the classification and separation, according to their corresponding levels, of the results of the linguistic tools annotating at several different levels. Sub-goal 3.2: Specification of the standardisation processes that allow (a) complying with the standardisation requirements of OntoTag’s annotation scheme, as well as (b) combining the results of those linguistic tools that share some level of annotation. Sub-goal 3.3: Specification of the merging processes that allow for the combination of the output annotations and the interoperation of those linguistic tools that share some level of annotation. Sub-goal 3.4: Specification of the merge processes that allow for the integration of the results and the interoperation of those tools performing their annotations at different levels. Goal 4: Generation of OntoTagger’s schema, a concrete instance of OntoTag’s abstract scheme for a concrete set of linguistic annotations. These linguistic annotations result from the tools and the resources available in the research group, namely • Bitext’s DataLexica (http://www.bitext.com/EN/datalexica.asp), • LACELL’s (POS) tagger (http://www.um.es/grupos/grupo-lacell/quees.php), • Connexor’s FDG (http://www.connexor.eu/technology/machinese/glossary/fdg/), and • EuroWordNet (Vossen et al., 1998). This schema should help evaluate OntoTag’s underlying hypotheses, stated below. Consequently, it should implement, at least, those levels of the abstract scheme dealing with the annotations of the set of tools considered in this implementation. This includes the morphosyntactic, the syntactic and the semantic levels. Goal 5: Implementation of OntoTagger’s configuration, a concrete instance of OntoTag’s abstract architecture for this set of linguistic tools and annotations. This configuration (1) had to use the schema generated in the previous goal; and (2) should help support or refute the hypotheses of this work as well (see the next section). Sub-goal 5.1: Implementation of the decanting processes that facilitate the classification and separation of the results of those linguistic resources that provide annotations at several different levels (on the one hand, LACELL’s tagger operates at the morphosyntactic level and, minimally, also at the semantic level; on the other hand, FDG operates at the morphosyntactic and the syntactic levels and, minimally, at the semantic level as well). Sub-goal 5.2: Implementation of the standardisation processes that allow (i) specifying the results of those linguistic tools that share some level of annotation according to the requirements of OntoTagger’s schema, as well as (ii) combining these shared level results. In particular, all the tools selected perform morphosyntactic annotations and they had to be conveniently combined by means of these processes. Sub-goal 5.3: Implementation of the merging processes that allow for the combination (and possibly the improvement) of the annotations and the interoperation of the tools that share some level of annotation (in particular, those relating the morphosyntactic level, as in the previous sub-goal). Sub-goal 5.4: Implementation of the merging processes that allow for the integration of the different standardised and combined annotations aforementioned, relating all the levels considered. Sub-goal 5.5: Improvement of the semantic level of this configuration by adding a named entity recognition, (sub-)classification and annotation subsystem, which also uses the named entities annotated to populate a domain ontology, in order to provide a concrete application of the present work in the two areas involved (the Semantic Web and Corpus Linguistics). 3. MAIN RESULTS: ASSESSMENT OF ONTOTAG’S UNDERLYING HYPOTHESES The model developed in the present thesis tries to shed some light on (i) whether linguistic annotation tools can effectively interoperate; (ii) whether their results can be combined and integrated; and, if they can, (iii) how they can, respectively, interoperate and be combined and integrated. Accordingly, several hypotheses had to be supported (or rejected) by the development of the OntoTag model and OntoTagger (its implementation). The hypotheses underlying OntoTag are surveyed below. Only one of the hypotheses (H.6) was rejected; the other five could be confirmed. H.1 The annotations of different levels (or layers) can be integrated into a sort of overall, comprehensive, multilayer and multilevel annotation, so that their elements can complement and refer to each other. • CONFIRMED by the development of: o OntoTag’s annotation scheme, o OntoTag’s annotation architecture, o OntoTagger’s (XML, RDF, OWL) annotation schemas, o OntoTagger’s configuration. H.2 Tool-dependent annotations can be mapped onto a sort of tool-independent annotations and, thus, can be standardised. • CONFIRMED by means of the standardisation phase incorporated into OntoTag and OntoTagger for the annotations yielded by the tools. H.3 Standardisation should ease: H.3.1: The interoperation of linguistic tools. H.3.2: The comparison, combination (at the same level and layer) and integration (at different levels or layers) of annotations. • H.3 was CONFIRMED by means of the development of OntoTagger’s ontology-based configuration: o Interoperation, comparison, combination and integration of the annotations of three different linguistic tools (Connexor’s FDG, Bitext’s DataLexica and LACELL’s tagger); o Integration of EuroWordNet-based, domain-ontology-based and named entity annotations at the semantic level. o Integration of morphosyntactic, syntactic and semantic annotations. H.4 Ontologies and Semantic Web technologies (can) play a crucial role in the standardisation of linguistic annotations, by providing consensual vocabularies and standardised formats for annotation (e.g., RDF triples). • CONFIRMED by means of the development of OntoTagger’s RDF-triple-based annotation schemas. H.5 The rate of errors introduced by a linguistic tool at a given level, when annotating, can be reduced automatically by contrasting and combining its results with the ones coming from other tools, operating at the same level. However, these other tools might be built following a different technological (stochastic vs. rule-based, for example) or theoretical (dependency vs. HPS-grammar-based, for instance) approach. • CONFIRMED by the results yielded by the evaluation of OntoTagger. H.6 Each linguistic level can be managed and annotated independently. • REJECTED: OntoTagger’s experiments and the dependencies observed among the morphosyntactic annotations, and between them and the syntactic annotations. In fact, Hypothesis H.6 was already rejected when OntoTag’s ontologies were developed. We observed then that several linguistic units stand on an interface between levels, belonging thereby to both of them (such as morphosyntactic units, which belong to both the morphological level and the syntactic level). Therefore, the annotations of these levels overlap and cannot be handled independently when merged into a unique multileveled annotation. 4. OTHER MAIN RESULTS AND CONTRIBUTIONS First, interoperability is a hot topic for both the linguistic annotation community and the whole Computer Science field. The specification (and implementation) of OntoTag’s architecture for the combination and integration of linguistic (annotation) tools and annotations by means of ontologies shows a way to make these different linguistic annotation tools and annotations interoperate in practice. Second, as mentioned above, the elements involved in linguistic annotation were formalised in a set (or network) of ontologies (OntoTag’s linguistic ontologies). • On the one hand, OntoTag’s network of ontologies consists of − The Linguistic Unit Ontology (LUO), which includes a mostly hierarchical formalisation of the different types of linguistic elements (i.e., units) identifiable in a written text; − The Linguistic Attribute Ontology (LAO), which includes also a mostly hierarchical formalisation of the different types of features that characterise the linguistic units included in the LUO; − The Linguistic Value Ontology (LVO), which includes the corresponding formalisation of the different values that the attributes in the LAO can take; − The OIO (OntoTag’s Integration Ontology), which  Includes the knowledge required to link, combine and unite the knowledge represented in the LUO, the LAO and the LVO;  Can be viewed as a knowledge representation ontology that describes the most elementary vocabulary used in the area of annotation. • On the other hand, OntoTag’s ontologies incorporate the knowledge included in the different standards and recommendations for linguistic annotation released so far, such as those developed within the EAGLES and the SIMPLE European projects or by the ISO/TC 37 committee: − As far as morphosyntactic annotations are concerned, OntoTag’s ontologies formalise the terms in the EAGLES (1996a) recommendations and their corresponding terms within the ISO Morphosyntactic Annotation Framework (ISO/MAF, 2008) standard; − As for syntactic annotations, OntoTag’s ontologies incorporate the terms in the EAGLES (1996b) recommendations and their corresponding terms within the ISO Syntactic Annotation Framework (ISO/SynAF, 2010) standard draft; − Regarding semantic annotations, OntoTag’s ontologies generalise and extend the recommendations in EAGLES (1996a; 1996b) and, since no stable standards or standard drafts have been released for semantic annotation by ISO/TC 37 yet, they incorporate the terms in SIMPLE (2000) instead; − The terms coming from all these recommendations and standards were supplemented by those within the ISO Data Category Registry (ISO/DCR, 2008) and also of the ISO Linguistic Annotation Framework (ISO/LAF, 2009) standard draft when developing OntoTag’s ontologies. Third, we showed that the combination of the results of tools annotating at the same level can yield better results (both in precision and in recall) than each tool separately. In particular, 1. OntoTagger clearly outperformed two of the tools integrated into its configuration, namely DataLexica and FDG in all the combination sub-phases in which they overlapped (i.e. POS tagging, lemma annotation and morphological feature annotation). As far as the remaining tool is concerned, i.e. LACELL’s tagger, it was also outperformed by OntoTagger in POS tagging and lemma annotation, and it did not behave better than OntoTagger in the morphological feature annotation layer. 2. As an immediate result, this implies that a) This type of combination architecture configurations can be applied in order to improve significantly the accuracy of linguistic annotations; and b) Concerning the morphosyntactic level, this could be regarded as a way of constructing more robust and more accurate POS tagging systems. Fourth, Semantic Web annotations are usually performed by humans or else by machine learning systems. Both of them leave much to be desired: the former, with respect to their annotation rate; the latter, with respect to their (average) precision and recall. In this work, we showed how linguistic tools can be wrapped in order to annotate automatically Semantic Web pages using ontologies. This entails their fast, robust and accurate semantic annotation. As a way of example, as mentioned in Sub-goal 5.5, we developed a particular OntoTagger module for the recognition, classification and labelling of named entities, according to the MUC and ACE tagsets (Chinchor, 1997; Doddington et al., 2004). These tagsets were further specified by means of a domain ontology, namely the Cinema Named Entities Ontology (CNEO). This module was applied to the automatic annotation of ten different web pages containing cinema reviews (that is, around 5000 words). In addition, the named entities annotated with this module were also labelled as instances (or individuals) of the classes included in the CNEO and, then, were used to populate this domain ontology. • The statistical results obtained from the evaluation of this particular module of OntoTagger can be summarised as follows. On the one hand, as far as recall (R) is concerned, (R.1) the lowest value was 76,40% (for file 7); (R.2) the highest value was 97, 50% (for file 3); and (R.3) the average value was 88,73%. On the other hand, as far as the precision rate (P) is concerned, (P.1) its minimum was 93,75% (for file 4); (R.2) its maximum was 100% (for files 1, 5, 7, 8, 9, and 10); and (R.3) its average value was 98,99%. • These results, which apply to the tasks of named entity annotation and ontology population, are extraordinary good for both of them. They can be explained on the basis of the high accuracy of the annotations provided by OntoTagger at the lower levels (mainly at the morphosyntactic level). However, they should be conveniently qualified, since they might be too domain- and/or language-dependent. It should be further experimented how our approach works in a different domain or a different language, such as French, English, or German. • In any case, the results of this application of Human Language Technologies to Ontology Population (and, accordingly, to Ontological Engineering) seem very promising and encouraging in order for these two areas to collaborate and complement each other in the area of semantic annotation. Fifth, as shown in the State of the Art of this work, there are different approaches and models for the semantic annotation of texts, but all of them focus on a particular view of the semantic level. Clearly, all these approaches and models should be integrated in order to bear a coherent and joint semantic annotation level. OntoTag shows how (i) these semantic annotation layers could be integrated together; and (ii) they could be integrated with the annotations associated to other annotation levels. Sixth, we identified some recommendations, best practices and lessons learned for annotation standardisation, interoperation and merge. They show how standardisation (via ontologies, in this case) enables the combination, integration and interoperation of different linguistic tools and their annotations into a multilayered (or multileveled) linguistic annotation, which is one of the hot topics in the area of Linguistic Annotation. And last but not least, OntoTag’s annotation scheme and OntoTagger’s annotation schemas show a way to formalise and annotate coherently and uniformly the different units and features associated to the different levels and layers of linguistic annotation. This is a great scientific step ahead towards the global standardisation of this area, which is the aim of ISO/TC 37 (in particular, Subcommittee 4, dealing with the standardisation of linguistic annotations and resources).

Interchanging lexical resources on the Semantic Web

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Lexica and terminology databases play a vital role in many NLP applications, but currently most such resources are published in application-specific formats, or with custom access interfaces, leading to the problem that much of this data is in ‘‘data silos’’ and hence difficult to access. The Semantic Web and in particular the Linked Data initiative provide effective solutions to this problem, as well as possibilities for data reuse by inter-lexicon linking, and incorporation of data categories by dereferencable URIs. The Semantic Web focuses on the use of ontologies to describe semantics on the Web, but currently there is no standard for providing complex lexical information for such ontologies and for describing the relationship between the lexicon and the ontology. We present our model, lemon, which aims to address these gaps

Using Cross-Lingual Explicit Semantic Analysis for Improving Ontology Translation

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Semantic Web aims to allow machines to make inferences using the explicit conceptualisations contained in ontologies. By pointing to ontologies, Semantic Web-based applications are able to inter-operate and share common information easily. Nevertheless, multilingual semantic applications are still rare, owing to the fact that most online ontologies are monolingual in English. In order to solve this issue, techniques for ontology localisation and translation are needed. However, traditional machine translation is difficult to apply to ontologies, owing to the fact that ontology labels tend to be quite short in length and linguistically different from the free text paradigm. In this paper, we propose an approach to enhance machine translation of ontologies based on exploiting the well-structured concept descriptions contained in the ontology. In particular, our approach leverages the semantics contained in the ontology by using Cross Lingual Explicit Semantic Analysis (CLESA) for context-based disambiguation in phrase-based Statistical Machine Translation (SMT). The presented work is novel in the sense that application of CLESA in SMT has not been performed earlier to the best of our knowledge.

MultiFarm: A benchmark for multilingual ontology matching

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we present the MultiFarm dataset, which has been designed as a benchmark for multilingual ontology matching. The MultiFarm dataset is composed of a set of ontologies translated in different languages and the corresponding alignments between these ontologies. It is based on the OntoFarm dataset, which has been used successfully for several years in the Ontology Alignment Evaluation Initiative (OAEI). By translating the ontologies of the OntoFarm dataset into eight different languages – Chinese, Czech, Dutch, French, German, Portuguese, Russian, and Spanish – we created a comprehensive set of realistic test cases. Based on these test cases, it is possible to evaluate and compare the performance of matching approaches with a special focus on multilingualism.

«
1
2
3
4
»