841 resultados para Natural language techniques, Semantic spaces, Random projection, Documents


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the context of monolingual and bilingual retrieval, Simple Knowledge Organisation System (SKOS) datasets can play a dual role as knowledge bases for semantic annotations and as language-independent resources for translation. With no existing track of formal evaluations of these aspects for datasets in SKOS format, we describe a case study on the usage of the Thesaurus for the Social Sciences in SKOS format for a retrieval setup based on the CLEF 2004-2006 Domain-Specific Track topics, documents and relevance assessments. Results showed a mixed picture with significant system-level improvements in terms of mean average precision in the bilingual runs. Our experiments set a new and improved baseline for using SKOS-based datasets with the GIRT collection and are an example of component-based evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of the work reported here is to capture the commonsense knowledge of non-expert human contributors. Achieving this goal will enable more intelligent human-computer interfaces and pave the way for computers to reason about our world. In the domain of natural language processing, it will provide the world knowledge much needed for semantic processing of natural language. To acquire knowledge from contributors not trained in knowledge engineering, I take the following four steps: (i) develop a knowledge representation (KR) model for simple assertions in natural language, (ii) introduce cumulative analogy, a class of nearest-neighbor based analogical reasoning algorithms over this representation, (iii) argue that cumulative analogy is well suited for knowledge acquisition (KA) based on a theoretical analysis of effectiveness of KA with this approach, and (iv) test the KR model and the effectiveness of the cumulative analogy algorithms empirically. To investigate effectiveness of cumulative analogy for KA empirically, Learner, an open source system for KA by cumulative analogy has been implemented, deployed, and evaluated. (The site "1001 Questions," is available at http://teach-computers.org/learner.html). Learner acquires assertion-level knowledge by constructing shallow semantic analogies between a KA topic and its nearest neighbors and posing these analogies as natural language questions to human contributors. Suppose, for example, that based on the knowledge about "newspapers" already present in the knowledge base, Learner judges "newspaper" to be similar to "book" and "magazine." Further suppose that assertions "books contain information" and "magazines contain information" are also already in the knowledge base. Then Learner will use cumulative analogy from the similar topics to ask humans whether "newspapers contain information." Because similarity between topics is computed based on what is already known about them, Learner exhibits bootstrapping behavior --- the quality of its questions improves as it gathers more knowledge. By summing evidence for and against posing any given question, Learner also exhibits noise tolerance, limiting the effect of incorrect similarities. The KA power of shallow semantic analogy from nearest neighbors is one of the main findings of this thesis. I perform an analysis of commonsense knowledge collected by another research effort that did not rely on analogical reasoning and demonstrate that indeed there is sufficient amount of correlation in the knowledge base to motivate using cumulative analogy from nearest neighbors as a KA method. Empirically, evaluating the percentages of questions answered affirmatively, negatively and judged to be nonsensical in the cumulative analogy case compares favorably with the baseline, no-similarity case that relies on random objects rather than nearest neighbors. Of the questions generated by cumulative analogy, contributors answered 45% affirmatively, 28% negatively and marked 13% as nonsensical; in the control, no-similarity case 8% of questions were answered affirmatively, 60% negatively and 26% were marked as nonsensical.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is about the use of natural language to communicate with computers. Most researches that have pursued this goal consider only requests expressed in English. A way to facilitate the use of several languages in natural language systems is by using an interlingua. An interlingua is an intermediary representation for natural language information that can be processed by machines. We propose to convert natural language requests into an interlingua [universal networking language (UNL)] and to execute these requests using software components. In order to achieve this goal, we propose OntoMap, an ontology-based architecture to perform the semantic mapping between UNL sentences and software components. OntoMap also performs component search and retrieval based on semantic information formalized in ontologies and rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the increasing production of information from e-government initiatives, there is also the need to transform a large volume of unstructured data into useful information for society. All this information should be easily accessible and made available in a meaningful and effective way in order to achieve semantic interoperability in electronic government services, which is a challenge to be pursued by governments round the world. Our aim is to discuss the context of e-Government Big Data and to present a framework to promote semantic interoperability through automatic generation of ontologies from unstructured information found in the Internet. We propose the use of fuzzy mechanisms to deal with natural language terms and present some related works found in this area. The results achieved in this study are based on the architectural definition and major components and requirements in order to compose the proposed framework. With this, it is possible to take advantage of the large volume of information generated from e-Government initiatives and use it to benefit society.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research and professional practices have the joint aim of re-structuring the preconceived notions of reality. They both want to gain the understanding about social reality. Social workers use their professional competence in order to grasp the reality of their clients, while researchers’ pursuit is to open the secrecies of the research material. Development and research are now so intertwined and inherent in almost all professional practices that making distinctions between practising, developing and researching has become difficult and in many aspects irrelevant. Moving towards research-based practices is possible and it is easily applied within the framework of the qualitative research approach (Dominelli 2005, 235; Humphries 2005, 280). Social work can be understood as acts and speech acts crisscrossing between social workers and clients. When trying to catch the verbal and non-verbal hints of each others’ behaviour, the actors have to do a lot of interpretations in a more or less uncertain mental landscape. Our point of departure is the idea that the study of social work practices requires tools which effectively reveal the internal complexity of social work (see, for example, Adams & Dominelli & Payne 2005, 294 – 295). The boom of qualitative research methodologies in recent decades is associated with much profound the rupture in humanities, which is called the linguistic turn (Rorty 1967). The idea that language is not transparently mediating our perceptions and thoughts about reality, but on the contrary it constitutes it was new and even confusing to many social scientists. Nowadays we have got used to read research reports which have applied different branches of discursive analyses or narratologic or semiotic approaches. Although differences are sophisticated between those orientations they share the idea of the predominance of language. Despite the lively research work of today’s social work and the research-minded atmosphere of social work practice, semiotics has rarely applied in social work research. However, social work as a communicative practice concerns symbols, metaphors and all kinds of the representative structures of language. Those items are at the core of semiotics, the science of signs, and the science which examines people using signs in their mutual interaction and their endeavours to make the sense of the world they live in, their semiosis. When thinking of the practice of social work and doing the research of it, a number of interpretational levels ought to be passed before reaching the research phase in social work. First of all, social workers have to interpret their clients’ situations, which will be recorded in the files. In some very rare cases those past situations will be reflected in discussions or perhaps interviews or put under the scrutiny of some researcher in the future. Each and every new observation adds its own flavour to the mixture of meanings. Social workers have combined their observations with previous experience and professional knowledge, furthermore, the situation on hand also influences the reactions. In addition, the interpretations made by social workers over the course of their daily working routines are never limited to being part of the personal process of the social worker, but are also always inherently cultural. The work aiming at social change is defined by the presence of an initial situation, a specific goal, and the means and ways of achieving it, which are – or which should be – agreed upon by the social worker and the client in situation which is unique and at the same time socially-driven. Because of the inherent plot-based nature of social work, the practices related to it can be analysed as stories (see Dominelli 2005, 234), given, of course, that they are signifying and told by someone. The research of the practices is concentrating on impressions, perceptions, judgements, accounts, documents etc. All these multifarious elements can be scrutinized as textual corpora, but not whatever textual material. In semiotic analysis, the material studied is characterised as verbal or textual and loaded with meanings. We present a contribution of research methodology, semiotic analysis, which has to our mind at least implicitly references to the social work practices. Our examples of semiotic interpretation have been picked up from our dissertations (Laine 2005; Saurama 2002). The data are official documents from the archives of a child welfare agency and transcriptions of the interviews of shelter employees. These data can be defined as stories told by the social workers of what they have seen and felt. The official documents present only fragmentations and they are often written in passive form. (Saurama 2002, 70.) The interviews carried out in the shelters can be described as stories where the narrators are more familiar and known. The material is characterised by the interaction between the interviewer and interviewee. The levels of the story and the telling of the story become apparent when interviews or documents are examined with the use of semiotic tools. The roots of semiotic interpretation can be found in three different branches; the American pragmatism, Saussurean linguistics in Paris and the so called formalism in Moscow and Tartu; however in this paper we are engaged with the so called Parisian School of semiology which prominent figure was A. J. Greimas. The Finnish sociologists Pekka Sulkunen and Jukka Törrönen (1997a; 1997b) have further developed the ideas of Greimas in their studies on socio-semiotics, and we lean on their ideas. In semiotics social reality is conceived as a relationship between subjects, observations, and interpretations and it is seen mediated by natural language which is the most common sign system among human beings (Mounin 1985; de Saussure 2006; Sebeok 1986). Signification is an act of associating an abstract context (signified) to some physical instrument (signifier). These two elements together form the basic concept, the “sign”, which never constitutes any kind of meaning alone. The meaning will be comprised in a distinction process where signs are being related to other signs. In this chain of signs, the meaning becomes diverged from reality. (Greimas 1980, 28; Potter 1996, 70; de Saussure 2006, 46-48.) One interpretative tool is to think of speech as a surface under which deep structures – i.e. values and norms – exist (Greimas & Courtes 1982; Greimas 1987). To our mind semiotics is very much about playing with two different levels of text: the syntagmatic surface which is more or less faithful to the grammar, and the paradigmatic, semantic structure of values and norms hidden in the deeper meanings of interpretations. Semiotic analysis deals precisely with the level of meaning which exists under the surface, but the only way to reach those meanings is through the textual level, the written or spoken text. That is why the tools are needed. In our studies, we have used the semiotic square and the actant analysis. The former is based on the distinctions and the categorisations of meanings, and the latter on opening the plotting of narratives in order to reach the value structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In his in uential article about the evolution of the Web, Berners-Lee [1] envisions a Semantic Web in which humans and computers alike are capable of understanding and processing information. This vision is yet to materialize. The main obstacle for the Semantic Web vision is that in today's Web meaning is rooted most often not in formal semantics, but in natural language and, in the sense of semiology, emerges not before interpretation and processing. Yet, an automated form of interpretation and processing can be tackled by precisiating raw natural language. To do that, Web agents extract fuzzy grassroots ontologies through induction from existing Web content. Inductive fuzzy grassroots ontologies thus constitute organically evolved knowledge bases that resemble automated gradual thesauri, which allow precisiating natural language [2]. The Web agents' underlying dynamic, self-organizing, and best-effort induction, enable a sub-syntactical bottom up learning of semiotic associations. Thus, knowledge is induced from the users' natural use of language in mutual Web interactions, and stored in a gradual, thesauri-like lexical-world knowledge database as a top-level ontology, eventually allowing a form of computing with words [3]. Since when computing with words the objects of computation are words, phrases and propositions drawn from natural languages, it proves to be a practical notion to yield emergent semantics for the Semantic Web. In the end, an improved understanding by computers on the one hand should upgrade human- computer interaction on the Web, and, on the other hand allow an initial version of human- intelligence amplification through the Web.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many attempts have been made to provide multilinguality to the Semantic Web, by means of annotation properties in Natural Language (NL), such as RDFs or SKOS labels, and other lexicon-ontology models, such as lemon, but there are still many issues to be solved if we want to have a truly accessible Multilingual Semantic Web (MSW). Reusability of monolingual resources (ontologies, lexicons, etc.), accessibility of multilingual resources hindered by many formats, reliability of ontological sources, disambiguation problems and multilingual presentation to the end user of all this information in NL can be mentioned as some of the most relevant problems. Unless this NL presentation is achieved, MSW will be restricted to the limits of IT experts, but even so, with great dissatisfaction and disenchantment

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La Gestión de Recursos Humanos a través de Internet es un problema latente y presente actualmente en cualquier sitio web dedicado a la búsqueda de empleo. Este problema también está presente en AFRICA BUILD Portal. AFRICA BUILD Portal es una emergente red socio-profesional nacida con el ánimo de crear comunidades virtuales que fomenten la educación e investigación en el área de la salud en países africanos. Uno de los métodos para fomentar la educación e investigación es mediante la movilidad de estudiantes e investigadores entre instituciones, apareciendo así, el citado problema de la gestión de recursos humanos. Por tanto, este trabajo se centra en solventar el problema de la gestión de recursos humanos en el entorno específico de AFRICA BUILD Portal. Para solventar este problema, el objetivo es desarrollar un sistema de recomendación que ayude en la gestión de recursos humanos en lo que concierne a la selección de las mejores ofertas y demandas de movilidad. Caracterizando al sistema de recomendación como un sistema semántico el cual ofrecerá las recomendaciones basándose en las reglas y restricciones impuestas por el dominio. La aproximación propuesta se basa en seguir el enfoque de los sistemas de Matchmaking semánticos. Siguiendo este enfoque, por un lado, se ha empleado un razonador de lógica descriptiva que ofrece inferencias útiles en el cálculo de las recomendaciones y por otro lado, herramientas de procesamiento de lenguaje natural para dar soporte al proceso de recomendación. Finalmente para la integración del sistema de recomendación con AFRICA BUILD Portal se han empleado diversas tecnologías web. Los resultados del sistema basados en la comparación de recomendaciones creadas por el sistema y por usuarios reales han mostrado un funcionamiento y rendimiento aceptable. Empleando medidas de evaluación de sistemas de recuperación de información se ha obtenido una precisión media del sistema de un 52%, cifra satisfactoria tratándose de un sistema semántico. Pudiendo concluir que con la solución implementada se ha construido un sistema estable y modular posibilitando: por un lado, una fácil evolución que debería ir encaminada a lograr un rendimiento mayor, incrementando su precisión y por otro lado, dejando abiertas nuevas vías de crecimiento orientadas a la explotación del potencial de AFRICA BUILD Portal mediante la Web 3.0. ---ABSTRACT---The Human Resource Management through Internet is currently a latent problem shown in any employment website. This problem has also appeared in AFRICA BUILD Portal. AFRICA BUILD Portal is an emerging socio-professional network with the objective of creating virtual communities to foster the capacity for health research and education in African countries. One way to foster this capacity of research and education is through the mobility of students and researches between institutions, thus appearing the Human Resource Management problem. Therefore, this dissertation focuses on solving the Human Resource Management problem in the specific environment of AFRICA BUILD Portal. To solve this problem, the objective is to develop a recommender system which assists the management of Human Resources with respect to the selection of the best mobility supplies and demands. The recommender system is a semantic system which will provide the recommendations according to the domain rules and restrictions. The proposed approach is based on semantic matchmaking solutions. So, this approach on the one hand uses a Description Logics reasoning engine which provides useful inferences to the recommendation process and on the other hand uses Natural Language Processing techniques to support the recommendation process. Finally, Web technologies are used in order to integrate the recommendation system into AFRICA BUILD Portal. The results of evaluating the system are based on the comparison between recommendations created by the system and by real users. These results have shown an acceptable behavior and performance. The average precision of the system has been obtained by evaluation measures for information retrieval systems, so the average precision of the system is at 52% which may be considered as a satisfactory result taking into account that the system is a semantic system. To conclude, it could be stated that the implemented system is stable and modular. This fact on the one hand allows an easy evolution that should aim to achieve a higher performance by increasing its average precision and on the other hand keeps open new ways to increase the functionality of the system oriented to exploit the potential of AFRICA BUILD Portal through Web 3.0.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esta tesis tiene por objeto estudiar las posibilidades de realizar en castellano tareas relativas a la resolución de problemas con sistemas basados en el conocimiento. En los dos primeros capítulos se plantea un análisis de la trayectoria seguida por las técnicas de tratamiento del lenguaje natural, prestando especial interés a los formalismos lógicos para la comprensión del lenguaje. Seguidamente, se plantea una valoración de la situación actual de los sistemas de tratamiento del lenguaje natural. Finalmente, se presenta lo que constituye el núcleo de este trabajo, un sistema llamado Sirena, que permite realizar tareas de adquisición, comprensión, recuperación y explicación de conocimiento en castellano con sistemas basados en el conocimiento. Este sistema contiene un subconjunto del castellano amplio pero simple formalizado con una gramática lógica. El significado del conocimiento se basa en la lógica y ha sido implementado en el lenguaje de programación lógica Prolog II vS. Palabras clave: Programación Lógica, Comprensión del Lenguaje Natural, Resolución de Problemas, Gramáticas Lógicas, Lingüistica Computacional, Inteligencia Artificial.---ABSTRACT---The purpose of this thesis is to study the possibi1 ities of performing in Spanish problem solving tasks with knowledge based systems. Ule study the development of the techniques for natural language processing with a particular interest in the logical formalisms that have been used to understand natural languages. Then, we present an evaluation of the current state of art in the field of natural language processing systems. Finally, we introduce the main contribution of our work, Sirena a system that allows the adquisition, understanding, retrieval and explanation of knowledge in Spanish with knowledge based systems. Sirena can deal with a large, although simple» subset of Spanish. This subset has been formalised by means of a logic grammar and the meaning of knowledge is based on logic. Sirena has been implemented in the programming language Prolog II v2. Keywords: Logic Programming, Understanding Natural Language, Problem Solving, Logic Grammars, Cumputational Linguistic, Artificial Intelligence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes our participation at SemEval- 2014 sentiment analysis task, in both contextual and message polarity classification. Our idea was to com- pare two different techniques for sentiment analysis. First, a machine learning classifier specifically built for the task using the provided training corpus. On the other hand, a lexicon-based approach using natural language processing techniques, developed for a ge- neric sentiment analysis task with no adaptation to the provided training corpus. Results, though far from the best runs, prove that the generic model is more robust as it achieves a more balanced evaluation for message polarity along the different test sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En el presente Trabajo de Fin de Máster se ha realizado un análisis sobre las técnicas y herramientas de Generación de Lenguaje Natural (GLN), así como las modificaciones a la herramienta Simple NLG para generar expresiones en el idioma Español. Dicha extensión va a permitir ampliar el grupo de personas a las cuales se les transmite la información, ya que alrededor de 540 millones de personas hablan español. Keywords - Generación de Lenguaje Natural, técnicas de GLN, herramientas de GLN, Inteligencia Artificial, análisis, SimpleNLG.---ABSTRACT---In this Master's Thesis has been performed an analysis on techniques and tools for Natural Language Generation (NLG), also the Simple NLG tool has been modified in order to generate expressions in the Spanish language. This modification will allow transmitting the information to more people; around 540 million people speak Spanish. Keywords - Natural Language Generation, NLG tools, NLG techniques, Artificial Intelligence, analysis, SimpleNLG.