928 resultados para Lexical Database
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
In the architecture of a natural language processing system based on linguistic knowledge, two types of component are important: the knowledge databases and the processing modules. One of the knowledge databases is the lexical database, which is responsible for providing the lexical unities and its properties to the processing modules. The systems that process two or more languages require bilingual and/or multilingual lexical databases. These databases can be constructed by aligning distinct monolingual databases. In this paper, we present the interlingua and the strategy of aligning the two monolingual databases in REBECA, which only stores concepts from the “wheeled vehicle” domain.
Resumo:
In this paper we present a whole Natural Language Processing (NLP) system for Spanish. The core of this system is the parser, which uses the grammatical formalism Lexical-Functional Grammars (LFG). Another important component of this system is the anaphora resolution module. To solve the anaphora, this module contains a method based on linguistic information (lexical, morphological, syntactic and semantic), structural information (anaphoric accessibility space in which the anaphor obtains the antecedent) and statistical information. This method is based on constraints and preferences and solves pronouns and definite descriptions. Moreover, this system fits dialogue and non-dialogue discourse features. The anaphora resolution module uses several resources, such as a lexical database (Spanish WordNet) to provide semantic information and a POS tagger providing the part of speech for each word and its root to make this resolution process easier.
Resumo:
This paper presents a Java-based hyperbolic-style browser designed to render RDF files as structured ontological maps. The program was motivated by the need to browse the content of a web-accessible ontology server: WEB KB-2. The ontology server contains descriptions of over 74,500 object types derived from the WordNet 1.7 lexical database and can be accessed using RDF syntax. Such a structure creates complications for hyperbolic-style displays. In WEB KB-2 there are 140 stable ontology link types and a hyperbolic display needs to filter and iconify the view so different link relations can be distinguished in multi-link views. Our browsing tool, OntoRama, is therefore motivated by two possibly interfering aims: the first to display up to 10 times the number of nodes in a hyperbolic-style view than using a conventional graphics display; secondly, to render the ontology with multiple links comprehensible in that view.
Resumo:
In this article, we present the first open-access lexical database that provides phonological representations for 120,000 Italian word forms. Each of these also includes syllable boundaries and stress markings and a comprehensive range of lexical statistics. Using data derived from this lexicon, we have also generated a set of derived databases and provided estimates of positional frequency use for Italian phonemes, syllables, syllable onsets and codas, and character and phoneme bigrams. These databases are freely available from phonitalia.org. This article describes the methods, content, and summarizing statistics for these databases. In a first application of this database, we also demonstrate how the distribution of phonological substitution errors made by Italian aphasic patients is related to phoneme frequency. © 2013 Psychonomic Society, Inc.
Resumo:
In this paper we try to present how information technologies as tools for the creation of digital bilingual dictionaries can help the preservation of natural languages. Natural languages are an outstanding part of human cultural values and for that reason they should be preserved as part of the world cultural heritage. We describe our work on the bilingual lexical database supporting the Bulgarian-Polish Online dictionary. The main software tools for the web- presentation of the dictionary are shortly described. We focus our special attention on the presentation of verbs, the richest from a specific characteristics viewpoint linguistic category in Bulgarian.
Resumo:
The paper aims to represent a bilingual online dictionary as a useful tool helping preservation of the natural languages. The author focuses on the approach that was taken to develop compatible bilingual lexical database for the Bulgarian-Polish online dictionary. A formal model for the dictionary encoding is developed in accordance with the complex structures of the dictionary entries. These structures vary depending on the grammatical characteristics of Bulgarian headwords. The Web-application for presentation of the bilingual dictionary is also describred.
Resumo:
Mark Pagel, Quentin D. Atkinson & Andrew Meade (2007). Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature, 449,717-720. RAE2008
Resumo:
L'objectiu d'aquest article és presentar l'estructura de la base de dades relacional que inclou tota la informació sintictica continguda en el Diccionario Critico Etimológico Castellano e Hispánico de J. Corominas i J. A. Pascual. Tot i que aquest diccionari conté un ampli ventall d'informacions històriques de cadascun dels temes, aquestes no es mostren de forma estructurada, per la qual cosa ha estat necessari estudiar i classificar tots aquells elements relacionats amb aspectes sintàctics. És a partir d'aquest estudi previ que s'han elaborat els diferents camps de la base de dades, els quals s'agrupen en cinc blocs temàtics: informació lemàtica; gramatical; sintàctica; altres aspectes relacionats; i observacions o comentaris rellevants fets per l'investigador. Aquesta base de dades no només reprodueix els continguts del diccionari, sinó que inclou diferents camps interpretatius. Per aquesta raó, Syntax. dbf representa una eina de treball fonamental per a tots aquells investigadors interessats en la sintaxi diacrònica de l'espanyol
Resumo:
Greek speakers say "ovpa", Germans "schwanz'' and the French "queue'' to describe what English speakers call a 'tail', but all of these languages use a related form of 'two' to describe the number after one. Among more than 100 Indo-European languages and dialects, the words for some meanings (such as 'tail') evolve rapidly, being expressed across languages by dozens of unrelated words, while others evolve much more slowly-such as the number 'two', for which all Indo-European language speakers use the same related word-form(1). No general linguistic mechanism has been advanced to explain this striking variation in rates of lexical replacement among meanings. Here we use four large and divergent language corpora (English(2), Spanish(3), Russian(4) and Greek(5)) and a comparative database of 200 fundamental vocabulary meanings in 87 Indo-European languages(6) to show that the frequency with which these words are used in modern language predicts their rate of replacement over thousands of years of Indo-European language evolution. Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora, and accounts for approximately 50% of the variation in historical rates of lexical replacement. We propose that the frequency with which specific words are used in everyday language exerts a general and law-like influence on their rates of evolution. Our findings are consistent with social models of word change that emphasize the role of selection, and suggest that owing to the ways that humans use language, some words will evolve slowly and others rapidly across all languages.
Resumo:
An implementation of a Lexical Functional Grammar (LFG) natural language front-end to a database is presented, and its capabilities demonstrated by reference to a set of queries used in the Chat-80 system. The potential of LFG for such applications is explored. Other grammars previously used for this purpose are briefly reviewed and contrasted with LFG. The basic LFG formalism is fully described, both as to its syntax and semantics, and the deficiencies of the latter for database access application shown. Other current LFG implementations are reviewed and contrasted with the LFG implementation developed here specifically for database access. The implementation described here allows a natural language interface to a specific Prolog database to be produced from a set of grammar rule and lexical specifications in an LFG-like notation. In addition to this the interface system uses a simple database description to compile metadata about the database for later use in planning the execution of queries. Extensions to LFG's semantic component are shown to be necessary to produce a satisfactory functional analysis and semantic output for querying a database. A diverse set of natural language constructs are analysed using LFG and the derivation of Prolog queries from the F-structure output of LFG is illustrated. The functional description produced from LFG is proposed as sufficient for resolving many problems of quantification and attachment.
Resumo:
Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
Resumo:
This paper presents a database ATP (Alternative Transient Program) simulated waveforms for shunt reactor switching cases with vacuum breakers in motor circuits following interruption of the starting current. The targeted objective is to provide multiple reignition simulated data for diagnostic and prognostic algorithms development, but also to help ATP users with practical study cases and component data compilation for shunt reactor switching. This method can be easily applied with different data for the different dielectric curves of circuit-breakers and networks. This paper presents design details, discusses some of the available cases and the advantages of such simulated data.