69 resultados para Lexical Database
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
Princeton WordNet (WN.Pr) lexical database has motivated efficient compilations of bulky relational lexicons since its inception in the 1980's. The EuroWordNet project, the first multilingual initiative built upon WN.Pr, opened up ways of building individual wordnets, and interrelating them by means of the so-called Inter-Lingual-Index, an unstructured list of the WN.Pr synsets. Other important initiative, relying on a slightly different method of building multilingual wordnets, is the MultiWordNet project, where the key strategy is building language specific wordnets keeping as much as possible of the semantic relations available in the WN.Pr. This paper, in particular, stresses that the additional advantage of using WN.Pr lexical database as a resource for building wordnets for other languages is to explore possibilities of implementing an automatic procedure to map the WN.Pr conceptual relations as hyponymy, co-hyponymy, troponymy, meronymy, cause, and entailment onto the lexical database of the wordnet under construction, a viable possibility, for those are language-independent relations that hold between lexicalized concepts, not between lexical units. Accordingly, combining methods from both initiatives, this paper presents the ongoing implementation of the WN.Br lexical database and the aforementioned automation procedure illustrated with a sample of the automatic encoding of the hyponymy and co-hyponymy relations.
Resumo:
This paper discusses particular linguistic challenges in the task of reusing published dictionaries, conceived as structured sources of lexical information, in the compilation process of a machine-tractable thesaurus-like lexical database for Brazilian Portuguese. After delimiting the scope of the polysemous term thesaurus, the paper focuses on the improvement of the resulting object by a small team, in a form compatible with and inspired by WordNet guidelines, comments on the dictionary entries, addresses selected problems found in the process of extracting the relevant lexical information form the selected dictionaries, and provides some strategies to overcome them.
Resumo:
This paper presents the overall methodology that has been used to encode both the Brazilian Portuguese WordNet (WordNet.Br) standard language-independent conceptual-semantic relations (hyponymy, co-hyponymy, meronymy, cause, and entailment) and the so-called cross-lingual conceptual-semantic relations between different wordnets. Accordingly, after contextualizing the project and outlining the current lexical database structure and statistics, it describes the WordNet.Br editing GUI that was designed to aid the linguist in carrying out the tasks of building synsets, selecting sample sentences from corpora, writing synset concept glosses, and encoding both language-independent conceptual-semantic relations and cross-lingual conceptual-semantic relations between WordNet.Br and Princeton WordNet © Springer-Verlag Berlin Heidelberg 2006.
Resumo:
The need for the representation of both semantics and common sense and its organization in a lexical database or knowledge base has motivated the development of large projects, such as Wordnets, CYC and Mikrokosmos. Besides the generic bases, another approach is the construction of ontologies for specific domains. Among the advantages of such approach there is the possibility of a greater and more detailed coverage of a specific domain and its terminology. Domain ontologies are important resources in several tasks related to the language processing, especially in those related to information retrieval and extraction in textual bases. Information retrieval or even question and answer systems can benefit from the domain knowledge represented in an ontology. Besides embracing the terminology of the field, the ontology makes the relationships among the terms explicit. Copyright 2007 ACM.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
In the architecture of a natural language processing system based on linguistic knowledge, two types of component are important: the knowledge databases and the processing modules. One of the knowledge databases is the lexical database, which is responsible for providing the lexical unities and its properties to the processing modules. The systems that process two or more languages require bilingual and/or multilingual lexical databases. These databases can be constructed by aligning distinct monolingual databases. In this paper, we present the interlingua and the strategy of aligning the two monolingual databases in REBECA, which only stores concepts from the “wheeled vehicle” domain.
Resumo:
The main goal of our research was to search for SSRs in the Eucalyptus EST FORESTs database (using a software for mining SSR-motifs). With this objective, we created a database for cataloging Eucalyptus EST-derived SSRs, and developed a bioinformatics tool, named Satellyptus, for finding and analyzing microsatellites in the Eucalyptus EST database. The search for microsatellites in the FORESTs database containing 71,115 Eucalyptus EST sequences (52.09 Mb) revealed 20,530 SSRs in 15,621 ESTs. The SSR abundance detected on the Eucalyptus ESTs database (29% or one microsatellite every four sequences) is considered very high for plants. Amongst the categories of SSR motifs, the dimeric (37%) and trimeric ones (33%) predominated. The AG/CT motif was the most frequent (35.15%) followed by the trimeric CCG/CGG (12.81%). From a random sample of 1,217 sequences, 343 microsatellites in 265 SSR-containing sequences were identified. Approximately 48% of these ESTs containing microsatellites were homologous to proteins with known biological function. Most of the microsatellites detected in Eucalyptus ESTs were positioned at either the 5 or 3 end. Our next priority involves the design of flanking primers for codominant SSR loci, which could lead to the development of a set of microsatellite-based markers suitable for marker-assisted Eucalyptus breeding programs.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
This article updates the Brazilian database on food carotenoids. Emphasis is on carotenoids that have been demonstrated important to human health: alpha-carotene, beta-carotene, beta-cryptoxanthin, lycopene, lutein and zeaxanthin. The sampling and sample preparation strategies and the analytical methodology are presented. Possible sources of analytical errors, as well as the measures taken to avoid them, are discussed. Compositional variation due to such factors as variety/cultivar, stage of maturity, part of the plant utilized, climate or season and production technique are demonstrated. The effects of post-harvest handling, preparation, processing and storage of food on the carotenoid composition are also discussed. The importance of biodiversity is manifested by the variety of carotenoid sources and the higher levels of carotenoids in native, uncultivated or semi-cultivated fruits and vegetables in comparison to commercially produced crops. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
No âmbito do Processamento Automático de Línguas Naturais (PLN), o desenvolvimento de recursos léxico-semânticos é premente. Ao conceber os sistemas de PLN como um exercício de engenharia da linguagem humana, acredita-se que o desenvolvimento de tais recursos pode ser beneficiado pelos modelos de representação do conhecimento, desenvolvidos pela Engenharia do Conhecimento. Esses modelos, em particular, fornecem simultaneamente o arcabouço teórico-metodológico e a metalinguagem formal para o tratamento computacional do significado das unidades lexicais. Neste artigo, após a apresentação da concepção linguístico-computacional de léxico, elucidam-se os principais paradigmas de representação do conhecimento, enfatizando a abordagem do significado e a metalinguagem formal vinculadas a cada um deles.
Resumo:
This paper reports the ongoing project (since 2002) of developing a wordnet for Brazilian Portuguese (Wordnet.Br) from scratch. In particular, it describes the process of constructing the Wordnet.Br core database, which has 44,000 words organized in 18,500 synsets Accordingly, it briefly sketches the project overall methodology, its lexical resourses, the synset compilation process, and the Wordnet.Br editor, a GUI (graphical user interface) which aids the linguist in the compilation and maintenance of the Wordnet.Br. It concludes with the planned further work.
Resumo:
Genome sequencing efforts are providing us with complete genetic blueprints for hundreds of organisms. We are now faced with assigning, understanding, and modifying the functions of proteins encoded by these genomes. DBMODELING is a relational database of annotated comparative protein structure models and their metabolic pathway characterization, when identified. This procedure was applied to complete genomes such as Mycobacteritum tuberculosis and Xylella fastidiosa. The main interest in the study of metabolic pathways is that some of these pathways are not present in humans, which makes them selective targets for drug design, decreasing the impact of drugs in humans. In the database, there are currently 1116 proteins from two genomes. It can be accessed by any researcher at http://www.biocristalografia.df.ibilce.unesp.br/tools/. This project confirms that homology modeling is a useful tool in structural bioinformatics and that it can be very valuable in annotating genome sequence information, contributing to structural and functional genomics, and analyzing protein-ligand docking.