377 resultados para Thesaurus
Resumo:
Automatic keyword or keyphrase extraction is concerned with assigning keyphrases to documents based on words from within the document. Previous studies have shown that in a significant number of cases author-supplied keywords are not appropriate for the document to which they are attached. This can either be because they represent what the author believes a paper is about not what it actually is, or because they include keyphrases which are more classificatory than explanatory e.g., “University of Poppleton” instead of “Knowledge Discovery in Databases”. Thus, there is a need for a system that can generate an appropriate and diverse range of keyphrases that reflect the document. This paper proposes two possible solutions that examine the synonyms of words and phrases in the document to find the underlying themes, and presents these as appropriate keyphrases. Using three different freely available thesauri, the work undertaken examines two different methods of producing keywords and compares the outcomes across multiple strands in the timeline. The primary method explores taking n-grams of the source document phrases, and examining the synonyms of these, while the secondary considers grouping outputs by their synonyms. The experiments undertaken show the primary method produces good results and that the secondary method produces both good results and potential for future work. In addition, the different qualities of the thesauri are examined and it is concluded that the more entries in a thesaurus, the better it is likely to perform. The age of the thesaurus or the size of each entry does not correlate to performance.
Resumo:
Deviations from the average can provide valuable insights about the organization of natural systems. The present article extends this important principle to the systematic identification and analysis of singular motifs in complex networks. Six measurements quantifying different and complementary features of the connectivity around each node of a network were calculated, and multivariate statistical methods applied to identify singular nodes. The potential of the presented concepts and methodology was illustrated with respect to different types of complex real-world networks, namely the US air transportation network, the protein-protein interactions of the yeast Saccharomyces cerevisiae and the Roget thesaurus networks. The obtained singular motifs possessed unique functional roles in the networks. Three classic theoretical network models were also investigated, with the Barabasi-Albert model resulting in singular motifs corresponding to hubs, confirming the potential of the approach. Interestingly, the number of different types of singular node motifs as well as the number of their instances were found to be considerably higher in the real-world networks than in any of the benchmark networks. Copyright (C) EPLA, 2009
Resumo:
This paper discusses particular linguistic challenges in the task of reusing published dictionaries, conceived as structured sources of lexical information, in the compilation process of a machine-tractable thesaurus-like lexical database for Brazilian Portuguese. After delimiting the scope of the polysemous term thesaurus, the paper focuses on the improvement of the resulting object by a small team, in a form compatible with and inspired by WordNet guidelines, comments on the dictionary entries, addresses selected problems found in the process of extracting the relevant lexical information form the selected dictionaries, and provides some strategies to overcome them.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Pós-graduação em Ciência da Informação - FFC
Resumo:
Pós-graduação em Ciência da Informação - FFC
Resumo:
Pós-graduação em Linguística e Língua Portuguesa - FCLAR
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Três tópicos são examinados: 1) História sucinta da Lexicografia de duas línguas latinas (espanhol e francês) e do português. São avaliados os principais dicionários dessas línguas do século XVI ao século XX. 2) Tipologia de obras lexicográficas. São indicados e comentados os principais tipos de dicionário existentes nas línguas latinas e no inglês. 3) O uso do computador na Lexicografia contemporânea. Essa máquina revolucionou a Lexicografia, podendo executar tarefas básicas e enfadonhas como: compilar, classificar e ordenar dados léxicos e contextuais para a confecção de dicionários e depois recuperá-los facilmente e com rapidez.
Resumo:
Os dicionários unilingues têm vários tamanhos e formatos. O número de verbetes que contêm depende do público a que se destina cada dicionário. O dicionário padrão da língua é um dicionário unilingue com 50.000 verbetes aproximadamente, incluindo um acervo léxico substancial, sem constituir, porém, um thesaurus que recolhe todas as palavras do léxico. O dicionário padrão é um instrumento cultural muito importante na sociedade contemporânea. Neste artigo são examinados vários problemas relacionados com a confecção de um dicionário padrão e de dicionários unilingues: a seleção das entradas lexicais, a compilação do corpus (banco de dados), a redação dos verbetes, a definição lexicográfica. Também se analisa a problemática da polissemia e da homonímia na elaboração do dicionário.
Resumo:
This article analyzes the specific features and processes of indexing and classification performed in school libraries to process and retrieve information from their collections. Subject languages used in Spanish, Portuguese and Brazilian Portuguese school libraries are also analyzed. To achieve this goal, the concept of school library was analyzed, its function was studied and the techniques and tools that allow the information organization were examined. Among the tools, we studied the Subject Headings Lists for children and juveniles'books and the Subject Headings List for public libraries, the Universal Decimal Classification System (paperback edition) or the classification by fields of interest and specialized thesauri like the Tesauro de la Educacion UNESCO-OIE and the TesauroEuropeo de la Educacion.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Twenty eight species of Temnocerus Thunberg, 1815 are recognized from Central America (Mexico to Panama) with eight previously described species and 20 new species as follows: T. abdominalis (Voss), T. chiapensis n. sp., T. chiriquensis (Sharp), T. confertus (Sharp), T. cyaneus n. sp., T. ellus n. sp., T. giganteus n. sp., T. guatemalenus (Sharp), T. guerrerensis n. sp., T. herediensis n. sp., T. mexicanus n. sp., T. michoacensis n. sp., T. minutus n. sp., T. niger n. sp., T. oaxacensis n. sp., T. obrieni, n. sp., T. oculatus (Sharp), T. potosi n. sp., T. pseudaeratus n. sp., T. pueblensis n. sp., T. pusillus (Sharp), T. regularis (Sharp), T. rostralis n. sp., T. rugosus n. sp., T. salvensis n. sp., T. tamaulipensis n. sp., T. thesaurus (Sharp) and T. yucatensis n. sp. Rhynchites debilis Sharp is placed in synonymy with Temnocerus guatemalenus (Sharp) and Pselaphorhynchites lindae Hamilton is placed in synonymy with Temnocerus regularis (Sharp). A key to species based on external characters and male genitalia is provided as well as digital images, aedeagus drawings, and map distributions.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Complex networks have attracted increasing interest from various fields of science. It has been demonstrated that each complex network model presents specific topological structures which characterize its connectivity and dynamics. Complex network classification relies on the use of representative measurements that describe topological structures. Although there are a large number of measurements, most of them are correlated. To overcome this limitation, this paper presents a new measurement for complex network classification based on partially self-avoiding walks. We validate the measurement on a data set composed by 40000 complex networks of four well-known models. Our results indicate that the proposed measurement improves correct classification of networks compared to the traditional ones. (C) 2012 American Institute of Physics. [http://dx.doi.org/10.1063/1.4737515]