9 resultados para Information retrieval, Web search behavior, Cognitive style
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
The University of São Paulo has been experiencing the increase in contents in electronic and digital formats, distributed by different suppliers and hosted remotely or in clouds, and is faced with the also increasing difficulties related to facilitating access to this digital collection by its users besides coexisting with the traditional world of physical collections. A possible solution was identified in the new generation of systems called Web Scale Discovery, which allow better management, data integration and agility of search. Aiming to identify if and how such a system would meet the USP demand and expectation and, in case it does, to identify what the analysis criteria of such a tool would be, an analytical study with an essentially documental base was structured, as from a revision of the literature and from data available in official websites and of libraries using this kind of resources. The conceptual base of the study was defined after the identification of software assessment methods already available, generating a standard with 40 analysis criteria, from details on the unique access interface to information contents, web 2.0 characteristics, intuitive interface, facet navigation, among others. The details of the studies conducted into four of the major systems currently available in this software category are presented, providing subsidies for the decision-making of other libraries interested in such systems.
Resumo:
Stereotyped behaviors have been routinely used as characters for phylogeny inference, but the same cannot be said of the plastic aspects of performance, which routinely are taken as a result of ecological processes. In this paper we examine the evolution of one of these plastic behavioral phenotypes, thus fostering a bridge between ecological and evolutionary processes. Foraging behavior in spiders is context dependent in many aspects, since it varies with prey type and size, spider nutritional and developmental state, previous experience and, in webweavers, is dependent on the structure of the web. Reeling is a predatory tactic typical of cobweb weavers (Theridiidae), in which the spider moves the prey toward her by pulling the capture thread (gumfoot) to which it is adhered. Predatory reeling is dependent on the gumfoot for its expression, and has not been previously reported in orbweavers. In order to investigate the evolution of this web dependent behavior, we built artificial, pseudogumfoot lines in orbwebs and registered parameters of the predatory tactics in this modified web. Aspects of the predatory tactics of 240 individuals (12 species in 4 families) were measured, and the resulting data were optimized on the phylogeny of Orbiculariae. All species perform predatory reeling with the pseudogumfoot lines. Thus, predatory reeling is homologous for the whole Orbiculariae group. In nature, holes made by insects in ecribellate orbs produce pseudogumfoot lines (similar to out experimentally modified webs), and thus reeling occurred naturally in ecribellates. Nevertheless, outside lab conditions, predatory reeling does not occur among cribellate orbweavers, so that this behavior could not have been selected for in the cribellate ancester of orbweavers. Cribellate spiders are flexible enough as to present novel and adaptive predatory responses (reeling) even when exposed for the first time to conditions outside their usual environment. Thus, the evolution of reeling suggests and alternative mechanism for the production of evolutionary novelties; that is, the exploration of unusual ecological conditions and of the regular effects these abnormal conditions have on phenotype expression.
Resumo:
XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.
Resumo:
We present the first record and description of the gregarious behavior of the Neotropical harvestmen Serracutisoma proximum (Mello-Leitao 1922) and Serracutisoma spelaeum (Mello-Leitao 1933) (Opiliones: Gonyleptidae: Goniosomatinae) (DaSilva & Gnaspini 2010). We followed and described the pattern of these aggregations over a period of 17 months in a cave in southeastern Brazil. Individuals of the two species aggregated with both conspecifics and heterospecifics during the non-reproductive season (i.e., from October to March, the cool and dry season). Aggregations contained up to 81 individuals, usually with a female-biased adult sex ratio. Multispecific aggregations were usually composed mainly of representatives of one of the two species, suggesting that although these species also aggregate with heterospecifics, there is a preference for aggregating with conspecifics. This study provides novel information on the social behavior of harvestmen, specifically regarding the composition of multispecific aggregations.
Resumo:
The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information retrieval, and represents a key step for developing the so-called Semantic Web. Humans disambiguate words in a straightforward fashion, but this does not apply to computers. In this paper we address the problem of Word Sense Disambiguation (WSD) by treating texts as complex networks, and show that word senses can be distinguished upon characterizing the local structure around ambiguous words. Our goal was not to obtain the best possible disambiguation system, but we nevertheless found that in half of the cases our approach outperforms traditional shallow methods. We show that the hierarchical connectivity and clustering of words are usually the most relevant features for WSD. The results reported here shed light on the relationship between semantic and structural parameters of complex networks. They also indicate that when combined with traditional techniques the complex network approach may be useful to enhance the discrimination of senses in large texts. Copyright (C) EPLA, 2012
Resumo:
Este trabalho relata a experiência e os procedimentos adotados em um processo de análise e identificação dos títulos de periódicos recebidos pela Biblioteca do Instituto de Medicina Tropical de São Paulo da Universidade de São Paulo, desde sua criação. Para a coleta de dados foram utilizadas as informações dos registros bibliográficos no Módulo de Catalogação no Banco de Dados Bibliográficos – DEDALUS Aleph 500 Versão 18.1 da Universidade de São Paulo, seguindo alguns critérios pré-estabelecidos. Conclui-se que, apesar dos problemas detectados serem pouco relevantes em relação ao acervo analisado, deve-se manter um estudo comparativo entre a necessidade do usuário e a coleção disponível na Biblioteca, para que os periódicos atendam às necessidades de informação de seus usuários.
Resumo:
In this paper, we propose an extension of the invariance principle for nonlinear switched systems under dwell-time switched solutions. This extension allows the derivative of an auxiliary function V, also called a Lyapunov-like function, along the solutions of the switched system to be positive on some sets. The results of this paper are useful to estimate attractors of nonlinear switched systems and corresponding basins of attraction. Uniform estimates of attractors and basin of attractions with respect to time-invariant uncertain parameters are also obtained. Results for a common Lyapunov-like function and multiple Lyapunov-like functions are given. Illustrative examples show the potential of the theoretical results in providing information on the asymptotic behavior of nonlinear dynamical switched systems. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
O artigo apresenta uma análise da operacionalidade das Folksonomias e a possibilidade de aplicação dessa ferramenta nos sistemas de organização da informação da área de Ciência da Informação. Para tanto foi realizada uma análise de coerência de tags e dos recursos disponíveis para etiquetagem em dois websites, a Last.fm e o CiteULike. Por meio dessa análise constatou-se que em ambos os websites ocorreram incoerências e discrepâncias nas tags utilizadas. Todavia, o sistema da Last.fm demonstrou-se mais funcional que o do CiteULike obtendo um desempenho melhor. Por fim, sugere-se a junção das Folksonomias às Ontologias, que permitiriam a criação de sistemas automatizados de organização de conteúdos informacionais alimentados pelos próprios usuários