13 resultados para Semantic metrics
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information retrieval, and represents a key step for developing the so-called Semantic Web. Humans disambiguate words in a straightforward fashion, but this does not apply to computers. In this paper we address the problem of Word Sense Disambiguation (WSD) by treating texts as complex networks, and show that word senses can be distinguished upon characterizing the local structure around ambiguous words. Our goal was not to obtain the best possible disambiguation system, but we nevertheless found that in half of the cases our approach outperforms traditional shallow methods. We show that the hierarchical connectivity and clustering of words are usually the most relevant features for WSD. The results reported here shed light on the relationship between semantic and structural parameters of complex networks. They also indicate that when combined with traditional techniques the complex network approach may be useful to enhance the discrimination of senses in large texts. Copyright (C) EPLA, 2012
Resumo:
Even though the digital processing of documents is increasingly widespread in industry, printed documents are still largely in use. In order to process electronically the contents of printed documents, information must be extracted from digital images of documents. When dealing with complex documents, in which the contents of different regions and fields can be highly heterogeneous with respect to layout, printing quality and the utilization of fonts and typing standards, the reconstruction of the contents of documents from digital images can be a difficult problem. In the present article we present an efficient solution for this problem, in which the semantic contents of fields in a complex document are extracted from a digital image.
Resumo:
Content-based image retrieval is still a challenging issue due to the inherent complexity of images and choice of the most discriminant descriptors. Recent developments in the field have introduced multidimensional projections to burst accuracy in the retrieval process, but many issues such as introduction of pattern recognition tasks and deeper user intervention to assist the process of choosing the most discriminant features still remain unaddressed. In this paper, we present a novel framework to CBIR that combines pattern recognition tasks, class-specific metrics, and multidimensional projection to devise an effective and interactive image retrieval system. User interaction plays an essential role in the computation of the final multidimensional projection from which image retrieval will be attained. Results have shown that the proposed approach outperforms existing methods, turning out to be a very attractive alternative for managing image data sets.
Resumo:
The Neotropical evaniid genus Evaniscus Szepligeti currently includes six species. Two new species are described, Evaniscus lansdownei Mullins, sp. n. from Colombia and Brazil and E. rafaeli Kawada, sp. n. from Brazil. Evaniscus sulcigenis Roman, syn. n., is synonymized under E. rufithorax Enderlein. An identification key to species of Evaniscus is provided. Thirty-five parsimony informative morphological characters are analyzed for six ingroup and four outgroup taxa. A topology resulting in a monophyletic Evaniscus is presented with E. tibialis and E. rafaeli as sister to the remaining Evaniscus species. The Hymenoptera Anatomy Ontology and other relevant biomedical ontologies are employed to create semantic phenotype statements in Entity-Quality (EQ) format for species descriptions. This approach is an early effort to formalize species descriptions and to make descriptive data available to other domains.
Resumo:
The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
There is a wide range of telecommunications services that transmit voice, video and data through complex transmission networks and in some cases, the service has not an acceptable quality level for the end user. In this sense the study of methods for assessing video quality and voice have a very important role. This paper presents a classification scheme, based on different criteria, of the methods and metrics that are being studied in recent years. This paper presents how the video quality is affected by degradation in the transmission channel in two kinds of services: Digital TV (ISDB-TB) due the fading in the air interface and video streaming service on an IP network due packet loss. For Digital TV tests was set up a scenario where the digital TV transmitter is connected to an RF channel emulator, where are inserted different fading models and at the end, the videos are saved in a mobile device. The tests of streaming video were performed in an isolated scenario of IP network, which are scheduled several network conditions, resulting in different qualities of video reception. The video quality assessment is performed using objective assessment methods: PSNR, SSIM and VQM. The results show how the losses in the transmission channel affects the quality of end-user experience on both services studied.
Resumo:
Background: Early progressive nonfluent aphasia (PNFA) may be difficult to differentiate from semantic dementia (SD) in a nonspecialist setting. There are descriptions of the clinical and neuropsychological profiles of patients with PNFA and SD but few systematic comparisons. Method: We compared the performance of groups with SD (n = 27) and PNFA (n = 16) with comparable ages, education, disease duration, and severity of dementia as measured by the Clinical Dementia Rating Scale on a comprehensive neuropsychological battery. Principal components analysis and intergroup comparisons were used. Results: A 5-factor solution accounted for 78.4% of the total variance with good separation of neuropsychological variables. As expected, both groups were anomic with preserved visuospatial function and mental speed. Patients with SD had lower scores on comprehension-based semantic tests and better performance on verbal working memory and phonological processing tasks. The opposite pattern was found in the PNFA group. Conclusions: Neuropsychological tests that examine verbal and nonverbal semantic associations, verbal working memory, and phonological processing are the most helpful for distinguishing between PNFA and SD.
Resumo:
With the increase in research on the components of Body Image, validated instruments are needed to evaluate its dimensions. The Body Change Inventory (BCI) assesses strategies used to alter body size among adolescents. The scope of this study was to describe the translation and evaluation for semantic equivalence of the BCI in the Portuguese language. The process involved the steps of (1) translation of the questionnaire to the Portuguese language; (2) back-translation to English; (3) evaluation of semantic equivalence; and (4) assessment of comprehension by professional experts and the target population. The six subscales of the instrument were translated into the Portuguese language. Language adaptations were made to render the instrument suitable for the Brazilian reality. The questions were interpreted as easily understandable by both experts and young people. The Body Change Inventory has been translated and adapted into Portuguese. Evaluation of the operational, measurement and functional equivalence are still needed.
Resumo:
Estimators of home-range size require a large number of observations for estimation and sparse data typical of tropical studies often prohibit the use of such estimators. An alternative may be use of distance metrics as indexes of home range. However, tests of correlation between distance metrics and home-range estimators only exist for North American rodents. We evaluated the suitability of 3 distance metrics (mean distance between successive captures [SD], observed range length [ORL], and mean distance between all capture points [AD]) as indexes for home range for 2 Brazilian Atlantic forest rodents, Akodon montensis (montane grass mouse) and Delomys sublineatus (pallid Atlantic forest rat). Further, we investigated the robustness of distance metrics to low numbers of individuals and captures per individual. We observed a strong correlation between distance metrics and the home-range estimator. None of the metrics was influenced by the number of individuals. ORL presented a strong dependence on the number of captures per individual. Accuracy of SD and AD was not dependent on number of captures per individual, but precision of both metrics was low with numbers of captures below 10. We recommend the use of SD and AD instead of ORL and use of caution in interpretation of results based on trapping data with low captures per individual.
Resumo:
There is evidence that the explicit lexical-semantic processing deficits which characterize aphasia may be observed in the absence of implicit semantic impairment. The aim of this article was to critically review the international literature on lexical-semantic processing in aphasia, as tested through the semantic priming paradigm. Specifically, this review focused on aphasia and lexical-semantic processing, the methodological strengths and weaknesses of the semantic paradigms used, and recent evidence from neuroimaging studies on lexical-semantic processing. Furthermore, evidence on dissociations between implicit and explicit lexical-semantic processing reported in the literature will be discussed and interpreted by referring to functional neuroimaging evidence from healthy populations. There is evidence that semantic priming effects can be found both in fluent and in non-fluent aphasias, and that these effects are related to an extensive network which includes the temporal lobe, the pre-frontal cortex, the left frontal gyrus, the left temporal gyrus and the cingulated cortex.
Resumo:
Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.
Resumo:
With the increasing production of information from e-government initiatives, there is also the need to transform a large volume of unstructured data into useful information for society. All this information should be easily accessible and made available in a meaningful and effective way in order to achieve semantic interoperability in electronic government services, which is a challenge to be pursued by governments round the world. Our aim is to discuss the context of e-Government Big Data and to present a framework to promote semantic interoperability through automatic generation of ontologies from unstructured information found in the Internet. We propose the use of fuzzy mechanisms to deal with natural language terms and present some related works found in this area. The results achieved in this study are based on the architectural definition and major components and requirements in order to compose the proposed framework. With this, it is possible to take advantage of the large volume of information generated from e-Government initiatives and use it to benefit society.
Resumo:
While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.