13 resultados para XML, Information, Retrieval, Query, Language

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article discusses issues related to the organization and reception of information in the context of services and public information systems driven by technology. It stems from the assumption that in a ""technologized"" society, the distance between users and information is almost always of cognitive and socio-cultural nature, a product of our effort to design communication. In this context, we favor the approach of the information sign, seeking to answer how a documentary message turns into information, i.e. a structure recognized as socially useful. Observing the structural, cognitive and communicative aspects of the documentary message, based on Documentary Linguistics, Terminology, as well as on Textual Linguistics, the policy of knowledge management and innovation of the Government of the State of Sao Paulo is analyzed, which authorizes the use of Web 2.0, also questioning to what extent this initiative represents innovation in the environment of libraries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern database applications are increasingly employing database management systems (DBMS) to store multimedia and other complex data. To adequately support the queries required to retrieve these kinds of data, the DBMS need to answer similarity queries. However, the standard structured query language (SQL) does not provide effective support for such queries. This paper proposes an extension to SQL that seamlessly integrates syntactical constructions to express similarity predicates to the existing SQL syntax and describes the implementation of a similarity retrieval engine that allows posing similarity queries using the language extension in a relational DBM. The engine allows the evaluation of every aspect of the proposed extension, including the data definition language and data manipulation language statements, and employs metric access methods to accelerate the queries. Copyright (c) 2008 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Successful classification, information retrieval and image analysis tools are intimately related with the quality of the features employed in the process. Pixel intensities, color, texture and shape are, generally, the basis from which most of the features are Computed and used in such fields. This papers presents a novel shape-based feature extraction approach where an image is decomposed into multiple contours, and further characterized by Fourier descriptors. Unlike traditional approaches we make use of topological knowledge to generate well-defined closed contours, which are efficient signatures for image retrieval. The method has been evaluated in the CBIR context and image analysis. The results have shown that the multi-contour decomposition, as opposed to a single shape information, introduced a significant improvement in the discrimination power. (c) 2008 Elsevier B.V. All rights reserved,

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatic summarization of texts is now crucial for several information retrieval tasks owing to the huge amount of information available in digital media, which has increased the demand for simple, language-independent extractive summarization strategies. In this paper, we employ concepts and metrics of complex networks to select sentences for an extractive summary. The graph or network representing one piece of text consists of nodes corresponding to sentences, while edges connect sentences that share common meaningful nouns. Because various metrics could be used, we developed a set of 14 summarizers, generically referred to as CN-Summ, employing network concepts such as node degree, length of shortest paths, d-rings and k-cores. An additional summarizer was created which selects the highest ranked sentences in the 14 systems, as in a voting system. When applied to a corpus of Brazilian Portuguese texts, some CN-Summ versions performed better than summarizers that do not employ deep linguistic knowledge, with results comparable to state-of-the-art summarizers based on expensive linguistic resources. The use of complex networks to represent texts appears therefore as suitable for automatic summarization, consistent with the belief that the metrics of such networks may capture important text features. (c) 2008 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to both the widespread and multipurpose use of document images and the current availability of a high number of document images repositories, robust information retrieval mechanisms and systems have been increasingly demanded. This paper presents an approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). We developed the LinkDI (Linking of Document Images) service, which extracts and indexes document images content, computes its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents as well as among their respective document images. Considering those same document images, we ran further experiments in order to compare the performance of LinkDI when it exploits or not the LSI technique. Experimental results showed that LSI can mitigate the effects of usual OCR misrecognition, which reinforces the feasibility of LinkDI relating OCR output with high degradation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The article presents and discusses issues such as informativeness, offering of directions and information retrieval, and also lists definitions of information and mediation. Based on the topics presented, the possible problems faced by information professionals are discussed while cultural mediators in the context of art museums.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The TCABR data analysis and acquisition system has been upgraded to support a joint research programme using remote participation technologies. The architecture of the new system uses Java language as programming environment. Since application parameters and hardware in a joint experiment are complex with a large variability of components, requirements and specification solutions need to be flexible and modular, independent from operating system and computer architecture. To describe and organize the information on all the components and the connections among them, systems are developed using the extensible Markup Language (XML) technology. The communication between clients and servers uses remote procedure call (RPC) based on the XML (RPC-XML technology). The integration among Java language, XML and RPC-XML technologies allows to develop easily a standard data and communication access layer between users and laboratories using common software libraries and Web application. The libraries allow data retrieval using the same methods for all user laboratories in the joint collaboration, and the Web application allows a simple graphical user interface (GUI) access. The TCABR tokamak team in collaboration with the IPFN (Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, Universidade Tecnica de Lisboa) is implementing this remote participation technologies. The first version was tested at the Joint Experiment on TCABR (TCABRJE), a Host Laboratory Experiment, organized in cooperation with the IAEA (International Atomic Energy Agency) in the framework of the IAEA Coordinated Research Project (CRP) on ""Joint Research Using Small Tokamaks"". (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In Natural Language Processing (NLP) symbolic systems, several linguistic phenomena, for instance, the thematic role relationships between sentence constituents, such as AGENT, PATIENT, and LOCATION, can be accounted for by the employment of a rule-based grammar. Another approach to NLP concerns the use of the connectionist model, which has the benefits of learning, generalization and fault tolerance, among others. A third option merges the two previous approaches into a hybrid one: a symbolic thematic theory is used to supply the connectionist network with initial knowledge. Inspired on neuroscience, it is proposed a symbolic-connectionist hybrid system called BIO theta PRED (BIOlogically plausible thematic (theta) symbolic-connectionist PREDictor), designed to reveal the thematic grid assigned to a sentence. Its connectionist architecture comprises, as input, a featural representation of the words (based on the verb/noun WordNet classification and on the classical semantic microfeature representation), and, as output, the thematic grid assigned to the sentence. BIO theta PRED is designed to ""predict"" thematic (semantic) roles assigned to words in a sentence context, employing biologically inspired training algorithm and architecture, and adopting a psycholinguistic view of thematic theory.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: Internet users are increasingly using the worldwide web to search for information relating to their health. This situation makes it necessary to create specialized tools capable of supporting users in their searches. Objective: To apply and compare strategies that were developed to investigate the use of the Portuguese version of Medical Subject Headings (MeSH) for constructing an automated classifier for Brazilian Portuguese-language web-based content within or outside of the field of healthcare, focusing on the lay public. Methods: 3658 Brazilian web pages were used to train the classifier and 606 Brazilian web pages were used to validate it. The strategies proposed were constructed using content-based vector methods for text classification, such that Naive Bayes was used for the task of classifying vector patterns with characteristics obtained through the proposed strategies. Results: A strategy named InDeCS was developed specifically to adapt MeSH for the problem that was put forward. This approach achieved better accuracy for this pattern classification task (0.94 sensitivity, specificity and area under the ROC curve). Conclusions: Because of the significant results achieved by InDeCS, this tool has been successfully applied to the Brazilian healthcare search portal known as Busca Saude. Furthermore, it could be shown that MeSH presents important results when used for the task of classifying web-based content focusing on the lay public. It was also possible to show from this study that MeSH was able to map out mutable non-deterministic characteristics of the web. (c) 2010 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Assuming as a starting point the acknowledge that the principles and methods used to build and manage the documentary systems are disperse and lack systematization, this study hypothesizes that the notion of structure, when assuming mutual relationships among its elements, promotes more organical systems and assures better quality and consistency in the retrieval of information concerning users` matters. Accordingly, it aims to explore the fundamentals about the records of information and documentary systems, starting from the notion of structure. In order to achieve that, it presents basic concepts and relative matters to documentary systems and information records. Next to this, it lists the theoretical subsides over the notion of structure, studied by Benveniste, Ferrater Mora, Levi-Strauss, Lopes, Penalver Simo, Saussure, apart from Ducrot, Favero and Koch. Appropriations that have already been done by Paul Otlet, Garcia Gutierrez and Moreiro Gonzalez. In Documentation come as a further topic. It concludes that the adopted notion of structure to make explicit a hypothesis of real systematization achieves more organical systems, as well as it grants pedagogical reference to the documentary tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose is to present a scientific research that led to the modeling of an information system which aimed at the maintenance of traceability data in the Brazilian wine industry, according to the principles of a service-oriented architecture (SOA). Since 2005, traceability data maintenance is an obligation for all producers that intend to export to any European Union country. Also, final customers, including the Brazilian ones, have been asking for information about food products. A solution that collectively contemplated the industry was sought in order to permit that producer consortiums of associations could share the costs and benefits of such a solution. Following an extensive bibliographic review, a series of interviews conducted with Brazilian researchers and wine producers in Bento Goncalves - RS, Brazil, elucidated many aspects associated with the wine production process. Information technology issues related to the theme were also researched. The software was modeled with the Unified Modeling Language (UML) and uses web services for data exchange. A model for the wine production process was also proposed. A functional prototype showed that the adopted model is able to fulfill the demands of wine producers. The good results obtained lead us to consider the use of this model in other domains.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a simple high-level programming language, endowed with resources that help encoding self-modifying programs. With this purpose, a conventional imperative language syntax (not explicitly stated in this paper) is incremented with special commands and statements forming an adaptive layer specially designed with focus on the dynamical changes to be applied to the code at run-time. The resulting language allows programmers to easily specify dynamic changes to their own program`s code. Such a language succeeds to allow programmers to effortless describe the dynamic logic of their adaptive applications. In this paper, we describe the most important aspects of the design and implementation of such a language. A small example is finally presented for illustration purposes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, we take advantage of association rule mining to support two types of medical systems: the Content-based Image Retrieval (CBIR) systems and the Computer-Aided Diagnosis (CAD) systems. For content-based retrieval, association rules are employed to reduce the dimensionality of the feature vectors that represent the images and to improve the precision of the similarity queries. We refer to the association rule-based method to improve CBIR systems proposed here as Feature selection through Association Rules (FAR). To improve CAD systems, we propose the Image Diagnosis Enhancement through Association rules (IDEA) method. Association rules are employed to suggest a second opinion to the radiologist or a preliminary diagnosis of a new image. A second opinion automatically obtained can either accelerate the process of diagnosing or to strengthen a hypothesis, increasing the probability of a prescribed treatment be successful. Two new algorithms are proposed to support the IDEA method: to pre-process low-level features and to propose a preliminary diagnosis based on association rules. We performed several experiments to validate the proposed methods. The results indicate that association rules can be successfully applied to improve CBIR and CAD systems, empowering the arsenal of techniques to support medical image analysis in medical systems. (C) 2009 Elsevier B.V. All rights reserved.