924 resultados para Information Retrieval, Document Databases, Digital Libraries
Resumo:
Pós-graduação em Televisão Digital: Informação e Conhecimento - FAAC
Resumo:
Este artigo trata da pesquisa bibliográfica no contexto de elaboração, execução e escrita tendo como recomendação a Internet, como fonte de pesquisa, com a finalidade de orientar o pesquisador na utilização de diversas ferramentas de pesquisa e bibliotecas digitais que servirão na recuperação de informações relevantes. Foram abordadas as seguintes etapas: escolha do assunto e delimitação do tema, a busca da terminologia da área, a recuperação exaustiva dos trabalhos, a localização dos trabalhos, a obtenção dos documentos, a leitura, a seleção mais o fichamento dos documentos e, finalmente, a redação do trabalho.
Resumo:
Acessível ao público desde junho de 2009, a Biblioteca Brasiliana Digital, da Universidade de São Paulo tem por objetivo facultar para a pesquisa, a maior Brasiliana custodiada por uma universidade. Sua intenção é disponibilizar virtualmente parte do acervo da Universidade oferecendo-se como um instrumento útil e funcional para a pesquisa e o estudo dos temas e cultura brasileiros, além de oferecer um modelo tecnológico de gestão que possa ser difundido a outras coleções, acervos e instituições. Este trabalho apresenta os resultado da implantação de um esquema de metadados baseado no formato Dublin Core, para a descrição de obras raras e especiais na web. Especificamente, apresenta os procedimentos e processos de descrição de conteúdos das diversas tipologias documentais (livros, periódicos, gravuras etc.) e formatos digitais (pdf, jpeg entre outros). Palavras-Chave: Bibliotecas digitais; Metadados; Dublin Core.
Resumo:
This paper describes the integration of information between Digital Library of Historical Cartography and Bibliographical Database (DEDALUS), both of the University of São Paulo (USP), to guarantee open, public access by Internet to the maps in the collection and make them available to users everywhere. This digital library was designed by Historical Cartography Studies Laboratory team (LECH/USP), and provides maps images on the Web, of high resolution, as well as such information on these maps as technical-scientific data (projection, scale, coordinates), printing techniques and material support that have made their circulation and cultural consumption possible. The Digital Library of Historical Cartography is accessible not only to the historical cartography researchers, but also to students and the general public. Beyond being a source of information about maps, the Digital Library of Historical Cartography seeks to be interactive, exchanging information and seeking dialogue with different branches of knowledge
Resumo:
The need for a convergence between semi-structured data management and Information Retrieval techniques is manifest to the scientific community. In order to fulfil this growing request, W3C has recently proposed XQuery Full Text, an IR-oriented extension of XQuery. However, the issue of query optimization requires the study of important properties like query equivalence and containment; to this aim, a formal representation of document and queries is needed. The goal of this thesis is to establish such formal background. We define a data model for XML documents and propose an algebra able to represent most of XQuery Full-Text expressions. We show how an XQuery Full-Text expression can be translated into an algebraic expression and how an algebraic expression can be optimized.
Resumo:
Except the article forming the main content most HTML documents on the WWW contain additional contents such as navigation menus, design elements or commercial banners. In the context of several applications it is necessary to draw the distinction between main and additional content automatically. Content extraction and template detection are the two approaches to solve this task. This thesis gives an extensive overview of existing algorithms from both areas. It contributes an objective way to measure and evaluate the performance of content extraction algorithms under different aspects. These evaluation measures allow to draw the first objective comparison of existing extraction solutions. The newly introduced content code blurring algorithm overcomes several drawbacks of previous approaches and proves to be the best content extraction algorithm at the moment. An analysis of methods to cluster web documents according to their underlying templates is the third major contribution of this thesis. In combination with a localised crawling process this clustering analysis can be used to automatically create sets of training documents for template detection algorithms. As the whole process can be automated it allows to perform template detection on a single document, thereby combining the advantages of single and multi document algorithms.
Resumo:
Es wurde ein für bodengebundene Feldmessungen geeignetes System zur digital-holographischen Abbildung luftgetragener Objekte entwickelt und konstruiert. Es ist, abhängig von der Tiefenposition, geeignet zur direkten Bestimmung der Größe luftgetragener Objekte oberhalb von ca. 20 µm, sowie ihrer Form bei Größen oberhalb von ca. 100µm bis in den Millimeterbereich. Die Entwicklung umfaßte zusätzlich einen Algorithmus zur automatisierten Verbesserung der Hologrammqualität und zur semiautomatischen Entfernungsbestimmung großer Objekte entwickelt. Eine Möglichkeit zur intrinsischen Effizienzsteigerung der Bestimmung der Tiefenposition durch die Berechnung winkelgemittelter Profile wurde vorgestellt. Es wurde weiterhin ein Verfahren entwickelt, das mithilfe eines iterativen Ansatzes für isolierte Objekte die Rückgewinnung der Phaseninformation und damit die Beseitigung des Zwillingsbildes erlaubt. Weiterhin wurden mithilfe von Simulationen die Auswirkungen verschiedener Beschränkungen der digitalen Holographie wie der endlichen Pixelgröße untersucht und diskutiert. Die geeignete Darstellung der dreidimensionalen Ortsinformation stellt in der digitalen Holographie ein besonderes Problem dar, da das dreidimensionale Lichtfeld nicht physikalisch rekonstruiert wird. Es wurde ein Verfahren entwickelt und implementiert, das durch Konstruktion einer stereoskopischen Repräsentation des numerisch rekonstruierten Meßvolumens eine quasi-dreidimensionale, vergrößerte Betrachtung erlaubt. Es wurden ausgewählte, während Feldversuchen auf dem Jungfraujoch aufgenommene digitale Hologramme rekonstruiert. Dabei ergab sich teilweise ein sehr hoher Anteil an irregulären Kristallformen, insbesondere infolge massiver Bereifung. Es wurden auch in Zeiträumen mit formal eisuntersättigten Bedingungen Objekte bis hinunter in den Bereich ≤20µm beobachtet. Weiterhin konnte in Anwendung der hier entwickelten Theorie des ”Phasenrandeffektes“ ein Objekt von nur ca. 40µm Größe als Eisplättchen identifiziert werden. Größter Nachteil digitaler Holographie gegenüber herkömmlichen photographisch abbildenden Verfahren ist die Notwendigkeit der aufwendigen numerischen Rekonstruktion. Es ergibt sich ein hoher rechnerischer Aufwand zum Erreichen eines einer Photographie vergleichbaren Ergebnisses. Andererseits weist die digitale Holographie Alleinstellungsmerkmale auf. Der Zugang zur dreidimensionalen Ortsinformation kann der lokalen Untersuchung der relativen Objektabstände dienen. Allerdings zeigte sich, dass die Gegebenheiten der digitalen Holographie die Beobachtung hinreichend großer Mengen von Objekten auf der Grundlage einzelner Hologramm gegenwärtig erschweren. Es wurde demonstriert, dass vollständige Objektgrenzen auch dann rekonstruiert werden konnten, wenn ein Objekt sich teilweise oder ganz außerhalb des geometrischen Meßvolumens befand. Weiterhin wurde die zunächst in Simulationen demonstrierte Sub-Bildelementrekonstruktion auf reale Hologramme angewandt. Dabei konnte gezeigt werden, dass z.T. quasi-punktförmige Objekte mit Sub-Pixelgenauigkeit lokalisiert, aber auch bei ausgedehnten Objekten zusätzliche Informationen gewonnen werden konnten. Schließlich wurden auf rekonstruierten Eiskristallen Interferenzmuster beobachtet und teilweise zeitlich verfolgt. Gegenwärtig erscheinen sowohl kristallinterne Reflexion als auch die Existenz einer (quasi-)flüssigen Schicht als Erklärung möglich, wobei teilweise in Richtung der letztgenannten Möglichkeit argumentiert werden konnte. Als Ergebnis der Arbeit steht jetzt ein System zur Verfügung, das ein neues Meßinstrument und umfangreiche Algorithmen umfaßt. S. M. F. Raupach, H.-J. Vössing, J. Curtius und S. Borrmann: Digital crossed-beam holography for in-situ imaging of atmospheric particles, J. Opt. A: Pure Appl. Opt. 8, 796-806 (2006) S. M. F. Raupach: A cascaded adaptive mask algorithm for twin image removal and its application to digital holograms of ice crystals, Appl. Opt. 48, 287-301 (2009) S. M. F. Raupach: Stereoscopic 3D visualization of particle fields reconstructed from digital inline holograms, (zur Veröffentlichung angenommen, Optik - Int. J. Light El. Optics, 2009)
Resumo:
The our reality is characterized by a constant progress and, to follow that, people need to stay up to date on the events. In a world with a lot of existing news, search for the ideal ones may be difficult, because the obstacles that make it arduous will be expanded more and more over time, due to the enrichment of data. In response, a great help is given by Information Retrieval, an interdisciplinary branch of computer science that deals with the management and the retrieval of the information. An IR system is developed to search for contents, contained in a reference dataset, considered relevant with respect to the need expressed by an interrogative query. To satisfy these ambitions, we must consider that most of the developed IR systems rely solely on textual similarity to identify relevant information, defining them as such when they include one or more keywords expressed by the query. The idea studied here is that this is not always sufficient, especially when it's necessary to manage large databases, as is the web. The existing solutions may generate low quality responses not allowing, to the users, a valid navigation through them. The intuition, to overcome these limitations, has been to define a new concept of relevance, to differently rank the results. So, the light was given to Temporal PageRank, a new proposal for the Web Information Retrieval that relies on a combination of several factors to increase the quality of research on the web. Temporal PageRank incorporates the advantages of a ranking algorithm, to prefer the information reported by web pages considered important by the context itself in which they reside, and the potential of techniques belonging to the world of the Temporal Information Retrieval, exploiting the temporal aspects of data, describing their chronological contexts. In this thesis, the new proposal is discussed, comparing its results with those achieved by the best known solutions, analyzing its strengths and its weaknesses.
Resumo:
Web-scale knowledge retrieval can be enabled by distributed information retrieval, clustering Web clients to a large-scale computing infrastructure for knowledge discovery from Web documents. Based on this infrastructure, we propose to apply semiotic (i.e., sub-syntactical) and inductive (i.e., probabilistic) methods for inferring concept associations in human knowledge. These associations can be combined to form a fuzzy (i.e.,gradual) semantic net representing a map of the knowledge in the Web. Thus, we propose to provide interactive visualizations of these cognitive concept maps to end users, who can browse and search the Web in a human-oriented, visual, and associative interface.
Resumo:
OBJECTIVE: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. DESIGN AND MEASUREMENTS A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. RESULTS Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. CONCLUSION Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.
Resumo:
Eukaryotic mRNAs with premature translation-termination codons (PTCs) are recognized and eliminated by nonsense-mediated mRNA decay (NMD). NMD substrates can be degraded by different routes that all require phosphorylated UPF1 (P-UPF1) as a starting point. The endonuclease SMG6, which cleaves mRNA near the PTC, is one of the three known NMD factors thought to be recruited to nonsense mRNAs via an interaction with P-UPF1, leading to eventual mRNA degradation. By artificial tethering of SMG6 and mutants thereof to a reporter mRNA combined with knockdowns of various NMD factors, we demonstrate that besides its endonucleolytic activity, SMG6 also requires UPF1 and SMG1 to reduce reporter mRNA levels. Using in vivo and in vitro approaches, we further document that SMG6 and the unique stalk region of the UPF1 helicase domain, along with a contribution from the SQ domain, form a novel interaction and we also show that this region of the UPF1 helicase domain is critical for SMG6 function and NMD. Our results show that this interaction is required for NMD and for the capability of tethered SMG6 to degrade its bound RNA, suggesting that it contributes to the intricate regulation of UPF1 and SMG6 enzymatic activities.
Resumo:
The metabolic network of a cell represents the catabolic and anabolic reactions that interconvert small molecules (metabolites) through the activity of enzymes, transporters and non-catalyzed chemical reactions. Our understanding of individual metabolic networks is increasing as we learn more about the enzymes that are active in particular cells under particular conditions and as technologies advance to allow detailed measurements of the cellular metabolome. Metabolic network databases are of increasing importance in allowing us to contextualise data sets emerging from transcriptomic, proteomic and metabolomic experiments. Here we present a dynamic database, TrypanoCyc (http://www.metexplore.fr/trypanocyc/), which describes the generic and condition-specific metabolic network of Trypanosoma brucei, a parasitic protozoan responsible for human and animal African trypanosomiasis. In addition to enabling navigation through the BioCyc-based TrypanoCyc interface, we have also implemented a network-based representation of the information through MetExplore, yielding a novel environment in which to visualise the metabolism of this important parasite.
Resumo:
Early Employee Assistance Programs (EAPs) had their origin in humanitarian motives, and there was little concern for their cost/benefit ratios; however, as some programs began accumulating data and analyzing it over time, even with single variables such as absenteeism, it became apparent that the humanitarian reasons for a program could be reinforced by cost savings particularly when the existence of the program was subject to justification.^ Today there is general agreement that cost/benefit analyses of EAPs are desirable, but the specific models for such analyses, particularly those making use of sophisticated but simple computer based data management systems, are few.^ The purpose of this research and development project was to develop a method, a design, and a prototype for gathering managing and presenting information about EAPS. This scheme provides information retrieval and analyses relevant to such aspects of EAP operations as: (1) EAP personnel activities, (2) Supervisory training effectiveness, (3) Client population demographics, (4) Assessment and Referral Effectiveness, (5) Treatment network efficacy, (6) Economic worth of the EAP.^ This scheme has been implemented and made operational at The University of Texas Employee Assistance Programs for more than three years.^ Application of the scheme in the various programs has defined certain variables which remained necessary in all programs. Depending on the degree of aggressiveness for data acquisition maintained by program personnel, other program specific variables are also defined. ^
Resumo:
Se aborda la construcción de repositorios institucionales open source con Software Greenstone. Se realiza un recorrido teórico y otro modélico desarrollando en él una aplicación práctica. El primer recorrido, que constituye el marco teórico, comprende una descripción, de: la filosofía open access (acceso abierto) y open source (código abierto) para la creación de repositorios institucionales. También abarca en líneas generales las temáticas relacionadas al protocolo OAI, el marco legal en lo que hace a la propiedad intelectual, las licencias y una aproximación a los metadatos. En el mismo recorrido se abordan aspectos teóricos de los repositorios institucionales: acepciones, beneficios, tipos, componentes intervinientes, herramientas open source para la creación de repositorios, descripción de las herramientas y finalmente, la descripción ampliada del Software Greenstone; elegido para el desarrollo modélico del repositorio institucional colocado en un demostrativo digital. El segundo recorrido, correspondiente al desarrollo modélico, incluye por un lado el modelo en sí del repositorio con el Software Greenstone; detallándose aquí uno a uno los componentes que lo conforman. Es el insumo teórico-práctico para el diseño -paso a paso- del repositorio institucional. Por otro lado, se incluye el resultado de la modelización, es decir el repositorio creado, el cual es exportado en entorno web a un soporte digital para su visibilización. El diseño del repositorio, paso a paso, constituye el núcleo sustantivo de aportes de este trabajo de tesina
Resumo:
Este trabajo descriptivo exploratorio se propone analizar la arquitectura de información (AI) de sitios Web de bibliotecas de la Universidad Nacional de La Plata (UNLP), Argentina. Se analizaron 17 bibliotecas y se aplicó una grilla para recabar 10 aspectos relevantes. Los resultados fueron: 1. Ubicación del sitio Web de la biblioteca: 9 sitios incluidos en la página principal de la facultad. 2. Etiquetado de contenidos: terminología simple, sin jergas; no hay homogeneidad entre las bibliotecas. 3. Capacidad de búsqueda: 62 por ciento positiva, 38 por ciento negativa. 4. Sistema de búsqueda: simple 43 por ciento, compleja 10 por ciento, con ayudas 10 por ciento, ninguno 38 por ciento. 5. Sistemas de navegación: globales 5 por ciento, jerárquicos 79 por ciento, locales 5 por ciento, ninguno 11 por ciento. 6. Herramientas de navegación: barras 16 por ciento, frames o marcos 30 por ciento, índices 2 por ciento, mapas de sitio 7 por ciento, menús horizontales 9 por ciento, menús verticales 35 por ciento. 7. Sindicación de contenidos RSS: 3 sitios. 8. Otros servicios: chat 7 por ciento, descarga de documentos 16 por ciento, envío de formularios 14 por ciento, instructivos 21 por ciento, links a otras páginas 23 por ciento, tutoriales 5 por ciento, otros 14 por ciento. 9. Accesibilidad Web: 1 sitio. 10. Otras observaciones: ninguna. Se concluye que el desarrollo de los sitios es dispar y se recomienda considerar pautas de AI como parte de la cooperación en la red de bibliotecas de la UNLP