427 resultados para Indexing
Resumo:
Due to both the widespread and multipurpose use of document images and the current availability of a high number of document images repositories, robust information retrieval mechanisms and systems have been increasingly demanded. This paper presents an approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). We developed the LinkDI (Linking of Document Images) service, which extracts and indexes document images content, computes its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents as well as among their respective document images. Considering those same document images, we ran further experiments in order to compare the performance of LinkDI when it exploits or not the LSI technique. Experimental results showed that LSI can mitigate the effects of usual OCR misrecognition, which reinforces the feasibility of LinkDI relating OCR output with high degradation.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Author's pre-print
Resumo:
Searching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries.
Resumo:
Internet on elektronisen postin perusrakenne ja ollut tärkeä tiedonlähde akateemisille käyttäjille jo pitkään. Siitä on tullut merkittävä tietolähde kaupallisille yrityksille niiden pyrkiessä pitämään yhteyttä asiakkaisiinsa ja seuraamaan kilpailijoitansa. WWW:n kasvu sekä määrällisesti että sen moninaisuus on luonut kasvavan kysynnän kehittyneille tiedonhallintapalveluille. Tällaisia palveluja ovet ryhmittely ja luokittelu, tiedon löytäminen ja suodattaminen sekä lähteiden käytön personointi ja seuranta. Vaikka WWW:stä saatavan tieteellisen ja kaupallisesti arvokkaan tiedon määrä on huomattavasti kasvanut viime vuosina sen etsiminen ja löytyminen on edelleen tavanomaisen Internet hakukoneen varassa. Tietojen hakuun kohdistuvien kasvavien ja muuttuvien tarpeiden tyydyttämisestä on tullut monimutkainen tehtävä Internet hakukoneille. Luokittelu ja indeksointi ovat merkittävä osa luotettavan ja täsmällisen tiedon etsimisessä ja löytämisessä. Tämä diplomityö esittelee luokittelussa ja indeksoinnissa käytettävät yleisimmät menetelmät ja niitä käyttäviä sovelluksia ja projekteja, joissa tiedon hakuun liittyvät ongelmat on pyritty ratkaisemaan.
Resumo:
Search engine optimization & marketing is a set of processes widely used on websites to improve search engine rankings which generate quality web traffic and increase ROI. Content is the most important part of any website. CMS web development is now become very essential for most of organizations and online businesses to develop their online system and websites. Every online business using a CMS wants to get users (customers) to make profit and ROI. This thesis comprises a brief study of existing SEO methods, tools and techniques and how they can be implemented to optimize a content base website. In results, the study provides recommendations about how to use SEO methods; tools and techniques to optimize CMS based websites on major search engines. This study compares popular CMS systems like Drupal, WordPress and Joomla SEO features and how implementing SEO can be improved on these CMS systems. Having knowledge of search engine indexing and search engine working is essential for a successful SEO campaign. This work is a complete guideline for web developers or SEO experts who want to optimize a CMS based website on all major search engines.
Resumo:
This paper presents a DHT-based grid resource indexing and discovery (DGRID) approach. With DGRID, resource-information data is stored on its own administrative domain and each domain, represented by an index server, is virtualized to several nodes (virtual servers) subjected to the number of resource types it has. Then, all nodes are arranged as a structured overlay network or distributed hash table (DHT). Comparing to existing grid resource indexing and discovery schemes, the benefits of DGRID include improving the security of domains, increasing the availability of data, and eliminating stale data.
Resumo:
Chess endgame tables should provide efficiently the value and depth of any required position during play. The indexing of an endgame’s positions is crucial to meeting this objective. This paper updates Heinz’ previous review of approaches to indexing and describes the latest approach by the first and third authors. Heinz’ and Nalimov’s endgame tables (EGTs) encompass the en passant rule and have the most compact index schemes to date. Nalimov’s EGTs, to the Distance-to-Mate (DTM) metric, require only 30.6 × 10^9 elements in total for all the 3-to-5-man endgames and are individually more compact than previous tables. His new index scheme has proved itself while generating the tables and in the 1999 World Computer Chess Championship where many of the top programs used the new suite of EGTs.
Resumo:
Chess endgame tables should provide efficiently the value and depth of any required position during play. The indexing of an endgame’s positions is crucial to meeting this objective. This paper updates Heinz’ previous review of approaches to indexing and describes the latest approach by the first and third authors. Heinz’ and Nalimov’s endgame tables (EGTs) encompass the en passant rule and have the most compact index schemes to date. Nalimov’s EGTs, to the Distance-to-Mate (DTM) metric, require only 30.6 × 109 elements in total for all the 3-to-5-man endgames and are individually more compact than previous tables. His new index scheme has proved itself while generating the tables and in the 1999 World Computer Chess Championship where many of the top programs used the new suite of EGTs.
Resumo:
There are still major challenges in the area of automatic indexing and retrieval of digital data. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. Research has been ongoing for a few years in the field of ontological engineering with the aim of using ontologies to add knowledge to information. In this paper we describe the architecture of a system designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval.