837 resultados para Search Engine Indexing
Resumo:
This thesis describes a project of terminology and localization focused on local and traditional cuisine from the province of Modena: the final products of this project are a specialized termbase and the localized version of the website of Trattoria Ermes, a small Modenese restaurant offering traditional dishes. It is a known fact the Internet has drastically altered the way companies and businesses communicate with their audience. Considering that food tourism is an invaluable sector of Italy’s economy and a great aid to safeguarding its culinary traditions, business owners can benefit from localizing their websites, allowing them to reach wider international audiences. The project is divided into two main sections: the first focuses on the terminological systematization of specialized terminology collected from Sandro Bellei’s cooking book and two web-derived monolingual corpora, while the second section offers insight into the analysis of the localization and optimization process of Trattoria Ermes website. In particular, the thesis approaches localization from the point of view of web marketing, with a theoretical and practical section dedicated to the Search Engine Optimization (SEO) processes employed by web marketing teams to ensure the visibility and popularity of the website
Resumo:
La ricerca di documenti rilevanti è un task fondamentale, può avvenire tramite ambienti chiusi, come le biblioteche digitali o tramite ambienti aperti, come il World Wide Web. Quello che analizzeremo in questo progetto di tesi riguarderà le interfacce per mostrare i risultati di ricerca su una collezione di documenti. L'obiettivo, tuttavia, non è l'analisi dei motori di ricerca, ma analizzare i diversi meccanismi che permettono di visualizzare i risultati. Vedremo, inoltre, le diverse visualizzazioni rilevanti nella ricerca di informazioni sul web, in particolare parleremo di visualizzazioni nello spazio, messe a confronto con la classica visualizzazione testuale. Analizzeremo anche la classificazione in una collezione di documenti, oltre che la personalizzazione, ovvero la configurazione della visualizzazione a vantaggio dell'utente. Una volta trovati i documenti rilevanti, analizzeremo i frammenti di testo come, gli snippet, i riassunti descrittivi e gli abstract, mettendo in luce il modo in cui essi aiutato l'utente a migliorare l'accesso a determinati tipi di risultati. Infine, andremo ad analizzare le visualizzazioni di frammenti rilevanti all'interno di un testo. In particolare presenteremo le tecniche di navigazione e la ricerca di determinate parole all'interno di documenti, vale a dire le panoramiche di documenti e del metodo preferito dall'utente per cercare una parola.
Resumo:
With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.
Resumo:
With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.
Resumo:
CODEX SEARCH es un motor de recuperación de información especializado en derecho de extranjería que está basado en herramientas y conocimiento lingüísticos. Un motor o Sistema de Recuperación de Información (SRI) es un software capaz de localizar información en grandes colecciones documentales (entorno no trivial) en formato electrónico. Mediante un estudio previo se ha detectado que la extranjería es un ámbito discursivo en el que resulta difícil expresar la necesidad de información en términos de una consulta formal, objeto de los sistemas de recuperación actuales. Por lo tanto, para desarrollar un SRI eficiente en el dominio indicado no basta con emplear un modelo tradicional de RI, es decir, comparar los términos de la pregunta con los de la respuesta, básicamente porque no expresan implicaciones y porque no tiene que haber necesariamente una relación 1 a 1. En este sentido, la solución lingüística propuesta se basa en incorporar el conocimiento del especialista mediante la integración en el sistema de una librería de casos. Los casos son ejemplos de procedimientos aplicados por expertos a la solución de problemas que han ocurrido en la realidad y que han terminado en éxito o fracaso. Los resultados obtenidos en esta primera fase son muy alentadores pero es necesario continuar la investigación en este campo para mejorar el rendimiento del prototipo al que se puede acceder desde &http://161.116.36.139/~codex/&.
Resumo:
Iterated Local Search has many of the desirable features of a metaheuristic: it is simple, easy to implement, robust, and highly effective. The essential idea of Iterated Local Search lies in focusing the search not on the full space of solutions but on a smaller subspace defined by the solutions that are locally optimal for a given optimization engine. The success of Iterated Local Search lies in the biased sampling of this set of local optima. How effective this approach turns out to be depends mainly on the choice of the local search, the perturbations, and the acceptance criterion. So far, in spite of its conceptual simplicity, it has lead to a number of state-of-the-art results without the use of too much problem-specific knowledge. But with further work so that the different modules are well adapted to the problem at hand, Iterated Local Search can often become a competitive or even state of the artalgorithm. The purpose of this review is both to give a detailed description of this metaheuristic and to show where it stands in terms of performance.
Resumo:
Internet on elektronisen postin perusrakenne ja ollut tärkeä tiedonlähde akateemisille käyttäjille jo pitkään. Siitä on tullut merkittävä tietolähde kaupallisille yrityksille niiden pyrkiessä pitämään yhteyttä asiakkaisiinsa ja seuraamaan kilpailijoitansa. WWW:n kasvu sekä määrällisesti että sen moninaisuus on luonut kasvavan kysynnän kehittyneille tiedonhallintapalveluille. Tällaisia palveluja ovet ryhmittely ja luokittelu, tiedon löytäminen ja suodattaminen sekä lähteiden käytön personointi ja seuranta. Vaikka WWW:stä saatavan tieteellisen ja kaupallisesti arvokkaan tiedon määrä on huomattavasti kasvanut viime vuosina sen etsiminen ja löytyminen on edelleen tavanomaisen Internet hakukoneen varassa. Tietojen hakuun kohdistuvien kasvavien ja muuttuvien tarpeiden tyydyttämisestä on tullut monimutkainen tehtävä Internet hakukoneille. Luokittelu ja indeksointi ovat merkittävä osa luotettavan ja täsmällisen tiedon etsimisessä ja löytämisessä. Tämä diplomityö esittelee luokittelussa ja indeksoinnissa käytettävät yleisimmät menetelmät ja niitä käyttäviä sovelluksia ja projekteja, joissa tiedon hakuun liittyvät ongelmat on pyritty ratkaisemaan.
Resumo:
Open educational resources (OER) promise increased access, participation, quality, and relevance, in addition to cost reduction. These seemingly fantastic promises are based on the supposition that educators and learners will discover existing resources, improve them, and share the results, resulting in a virtuous cycle of improvement and re-use. By anecdotal metrics, existing web scale search is not working for OER. This situation impairs the cycle underlying the promise of OER, endangering long term growth and sustainability. While the scope of the problem is vast, targeted improvements in areas of curation, indexing, and data exchange can improve the situation, and create opportunities for further scale. I explore the way the system is currently inadequate, discuss areas for targeted improvement, and describe a prototype system built to test these ideas. I conclude with suggestions for further exploration and development.
Resumo:
A large volume of visual content is inaccessible until effective and efficient indexing and retrieval of such data is achieved. In this paper, we introduce the DREAM system, which is a knowledge-assisted semantic-driven context-aware visual information retrieval system applied in the film post production domain. We mainly focus on the automatic labelling and topic map related aspects of the framework. The use of the context- related collateral knowledge, represented by a novel probabilistic based visual keyword co-occurrence matrix, had been proven effective via the experiments conducted during system evaluation. The automatically generated semantic labels were fed into the Topic Map Engine which can automatically construct ontological networks using Topic Maps technology, which dramatically enhances the indexing and retrieval performance of the system towards an even higher semantic level.
Resumo:
Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper. we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Studies on ethics in information organization have deeply contributed to the recognition of the social dimension of Information Science. The subject approach to information is linked to an ethical dimension because one of its major concerns is related to its reliability and usefulness in a specific discursive community or knowledge domain. In this direction, we propose, through an exploratory research design with qualitative and inductive characteristics, to identify the specific terminology that Brazilian indexing languages allow for terms relating to male homosexuality. We also analyzed the terms assigned to papers published in the Journal of Homosexuality, Sexualities and Journal of Gay & Lesbian Mental Health between the years 2005 to 2009. From this analysis of terms and the Brazilian indexing languages, we see (1) the Brazilian context, (2) imprecision in the terminology, (3) indications of prejudices disseminated by political correctness, (4) biased representation of the subject matter, (5) and the presence of figures of speech.
Resumo:
This is the second part of a study investigating a model-based transient calibration process for diesel engines. The first part addressed the data requirements and data processing required for empirical transient emission and torque models. The current work focuses on modelling and optimization. The unexpected result of this investigation is that when trained on transient data, simple regression models perform better than more powerful methods such as neural networks or localized regression. This result has been attributed to extrapolation over data that have estimated rather than measured transient air-handling parameters. The challenges of detecting and preventing extrapolation using statistical methods that work well with steady-state data have been explained. The concept of constraining the distribution of statistical leverage relative to the distribution of the starting solution to prevent extrapolation during the optimization process has been proposed and demonstrated. Separate from the issue of extrapolation is preventing the search from being quasi-static. Second-order linear dynamic constraint models have been proposed to prevent the search from returning solutions that are feasible if each point were run at steady state, but which are unrealistic in a transient sense. Dynamic constraint models translate commanded parameters to actually achieved parameters that then feed into the transient emission and torque models. Combined model inaccuracies have been used to adjust the optimized solutions. To frame the optimization problem within reasonable dimensionality, the coefficients of commanded surfaces that approximate engine tables are adjusted during search iterations, each of which involves simulating the entire transient cycle. The resulting strategy, different from the corresponding manual calibration strategy and resulting in lower emissions and efficiency, is intended to improve rather than replace the manual calibration process.
Resumo:
With rapid advances in video processing technologies and ever fast increments in network bandwidth, the popularity of video content publishing and sharing has made similarity search an indispensable operation to retrieve videos of user interests. The video similarity is usually measured by the percentage of similar frames shared by two video sequences, and each frame is typically represented as a high-dimensional feature vector. Unfortunately, high complexity of video content has posed the following major challenges for fast retrieval: (a) effective and compact video representations, (b) efficient similarity measurements, and (c) efficient indexing on the compact representations. In this paper, we propose a number of methods to achieve fast similarity search for very large video database. First, each video sequence is summarized into a small number of clusters, each of which contains similar frames and is represented by a novel compact model called Video Triplet (ViTri). ViTri models a cluster as a tightly bounded hypersphere described by its position, radius, and density. The ViTri similarity is measured by the volume of intersection between two hyperspheres multiplying the minimal density, i.e., the estimated number of similar frames shared by two clusters. The total number of similar frames is then estimated to derive the overall similarity between two video sequences. Hence the time complexity of video similarity measure can be reduced greatly. To further reduce the number of similarity computations on ViTris, we introduce a new one dimensional transformation technique which rotates and shifts the original axis system using PCA in such a way that the original inter-distance between two high-dimensional vectors can be maximally retained after mapping. An efficient B+-tree is then built on the transformed one dimensional values of ViTris' positions. Such a transformation enables B+-tree to achieve its optimal performance by quickly filtering a large portion of non-similar ViTris. Our extensive experiments on real large video datasets prove the effectiveness of our proposals that outperform existing methods significantly.
Resumo:
Background Qualitative research makes an important contribution to our understanding of health and healthcare. However, qualitative evidence can be difficult to search for and identify, and the effectiveness of different types of search strategies is unknown. Methods Three search strategies for qualitative research in the example area of support for breast-feeding were evaluated using six electronic bibliographic databases. The strategies were based on using thesaurus terms, free-text terms and broad-based terms. These strategies were combined with recognised search terms for support for breast-feeding previously used in a Cochrane review. For each strategy, we evaluated the recall (potentially relevant records found) and precision (actually relevant records found). Results A total yield of 7420 potentially relevant records was retrieved by the three strategies combined. Of these, 262 were judged relevant. Using one strategy alone would miss relevant records. The broad-based strategy had the highest recall and the thesaurus strategy the highest precision. Precision was generally poor: 96% of records initially identified as potentially relevant were deemed irrelevant. Searching for qualitative research involves trade-offs between recall and precision. Conclusions These findings confirm that strategies that attempt to maximise the number of potentially relevant records found are likely to result in a large number of false positives. The findings also suggest that a range of search terms is required to optimise searching for qualitative evidence. This underlines the problems of current methods for indexing qualitative research in bibliographic databases and indicates where improvements need to be made.
Resumo:
In this paper the technique of shorter route determination of fire engine to the fire place on time minimization criterion with the use of evolutionary modeling is offered. The algorithm of its realization on the base of complete and optimized space of search of possible decisions is explored. The aspects of goal function forming and program realization of method having a special purpose are considered. Experimental verification is executed and the results of comparative analysis with the expert conclusions are considered.