993 resultados para Query results


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The development of new technologies that use peer-to-peer networks grows every day, with the object to supply the need of sharing information, resources and services of databases around the world. Among them are the peer-to-peer databases that take advantage of peer-to-peer networks to manage distributed knowledge bases, allowing the sharing of information semantically related but syntactically heterogeneous. However, it is a challenge to ensure the efficient search for information without compromising the autonomy of each node and network flexibility, given the structural characteristics of these networks. On the other hand, some studies propose the use of ontology semantics by assigning standardized categorization of information. The main original contribution of this work is the approach of this problem with a proposal for optimization of queries supported by the Ant Colony algorithm and classification though ontologies. The results show that this strategy enables the semantic support to the searches in peer-to-peer databases, aiming to expand the results without compromising network performance. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The central objective of research in Information Retrieval (IR) is to discover new techniques to retrieve relevant information in order to satisfy an Information Need. The Information Need is satisfied when relevant information can be provided to the user. In IR, relevance is a fundamental concept which has changed over time, from popular to personal, i.e., what was considered relevant before was information for the whole population, but what is considered relevant now is specific information for each user. Hence, there is a need to connect the behavior of the system to the condition of a particular person and his social context; thereby an interdisciplinary sector called Human-Centered Computing was born. For the modern search engine, the information extracted for the individual user is crucial. According to the Personalized Search (PS), two different techniques are necessary to personalize a search: contextualization (interconnected conditions that occur in an activity), and individualization (characteristics that distinguish an individual). This movement of focus to the individual's need undermines the rigid linearity of the classical model overtaken the ``berry picking'' model which explains that the terms change thanks to the informational feedback received from the search activity introducing the concept of evolution of search terms. The development of Information Foraging theory, which observed the correlations between animal foraging and human information foraging, also contributed to this transformation through attempts to optimize the cost-benefit ratio. This thesis arose from the need to satisfy human individuality when searching for information, and it develops a synergistic collaboration between the frontiers of technological innovation and the recent advances in IR. The search method developed exploits what is relevant for the user by changing radically the way in which an Information Need is expressed, because now it is expressed through the generation of the query and its own context. As a matter of fact the method was born under the pretense to improve the quality of search by rewriting the query based on the contexts automatically generated from a local knowledge base. Furthermore, the idea of optimizing each IR system has led to develop it as a middleware of interaction between the user and the IR system. Thereby the system has just two possible actions: rewriting the query, and reordering the result. Equivalent actions to the approach was described from the PS that generally exploits information derived from analysis of user behavior, while the proposed approach exploits knowledge provided by the user. The thesis went further to generate a novel method for an assessment procedure, according to the "Cranfield paradigm", in order to evaluate this type of IR systems. The results achieved are interesting considering both the effectiveness achieved and the innovative approach undertaken together with the several applications inspired using a local knowledge base.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Spatial data are particularly useful in mobile environments. However, due to the low bandwidth of most wireless networks, developing large spatial database applications becomes a challenging process. In this paper, we provide the first attempt to combine two important techniques, multiresolution spatial data structure and semantic caching, towards efficient spatial query processing in mobile environments. Based on the study of the characteristics of multiresolution spatial data (MSD) and multiresolution spatial query, we propose a new semantic caching model called Multiresolution Semantic Caching (MSC) for caching MSD in mobile environments. MSC enriches the traditional three-category query processing in semantic cache to five categories, thus improving the performance in three ways: 1) a reduction in the amount and complexity of the remainder queries; 2) the redundant transmission of spatial data already residing in a cache is avoided; 3) a provision for satisfactory answers before 100% query results have been transmitted to the client side. Our extensive experiments on a very large and complex real spatial database show that MSC outperforms the traditional semantic caching models significantly

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results: PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions: PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper highlights the role of non-functional information when reusing from a component library. We describe a method for selecting appropriate implementations of Ada packages taking non-functional constraints into account; these constraints model the context of reuse. Constraints take the form of queries using an interface description language called NoFun, which is also used to state non-functional information in Ada packages; query results are trees of implementations, following the import relationships between components. We define two different situations when reusing components, depending whether we take the library being searched as closed or extendible. The resulting tree of implementations can be manipulated by the user to solve ambiguities, to state default behaviours, and by the like. As part of the proposal, we face the problem of computing from code the non-functional information that determines the selection process.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Prosessisimulointiohjelmistojen käyttö on yleistynyt paperiteollisuuden prosessien kartoituksessa ja kyseiset ohjelmistot ovat jo pitkään olleet myös Pöyry Engineering Oy:n työkaluja prosessisuunnittelussa. Tämän työn tavoitteeksi määritettiin prosessisimulointiohjelmistojen käytön selvittäminen suomalaisissa paperitehtaissa sekä prosessisimuloinnin tulevaisuuden näkymien arviointi metsäteollisuuden suunnittelupalveluissa liiketoiminnan kehittämiseksi. Työn teoriaosassa selvitetään mm. seuraavia asioita: mitä prosessisimulointi on, miksi simuloidaan ja mitkä ovat simuloinnin hyödyt ja haasteet. Teoriaosassa esitellään yleisimmät käytössä olevat prosessisimulointiohjelmistot, simulointiprosessin eteneminen sekä prosessisimuloinnin tuotteistamisen vaatimuksia. Työn kokeellisessa osassa selvitettiin kyselyn avulla prosessisimulointiohjelmistojen käyttöä Suomen paperitehtaissa. Kysely lähetettiin kaikille Suomen tärkeimmille paperitehtaille. Kyselyn avulla selvitettiin mm, mitä ohjelmia käytetään, mitä on simuloitu, mitä pitää vielä simuloida ja kuinka tarpeellisena prosessisimulointia pidetään. Työntulokset osoittavat, että kaikilla kyselyyn vastanneilla suomalaisilla paperitehtailla on käytetty prosessisimulointia. Suurin osa simuloinneista on tehty konelinjoihin sekä massa- ja vesijärjestelmiin. Tulevaisuuden tärkeimpänä kohteena pidetään energiavirtojen simulointia. Simulointimallien pitkäjänteisessä hyödyntämisessä ja ylläpidossa on kehitettävää, jossa simulointipalvelujen hankkiminen palveluna on tehtaille todennäköisin vaihtoehto. Johtopäätöksenä on se, että tehtailla on tarvetta prosessisimuloinnille. Ilmapiiri on kyselytuloksien mukaan suotuisa ja simulointi nähdään tarpeellisena työkaluna. Prosessisimuloinnin markkinointia, erillispalvelutuotteen lisäksi, kannattaisi kehittää siten, että simulointimallin ylläpito jatkuisi projektin jälkeen lähipalveluna. Markkinointi pitäisi tehdä jo projektin alkuvaiheessa tai projektin aikana. Simulointiohjelmien kirjosta suunnittelutoimiston kannattaa valita simulointiohjelmistoja, jotka sopivat sille parhaiten. Erityistapauksissa muiden ohjelmien hankintaa kannattaa harkita asiakkaan toivomusten mukaisesti.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Prosessiorganisaatiossa johtamisen ja prosessien tavoitteena on tyydyttää asiakkaan (sisäinen tai ulkoinen) tarpeet. Mittaamisen sitominen prosessin suorityskyvyn mittaamiseen antaa johdolle kuvan yrityksen toiminnasta. Suorityskykymittariston ja yksittäisten mittareiden avulla yritysjohto pystyy arvioimaan toiminnan tasoa, asettamaan tavoitteita sekä seuraamaan asettamiensa tavoitteiden toteutumista. Työn ensimmäisenä tavoitteena oli kartoittaa edellytyksiä sekä tukea Balanced Scorecardin mukaisen suorityskykymittariston tulevaisuuden implementointia. Mittaristo on tarkoitettu toimitusketjun prosessien tehokkuuden mittaamiseen. Työn toisena tavoitteena oli prosessipohjaisen ajattelutavan tukeminen suorituskykymittariston avulla. Implementoinnin edellytyksiä testattiin valitsemalla kaksi ensimmäisen tason avainmittaria pilottimittareiksi. Varaston suorityskykyä mittaavien pilottimittareiden avulla selvitettiin SC tuoteryhmän osalta toimitusketjun suorituskyky avainasiakkaiden ja tärkeiden markkina-alueiden osalta. Erona käytössä oleviin mittareihin on se, että uudet avainmittarit kattavat koko yrityksen toimitusketjun, kun tällä hetkellä käytössä olevat mittarit mittaavat toimitusketjun yksittäisiä osia. Uusien avaimittareiden lähtöarvot selvitettiin tietokantakyselyjen avulla. Tietokyselyt suoritettiin useissa yksittäisissä tietojärjejestelmissä, jonka jälkeen niiden tulokset koottiin yhteen tiedostoon ja analysoitiin PC sovellusten avulla. Mittauskohteet oli valittu yhdessä linjaorganisaation kanssa. Näin taattiin yhtiön johdon sitoutuminen mittariston kehittämiseen ja käyttöönottoon. Organisaatiossa yksittäisten prosessien (esim. mittaamisprosessi) vastuualueiden selventämiseen käytettiin koeluonteisesti vastuumatriisitekniikkaa. Prosessiorganisaatiossa johtamisen ja prosessien tavoitteena on tyydyttää asiakkaan (sisäinen tai ulkoinen) tarpeet. Mittaamisen sitominen prosessin suorityskyvyn mittaamiseen antaa johdolle kuvan yrityksen toiminnasta. Suorityskykymittariston ja yksittäisten mittareiden avulla yritysjohto pystyy arvioimaan toiminnan tasoa, asettamaan tavoitteita sekä seuraamaan asettamiensa tavoitteiden toteutumista. Työn ensimmäisenä tavoitteena oli kartoittaa edellytyksiä sekä tukea Balanced Scorecardin mukaisen suorityskykymittariston tulevaisuuden implementointia. Mittaristo on tarkoitettu toimitusketjun prosessien tehokkuuden mittaamiseen. Työn toisena tavoitteena oli prosessipohjaisen ajattelutavan tukeminen suorituskykymittariston avulla.Implementoinnin edellytyksiä testattiin valitsemalla kaksi ensimmäisen tason avainmittaria pilottimittareiksi. Varaston suorityskykyä mittaavien pilottimittareiden avulla selvitettiin SC tuoteryhmän osalta toimitusketjun suorituskyky avainasiakkaiden ja tärkeiden markkina-alueiden osalta. Erona käytössä oleviin mittareihin on se, että uudet avainmittarit kattavat koko yrityksen toimitusketjun, kun tällä hetkellä käytössä olevat mittarit mittaavat toimitusketjun yksittäisiä osia. Uusien avaimittareiden lähtöarvot selvitettiin tietokantakyselyjen avulla. Tietokyselyt suoritettiin useissa yksittäisissä tietojärjejestelmissä, jonka jälkeen niiden tulokset koottiin yhteen tiedostoon ja analysoitiin PC sovellusten avulla. Mittauskohteet oli valittu yhdessä linjaorganisaation kanssa. Näin taattiin yhtiön johdon sitoutuminen mittariston kehittämiseen ja käyttöönottoon. Organisaatiossa yksittäisten prosessien (esim. mittaamisprosessi) vastuualueiden selventämiseen käytettiin koeluonteisesti vastuumatriisitekniikkaa.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Current-day web search engines (e.g., Google) do not crawl and index a significant portion of theWeb and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms (or search interfaces) are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages that embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. In this thesis, our primary and key object of study is a huge portion of the Web (hereafter referred as the deep Web) hidden behind web search interfaces. We concentrate on three classes of problems around the deep Web: characterization of deep Web, finding and classifying deep web resources, and querying web databases. Characterizing deep Web: Though the term deep Web was coined in 2000, which is sufficiently long ago for any web-related concept/technology, we still do not know many important characteristics of the deep Web. Another matter of concern is that surveys of the deep Web existing so far are predominantly based on study of deep web sites in English. One can then expect that findings from these surveys may be biased, especially owing to a steady increase in non-English web content. In this way, surveying of national segments of the deep Web is of interest not only to national communities but to the whole web community as well. In this thesis, we propose two new methods for estimating the main parameters of deep Web. We use the suggested methods to estimate the scale of one specific national segment of the Web and report our findings. We also build and make publicly available a dataset describing more than 200 web databases from the national segment of the Web. Finding deep web resources: The deep Web has been growing at a very fast pace. It has been estimated that there are hundred thousands of deep web sites. Due to the huge volume of information in the deep Web, there has been a significant interest to approaches that allow users and computer applications to leverage this information. Most approaches assumed that search interfaces to web databases of interest are already discovered and known to query systems. However, such assumptions do not hold true mostly because of the large scale of the deep Web – indeed, for any given domain of interest there are too many web databases with relevant content. Thus, the ability to locate search interfaces to web databases becomes a key requirement for any application accessing the deep Web. In this thesis, we describe the architecture of the I-Crawler, a system for finding and classifying search interfaces. Specifically, the I-Crawler is intentionally designed to be used in deepWeb characterization studies and for constructing directories of deep web resources. Unlike almost all other approaches to the deep Web existing so far, the I-Crawler is able to recognize and analyze JavaScript-rich and non-HTML searchable forms. Querying web databases: Retrieving information by filling out web search forms is a typical task for a web user. This is all the more so as interfaces of conventional search engines are also web forms. At present, a user needs to manually provide input values to search interfaces and then extract required data from the pages with results. The manual filling out forms is not feasible and cumbersome in cases of complex queries but such kind of queries are essential for many web searches especially in the area of e-commerce. In this way, the automation of querying and retrieving data behind search interfaces is desirable and essential for such tasks as building domain-independent deep web crawlers and automated web agents, searching for domain-specific information (vertical search engines), and for extraction and integration of information from various deep web resources. We present a data model for representing search interfaces and discuss techniques for extracting field labels, client-side scripts and structured data from HTML pages. We also describe a representation of result pages and discuss how to extract and store results of form queries. Besides, we present a user-friendly and expressive form query language that allows one to retrieve information behind search interfaces and extract useful data from the result pages based on specified conditions. We implement a prototype system for querying web databases and describe its architecture and components design.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this thesis, the author presents a query language for an RDF (Resource Description Framework) database and discusses its applications in the context of the HELM project (the Hypertextual Electronic Library of Mathematics). This language aims at meeting the main requirements coming from the RDF community. in particular it includes: a human readable textual syntax and a machine-processable XML (Extensible Markup Language) syntax both for queries and for query results, a rigorously exposed formal semantics, a graph-oriented RDF data access model capable of exploring an entire RDF graph (including both RDF Models and RDF Schemata), a full set of Boolean operators to compose the query constraints, fully customizable and highly structured query results having a 4-dimensional geometry, some constructions taken from ordinary programming languages that simplify the formulation of complex queries. The HELM project aims at integrating the modern tools for the automation of formal reasoning with the most recent electronic publishing technologies, in order create and maintain a hypertextual, distributed virtual library of formal mathematical knowledge. In the spirit of the Semantic Web, the documents of this library include RDF metadata describing their structure and content in a machine-understandable form. Using the author's query engine, HELM exploits this information to implement some functionalities allowing the interactive and automatic retrieval of documents on the basis of content-aware requests that take into account the mathematical nature of these documents.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Although a vast amount of life sciences data is generated in the form of images, most scientists still store images on extremely diverse and often incompatible storage media, without any type of metadata structure, and thus with no standard facility with which to conduct searches or analyses. Here we present a solution to unlock the value of scientific images. The Global Image Database (GID) is a web-based (http://www.g wer.ch/qv/gid/gid.htm) structured central repository for scientific annotated images. The GID was designed to manage images from a wide spectrum of imaging domains ranging from microscopy to automated screening. The annotations in the GID define the source experiment of the images by describing who the authors of the experiment are, when the images were created, the biological origin of the experimental sample and how the sample was processed for visualization. A collection of experimental imaging protocols provides details of the sample preparation, and labeling, or visualization procedures. In addition, the entries in the GID reference these imaging protocols with the probe sequences or antibody names used in labeling experiments. The GID annotations are searchable by field or globally. The query results are first shown as image thumbnail previews, enabling quick browsing prior to original-sized annotated image retrieval. The development of the GID continues, aiming at facilitating the management and exchange of image data in the scientific community, and at creating new query tools for mining image data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.