Biblioteca Digital

941 resultados para Grid search algorithm

Search Interfaces on the Web: Querying and Characterizing

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current-day web search engines (e.g., Google) do not crawl and index a significant portion of theWeb and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms (or search interfaces) are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages that embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. In this thesis, our primary and key object of study is a huge portion of the Web (hereafter referred as the deep Web) hidden behind web search interfaces. We concentrate on three classes of problems around the deep Web: characterization of deep Web, finding and classifying deep web resources, and querying web databases. Characterizing deep Web: Though the term deep Web was coined in 2000, which is sufficiently long ago for any web-related concept/technology, we still do not know many important characteristics of the deep Web. Another matter of concern is that surveys of the deep Web existing so far are predominantly based on study of deep web sites in English. One can then expect that findings from these surveys may be biased, especially owing to a steady increase in non-English web content. In this way, surveying of national segments of the deep Web is of interest not only to national communities but to the whole web community as well. In this thesis, we propose two new methods for estimating the main parameters of deep Web. We use the suggested methods to estimate the scale of one specific national segment of the Web and report our findings. We also build and make publicly available a dataset describing more than 200 web databases from the national segment of the Web. Finding deep web resources: The deep Web has been growing at a very fast pace. It has been estimated that there are hundred thousands of deep web sites. Due to the huge volume of information in the deep Web, there has been a significant interest to approaches that allow users and computer applications to leverage this information. Most approaches assumed that search interfaces to web databases of interest are already discovered and known to query systems. However, such assumptions do not hold true mostly because of the large scale of the deep Web – indeed, for any given domain of interest there are too many web databases with relevant content. Thus, the ability to locate search interfaces to web databases becomes a key requirement for any application accessing the deep Web. In this thesis, we describe the architecture of the I-Crawler, a system for finding and classifying search interfaces. Specifically, the I-Crawler is intentionally designed to be used in deepWeb characterization studies and for constructing directories of deep web resources. Unlike almost all other approaches to the deep Web existing so far, the I-Crawler is able to recognize and analyze JavaScript-rich and non-HTML searchable forms. Querying web databases: Retrieving information by filling out web search forms is a typical task for a web user. This is all the more so as interfaces of conventional search engines are also web forms. At present, a user needs to manually provide input values to search interfaces and then extract required data from the pages with results. The manual filling out forms is not feasible and cumbersome in cases of complex queries but such kind of queries are essential for many web searches especially in the area of e-commerce. In this way, the automation of querying and retrieving data behind search interfaces is desirable and essential for such tasks as building domain-independent deep web crawlers and automated web agents, searching for domain-specific information (vertical search engines), and for extraction and integration of information from various deep web resources. We present a data model for representing search interfaces and discuss techniques for extracting field labels, client-side scripts and structured data from HTML pages. We also describe a representation of result pages and discuss how to extract and store results of form queries. Besides, we present a user-friendly and expressive form query language that allows one to retrieve information behind search interfaces and extract useful data from the result pages based on specified conditions. We implement a prototype system for querying web databases and describe its architecture and components design.

Veja mais

Development of phase retrieval algorithm and techniques for optical spectrum analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Coherent anti-Stokes Raman scattering is the powerful method of laser spectroscopy in which significant successes are achieved. However, the non-linear nature of CARS complicates the analysis of the received spectra. The objective of this Thesis is to develop a new phase retrieval algorithm for CARS. It utilizes the maximum entropy method and the new wavelet approach for spectroscopic background correction of a phase function. The method was developed to be easily automated and used on a large number of spectra of different substances.. The algorithm was successfully tested on experimental data.

Veja mais

The efficiency of indicator groups for the conservation of amphibians in the Brazilian Atlantic Forest

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The adequate selection of indicator groups of biodiversity is an important aspect of the systematic conservation planning. However, these assessments differ in the spatial scale, in the methods used and in the groups considered to accomplish this task, which generally produces contradictory results. The quantification of the spatial congruence between species richness and complementarity among different taxonomic groups is a fundamental step to identify potential indicator groups. Using a constructive approach, the main purposes of this study were to evaluate the performance and efficiency of eight potential indicator groups representing amphibian diversity in the Brazilian Atlantic Forest. Data on the geographic range of amphibian species that occur in the Brazilian Atlantic Forest was overlapped to the full geographic extent of the biome, which was divided into a regular equal-area grid. Optimization routines based on the concept of complementarily were applied to verify the performance of each indicator group selected in relation to the representativeness of the amphibians in the Brazilian Atlantic Forest as a whole, which were solved by the algorithm"simulated annealing", through the use of the software MARXAN. Some indicator groups were substantially more effective than others in regards to the representation of the taxonomic groups assessed, which was confirmed by the high significance of data (F = 312.76; p < 0.01). Leiuperidae was considered as the best indicator group among the families analyzed, as it showed a good performance, representing 71% of amphibian species in the Brazilian Atlantic Forest (i.e. 290 species), which may be associated with the diffuse geographic distribution of its species. This study promotes understanding of how the diversity standards of amphibians can be informative for systematic conservation planning on a regional scale.

Veja mais

Efficient total variation algorithm for fetal MRI reconstruction

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fetal MRI reconstruction aims at finding a high-resolution image given a small set of low-resolution images. It is usually modeled as an inverse problem where the regularization term plays a central role in the reconstruction quality. Literature has considered several regularization terms s.a. Dirichlet/Laplacian energy [1], Total Variation (TV)based energies [2,3] and more recently non-local means [4]. Although TV energies are quite attractive because of their ability in edge preservation, standard explicit steepest gradient techniques have been applied to optimize fetal-based TV energies. The main contribution of this work lies in the introduction of a well-posed TV algorithm from the point of view of convex optimization. Specifically, our proposed TV optimization algorithm for fetal reconstruction is optimal w.r.t. the asymptotic and iterative convergence speeds O(1/n(2)) and O(1/root epsilon), while existing techniques are in O(1/n) and O(1/epsilon). We apply our algorithm to (1) clinical newborn data, considered as ground truth, and (2) clinical fetal acquisitions. Our algorithm compares favorably with the literature in terms of speed and accuracy.

Veja mais

SAFT: Split-Algorithm for Fast T2 Mapping

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Search and Discovery : OER's Open Loop

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Open educational resources (OER) promise increased access, participation, quality, and relevance, in addition to cost reduction. These seemingly fantastic promises are based on the supposition that educators and learners will discover existing resources, improve them, and share the results, resulting in a virtuous cycle of improvement and re-use. By anecdotal metrics, existing web scale search is not working for OER. This situation impairs the cycle underlying the promise of OER, endangering long term growth and sustainability. While the scope of the problem is vast, targeted improvements in areas of curation, indexing, and data exchange can improve the situation, and create opportunities for further scale. I explore the way the system is currently inadequate, discuss areas for targeted improvement, and describe a prototype system built to test these ideas. I conclude with suggestions for further exploration and development.

Veja mais

The unfathomable secret remains: a reverse engineering study of SEO factors on Google's search engine

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The vast majority of users don’t seek results beyond the second page offered by the search engine, so if a site fails to be among the top 20 (second page), it says that this page does not have good SEO and, therefore, is not visible to the user. The overall objective of this project is to conduct a study to discover the factors that determine (or not) the positioning of websites in a search engine.

Veja mais