Biblioteca Digital

6 resultados para PBL tutorial search term

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland

Search Interfaces on the Web: Querying and Characterizing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current-day web search engines (e.g., Google) do not crawl and index a significant portion of theWeb and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms (or search interfaces) are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages that embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. In this thesis, our primary and key object of study is a huge portion of the Web (hereafter referred as the deep Web) hidden behind web search interfaces. We concentrate on three classes of problems around the deep Web: characterization of deep Web, finding and classifying deep web resources, and querying web databases. Characterizing deep Web: Though the term deep Web was coined in 2000, which is sufficiently long ago for any web-related concept/technology, we still do not know many important characteristics of the deep Web. Another matter of concern is that surveys of the deep Web existing so far are predominantly based on study of deep web sites in English. One can then expect that findings from these surveys may be biased, especially owing to a steady increase in non-English web content. In this way, surveying of national segments of the deep Web is of interest not only to national communities but to the whole web community as well. In this thesis, we propose two new methods for estimating the main parameters of deep Web. We use the suggested methods to estimate the scale of one specific national segment of the Web and report our findings. We also build and make publicly available a dataset describing more than 200 web databases from the national segment of the Web. Finding deep web resources: The deep Web has been growing at a very fast pace. It has been estimated that there are hundred thousands of deep web sites. Due to the huge volume of information in the deep Web, there has been a significant interest to approaches that allow users and computer applications to leverage this information. Most approaches assumed that search interfaces to web databases of interest are already discovered and known to query systems. However, such assumptions do not hold true mostly because of the large scale of the deep Web – indeed, for any given domain of interest there are too many web databases with relevant content. Thus, the ability to locate search interfaces to web databases becomes a key requirement for any application accessing the deep Web. In this thesis, we describe the architecture of the I-Crawler, a system for finding and classifying search interfaces. Specifically, the I-Crawler is intentionally designed to be used in deepWeb characterization studies and for constructing directories of deep web resources. Unlike almost all other approaches to the deep Web existing so far, the I-Crawler is able to recognize and analyze JavaScript-rich and non-HTML searchable forms. Querying web databases: Retrieving information by filling out web search forms is a typical task for a web user. This is all the more so as interfaces of conventional search engines are also web forms. At present, a user needs to manually provide input values to search interfaces and then extract required data from the pages with results. The manual filling out forms is not feasible and cumbersome in cases of complex queries but such kind of queries are essential for many web searches especially in the area of e-commerce. In this way, the automation of querying and retrieving data behind search interfaces is desirable and essential for such tasks as building domain-independent deep web crawlers and automated web agents, searching for domain-specific information (vertical search engines), and for extraction and integration of information from various deep web resources. We present a data model for representing search interfaces and discuss techniques for extracting field labels, client-side scripts and structured data from HTML pages. We also describe a representation of result pages and discuss how to extract and store results of form queries. Besides, we present a user-friendly and expressive form query language that allows one to retrieve information behind search interfaces and extract useful data from the result pages based on specified conditions. We implement a prototype system for querying web databases and describe its architecture and components design.

Veja mais

Optimization as a method of search engine marketing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study is dedicated to search engine marketing (SEM). It aims for developing a business model of SEM firms and to provide explicit research of trustworthy practices of virtual marketing companies. Optimization is a general term that represents a variety of techniques and methods of the web pages promotion. The research addresses optimization as a business activity, and it explains its role for the online marketing. Additionally, it highlights issues of unethical techniques utilization by marketers which created relatively negative attitude to them on the Internet environment. Literature insight combines in the one place both technical and economical scientific findings in order to highlight technological and business attributes incorporated in SEM activities. Empirical data regarding search marketers was collected via e-mail questionnaires. 4 representatives of SEM companies were engaged in this study to accomplish the business model design. Additionally, the fifth respondent was a representative of the search engine portal, who provided insight on relations between search engines and marketers. Obtained information of the respondents was processed qualitatively. Movement of commercial organizations to the online market increases demand on promotional programs. SEM is the largest part of online marketing, and it is a prerogative of search engines portals. However, skilled users, or marketers, are able to implement long-term marketing programs by utilizing web page optimization techniques, key word consultancy or content optimization to increase web site visibility to search engines and, therefore, user’s attention to the customer pages. SEM firms are related to small knowledge-intensive businesses. On the basis of data analysis the business model was constructed. The SEM model includes generalized constructs, although they represent a wider amount of operational aspects. Constructing blocks of the model includes fundamental parts of SEM commercial activity: value creation, customer, infrastructure and financial segments. Also, approaches were provided on company’s differentiation and competitive advantages evaluation. It is assumed that search marketers should apply further attempts to differentiate own business out of the large number of similar service providing companies. Findings indicate that SEM companies are interested in the increasing their trustworthiness and the reputation building. Future of the search marketing is directly depending on search engines development.

Veja mais

Content-centered method for search engine optimization

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Value of online business has grown to over one trillion USD. This thesis is about search engine optimization, which focus is to increase search engine rankings. Search engine optimization is an important branch of online marketing because the first page of search engine results is generating majority of the search traffic. Current articles about search engine optimization and Google are indicating that with the proper use of quality content, there is potential to improve search engine rankings. However, the existing search engine optimization literature is not noticing content at a sufficient level. To decrease that difference, the content-centered method for search engine optimization is constructed, and content in search engine optimization is studied. This content-centered method consists of three search engine optimization tactics: 1) content, 2) keywords, and 3) links. Two propositions were used for testing these tactics in a real business environment and results are suggesting that the content-centered method is improving search engine rankings. Search engine optimization is constantly changing because Google is adjusting its search algorithm regularly. Still, some long-term trends can be recognized. Google has said that content is growing its importance as a ranking factor in the future. The content-centered method is taking advance of this new trend in search engine optimization to be relevant for years to come.