Biblioteca Digital

46 resultados para Web search engines

em Deakin Research Online - Australia

Context-aware meta search engine for distributed web service

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ubiquity of the Internet and Web has led to the emergency of several Web search engines with varying capabilities. A weakness of existing search engines is the very extensive amount of hits that they can produce. Moreover, only a small number of web users actually know how to utilize the true power of Web search engines. Therefore, there is a need for searching infrastructure to help ease and guide the searching efforts of web users toward their desired objectives. In this paper, we propose a context-based meta-search engine and discuss its implementation on top of the actual Google.com search engine. The proposed meta-search engine benefits the user the most when the user does not know what exact document he or she is looking for. Comparison of the context-based meta-search engine with both Google and Guided Google shows that the results returned by context-based meta-search engine is much more intuitive and accurate than the results returned by both Google and Guided Google.

Web search activity data accurately predict population chronic disease risk in the USA

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors. METHODS: Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r. RESULTS: For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93. CONCLUSIONS: The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.

Spectral kernels for classification

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spectral methods, as an unsupervised technique, have been used with success in data mining such as LSI in information retrieval, HITS and PageRank in Web search engines, and spectral clustering in machine learning. The essence of success in these applications is the spectral information that captures the semantics inherent in the large amount of data required during unsupervised learning. In this paper, we ask if spectral methods can also be used in supervised learning, e.g., classification. In an attempt to answer this question, our research reveals a novel kernel in which spectral clustering information can be easily exploited and extended to new incoming data during classification tasks. From our experimental results, the proposed Spectral Kernel has proved to speedup classification tasks without compromising accuracy.

Algorithms and applications of preference based ranking for information retrieval

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis, the author designed three sets of preference based ranking algorithms for information retrieval and provided the corresponsive applications for the algorithms. The main goal is to retrieve recommended, high similar and valuable ranking results to users.

Our anonymous online research participants are not always anonymous: Is this a problem?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When educational research is conducted online, we sometimes promise our participants that they will be anonymous—but do we deliver on this promise? We have been warned since 1996 to be careful when using direct quotes in Internet research, as full-text web search engines make it easy to find chunks of text online. This paper details an empirical study into the prevalence of direct quotes from participants in a subset of the educational technology literature. Using basic web search techniques, the source of direct quotes could be found in 10 of 112 articles. Analysis of the articles revealed previously undiscussed threats from data triangulation and expert analysis/diagnosis. Issues of ethical obliviousness, obscurity and concern for future privacy-invasive technologies are also discussed. Recommendations for researchers, journals and institutional ethics review boards are made for how to better protect participants' anonymity against current and future threats.

A co-recommendation algorithm for web searching

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an approach called the Co-Recommendation Algorithm, which consists of the features of the recommendation rule and the co-citation algorithm. The algorithm addresses some challenges that are essential for further searching and recommendation algorithms. It does not require users to provide a lot of interactive communication. Furthermore, it supports other queries, such as keyword, URL and document investigations. When the structure is compared to other algorithms, the scalability is noticeably easier. The high online performance can be obtained as well as the repository computation, which can achieve a high group-forming accuracy using only a fraction of Web pages from a cluster.

Utilizing hyperlink transitivity to improve web page clustering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rapid increase of web complexity and size makes web searched results far from satisfaction in many cases due to a huge amount of information returned by search engines. How to find intrinsic relationships among the web pages at a higher level to implement efficient web searched information management and retrieval is becoming a challenge problem. In this paper, we propose an approach to measure web page similarity. This approach takes hyperlink transitivity and page importance into consideration. From this new similarity measurement, an effective hierarchical web page clustering algorithm is proposed. The primary evaluations show the effectiveness of the new similarity measurement and the improvement of web page clustering. The proposed page similarity, as well as the matrix-based hyperlink analysis methods, could be applied to other web-based research areas..

Direct versus search engine traffic : an innovative approach to demand analysis in the property market

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose – The application of “Google” econometrics (Geco) has evolved rapidly in recent years and can be applied in various fields of research. Based on accepted theories in existing economic literature, this paper seeks to contribute to the innovative use of research on Google search query data to provide a new innovative to property research.

Design/methodology/approach – In this study, existing data from Google Insights for Search (GI4S) is extended into a new potential source of consumer sentiment data based on visits to a commonly-used UK online real-estate agent platform (Rightmove.co.uk). In order to contribute to knowledge about the use of Geco's black box, namely the unknown sampling population and the specific search queries influencing the variables, the GI4S series are compared to direct web navigation.

Findings – The main finding from this study is that GI4S data produce immediate real-time results with a high level of reliability in explaining the future volume of transactions and house prices in comparison to the direct website data. Furthermore, the results reveal that the number of visits to Rightmove.co.uk is driven by GI4S data and vice versa, and indeed without a contemporaneous relationship.

Originality/value – This study contributes to the new emerging and innovative field of research involving search engine data. It also contributes to the knowledge base about the increasing use of online consumer data in economic research in property markets.

Effectively finding relevant web pages from linkage information

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents two hyperlink analysis-based algorithms to find relevant pages for a given Web page (URL). The first algorithm comes from the extended cocitation analysis of the Web pages. It is intuitive and easy to implement. The second one takes advantage of linear algebra theories to reveal deeper relationships among the Web pages and to identify relevant pages more precisely and effectively. The experimental results show the feasibility and effectiveness of the algorithms. These algorithms could be used for various Web applications, such as enhancing Web search. The ideas and techniques in this work would be helpful to other Web-related researches.

A maximal frequent itemset approach for web document clustering

Relevância:

90.00% 90.00%

Publicador:

Resumo:

To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a key component of the searching accuracy of a Web search engine. To achieve this, this paper introduces a new approach for the clustering of Web documents, which is called maximal frequent itemset (MFI) approach. Iterative clustering algorithms, such as K-means and expectation-maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 Web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in Web document sets.

Web communities analysis and construction

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Yanchun Zhang and his co-authors explain how to construct and analyse Web communities based on information like Web document contents, hyperlinks, or user access logs. Their approaches combine results from Web search algorithms, Web clustering methods, and Web usage mining. They also detail the necessary preliminaries needed to understand the algorithms presented, and they discuss several successful existing applications. Researchers and students in information retrieval and Web search find in this all the necessary basics and methods to create and understand Web communities. Professionals developing Web applications will additionally benefit from the samples presented for their own designs and implementations

Diving for pearls : controlled searching on EdNA online

Relevância:

90.00% 90.00%

Publicador:

Medication adherence self-report instruments : implications for practice and research

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: After an acute cardiac event, adhering to recommendations for pharmacologic therapy is important in achieving optimal health outcomes. Considering the impressive evidence base for cardiovascular pharmacotherapy, strategies for promoting adherence are less well developed. Furthermore, accessing reliable, valid, and cost-effective mechanisms of monitoring adherence in the research and clinical settings is challenging. Aim: The aim of this article was to review published self-report measures assessing and monitoring medication adherence in cardiovascular disease and provide recommendations for research into medication adherence. Methods: The electronic databases CINAHL, Medline, and Science Direct were searched using the key search terms medication adherence and/or compliance, cardiovascular, self-report measures, and questionnaires. The World Wide Web was searched using the Google and Google Scholar search engines, and reference lists of retrieved documents were reviewed. The search strategy was verified by a health librarian. Instruments were included if they specifically addressed medication adherence as a discrete construct rather than a disease-specific or a generic health status measurement. Findings: Despite of the problems with medication adherence identified in the literature, only 7 instruments met the search criteria. There was limited use of instruments across studies and settings to enable comparison across populations and extensive psychometric evaluation. Conclusions: Medication adherence is a complex, multifaceted construct dependent on a range of physical, social, economic, and psychological considerations. In spite of the importance of adherence in ensuring optimal cardiovascular outcomes, conceptual underpinnings and methods of assessing medication adherence require further discussion and debate.

Google hacking defence based on honey pages

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many web servers contain some dangerous pages (we name them eigenpages) that can indicate their vulnerabilities. Therefore, some worms such as Santy locate their targets by searching for these eigenpages in search engines with well-crafted queries. In this paper, we focus on the modeling and containment of these special worms targeting web applications. We propose a containment system based on honey pots. We make search engines randomly insert a few honey pages that will induce visitors to the pre-established honey pots among the search results for the arriving queries. And then infectious can be detected and reported to the search engines when their malicious scans hit the honey pots. We find that the Santy worm can be well stopped by inserting no more than two honey pages in every one hundred search results. We also solve the challenging issue to dynamically generate matching honey pages for those dynamically arriving queries. Finally, a prototype is implemented to prove the technical feasibility of this system. © 2013 by CESER Publications.

Ipoll: Automatic polling using online search

Relevância:

90.00% 90.00%

Publicador:

Resumo:

For years, opinion polls rely on data collected through telephone or person-to-person surveys. The process is costly, inconvenient, and slow. Recently online search data has emerged as potential proxies for the survey data. However considerable human involvement is still needed for the selection of search indices, a task that requires knowledge of both the target issue and how search terms are used by the online community. The robustness of such manually selected search indices can be questionable. In this paper, we propose an automatic polling system through a novel application of machine learning. In this system, the needs for examining, comparing, and selecting search indices have been eliminated through automatic generation of candidate search indices and intelligent combination of the indices. The results include a publicly accessible web application that provides real-time, robust, and accurate measurements of public opinions on several subjects of general interest.

«
1
2
3
4
»