31 resultados para Web Search

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors. METHODS: Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r. RESULTS: For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93. CONCLUSIONS: The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The ubiquity of the Internet and Web has led to the emergency of several Web search engines with varying capabilities. A weakness of existing search engines is the very extensive amount of hits that they can produce. Moreover, only a small number of web users actually know how to utilize the true power of Web search engines. Therefore, there is a need for searching infrastructure to help ease and guide the searching efforts of web users toward their desired objectives. In this paper, we propose a context-based meta-search engine and discuss its implementation on top of the actual Google.com search engine. The proposed meta-search engine benefits the user the most when the user does not know what exact document he or she is looking for. Comparison of the context-based meta-search engine with both Google and Guided Google shows that the results returned by context-based meta-search engine is much more intuitive and accurate than the results returned by both Google and Guided Google.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents two hyperlink analysis-based algorithms to find relevant pages for a given Web page (URL). The first algorithm comes from the extended cocitation analysis of the Web pages. It is intuitive and easy to implement. The second one takes advantage of linear algebra theories to reveal deeper relationships among the Web pages and to identify relevant pages more precisely and effectively. The experimental results show the feasibility and effectiveness of the algorithms. These algorithms could be used for various Web applications, such as enhancing Web search. The ideas and techniques in this work would be helpful to other Web-related researches.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a key component of the searching accuracy of a Web search engine. To achieve this, this paper introduces a new approach for the clustering of Web documents, which is called maximal frequent itemset (MFI) approach. Iterative clustering algorithms, such as K-means and expectation-maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 Web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in Web document sets.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Yanchun Zhang and his co-authors explain how to construct and analyse Web communities based on information like Web document contents, hyperlinks, or user access logs. Their approaches combine results from Web search algorithms, Web clustering methods, and Web usage mining. They also detail the necessary preliminaries needed to understand the algorithms presented, and they discuss several successful existing applications. Researchers and students in information retrieval and Web search find in this all the necessary basics and methods to create and understand Web communities. Professionals developing Web applications will additionally benefit from the samples presented for their own designs and implementations

Relevância:

70.00% 70.00%

Publicador:

Resumo:

For years, opinion polls rely on data collected through telephone or person-to-person surveys. The process is costly, inconvenient, and slow. Recently online search data has emerged as potential proxies for the survey data. However considerable human involvement is still needed for the selection of search indices, a task that requires knowledge of both the target issue and how search terms are used by the online community. The robustness of such manually selected search indices can be questionable. In this paper, we propose an automatic polling system through a novel application of machine learning. In this system, the needs for examining, comparing, and selecting search indices have been eliminated through automatic generation of candidate search indices and intelligent combination of the indices. The results include a publicly accessible web application that provides real-time, robust, and accurate measurements of public opinions on several subjects of general interest.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spectral methods, as an unsupervised technique, have been used with success in data mining such as LSI in information retrieval, HITS and PageRank in Web search engines, and spectral clustering in machine learning. The essence of success in these applications is the spectral information that captures the semantics inherent in the large amount of data required during unsupervised learning. In this paper, we ask if spectral methods can also be used in supervised learning, e.g., classification. In an attempt to answer this question, our research reveals a novel kernel in which spectral clustering information can be easily exploited and extended to new incoming data during classification tasks. From our experimental results, the proposed Spectral Kernel has proved to speedup classification tasks without compromising accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this thesis, the author designed three sets of preference based ranking algorithms for information retrieval and provided the corresponsive applications for the algorithms. The main goal is to retrieve recommended, high similar and valuable ranking results to users.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conventional relevance feedback schemes may not be suitable to all practical applications of content-based image retrieval (CBIR), since most ordinary users would like to complete their search in a single interaction, especially on the web search. In this paper, we explore a new approach to improve the retrieval performance based on a new concept, bag of images, rather than relevance feedback. We consider that image collection comprises of image bags instead of independent individual images. Each image bag includes some relevant images with the same perceptual meaning. A theoretical case study demonstrates that image retrieval can benefit from the new concept. A number of experimental results show that the CBIR scheme based on bag of images can improve the retrieval performance dramatically.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

When educational research is conducted online, we sometimes promise our participants that they will be anonymous—but do we deliver on this promise? We have been warned since 1996 to be careful when using direct quotes in Internet research, as full-text web search engines make it easy to find chunks of text online. This paper details an empirical study into the prevalence of direct quotes from participants in a subset of the educational technology literature. Using basic web search techniques, the source of direct quotes could be found in 10 of 112 articles. Analysis of the articles revealed previously undiscussed threats from data triangulation and expert analysis/diagnosis. Issues of ethical obliviousness, obscurity and concern for future privacy-invasive technologies are also discussed. Recommendations for researchers, journals and institutional ethics review boards are made for how to better protect participants' anonymity against current and future threats.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Retrieval systems with non-deterministic output are widely used in information retrieval. Common examples include sampling, approximation algorithms, or interactive user input. The effectiveness of such systems differs not just for different topics, but also for different instances of the system. The inherent variance presents a dilemma - What is the best way to measure the effectiveness of a non-deterministic IR system? Existing approaches to IR evaluation do not consider this problem, or the potential impact on statistical significance. In this paper, we explore how such variance can affect system comparisons, and propose an evaluation framework and methodologies capable of doing this comparison. Using the context of distributed information retrieval as a case study for our investigation, we show that the approaches provide a consistent and reliable methodology to compare the effectiveness of a non-deterministic system with a deterministic or another non-deterministic system. In addition, we present a statistical best-practice that can be used to safely show how a non-deterministic IR system has equivalent effectiveness to another IR system, and how to avoid the common pitfall of misusing a lack of significance as a proof that two systems have equivalent effectiveness.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

UDDI is a standard for publishing and discovery of web services. UDDI registries provide keyword searches for web services. The search functionality is very simple and fails to account for relationships between web services. In this paper, we propose an algorithm which retrieves closely related web services. The proposed algorithm is based on singular value decomposition (SVD) in linear algebra, which reveals semantic relationships among web services. The preliminary evaluation shows the effectiveness and feasibility of the algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents an approach called the Co-Recommendation Algorithm, which consists of the features of the recommendation rule and the co-citation algorithm. The algorithm addresses some challenges that are essential for further searching and recommendation algorithms. It does not require users to provide a lot of interactive communication. Furthermore, it supports other queries, such as keyword, URL and document investigations. When the structure is compared to other algorithms, the scalability is noticeably easier. The high online performance can be obtained as well as the repository computation, which can achieve a high group-forming accuracy using only a fraction of Web pages from a cluster.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The rapid increase of web complexity and size makes web searched results far from satisfaction in many cases due to a huge amount of information returned by search engines. How to find intrinsic relationships among the web pages at a higher level to implement efficient web searched information management and retrieval is becoming a challenge problem. In this paper, we propose an approach to measure web page similarity. This approach takes hyperlink transitivity and page importance into consideration. From this new similarity measurement, an effective hierarchical web page clustering algorithm is proposed. The primary evaluations show the effectiveness of the new similarity measurement and the improvement of web page clustering. The proposed page similarity, as well as the matrix-based hyperlink analysis methods, could be applied to other web-based research areas..

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Discovering intrinsic relationships/structures among concerned web information objects such as web pages is important for effectively processing and managing web information. In this work, a set of web pages that has its own intrinsic structure is called a web page community. This paper proposes a matrix model to describe relationships among concerned web pages. Based on this model, intrinsic relationships among pages could be revealed, and in turn a web page community could be constructed. The issues that are related to this model and its applications are investigated and studied. Some applications based on this model are presented, which demonstrate the potential of this matrix model in different kinds of web page community construction and information processing.