11 resultados para web clustering

em Deakin Research Online - Australia


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The rapid increase of web complexity and size makes web searched results far from satisfaction in many cases due to a huge amount of information returned by search engines. How to find intrinsic relationships among the web pages at a higher level to implement efficient web searched information management and retrieval is becoming a challenge problem. In this paper, we propose an approach to measure web page similarity. This approach takes hyperlink transitivity and page importance into consideration. From this new similarity measurement, an effective hierarchical web page clustering algorithm is proposed. The primary evaluations show the effectiveness of the new similarity measurement and the improvement of web page clustering. The proposed page similarity, as well as the matrix-based hyperlink analysis methods, could be applied to other web-based research areas..

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Yanchun Zhang and his co-authors explain how to construct and analyse Web communities based on information like Web document contents, hyperlinks, or user access logs. Their approaches combine results from Web search algorithms, Web clustering methods, and Web usage mining. They also detail the necessary preliminaries needed to understand the algorithms presented, and they discuss several successful existing applications. Researchers and students in information retrieval and Web search find in this all the necessary basics and methods to create and understand Web communities. Professionals developing Web applications will additionally benefit from the samples presented for their own designs and implementations

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes a hyperlink-based web page similarity measurement and two matrix-based hierarchical web page clustering algorithms. The web page similarity measurement incorporates hyperlink transitivity and page importance within the concerned web page space. One clustering algorithm takes cluster overlapping into account, another one does not. These algorithxms do not require predefined similarity thresholds for clustering, and are independent of the page order. The primary evaluations show the effectiveness of the proposed algorithms in clustering improvement.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a key component of the searching accuracy of a Web search engine. To achieve this, this paper introduces a new approach for the clustering of Web documents, which is called maximal frequent itemset (MFI) approach. Iterative clustering algorithms, such as K-means and expectation-maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 Web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in Web document sets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The human immune system provides inspiration for solving a wide range of innovative problems. In this paper, we propse an immune network based approach for web document clustering. All the immune cells in the network competitively recognize the antigens (web documents) which are presented to the network one by one. The interaction between immune cells and an antigen leads to an augment of the network through the clonal selection and somatic mutation of the stimulated immune cells, while the interaction among immune cells results in a network compression. The structure of the immune network is well maintained by learning and self-regularity. We use a public web document data set to test the effectiveness of our method and compare it with other approaches. The experimental results demonstrate that the most striking advantage of immune-based data clustering is its adaptation in dynamic environment and the capability of finding new clusters automatically.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For many clustering algorithms, such as K-Means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters, that is, k, to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images, or biological data. In an effort to improve the effectiveness of clustering, we seek the answer to a fundamental question: How can we effectively estimate the number of clusters in a given data set? We propose an efficient method based on spectra analysis of eigenvalues (not eigenvectors) of the data set as the solution to the above. We first present the relationship between a data set and its underlying spectra with theoretical and experimental results. We then show how our method is capable of suggesting a range of k that is well suited to different analysis contexts. Finally, we conclude with further  empirical results to show how the answer to this fundamental question enhances the clustering process for large text collections.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: ldquoHow can we effectively estimate the natural number of clusters in a given text collection?rdquo. We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The high-throughput experimental data from the new gene microarray technology has spurred numerous efforts to find effective ways of processing microarray data for revealing real biological relationships among genes. This work proposes an innovative data pre-processing approach to identify noise data in the data sets and eliminate or reduce the impact of the noise data on gene clustering, With the proposed algorithm, the pre-processed data sets make the clustering results stable across clustering algorithms with different similarity metrics, the important information of genes and features is kept, and the clustering quality is improved. The primary evaluation on real microarray data sets has shown the effectiveness of the proposed algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Having an eye catching and attractive website could help hotels to compete in the vigorous online market. This study attempts to examine the relationship between human personality and the web design preferences. Kohonen Networks were adopted to cluster people with similar personality characteristics and identify their differences on web design preferences. Empirical results indicated people with similar personality traits have similar design preferences. For example, to attract those who got high scores in agreeableness, conscientiousness and openness but low score in neuroticism, a web page should start with a language selection page with introductory movie, one large image on the web page showing hotel interior design with hotel guest in the photo, and with background music.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Online social networks offer considerable potential for delivery of socially influential health behavior change interventions. OBJECTIVE: To determine the efficacy, engagement, and feasibility of an online social networking physical activity intervention with pedometers delivered via Facebook app. METHODS: A total of 110 adults with a mean age of 35.6 years (SD 12.4) were recruited online in teams of 3 to 8 friends. Teams were randomly allocated to receive access to a 50-day online social networking physical activity intervention which included self-monitoring, social elements, and pedometers ("Active Team" Facebook app; n=51 individuals, 12 teams) or a wait-listed control condition (n=59 individuals, 13 teams). Assessments were undertaken online at baseline, 8 weeks, and 20 weeks. The primary outcome measure was self-reported weekly moderate-to-vigorous physical activity (MVPA). Secondary outcomes were weekly walking, vigorous physical activity time, moderate physical activity time, overall quality of life, and mental health quality of life. Analyses were undertaken using random-effects mixed modeling, accounting for potential clustering at the team level. Usage statistics were reported descriptively to determine engagement and feasibility. RESULTS: At the 8-week follow-up, the intervention participants had significantly increased their total weekly MVPA by 135 minutes relative to the control group (P=.03), due primarily to increases in walking time (155 min/week increase relative to controls, P<.001). However, statistical differences between groups for total weekly MVPA and walking time were lost at the 20-week follow-up. There were no significant changes in vigorous physical activity, nor overall quality of life or mental health quality of life at either time point. High levels of engagement with the intervention, and particularly the self-monitoring features, were observed. CONCLUSIONS: An online, social networking physical activity intervention with pedometers can produce sizable short-term physical activity changes. Future work is needed to determine how to maintain behavior change in the longer term, how to reach at-need populations, and how to disseminate such interventions on a mass scale. TRIAL REGISTRATION: Australian New Zealand Clinical Trials Registry (ANZCTR): ACTRN12614000488606; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=366239 (Archived by WebCite at http://www.webcitation.org/6ZVtu6TMz).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ontology-driven systems with reasoning capabilities in the legal field are now better understood. Legal concepts are not discrete, but make up a dynamic continuum between common sense terms, specific technical use, and professional knowledge, in an evolving institutional reality. Thus, the tension between a plural understanding of regulations and a more general understanding of law is bringing into view a new landscape in which general legal frameworks – grounded in well-known legal theories stemming from 20th-century c. legal positivism or sociological jurisprudence – are made compatible with specific forms of rights management on the Web. In this sense, Semantic Web tools are not only being designed for information retrieval, classification, clustering, and knowledge management. They can also be understood as regulatory tools, i.e. as components of the contemporary legal architecture, to be used by multiple stakeholders – front-line practitioners, policymakers, legal drafters, companies, market agents, and citizens. That is the issue broadly addressed in this Special Issue on the Semantic Web for the Legal Domain, overviewing the work carried out over the last fifteen years, and seeking to foster new research in this field, beyond the state of the art.