358 resultados para educational data mining


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Online dating networks, a type of social network, are gaining popularity. With many people joining and being available in the network, users are overwhelmed with choices when choosing their ideal partners. This problem can be overcome by utilizing recommendation methods. However, traditional recommendation methods are ineffective and inefficient for online dating networks where the dataset is sparse and/or large and two-way matching is required. We propose a methodology by using clustering, SimRank to recommend matching candidates to users in an online dating network. Data from a live online dating network is used in evaluation. The success rate of recommendation obtained using the proposed method is compared with baseline success rate of the network and the performance is improved by double.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recommender systems are widely used online to help users find other products, items etc that they may be interested in based on what is known about that user in their profile. Often however user profiles may be short on information and thus it is difficult for a recommender system to make quality recommendations. This problem is known as the cold-start problem. Here we investigate using association rules as a source of information to expand a user profile and thus avoid this problem. Our experiments show that it is possible to use association rules to noticeably improve the performance of a recommender system under the cold-start situation. Furthermore, we also show that the improvement in performance obtained can be achieved while using non-redundant rule sets. This shows that non-redundant rules do not cause a loss of information and are just as informative as a set of association rules that contain redundancy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Due to the change in attitudes and lifestyles, people expect to find new partners and friends via various ways now-a-days. Online dating networks create a network for people to meet each other and allow making contact with different objectives of developing a personal, romantic or sexual relationship. Due to the higher expectation of users, online matching companies are trying to adopt recommender systems. However, the existing recommendation techniques such as content-based, collaborative filtering or hybrid techniques focus on users explicit contact behaviors but ignore the implicit relationship among users in the network. This paper proposes a social matching system that uses past relations and user similarities in finding potential matches. The proposed system is evaluated on the dataset collected from an online dating network. Empirical analysis shows that the recommendation success rate has increased to 31% as compared to the baseline success rate of 19%.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In automatic facial expression detection, very accurate registration is desired which can be achieved via a deformable model approach where a dense mesh of 60-70 points on the face is used, such as an active appearance model (AAM). However, for applications where manually labeling frames is prohibitive, AAMs do not work well as they do not generalize well to unseen subjects. As such, a more coarse approach is taken for person-independent facial expression detection, where just a couple of key features (such as face and eyes) are tracked using a Viola-Jones type approach. The tracked image is normally post-processed to encode for shift and illumination invariance using a linear bank of filters. Recently, it was shown that this preprocessing step is of no benefit when close to ideal registration has been obtained. In this paper, we present a system based on the Constrained Local Model (CLM) which is a generic or person-independent face alignment algorithm which gains high accuracy. We show these results against the LBP feature extraction on the CK+ and GEMEP datasets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents an overview of the experiments conducted using Hybrid Clustering of XML documents using Constraints (HCXC) method for the clustering task in the INEX 2009 XML Mining track. This technique utilises frequent subtrees generated from the structure to extract the content for clustering the XML documents. It also presents the experimental study using several data representations such as the structure-only, content-only and using both the structure and the content of XML documents for the purpose of clustering them. Unlike previous years, this year the XML documents were marked up using the Wiki tags and contains categories derived by using the YAGO ontology. This paper also presents the results of studying the effect of these tags on XML clustering using the HCXC method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation performance compared to previous approaches reported in literature using the Generalized Likelihood Ratio. When applied in a speaker diarization system, this approach results in a 5.1% relative improvement in the overall Diarization Error Rate compared to the baseline.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This technical report is concerned with one aspect of environmental monitoring—the detection and analysis of acoustic events in sound recordings of the environment. Sound recordings offer ecologists the advantage of cheaper and increased sampling but make available so much data that automated analysis becomes essential. The report describes a number of tools for automated analysis of recordings, including noise removal from spectrograms, acoustic event detection, event pattern recognition, spectral peak tracking, syntactic pattern recognition applied to call syllables, and oscillation detection. These algorithms are applied to a number of animal call recognition tasks, chosen because they illustrate quite different modes of analysis: (1) the detection of diffuse events caused by wind and rain, which are frequent contaminants of recordings of the terrestrial environment; (2) the detection of bird and calls; and (3) the preparation of acoustic maps for whole ecosystem analysis. This last task utilises the temporal distribution of events over a daily, monthly or yearly cycle.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Association rule mining has contributed to many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we first propose a definition for redundancy, then propose a concise representation, called a Reliable basis, for representing non-redundant association rules. The Reliable basis contains a set of non-redundant rules which are derived using frequent closed itemsets and their generators instead of using frequent itemsets that are usually used by traditional association rule mining approaches. An important contribution of this paper is that we propose to use the certainty factor as the criterion to measure the strength of the discovered association rules. Using this criterion, we can ensure the elimination of as many redundant rules as possible without reducing the inference capacity of the remaining extracted non-redundant rules. We prove that the redundancy elimination, based on the proposed Reliable basis, does not reduce the strength of belief in the extracted rules. We also prove that all association rules, their supports and confidences, can be retrieved from the Reliable basis without accessing the dataset. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules. We also conduct experiments on the application of association rules to the area of product recommendation. The experimental results show that the non-redundant association rules extracted using the proposed method retain the same inference capacity as the entire rule set. This result indicates that using non-redundant rules only is sufficient to solve real problems needless using the entire rule set.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Large scaled emerging user created information in web 2.0 such as tags, reviews, comments and blogs can be used to profile users’ interests and preferences to make personalized recommendations. To solve the scalability problem of the current user profiling and recommender systems, this paper proposes a parallel user profiling approach and a scalable recommender system. The current advanced cloud computing techniques including Hadoop, MapReduce and Cascading are employed to implement the proposed approaches. The experiments were conducted on Amazon EC2 Elastic MapReduce and S3 with a real world large scaled dataset from Del.icio.us website.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Social tags in web 2.0 are becoming another important information source to describe the content of items as well as to profile users’ topic preferences. However, as arbitrary words given by users, tags contains a lot of noise such as tag synonym and semantic ambiguity a large number personal tags that only used by one user, which brings challenges to effectively use tags to make item recommendations. To solve these problems, this paper proposes to use a set of related tags along with their weights to represent semantic meaning of each tag for each user individually. A hybrid recommendation generation approaches that based on the weighted tags are proposed. We have conducted experiments using the real world dataset obtained from Amazon.com. The experimental results show that the proposed approaches outperform the other state of the art approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Choi et al. recently proposed an efficient RFID authentication protocol for a ubiquitous computing environment, OHLCAP(One-Way Hash based Low-Cost Authentication Protocol). However, this paper reveals that the protocol has several security weaknesses : 1) traceability based on the leakage of counter information, 2) vulnerability to an impersonation attack by maliciously updating a random number, and 3) traceability based on a physically-attacked tag. Finally, a security enhanced group-based authentication protocol is presented.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we present pyktree, an implementation of the K-tree algorithm in the Python programming language. The K-tree algorithm provides highly balanced search trees for vector quantization that scales up to very large data sets. Pyktree is highly modular and well suited for rapid-prototyping of novel distance measures and centroid representations. It is easy to install and provides a python package for library use as well as command line tools.