954 resultados para educational data mining


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Large scaled emerging user created information in web 2.0 such as tags, reviews, comments and blogs can be used to profile users’ interests and preferences to make personalized recommendations. To solve the scalability problem of the current user profiling and recommender systems, this paper proposes a parallel user profiling approach and a scalable recommender system. The current advanced cloud computing techniques including Hadoop, MapReduce and Cascading are employed to implement the proposed approaches. The experiments were conducted on Amazon EC2 Elastic MapReduce and S3 with a real world large scaled dataset from Del.icio.us website.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Social tags in web 2.0 are becoming another important information source to describe the content of items as well as to profile users’ topic preferences. However, as arbitrary words given by users, tags contains a lot of noise such as tag synonym and semantic ambiguity a large number personal tags that only used by one user, which brings challenges to effectively use tags to make item recommendations. To solve these problems, this paper proposes to use a set of related tags along with their weights to represent semantic meaning of each tag for each user individually. A hybrid recommendation generation approaches that based on the weighted tags are proposed. We have conducted experiments using the real world dataset obtained from Amazon.com. The experimental results show that the proposed approaches outperform the other state of the art approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Choi et al. recently proposed an efficient RFID authentication protocol for a ubiquitous computing environment, OHLCAP(One-Way Hash based Low-Cost Authentication Protocol). However, this paper reveals that the protocol has several security weaknesses : 1) traceability based on the leakage of counter information, 2) vulnerability to an impersonation attack by maliciously updating a random number, and 3) traceability based on a physically-attacked tag. Finally, a security enhanced group-based authentication protocol is presented.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we present pyktree, an implementation of the K-tree algorithm in the Python programming language. The K-tree algorithm provides highly balanced search trees for vector quantization that scales up to very large data sets. Pyktree is highly modular and well suited for rapid-prototyping of novel distance measures and centroid representations. It is easy to install and provides a python package for library use as well as command line tools.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Electrocardiogram (ECG) is an important bio-signal representing the sum total of millions of cardiac cell depolarization potentials. It contains important insight into the state of health and nature of the disease afflicting the heart. Heart rate variability (HRV) refers to the regulation of the sinoatrial node, the natural pacemaker of the heart by the sympathetic and parasympathetic branches of the autonomic nervous system. The HRV signal can be used as a base signal to observe the heart's functioning. These signals are non-linear and non-stationary in nature. So, higher order spectral (HOS) analysis, which is more suitable for non-linear systems and is robust to noise, was used. An automated intelligent system for the identification of cardiac health is very useful in healthcare technology. In this work, we have extracted seven features from the heart rate signals using HOS and fed them to a support vector machine (SVM) for classification. Our performance evaluation protocol uses 330 subjects consisting of five different kinds of cardiac disease conditions. We demonstrate a sensitivity of 90% for the classifier with a specificity of 87.93%. Our system is ready to run on larger data sets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and positions the file signatures model in the class of Vector Space retrieval models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the ocean science community, researchers have begun employing novel sensor platforms as integral pieces in oceanographic data collection, which have significantly advanced the study and prediction of complex and dynamic ocean phenomena. These innovative tools are able to provide scientists with data at unprecedented spatiotemporal resolutions. This paper focuses on the newly developed Wave Glider platform from Liquid Robotics. This vehicle produces forward motion by harvesting abundant natural energy from ocean waves, and provides a persistent ocean presence for detailed ocean observation. This study is targeted at determining a kinematic model for offline planning that provides an accurate estimation of the vehicle speed for a desired heading and set of environmental parameters. Given the significant wave height, ocean surface and subsurface currents, wind speed and direction, we present the formulation of a system identification to provide the vehicle’s speed over a range of possible directions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In automatic facial expression recognition, an increasing number of techniques had been proposed for in the literature that exploits the temporal nature of facial expressions. As all facial expressions are known to evolve over time, it is crucially important for a classifier to be capable of modelling their dynamics. We establish that the method of sparse representation (SR) classifiers proves to be a suitable candidate for this purpose, and subsequently propose a framework for expression dynamics to be efficiently incorporated into its current formulation. We additionally show that for the SR method to be applied effectively, then a certain threshold on image dimensionality must be enforced (unlike in facial recognition problems). Thirdly, we determined that recognition rates may be significantly influenced by the size of the projection matrix \Phi. To demonstrate these, a battery of experiments had been conducted on the CK+ dataset for the recognition of the seven prototypic expressions - anger, contempt, disgust, fear, happiness, sadness and surprise - and comparisons have been made between the proposed temporal-SR against the static-SR framework and state-of-the-art support vector machine.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dealing with product yield and quality in manufacturing industries is getting more difficult due to the increasing volume and complexity of data and quicker time to market expectations. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large databases. Growing self-organizing map (GSOM) is established as an efficient unsupervised datamining algorithm. In this study some modifications to the original GSOM are proposed for manufacturing yield improvement by clustering. These modifications include introduction of a clustering quality measure to evaluate the performance of the programme in separating good and faulty products and a filtering index to reduce noise from the dataset. Results show that the proposed method is able to effectively differentiate good and faulty products. It will help engineers construct the knowledge base to predict product quality automatically from collected data and provide insights for yield improvement.