945 resultados para Data mining models


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This technical report is concerned with one aspect of environmental monitoring—the detection and analysis of acoustic events in sound recordings of the environment. Sound recordings offer ecologists the advantage of cheaper and increased sampling but make available so much data that automated analysis becomes essential. The report describes a number of tools for automated analysis of recordings, including noise removal from spectrograms, acoustic event detection, event pattern recognition, spectral peak tracking, syntactic pattern recognition applied to call syllables, and oscillation detection. These algorithms are applied to a number of animal call recognition tasks, chosen because they illustrate quite different modes of analysis: (1) the detection of diffuse events caused by wind and rain, which are frequent contaminants of recordings of the terrestrial environment; (2) the detection of bird and calls; and (3) the preparation of acoustic maps for whole ecosystem analysis. This last task utilises the temporal distribution of events over a daily, monthly or yearly cycle.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Association rule mining has contributed to many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we first propose a definition for redundancy, then propose a concise representation, called a Reliable basis, for representing non-redundant association rules. The Reliable basis contains a set of non-redundant rules which are derived using frequent closed itemsets and their generators instead of using frequent itemsets that are usually used by traditional association rule mining approaches. An important contribution of this paper is that we propose to use the certainty factor as the criterion to measure the strength of the discovered association rules. Using this criterion, we can ensure the elimination of as many redundant rules as possible without reducing the inference capacity of the remaining extracted non-redundant rules. We prove that the redundancy elimination, based on the proposed Reliable basis, does not reduce the strength of belief in the extracted rules. We also prove that all association rules, their supports and confidences, can be retrieved from the Reliable basis without accessing the dataset. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules. We also conduct experiments on the application of association rules to the area of product recommendation. The experimental results show that the non-redundant association rules extracted using the proposed method retain the same inference capacity as the entire rule set. This result indicates that using non-redundant rules only is sufficient to solve real problems needless using the entire rule set.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Large scaled emerging user created information in web 2.0 such as tags, reviews, comments and blogs can be used to profile users’ interests and preferences to make personalized recommendations. To solve the scalability problem of the current user profiling and recommender systems, this paper proposes a parallel user profiling approach and a scalable recommender system. The current advanced cloud computing techniques including Hadoop, MapReduce and Cascading are employed to implement the proposed approaches. The experiments were conducted on Amazon EC2 Elastic MapReduce and S3 with a real world large scaled dataset from Del.icio.us website.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Social tags in web 2.0 are becoming another important information source to describe the content of items as well as to profile users’ topic preferences. However, as arbitrary words given by users, tags contains a lot of noise such as tag synonym and semantic ambiguity a large number personal tags that only used by one user, which brings challenges to effectively use tags to make item recommendations. To solve these problems, this paper proposes to use a set of related tags along with their weights to represent semantic meaning of each tag for each user individually. A hybrid recommendation generation approaches that based on the weighted tags are proposed. We have conducted experiments using the real world dataset obtained from Amazon.com. The experimental results show that the proposed approaches outperform the other state of the art approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Choi et al. recently proposed an efficient RFID authentication protocol for a ubiquitous computing environment, OHLCAP(One-Way Hash based Low-Cost Authentication Protocol). However, this paper reveals that the protocol has several security weaknesses : 1) traceability based on the leakage of counter information, 2) vulnerability to an impersonation attack by maliciously updating a random number, and 3) traceability based on a physically-attacked tag. Finally, a security enhanced group-based authentication protocol is presented.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we present pyktree, an implementation of the K-tree algorithm in the Python programming language. The K-tree algorithm provides highly balanced search trees for vector quantization that scales up to very large data sets. Pyktree is highly modular and well suited for rapid-prototyping of novel distance measures and centroid representations. It is easy to install and provides a python package for library use as well as command line tools.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Electrocardiogram (ECG) is an important bio-signal representing the sum total of millions of cardiac cell depolarization potentials. It contains important insight into the state of health and nature of the disease afflicting the heart. Heart rate variability (HRV) refers to the regulation of the sinoatrial node, the natural pacemaker of the heart by the sympathetic and parasympathetic branches of the autonomic nervous system. The HRV signal can be used as a base signal to observe the heart's functioning. These signals are non-linear and non-stationary in nature. So, higher order spectral (HOS) analysis, which is more suitable for non-linear systems and is robust to noise, was used. An automated intelligent system for the identification of cardiac health is very useful in healthcare technology. In this work, we have extracted seven features from the heart rate signals using HOS and fed them to a support vector machine (SVM) for classification. Our performance evaluation protocol uses 330 subjects consisting of five different kinds of cardiac disease conditions. We demonstrate a sensitivity of 90% for the classifier with a specificity of 87.93%. Our system is ready to run on larger data sets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the ocean science community, researchers have begun employing novel sensor platforms as integral pieces in oceanographic data collection, which have significantly advanced the study and prediction of complex and dynamic ocean phenomena. These innovative tools are able to provide scientists with data at unprecedented spatiotemporal resolutions. This paper focuses on the newly developed Wave Glider platform from Liquid Robotics. This vehicle produces forward motion by harvesting abundant natural energy from ocean waves, and provides a persistent ocean presence for detailed ocean observation. This study is targeted at determining a kinematic model for offline planning that provides an accurate estimation of the vehicle speed for a desired heading and set of environmental parameters. Given the significant wave height, ocean surface and subsurface currents, wind speed and direction, we present the formulation of a system identification to provide the vehicle’s speed over a range of possible directions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In automatic facial expression recognition, an increasing number of techniques had been proposed for in the literature that exploits the temporal nature of facial expressions. As all facial expressions are known to evolve over time, it is crucially important for a classifier to be capable of modelling their dynamics. We establish that the method of sparse representation (SR) classifiers proves to be a suitable candidate for this purpose, and subsequently propose a framework for expression dynamics to be efficiently incorporated into its current formulation. We additionally show that for the SR method to be applied effectively, then a certain threshold on image dimensionality must be enforced (unlike in facial recognition problems). Thirdly, we determined that recognition rates may be significantly influenced by the size of the projection matrix \Phi. To demonstrate these, a battery of experiments had been conducted on the CK+ dataset for the recognition of the seven prototypic expressions - anger, contempt, disgust, fear, happiness, sadness and surprise - and comparisons have been made between the proposed temporal-SR against the static-SR framework and state-of-the-art support vector machine.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dealing with product yield and quality in manufacturing industries is getting more difficult due to the increasing volume and complexity of data and quicker time to market expectations. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large databases. Growing self-organizing map (GSOM) is established as an efficient unsupervised datamining algorithm. In this study some modifications to the original GSOM are proposed for manufacturing yield improvement by clustering. These modifications include introduction of a clustering quality measure to evaluate the performance of the programme in separating good and faulty products and a filtering index to reduce noise from the dataset. Results show that the proposed method is able to effectively differentiate good and faulty products. It will help engineers construct the knowledge base to predict product quality automatically from collected data and provide insights for yield improvement.