974 resultados para Pattern Mining
Resumo:
For many landholders in the South Pacific, weed control of Mikania micrantha Kunth is conducted by manual or mechanical means, leaving fragments on or below the ground to reshoot and grow. Effects of age, length (number of nodes), and pattern of burial on the survival of stem sections of M. micrantha were examined in the field in Viti Levu, Fiji. The experiment was arranged in a randomized factorial design, with number of nodes, age of stem sections, and pattern (depth and orientation) of stem burial as factors. Stem sections with two or three nodes had significantly greater survival (30% and 25%, respectively) than those with one node (12%). Mature stem sections had a significantly greater survival rate (31%) than young stem sections (13%) when buried in either the horizontal or the vertical position. Vertical plantings had significantly greater survival (43%) than horizontal plantings (10%), and for both orientations survival decreased with depth of burial. Only 8% of stem sections survived when cut into smaller (3 to 5 cm) sections and buried at a depth of 10 cm. This study revealed that cutting the M. micrantha stems into smaller sections (<3 cm) and burying them at depths of 10 cm or greater would improve the overall management of M. micrantha in crop and noncrop systems.
Resumo:
User generated information such as product reviews have been booming due to the advent of web 2.0. In particular, rich information associated with reviewed products has been buried in such big data. In order to facilitate identifying useful information from product (e.g., cameras) reviews, opinion mining has been proposed and widely used in recent years. In detail, as the most critical step of opinion mining, feature extraction aims to extract significant product features from review texts. However, most existing approaches only find individual features rather than identifying the hierarchical relationships between the product features. In this paper, we propose an approach which finds both features and feature relationships, structured as a feature hierarchy which is referred to as feature taxonomy in the remainder of the paper. Specifically, by making use of frequent patterns and association rules, we construct the feature taxonomy to profile the product at multiple levels instead of single level, which provides more detailed information about the product. The experiment which has been conducted based upon some real world review datasets shows that our proposed method is capable of identifying product features and relations effectively.
Resumo:
In this paper we investigate the effectiveness of class specific sparse codes in the context of discriminative action classification. The bag-of-words representation is widely used in activity recognition to encode features, and although it yields state-of-the art performance with several feature descriptors it still suffers from large quantization errors and reduces the overall performance. Recently proposed sparse representation methods have been shown to effectively represent features as a linear combination of an over complete dictionary by minimizing the reconstruction error. In contrast to most of the sparse representation methods which focus on Sparse-Reconstruction based Classification (SRC), this paper focuses on a discriminative classification using a SVM by constructing class-specific sparse codes for motion and appearance separately. Experimental results demonstrates that separate motion and appearance specific sparse coefficients provide the most effective and discriminative representation for each class compared to a single class-specific sparse coefficients.
Resumo:
This paper presents an effective feature representation method in the context of activity recognition. Efficient and effective feature representation plays a crucial role not only in activity recognition, but also in a wide range of applications such as motion analysis, tracking, 3D scene understanding etc. In the context of activity recognition, local features are increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational requirements, their performance is still limited for real world applications due to a lack of contextual information and models not being tailored to specific activities. We propose a new activity representation framework to address the shortcomings of the popular, but simple bag-of-words approach. In our framework, first multiple instance SVM (mi-SVM) is used to identify positive features for each action category and the k-means algorithm is used to generate a codebook. Then locality-constrained linear coding is used to encode the features into the generated codebook, followed by spatio-temporal pyramid pooling to convey the spatio-temporal statistics. Finally, an SVM is used to classify the videos. Experiments carried out on two popular datasets with varying complexity demonstrate significant performance improvement over the base-line bag-of-feature method.
Resumo:
Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.
Resumo:
Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.
Resumo:
Springsure Creek Coal (SCC) intends to develop a coal mine using the long wall mining process under grain farming land near Emerald in Central Queensland (CQ). While this technology will result in some subsidence of the land surface, SCC wishes to maintain productivity of the grain cropping land in the precinct after coal mining. However, the impact of the surface subsidence resulting from that mining process on productivity of cropping land in any Australian landscape is currently unclear. A research protocol to investigate the impacts of subsidence on grain productivity for when the SCC project becomes operational is proposed. The protocol has wider application for other similar mining projects throughout the country. A copy of the full report is accessible on www.aginstitute.com.au.
Resumo:
A high temperature source has been developed and coupled to a high resolution Fourier transform spectrometer to record emission spectra of acetylene around 3 mu m up to 1455 K under Doppler limited resolution (0.015 cm(-1)). The nu(3)-ground state (GS) and nu(2)+nu(4)+nu(5)(Sigma(+)(u) and Delta(u))-GS bands and 76 related hot bands, counting e and f parities separately, are assigned using semiautomatic methods based on a global model to reproduce all related vibration-rotation states. Significantly higher J-values than previously reported are observed for 40 known substates while 37 new e or f vibrational substates, up to about 6000 cm(-1), are identified and characterized by vibration-rotation parameters. The 3 811 new or improved data resulting from the analysis are merged into the database presented by Robert et al. [Mol. Phys. 106, 2581 (2008)], now including 15 562 lines accessing vibrational states up to 8600 cm(-1). A global model, updated as compared to the one in the previous paper, allows all lines in the database to be simultaneously fitted, successfully. The updates are discussed taking into account, in particular, the systematic inclusion of Coriolis interaction.
Resumo:
We present two discriminative language modelling techniques for Lempel-Ziv-Welch (LZW) based LID system. The previous approach to LID using LZW algorithm was to directly use the LZW pattern tables forlanguage modelling. But, since the patterns in a language pattern table are shared by other language pattern tables, confusability prevailed in the LID task. For overcoming this, we present two pruning techniques (i) Language Specific (LS-LZW)-in which patterns common to more than one pattern table are removed. (ii) Length-Frequency product based (LF-LZW)-in which patterns having their length-frequency product below a threshold are removed. These approaches reduce the classification score (Compression Ratio [LZW-CR] or the weighted discriminant score [LZW-WDS]) for non native languages and increases the LID performance considerably. Also the memory and computational requirements of these techniques are much less compared to basic LZW techniques.