993 resultados para K-tree


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Random Indexing K-tree is the combination of two algorithms suited for large scale document clustering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the approach taken to the clustering task at INEX 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Digital collections are growing exponentially in size as the information age takes a firm grip on all aspects of society. As a result Information Retrieval (IR) has become an increasingly important area of research. It promises to provide new and more effective ways for users to find information relevant to their search intentions. Document clustering is one of the many tools in the IR toolbox and is far from being perfected. It groups documents that share common features. This grouping allows a user to quickly identify relevant information. If these groups are misleading then valuable information can accidentally be ignored. There- fore, the study and analysis of the quality of document clustering is important. With more and more digital information available, the performance of these algorithms is also of interest. An algorithm with a time complexity of O(n2) can quickly become impractical when clustering a corpus containing millions of documents. Therefore, the investigation of algorithms and data structures to perform clustering in an efficient manner is vital to its success as an IR tool. Document classification is another tool frequently used in the IR field. It predicts categories of new documents based on an existing database of (doc- ument, category) pairs. Support Vector Machines (SVM) have been found to be effective when classifying text documents. As the algorithms for classifica- tion are both efficient and of high quality, the largest gains can be made from improvements to representation. Document representations are vital for both clustering and classification. Representations exploit the content and structure of documents. Dimensionality reduction can improve the effectiveness of existing representations in terms of quality and run-time performance. Research into these areas is another way to improve the efficiency and quality of clustering and classification results. Evaluating document clustering is a difficult task. Intrinsic measures of quality such as distortion only indicate how well an algorithm minimised a sim- ilarity function in a particular vector space. Intrinsic comparisons are inherently limited by the given representation and are not comparable between different representations. Extrinsic measures of quality compare a clustering solution to a “ground truth” solution. This allows comparison between different approaches. As the “ground truth” is created by humans it can suffer from the fact that not every human interprets a topic in the same manner. Whether a document belongs to a particular topic or not can be subjective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present pyktree, an implementation of the K-tree algorithm in the Python programming language. The K-tree algorithm provides highly balanced search trees for vector quantization that scales up to very large data sets. Pyktree is highly modular and well suited for rapid-prototyping of novel distance measures and centroid representations. It is easy to install and provides a python package for library use as well as command line tools.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Contraction of an edge e merges its end points into a new single vertex, and each neighbor of one of the end points of e is a neighbor of the new vertex. An edge in a k-connected graph is contractible if its contraction does not result in a graph with lesser connectivity; otherwise the edge is called non-contractible. In this paper, we present results on the structure of contractible edges in k-trees and k-connected partial k-trees. Firstly, we show that an edge e in a k-tree is contractible if and only if e belongs to exactly one (k + 1) clique. We use this characterization to show that the graph formed by contractible edges is a 2-connected graph. We also show that there are at least |V(G)| + k - 2 contractible edges in a k-tree. Secondly, we show that if an edge e in a partial k-tree is contractible then e is contractible in any k-tree which contains the partial k-tree as an edge subgraph. We also construct a class of contraction critical 2k-connected partial 2k-trees.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A (k-, K) circuit is one which can be decomposed into nonintersecting blocks of gates where each block has no more than K external inputs, such that the graph formed by letting each block be a node and inserting edges between blocks if they share a signal line, is a partial k-tree. (k, K) circuits are special in that they have been shown to be testable in time polynomial in the number of gates in the circuit, and are useful if the constants k and K are small. We demonstrate a procedure to synthesise (k, K) circuits from a special class of Boolean expressions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods [12, 14] tackle the problem by using k-trees to learn the optimal Bayesian network with tree-width up to k. In this paper, we propose a sampling method to efficiently find representative k-trees by introducing an Informative score function to characterize the quality of a k-tree. The proposed algorithm can efficiently learn a Bayesian network with tree-width at most k. Experiment results indicate that our approach is comparable with exact methods, but is much more computationally efficient.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Bounding the tree-width of a Bayesian network can reduce the chance of overfitting, and allows exact inference to be performed efficiently. Several existing algorithms tackle the problem of learning bounded tree-width Bayesian networks by learning from k-trees as super-structures, but they do not scale to large domains and/or large tree-width. We propose a guided search algorithm to find k-trees with maximum Informative scores, which is a measure of quality for the k-tree in yielding good Bayesian networks. The algorithm achieves close to optimal performance compared to exact solutions in small domains, and can discover better networks than existing approximate methods can in large domains. It also provides an optimal elimination order of variables that guarantees small complexity for later runs of exact inference. Comparisons with well-known approaches in terms of learning and inference accuracy illustrate its capabilities.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods \cite{korhonen2exact, nie2014advances} tackle the problem by using $k$-trees to learn the optimal Bayesian network with tree-width up to $k$. Finding the best $k$-tree, however, is computationally intractable. In this paper, we propose a sampling method to efficiently find representative $k$-trees by introducing an informative score function to characterize the quality of a $k$-tree. To further improve the quality of the $k$-trees, we propose a probabilistic hill climbing approach that locally refines the sampled $k$-trees. The proposed algorithm can efficiently learn a quality Bayesian network with tree-width at most $k$. Experimental results demonstrate that our approach is more computationally efficient than the exact methods with comparable accuracy, and outperforms most existing approximate methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract. For interactive systems, recognition, reproduction, and generalization of observed motion data are crucial for successful interaction. In this paper, we present a novel method for analysis of motion data that we refer to as K-OMM-trees. K-OMM-trees combine Ordered Means Models (OMMs) a model-based machine learning approach for time series with an hierarchical analysis technique for very large data sets, the K-tree algorithm. The proposed K-OMM-trees enable unsupervised prototype extraction of motion time series data with hierarchical data representation. After introducing the algorithmic details, we apply the proposed method to a gesture data set that includes substantial inter-class variations. Results from our studies show that K-OMM-trees are able to substantially increase the recognition performance and to learn an inherent data hierarchy with meaningful gesture abstractions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Two reliable small-plant bioassays were developed using tissue-cultured banana, resulting in consistent symptom expression and infection by Fusarium oxysporum f. sp. cubense (Foc). One bioassay was based on providing a constant watertable within a closed pot and the second used free-draining pots. Culture medium for spore generation influenced infectivity of Foc. Inoculation of potted banana by drenching potting mix with a conidial suspension, consisting mostly of microconidia, few macroconidia and no chlamydospores, generated from one-quarter-strength potato dextrose agar + streptomycin sulfate, resulted in inconsistent infection. When a conidial suspension that consisted of all three spore types, microconidia, macroconidia and chlamydospores, prepared from spores generated on carnation leaf agar was used, all plants became infected, indicating that the spore type present in conidial suspensions may contribute to inconsistency of infection. Inconsistency of infection was not due to loss of virulence of the pathogen in culture. Millet grain precolonised by Foc as a source of inoculum resulted in consistent infection between replicate plants. Sorghum was not a suitable grain for preparation of inoculum as it was observed to discolour roots and has the potential to stunt root growth, possibly due to the release of phytotoxins. For the modified closed-pot system, a pasteurised potting mix consisting of equal parts of bedding sand, perlite and vermiculite plus 1 g/L Triabon slow release fertiliser was suitable for plant growth and promoted capillary movement of water through the potting mix profile. A suitable potting mix for the free-draining pot system was also developed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work presents novel algorithms for learning Bayesian networks of bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in sampling k-trees (maximal graphs of treewidth k), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that k-tree. The approaches are empirically compared to each other and to state-of-the-art methods on a collection of public data sets with up to 100 variables.