999 resultados para search tree


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper the Binary Search Tree Imposed Growing Self Organizing Map (BSTGSOM) is presented as an extended version of the Growing Self Organizing Map (GSOM), which has proven advantages in knowledge discovery applications. A Binary Search Tree imposed on the GSOM is mainly used to investigate the dynamic perspectives of the GSOM based on the inputs and these generated temporal patterns are stored to further analyze the behavior of the GSOM based on the input sequence. Also, the performance advantages are discussed and compared with that of the original GSOM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The control part of the execution of a constraint logic program can be conceptually shown as a search-tree, where nodes correspond to calis, and whose branches represent conjunctions and disjunctions. This tree represents the search space traversed by the program, and has also a direct relationship with the amount of work performed by the program. The nodes of the tree can be used to display information regarding the state and origin of instantiation of the variables involved in each cali. This depiction can also be used for the enumeration process. These are the features implemented in APT, a tool which runs constraint logic programs while depicting a (modified) search-tree, keeping at the same time information about the state of the variables at every moment in the execution. This information can be used to replay the execution at will, both forwards and backwards in time. These views can be abstracted when the size of the execution requires it. The search-tree view is used as a framework onto which constraint-level visualizations (such as those presented in the following chapter) can be attached.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Random Indexing K-tree is the combination of two algorithms suited for large scale document clustering.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we present pyktree, an implementation of the K-tree algorithm in the Python programming language. The K-tree algorithm provides highly balanced search trees for vector quantization that scales up to very large data sets. Pyktree is highly modular and well suited for rapid-prototyping of novel distance measures and centroid representations. It is easy to install and provides a python package for library use as well as command line tools.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we present a distributed computing framework for problems characterized by a highly irregular search tree, whereby no reliable workload prediction is available. The framework is based on a peer-to-peer computing environment and dynamic load balancing. The system allows for dynamic resource aggregation, does not depend on any specific meta-computing middleware and is suitable for large-scale, multi-domain, heterogeneous environments, such as computational Grids. Dynamic load balancing policies based on global statistics are known to provide optimal load balancing performance, while randomized techniques provide high scalability. The proposed method combines both advantages and adopts distributed job-pools and a randomized polling technique. The framework has been successfully adopted in a parallel search algorithm for subgraph mining and evaluated on a molecular compounds dataset. The parallel application has shown good calability and close-to linear speedup in a distributed network of workstations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner (BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper discusses the application of hybrid model predictive control to control switching between different burner modes in a novel compact marine boiler design. A further purpose of the present work is to point out problems with finite horizon model predictive control applied to systems for which the optimal solution is a limit cycle. Regarding the marine boiler control the aim is to find an optimal control strategy which minimizes a trade-off between deviations in boiler pressure and water level from their respective setpoints while limiting burner switches.The approach taken is based on the Mixed Logic Dynamical framework. The whole boiler systems is modelled in this framework and a model predictive controller is designed. However to facilitate on-line implementation only a small part of the search tree in the mixed integer optimization is evaluated to find out whether a switch should occur or not. The strategy is verified on a simulation model of the compact marine boiler for control of low/high burner load switches. It is shown that even though performance is adequate for some disturbance levels it becomes deteriorated when the optimal solution is a limit cycle. Copyright © 2007 International Federation of Automatic Control All Rights Reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Relative (comparative) attributes are promising for thematic ranking of visual entities, which also aids in recognition tasks. However, attribute rank learning often requires a substantial amount of relational supervision, which is highly tedious, and apparently impractical for real-world applications. In this paper, we introduce the Semantic Transform, which under minimal supervision, adaptively finds a semantic feature space along with a class ordering that is related in the best possible way. Such a semantic space is found for every attribute category. To relate the classes under weak supervision, the class ordering needs to be refined according to a cost function in an iterative procedure. This problem is ideally NP-hard, and we thus propose a constrained search tree formulation for the same. Driven by the adaptive semantic feature space representation, our model achieves the best results to date for all of the tasks of relative, absolute and zero-shot classification on two popular datasets. © 2013 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Choosing the right or the best option is often a demanding and challenging task for the user (e.g., a customer in an online retailer) when there are many available alternatives. In fact, the user rarely knows which offering will provide the highest value. To reduce the complexity of the choice process, automated recommender systems generate personalized recommendations. These recommendations take into account the preferences collected from the user in an explicit (e.g., letting users express their opinion about items) or implicit (e.g., studying some behavioral features) way. Such systems are widespread; research indicates that they increase the customers' satisfaction and lead to higher sales. Preference handling is one of the core issues in the design of every recommender system. This kind of system often aims at guiding users in a personalized way to interesting or useful options in a large space of possible options. Therefore, it is important for them to catch and model the user's preferences as accurately as possible. In this thesis, we develop a comparative preference-based user model to represent the user's preferences in conversational recommender systems. This type of user model allows the recommender system to capture several preference nuances from the user's feedback. We show that, when applied to conversational recommender systems, the comparative preference-based model is able to guide the user towards the best option while the system is interacting with her. We empirically test and validate the suitability and the practical computational aspects of the comparative preference-based user model and the related preference relations by comparing them to a sum of weights-based user model and the related preference relations. Product configuration, scheduling a meeting and the construction of autonomous agents are among several artificial intelligence tasks that involve a process of constrained optimization, that is, optimization of behavior or options subject to given constraints with regards to a set of preferences. When solving a constrained optimization problem, pruning techniques, such as the branch and bound technique, point at directing the search towards the best assignments, thus allowing the bounding functions to prune more branches in the search tree. Several constrained optimization problems may exhibit dominance relations. These dominance relations can be particularly useful in constrained optimization problems as they can instigate new ways (rules) of pruning non optimal solutions. Such pruning methods can achieve dramatic reductions in the search space while looking for optimal solutions. A number of constrained optimization problems can model the user's preferences using the comparative preferences. In this thesis, we develop a set of pruning rules used in the branch and bound technique to efficiently solve this kind of optimization problem. More specifically, we show how to generate newly defined pruning rules from a dominance algorithm that refers to a set of comparative preferences. These rules include pruning approaches (and combinations of them) which can drastically prune the search space. They mainly reduce the number of (expensive) pairwise comparisons performed during the search while guiding constrained optimization algorithms to find optimal solutions. Our experimental results show that the pruning rules that we have developed and their different combinations have varying impact on the performance of the branch and bound technique.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In many real world situations, we make decisions in the presence of multiple, often conflicting and non-commensurate objectives. The process of optimizing systematically and simultaneously over a set of objective functions is known as multi-objective optimization. In multi-objective optimization, we have a (possibly exponentially large) set of decisions and each decision has a set of alternatives. Each alternative depends on the state of the world, and is evaluated with respect to a number of criteria. In this thesis, we consider the decision making problems in two scenarios. In the first scenario, the current state of the world, under which the decisions are to be made, is known in advance. In the second scenario, the current state of the world is unknown at the time of making decisions. For decision making under certainty, we consider the framework of multiobjective constraint optimization and focus on extending the algorithms to solve these models to the case where there are additional trade-offs. We focus especially on branch-and-bound algorithms that use a mini-buckets algorithm for generating the upper bound at each node of the search tree (in the context of maximizing values of objectives). Since the size of the guiding upper bound sets can become very large during the search, we introduce efficient methods for reducing these sets, yet still maintaining the upper bound property. We define a formalism for imprecise trade-offs, which allows the decision maker during the elicitation stage, to specify a preference for one multi-objective utility vector over another, and use such preferences to infer other preferences. The induced preference relation then is used to eliminate the dominated utility vectors during the computation. For testing the dominance between multi-objective utility vectors, we present three different approaches. The first is based on a linear programming approach, the second is by use of distance-based algorithm (which uses a measure of the distance between a point and a convex cone); the third approach makes use of a matrix multiplication, which results in much faster dominance checks with respect to the preference relation induced by the trade-offs. Furthermore, we show that our trade-offs approach, which is based on a preference inference technique, can also be given an alternative semantics based on the well known Multi-Attribute Utility Theory. Our comprehensive experimental results on common multi-objective constraint optimization benchmarks demonstrate that the proposed enhancements allow the algorithms to scale up to much larger problems than before. For decision making problems under uncertainty, we describe multi-objective influence diagrams, based on a set of p objectives, where utility values are vectors in Rp, and are typically only partially ordered. These can be solved by a variable elimination algorithm, leading to a set of maximal values of expected utility. If the Pareto ordering is used this set can often be prohibitively large. We consider approximate representations of the Pareto set based on ϵ-coverings, allowing much larger problems to be solved. In addition, we define a method for incorporating user trade-offs, which also greatly improves the efficiency.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A novel implementation of a tag sorting circuit for a weighted fair queueing (WFQ) enabled Internet Protocol (IP) packet scheduler is presented. The design consists of a search tree, matching circuitry, and a custom memory layout. It is implemented using 130-nm silicon technology and supports quality of service (QoS) on networks at line speeds of 40 Gb/s, enabling next generation IP services to be deployed.