984 resultados para Search Tree


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The control part of the execution of a constraint logic program can be conceptually shown as a search-tree, where nodes correspond to calis, and whose branches represent conjunctions and disjunctions. This tree represents the search space traversed by the program, and has also a direct relationship with the amount of work performed by the program. The nodes of the tree can be used to display information regarding the state and origin of instantiation of the variables involved in each cali. This depiction can also be used for the enumeration process. These are the features implemented in APT, a tool which runs constraint logic programs while depicting a (modified) search-tree, keeping at the same time information about the state of the variables at every moment in the execution. This information can be used to replay the execution at will, both forwards and backwards in time. These views can be abstracted when the size of the execution requires it. The search-tree view is used as a framework onto which constraint-level visualizations (such as those presented in the following chapter) can be attached.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Formal Concept Analysis is an unsupervised machine learning technique that has successfully been applied to document organisation by considering documents as objects and keywords as attributes. The basic algorithms of Formal Concept Analysis then allow an intelligent information retrieval system to cluster documents according to keyword views. This paper investigates the scalability of this idea. In particular we present the results of applying spatial data structures to large datasets in formal concept analysis. Our experiments are motivated by the application of the Formal Concept Analysis idea of a virtual filesystem [11,17,15]. In particular the libferris [1] Semantic File System. This paper presents customizations to an RD-Tree Generalized Index Search Tree based index structure to better support the application of Formal Concept Analysis to large data sources.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we present a distributed computing framework for problems characterized by a highly irregular search tree, whereby no reliable workload prediction is available. The framework is based on a peer-to-peer computing environment and dynamic load balancing. The system allows for dynamic resource aggregation, does not depend on any specific meta-computing middleware and is suitable for large-scale, multi-domain, heterogeneous environments, such as computational Grids. Dynamic load balancing policies based on global statistics are known to provide optimal load balancing performance, while randomized techniques provide high scalability. The proposed method combines both advantages and adopts distributed job-pools and a randomized polling technique. The framework has been successfully adopted in a parallel search algorithm for subgraph mining and evaluated on a molecular compounds dataset. The parallel application has shown good calability and close-to linear speedup in a distributed network of workstations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes a branch-and-price algorithm for the p-median location problem. The objective is to locate p facilities (medians) such as the sum of the distances from each demand point to its nearest facility is minimized. The traditional column generation process is compared with a stabilized approach that combines the column generation and Lagrangean/surrogate relaxation. The Lagrangean/surrogate multiplier modifies; the reduced cost criterion, providing the selection of new productive columns at the search tree. Computational experiments are conducted considering especially difficult instances to the traditional column generation and also with some large-scale instances. (C) 2004 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La frenetica evoluzione sociale e culturale, data dal crescente e continuo bisogno di conoscenza dell’uomo, ha portato oggi a navigare in un oceano sconfinato di dati e informazioni. Esse assumono una propria peculiare importanza, un valore sia dal punto di vista del singolo individuo, sia all’interno di un contesto sociale e di un settore di riferimento specifico e concreto. La conseguente mutazione dell’interazione e della comunicazione a livello economico della società, ha portato a parlare oggi di economia dell’informazione. In un contesto in cui l’informazione rappresenta la risorsa principale per l’attività di crescita e sviluppo economico, è fondamentale possedere la più adeguata strategia organizzativa per la gestione dei dati grezzi. Questo per permetterne un’efficiente memorizzazione, recupero e manipolazione in grado di aumentare il valore dell’organizzazione che ne fa uso. Un’informazione incompleta o non accurata può portare a valutazioni errate o non ottimali. Ecco quindi la necessità di gestire i dati secondo specifici criteri al fine di creare un proprio vantaggio competitivo. La presente rassegna ha lo scopo di analizzare le tecniche di ottimizzazione di accesso alle basi di dati. La loro efficiente implementazione è di fondamentale importanza per il supporto e il corretto funzionamento delle applicazioni che ne fanno uso: devono garantire un comportamento performante in termini di velocità, precisione e accuratezza delle informazioni elaborate. L’attenzione si focalizzerà sulle strutture d’indicizzazione di tipo gerarchico: gli alberi di ricerca. Verranno descritti sia gli alberi su dati ad una dimensione, sia quelli utilizzati nel contesto di ricerche multi dimensionali (come, ad esempio, punti in uno spazio). L’ingente sforzo per implementare strutture di questo tipo ha portato gli sviluppatori a sfruttare i principi di ereditarietà e astrazione della programmazione ad oggetti al fine di ideare un albero generalizzato che inglobasse in sé tutte le principali caratteristiche e funzioni di una struttura di indicizzazione gerarchica, così da aumentarne la riusabilità per i più particolari utilizzi. Da qui la presentazione della struttura GiST: Generalized Search Tree. Concluderà una valutazione dei metodi d’accesso esposti nella dissertazione con un riepilogo dei principali dati relativi ai costi computazionali, vantaggi e svantaggi.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fuzzy community detection is to identify fuzzy communities in a network, which are groups of vertices in the network such that the membership of a vertex in one community is in [0,1] and that the sum of memberships of vertices in all communities equals to 1. Fuzzy communities are pervasive in social networks, but only a few works have been done for fuzzy community detection. Recently, a one-step forward extension of Newman’s Modularity, the most popular quality function for disjoint community detection, results into the Generalized Modularity (GM) that demonstrates good performance in finding well-known fuzzy communities. Thus, GMis chosen as the quality function in our research. We first propose a generalized fuzzy t-norm modularity to investigate the effect of different fuzzy intersection operators on fuzzy community detection, since the introduction of a fuzzy intersection operation is made feasible by GM. The experimental results show that the Yager operator with a proper parameter value performs better than the product operator in revealing community structure. Then, we focus on how to find optimal fuzzy communities in a network by directly maximizing GM, which we call it Fuzzy Modularity Maximization (FMM) problem. The effort on FMM problem results into the major contribution of this thesis, an efficient and effective GM-based fuzzy community detection method that could automatically discover a fuzzy partition of a network when it is appropriate, which is much better than fuzzy partitions found by existing fuzzy community detection methods, and a crisp partition of a network when appropriate, which is competitive with partitions resulted from the best disjoint community detections up to now. We address FMM problem by iteratively solving a sub-problem called One-Step Modularity Maximization (OSMM). We present two approaches for solving this iterative procedure: a tree-based global optimizer called Find Best Leaf Node (FBLN) and a heuristic-based local optimizer. The OSMM problem is based on a simplified quadratic knapsack problem that can be solved in linear time; thus, a solution of OSMM can be found in linear time. Since the OSMM algorithm is called within FBLN recursively and the structure of the search tree is non-deterministic, we can see that the FMM/FBLN algorithm runs in a time complexity of at least O (n2). So, we also propose several highly efficient and very effective heuristic algorithms namely FMM/H algorithms. We compared our proposed FMM/H algorithms with two state-of-the-art community detection methods, modified MULTICUT Spectral Fuzzy c-Means (MSFCM) and Genetic Algorithm with a Local Search strategy (GALS), on 10 real-world data sets. The experimental results suggest that the H2 variant of FMM/H is the best performing version. The H2 algorithm is very competitive with GALS in producing maximum modularity partitions and performs much better than MSFCM. On all the 10 data sets, H2 is also 2-3 orders of magnitude faster than GALS. Furthermore, by adopting a simply modified version of the H2 algorithm as a mutation operator, we designed a genetic algorithm for fuzzy community detection, namely GAFCD, where elite selection and early termination are applied. The crossover operator is designed to make GAFCD converge fast and to enhance GAFCD’s ability of jumping out of local minimums. Experimental results on all the data sets show that GAFCD uncovers better community structure than GALS.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Graph automorphism (GA) is a classical problem, in which the objective is to compute the automorphism group of an input graph. In this work we propose four novel techniques to speed up algorithms that solve the GA problem by exploring a search tree. They increase the performance of the algorithm by allowing to reduce the depth of the search tree, and by effectively pruning it. We formally prove that a GA algorithm that uses these techniques correctly computes the automorphism group of the input graph. We also describe how the techniques have been incorporated into the GA algorithm conauto, as conauto-2.03, with at most an additive polynomial increase in its asymptotic time complexity. We have experimentally evaluated the impact of each of the above techniques with several graph families. We have observed that each of the techniques by itself significantly reduces the number of processed nodes of the search tree in some subset of graphs, which justifies the use of each of them. Then, when they are applied together, their effect is combined, leading to reductions in the number of processed nodes in most graphs. This is also reflected in a reduction of the running time, which is substantial in some graph families.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Logic programming (LP) is a family of high-level programming languages which provides high expressive power. With LP, the programmer writes the properties of the result and / or executable specifications instead of detailed computation steps. Logic programming systems which feature tabled execution and constraint logic programming have been shown to increase the declarativeness and efficiency of Prolog, while at the same time making it possible to write very expressive programs. Tabled execution avoids infinite failure in some cases, while improving efficiency in programs which repeat computations. CLP reduces the search tree and brings the power of solving (in)equations over arbitrary domains. Similarly to the LP case, CLP systems can also benefit from the power of tabling. Previous implementations which take ful advantage of the ideas behind tabling (e.g., forcing suspension, answer subsumption, etc. wherever it is necessary to avoid recomputation and terminate whenever possible) did not offer a simple, well-documented, easy-to-understand interface. This would be necessary to make the integratation of arbitrary CLP solvers into existing tabling systems possible. This clearly hinders a more widespread usage of the combination of both facilities. In this thesis we examine the requirements that a constraint solver must fulfill in order to be interfaced with a tabling system. We propose and implement a framework, which we have called Mod TCLP, with a minimal set of operations (e.g., entailment checking and projection) which the constraint solver has to provide to the tabling engine. We validate the design of Mod TCLP by a series of use cases: we re-engineer a previously existing tabled constrain domain (difference constraints) which was connected in an ad-hoc manner with the tabling engine in Ciao Prolog; we integrateHolzbauer’s CLP(Q) implementationwith Ciao Prolog’s tabling engine; and we implement a constraint solver over (finite) lattices. We evaluate its performance with several benchmarks that implement a simple abstract interpreter whose fixpoint is reached by means of tabled execution, and whose domain operations are handled by the constraint over (finite) lattices, where TCLP avoids recomputing subsumed abstractions.---ABSTRACT---La programación lógica con restricciones (CLP) y la tabulación son extensiones de la programación lógica que incrementan la declaratividad y eficiencia de Prolog, al mismo tiempo que hacen posible escribir programasmás expresivos. Las implementaciones anteriores que integran completamente ambas extensiones, incluyendo la suspensión de la ejecución de objetivos siempre que sea necesario, la implementación de inclusión (subsumption) de respuestas, etc., en todos los puntos en los que sea necesario para evitar recomputaciones y garantizar la terminación cuando sea posible, no han proporcionan una interfaz simple, bien documentada y fácil de entender. Esta interfaz es necesaria para permitir integrar resolutores de CLP arbitrarios en el sistema de tabulación. Esto claramente dificulta un uso más generalizado de la integración de ambas extensiones. En esta tesis examinamos los requisitos que un resolutor de restricciones debe cumplir para ser integrado con un sistema de tabulación. Proponemos un esquema (y su implementación), que hemos llamadoMod TCLP, que requiere un reducido conjunto de operaciones (en particular, y entre otras, entailment y proyección de almacenes de restricciones) que el resolutor de restricciones debe ofrecer al sistema de tabulación. Hemos validado el diseño de Mod TCLP con una serie de casos de uso: la refactorización de un sistema de restricciones (difference constraints) previamente conectado de un modo ad-hoc con la tabulación de Ciao Prolog; la integración del sistema de restricciones CLP(Q) de Holzbauer; y la implementación de un resolutor de restricciones sobre retículos finitos. Hemos evaluado su rendimiento con varios programas de prueba, incluyendo la implementación de un intérprete abstracto que alcanza su punto fijo mediante el sistema de tabulación y en el que las operaciones en el dominio son realizadas por el resolutor de restricciones sobre retículos (finitos) donde TCLP evita la recomputación de valores abstractos de las variables ya contenidos en llamadas anteriores.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Partially supported by the Bulgarian Science Fund contract with TU Varna, No 487.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La programmation par contraintes est une technique puissante pour résoudre, entre autres, des problèmes d’ordonnancement de grande envergure. L’ordonnancement vise à allouer dans le temps des tâches à des ressources. Lors de son exécution, une tâche consomme une ressource à un taux constant. Généralement, on cherche à optimiser une fonction objectif telle la durée totale d’un ordonnancement. Résoudre un problème d’ordonnancement signifie trouver quand chaque tâche doit débuter et quelle ressource doit l’exécuter. La plupart des problèmes d’ordonnancement sont NP-Difficiles. Conséquemment, il n’existe aucun algorithme connu capable de les résoudre en temps polynomial. Cependant, il existe des spécialisations aux problèmes d’ordonnancement qui ne sont pas NP-Complet. Ces problèmes peuvent être résolus en temps polynomial en utilisant des algorithmes qui leur sont propres. Notre objectif est d’explorer ces algorithmes d’ordonnancement dans plusieurs contextes variés. Les techniques de filtrage ont beaucoup évolué dans les dernières années en ordonnancement basé sur les contraintes. La proéminence des algorithmes de filtrage repose sur leur habilité à réduire l’arbre de recherche en excluant les valeurs des domaines qui ne participent pas à des solutions au problème. Nous proposons des améliorations et présentons des algorithmes de filtrage plus efficaces pour résoudre des problèmes classiques d’ordonnancement. De plus, nous présentons des adaptations de techniques de filtrage pour le cas où les tâches peuvent être retardées. Nous considérons aussi différentes propriétés de problèmes industriels et résolvons plus efficacement des problèmes où le critère d’optimisation n’est pas nécessairement le moment où la dernière tâche se termine. Par exemple, nous présentons des algorithmes à temps polynomial pour le cas où la quantité de ressources fluctue dans le temps, ou quand le coût d’exécuter une tâche au temps t dépend de t.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Quest for Orthologs (QfO) is a community effort with the goal to improve and benchmark orthology predictions. As quality assessment assumes prior knowledge on species phylogenies, we investigated the congruency between existing species trees by comparing the relationships of 147 QfO reference organisms from six Tree of Life (ToL)/species tree projects: The National Center for Biotechnology Information (NCBI) taxonomy, Opentree of Life, the sequenced species/species ToL, the 16S ribosomal RNA (rRNA) database, and trees published by Ciccarelli et al. (Ciccarelli FD, et al. 2006. Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283-1287) and by Huerta-Cepas et al. (Huerta-Cepas J, Marcet-Houben M, Gabaldon T. 2014. A nested phylogenetic reconstruction approach provides scalable resolution in the eukaryotic Tree Of Life. PeerJ PrePrints 2:223) Our study reveals that each species tree suggests a different phylogeny: 87 of the 146 (60%) possible splits of a dichotomous and rooted tree are congruent, while all other splits are incongruent in at least one of the species trees. Topological differences are observed not only at deep speciation events, but also within younger clades, such as Hominidae, Rodentia, Laurasiatheria, or rosids. The evolutionary relationships of 27 archaea and bacteria are highly inconsistent. By assessing 458,108 gene trees from 65 genomes, we show that consistent species topologies are more often supported by gene phylogenies than contradicting ones. The largest concordant species tree includes 77 of the QfO reference organisms at the most. Results are summarized in the form of a consensus ToL (http://swisstree.vital-it.ch/species_tree) that can serve different benchmarking purposes.