938 resultados para Directed acyclic graphs
Resumo:
23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2015). 4 to 6, Mar, 2015. Turku, Finland.
Resumo:
Los hipergrafos dirigidos se han empleado en problemas relacionados con lógica proposicional, bases de datos relacionales, linguística computacional y aprendizaje automático. Los hipergrafos dirigidos han sido también utilizados como alternativa a los grafos (bipartitos) dirigidos para facilitar el estudio de las interacciones entre componentes de sistemas complejos que no pueden ser fácilmente modelados usando exclusivamente relaciones binarias. En este contexto, este tipo de representación es conocida como hiper-redes. Un hipergrafo dirigido es una generalización de un grafo dirigido especialmente adecuado para la representación de relaciones de muchos a muchos. Mientras que una arista en un grafo dirigido define una relación entre dos de sus nodos, una hiperarista en un hipergrafo dirigido define una relación entre dos conjuntos de sus nodos. La conexión fuerte es una relación de equivalencia que divide el conjunto de nodos de un hipergrafo dirigido en particiones y cada partición define una clase de equivalencia conocida como componente fuertemente conexo. El estudio de los componentes fuertemente conexos de un hipergrafo dirigido puede ayudar a conseguir una mejor comprensión de la estructura de este tipo de hipergrafos cuando su tamaño es considerable. En el caso de grafo dirigidos, existen algoritmos muy eficientes para el cálculo de los componentes fuertemente conexos en grafos de gran tamaño. Gracias a estos algoritmos, se ha podido averiguar que la estructura de la WWW tiene forma de “pajarita”, donde más del 70% del los nodos están distribuidos en tres grandes conjuntos y uno de ellos es un componente fuertemente conexo. Este tipo de estructura ha sido también observada en redes complejas en otras áreas como la biología. Estudios de naturaleza similar no han podido ser realizados en hipergrafos dirigidos porque no existe algoritmos capaces de calcular los componentes fuertemente conexos de este tipo de hipergrafos. En esta tesis doctoral, hemos investigado como calcular los componentes fuertemente conexos de un hipergrafo dirigido. En concreto, hemos desarrollado dos algoritmos para este problema y hemos determinado que son correctos y cuál es su complejidad computacional. Ambos algoritmos han sido evaluados empíricamente para comparar sus tiempos de ejecución. Para la evaluación, hemos producido una selección de hipergrafos dirigidos generados de forma aleatoria inspirados en modelos muy conocidos de grafos aleatorios como Erdos-Renyi, Newman-Watts-Strogatz and Barabasi-Albert. Varias optimizaciones para ambos algoritmos han sido implementadas y analizadas en la tesis. En concreto, colapsar los componentes fuertemente conexos del grafo dirigido que se puede construir eliminando ciertas hiperaristas complejas del hipergrafo dirigido original, mejora notablemente los tiempos de ejecucion de los algoritmos para varios de los hipergrafos utilizados en la evaluación. Aparte de los ejemplos de aplicación mencionados anteriormente, los hipergrafos dirigidos han sido también empleados en el área de representación de conocimiento. En concreto, este tipo de hipergrafos se han usado para el cálculo de módulos de ontologías. Una ontología puede ser definida como un conjunto de axiomas que especifican formalmente un conjunto de símbolos y sus relaciones, mientras que un modulo puede ser entendido como un subconjunto de axiomas de la ontología que recoge todo el conocimiento que almacena la ontología sobre un conjunto especifico de símbolos y sus relaciones. En la tesis nos hemos centrado solamente en módulos que han sido calculados usando la técnica de localidad sintáctica. Debido a que las ontologías pueden ser muy grandes, el cálculo de módulos puede facilitar las tareas de re-utilización y mantenimiento de dichas ontologías. Sin embargo, analizar todos los posibles módulos de una ontología es, en general, muy costoso porque el numero de módulos crece de forma exponencial con respecto al número de símbolos y de axiomas de la ontología. Afortunadamente, los axiomas de una ontología pueden ser divididos en particiones conocidas como átomos. Cada átomo representa un conjunto máximo de axiomas que siempre aparecen juntos en un modulo. La decomposición atómica de una ontología es definida como un grafo dirigido de tal forma que cada nodo del grafo corresponde con un átomo y cada arista define una dependencia entre una pareja de átomos. En esta tesis introducimos el concepto de“axiom dependency hypergraph” que generaliza el concepto de descomposición atómica de una ontología. Un modulo en una ontología correspondería con un componente conexo en este tipo de hipergrafos y un átomo de una ontología con un componente fuertemente conexo. Hemos adaptado la implementación de nuestros algoritmos para que funcionen también con axiom dependency hypergraphs y poder de esa forma calcular los átomos de una ontología. Para demostrar la viabilidad de esta idea, hemos incorporado nuestros algoritmos en una aplicación que hemos desarrollado para la extracción de módulos y la descomposición atómica de ontologías. A la aplicación la hemos llamado HyS y hemos estudiado sus tiempos de ejecución usando una selección de ontologías muy conocidas del área biomédica, la mayoría disponibles en el portal de Internet NCBO. Los resultados de la evaluación muestran que los tiempos de ejecución de HyS son mucho mejores que las aplicaciones más rápidas conocidas. ABSTRACT Directed hypergraphs are an intuitive modelling formalism that have been used in problems related to propositional logic, relational databases, computational linguistic and machine learning. Directed hypergraphs are also presented as an alternative to directed (bipartite) graphs to facilitate the study of the interactions between components of complex systems that cannot naturally be modelled as binary relations. In this context, they are known as hyper-networks. A directed hypergraph is a generalization of a directed graph suitable for representing many-to-many relationships. While an edge in a directed graph defines a relation between two nodes of the graph, a hyperedge in a directed hypergraph defines a relation between two sets of nodes. Strong-connectivity is an equivalence relation that induces a partition of the set of nodes of a directed hypergraph into strongly-connected components. These components can be collapsed into single nodes. As result, the size of the original hypergraph can significantly be reduced if the strongly-connected components have many nodes. This approach might contribute to better understand how the nodes of a hypergraph are connected, in particular when the hypergraphs are large. In the case of directed graphs, there are efficient algorithms that can be used to compute the strongly-connected components of large graphs. For instance, it has been shown that the macroscopic structure of the World Wide Web can be represented as a “bow-tie” diagram where more than 70% of the nodes are distributed into three large sets and one of these sets is a large strongly-connected component. This particular structure has been also observed in complex networks in other fields such as, e.g., biology. Similar studies cannot be conducted in a directed hypergraph because there does not exist any algorithm for computing the strongly-connected components of the hypergraph. In this thesis, we investigate ways to compute the strongly-connected components of directed hypergraphs. We present two new algorithms and we show their correctness and computational complexity. One of these algorithms is inspired by Tarjan’s algorithm for directed graphs. The second algorithm follows a simple approach to compute the stronglyconnected components. This approach is based on the fact that two nodes of a graph that are strongly-connected can also reach the same nodes. In other words, the connected component of each node is the same. Both algorithms are empirically evaluated to compare their performances. To this end, we have produced a selection of random directed hypergraphs inspired by existent and well-known random graphs models like Erd˝os-Renyi and Newman-Watts-Strogatz. Besides the application examples that we mentioned earlier, directed hypergraphs have also been employed in the field of knowledge representation. In particular, they have been used to compute the modules of an ontology. An ontology is defined as a collection of axioms that provides a formal specification of a set of terms and their relationships; and a module is a subset of an ontology that completely captures the meaning of certain terms as defined in the ontology. In particular, we focus on the modules computed using the notion of syntactic locality. As ontologies can be very large, the computation of modules facilitates the reuse and maintenance of these ontologies. Analysing all modules of an ontology, however, is in general not feasible as the number of modules grows exponentially in the number of terms and axioms of the ontology. Nevertheless, the modules can succinctly be represented using the Atomic Decomposition of an ontology. Using this representation, an ontology can be partitioned into atoms, which are maximal sets of axioms that co-occur in every module. The Atomic Decomposition is then defined as a directed graph such that each node correspond to an atom and each edge represents a dependency relation between two atoms. In this thesis, we introduce the notion of an axiom dependency hypergraph which is a generalization of the atomic decomposition of an ontology. A module in the ontology corresponds to a connected component in the hypergraph, and the atoms of the ontology to the strongly-connected components. We apply our algorithms for directed hypergraphs to axiom dependency hypergraphs and in this manner, we compute the atoms of an ontology. To demonstrate the viability of this approach, we have implemented the algorithms in the application HyS which computes the modules of ontologies and calculate their atomic decomposition. In the thesis, we provide an experimental evaluation of HyS with a selection of large and prominent biomedical ontologies, most of which are available in the NCBO Bioportal. HyS outperforms state-of-the-art implementations in the tasks of extracting modules and computing the atomic decomposition of these ontologies.
Resumo:
In this dissertation, we apply mathematical programming techniques (i.e., integer programming and polyhedral combinatorics) to develop exact approaches for influence maximization on social networks. We study four combinatorial optimization problems that deal with maximizing influence at minimum cost over a social network. To our knowl- edge, all previous work to date involving influence maximization problems has focused on heuristics and approximation. We start with the following viral marketing problem that has attracted a significant amount of interest from the computer science literature. Given a social network, find a target set of customers to seed with a product. Then, a cascade will be caused by these initial adopters and other people start to adopt this product due to the influence they re- ceive from earlier adopters. The idea is to find the minimum cost that results in the entire network adopting the product. We first study a problem called the Weighted Target Set Selection (WTSS) Prob- lem. In the WTSS problem, the diffusion can take place over as many time periods as needed and a free product is given out to the individuals in the target set. Restricting the number of time periods that the diffusion takes place over to be one, we obtain a problem called the Positive Influence Dominating Set (PIDS) problem. Next, incorporating partial incentives, we consider a problem called the Least Cost Influence Problem (LCIP). The fourth problem studied is the One Time Period Least Cost Influence Problem (1TPLCIP) which is identical to the LCIP except that we restrict the number of time periods that the diffusion takes place over to be one. We apply a common research paradigm to each of these four problems. First, we work on special graphs: trees and cycles. Based on the insights we obtain from special graphs, we develop efficient methods for general graphs. On trees, first, we propose a polynomial time algorithm. More importantly, we present a tight and compact extended formulation. We also project the extended formulation onto the space of the natural vari- ables that gives the polytope on trees. Next, building upon the result for trees---we derive the polytope on cycles for the WTSS problem; as well as a polynomial time algorithm on cycles. This leads to our contribution on general graphs. For the WTSS problem and the LCIP, using the observation that the influence propagation network must be a directed acyclic graph (DAG), the strong formulation for trees can be embedded into a formulation on general graphs. We use this to design and implement a branch-and-cut approach for the WTSS problem and the LCIP. In our computational study, we are able to obtain high quality solutions for random graph instances with up to 10,000 nodes and 20,000 edges (40,000 arcs) within a reasonable amount of time.
Resumo:
Workflows have been successfully applied to express the decomposition of complex scientific applications. This has motivated many initiatives that have been developing scientific workflow tools. However the existing tools still lack adequate support to important aspects namely, decoupling the enactment engine from workflow tasks specification, decentralizing the control of workflow activities, and allowing their tasks to run autonomous in distributed infrastructures, for instance on Clouds. Furthermore many workflow tools only support the execution of Direct Acyclic Graphs (DAG) without the concept of iterations, where activities are executed millions of iterations during long periods of time and supporting dynamic workflow reconfigurations after certain iteration. We present the AWARD (Autonomic Workflow Activities Reconfigurable and Dynamic) model of computation, based on the Process Networks model, where the workflow activities (AWA) are autonomic processes with independent control that can run in parallel on distributed infrastructures, e. g. on Clouds. Each AWA executes a Task developed as a Java class that implements a generic interface allowing end-users to code their applications without concerns for low-level details. The data-driven coordination of AWA interactions is based on a shared tuple space that also enables support to dynamic workflow reconfiguration and monitoring of the execution of workflows. We describe how AWARD supports dynamic reconfiguration and discuss typical workflow reconfiguration scenarios. For evaluation we describe experimental results of AWARD workflow executions in several application scenarios, mapped to a small dedicated cluster and the Amazon (Elastic Computing EC2) Cloud.
Resumo:
Most haptic environments are based on single point interactions whereas in practice, object manipulation requires multiple contact points between the object, fingers, thumb and palm. The Friction Cone Algorithm was developed specifically to work well in a multi-finger haptic environment where object manipulation would occur. However, the Friction Cone Algorithm has two shortcomings when applied to polygon meshes: there is no means of transitioning polygon boundaries or feeling non-convex edges. In order to overcome these deficiencies, Face Directed Connection Graphs have been developed as well as a robust method for applying friction to non-convex edges. Both these extensions are described herein, as well as the implementation issues associated with them.
Resumo:
The problem of scheduling a parallel program presented by a weighted directed acyclic graph (DAG) to the set of homogeneous processors for minimizing the completion time of the program has been extensively studied as academic optimization problem which occurs in optimizing the execution time of parallel algorithm with parallel computer.In this paper, we propose an application of the Ant Colony Optimization (ACO) to a multiprocessor scheduling problem (MPSP). In the MPSP, no preemption is allowed and each operation demands a setup time on the machines. The problem seeks to compose a schedule that minimizes the total completion time.We therefore rely on heuristics to find solutions since solution methods are not feasible for most problems as such. This novel heuristic searching approach to the multiprocessor based on the ACO algorithm a collection of agents cooperate to effectively explore the search space.A computational experiment is conducted on a suit of benchmark application. By comparing our algorithm result obtained to that of previous heuristic algorithm, it is evince that the ACO algorithm exhibits competitive performance with small error ratio.
Resumo:
In order to achieve the high performance, we need to have an efficient scheduling of a parallelprogram onto the processors in multiprocessor systems that minimizes the entire executiontime. This problem of multiprocessor scheduling can be stated as finding a schedule for ageneral task graph to be executed on a multiprocessor system so that the schedule length can be minimize [10]. This scheduling problem is known to be NP- Hard.In multi processor task scheduling, we have a number of CPU’s on which a number of tasksare to be scheduled that the program’s execution time is minimized. According to [10], thetasks scheduling problem is a key factor for a parallel multiprocessor system to gain betterperformance. A task can be partitioned into a group of subtasks and represented as a DAG(Directed Acyclic Graph), so the problem can be stated as finding a schedule for a DAG to beexecuted in a parallel multiprocessor system so that the schedule can be minimized. Thishelps to reduce processing time and increase processor utilization. The aim of this thesis workis to check and compare the results obtained by Bee Colony algorithm with already generatedbest known results in multi processor task scheduling domain.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
Mathematical models of disease progression predict disease outcomes and are useful epidemiological tools for planners and evaluators of health interventions. The R package gems is a tool that simulates disease progression in patients and predicts the effect of different interventions on patient outcome. Disease progression is represented by a series of events (e.g., diagnosis, treatment and death), displayed in a directed acyclic graph. The vertices correspond to disease states and the directed edges represent events. The package gems allows simulations based on a generalized multistate model that can be described by a directed acyclic graph with continuous transition-specific hazard functions. The user can specify an arbitrary hazard function and its parameters. The model includes parameter uncertainty, does not need to be a Markov model, and may take the history of previous events into account. Applications are not limited to the medical field and extend to other areas where multistate simulation is of interest. We provide a technical explanation of the multistate models used by gems, explain the functions of gems and their arguments, and show a sample application.
Resumo:
We present a novel framework for the analysis and optimization of encoding latency for multiview video. Firstly, we characterize the elements that have an influence in the encoding latency performance: (i) the multiview prediction structure and (ii) the hardware encoder model. Then, we provide algorithms to find the encoding latency of any arbitrary multiview prediction structure. The proposed framework relies on the directed acyclic graph encoder latency (DAGEL) model, which provides an abstraction of the processing capacity of the encoder by considering an unbounded number of processors. Using graph theoretic algorithms, the DAGEL model allows us to compute the encoding latency of a given prediction structure, and determine the contribution of the prediction dependencies to it. As an example of DAGEL application, we propose an algorithm to reduce the encoding latency of a given multiview prediction structure up to a target value. In our approach, a minimum number of frame dependencies are pruned, until the latency target value is achieved, thus minimizing the degradation of the rate-distortion performance due to the removal of the prediction dependencies. Finally, we analyze the latency performance of the DAGEL derived prediction structures in multiview encoders with limited processing capacity.
Resumo:
Esta tesis presenta el diseño y la aplicación de una metodología que permite la determinación de los parámetros para la planificación de nodos e infraestructuras logísticas en un territorio, considerando además el impacto de estas en los diferentes componentes territoriales, así como en el desarrollo poblacional, el desarrollo económico y el medio ambiente, presentando así un avance en la planificación integral del territorio. La Metodología propuesta está basada en Minería de Datos, que permite el descubrimiento de patrones detrás de grandes volúmenes de datos previamente procesados. Las características propias de los datos sobre el territorio y los componentes que lo conforman hacen de los estudios territoriales un campo ideal para la aplicación de algunas de las técnicas de Minería de Datos, tales como los ´arboles decisión y las redes bayesianas. Los árboles de decisión permiten representar y categorizar de forma esquemática una serie de variables de predicción que ayudan al análisis de una variable objetivo. Las redes bayesianas representan en un grafo acíclico dirigido, un modelo probabilístico de variables distribuidas en padres e hijos, y la inferencia estadística que permite determinar la probabilidad de certeza de una hipótesis planteada, es decir, permiten construir modelos de probabilidad conjunta que presentan de manera gráfica las dependencias relevantes en un conjunto de datos. Al igual que con los árboles de decisión, la división del territorio en diferentes unidades administrativas hace de las redes bayesianas una herramienta potencial para definir las características físicas de alguna tipología especifica de infraestructura logística tomando en consideración las características territoriales, poblacionales y económicas del área donde se plantea su desarrollo y las posibles sinergias que se puedan presentar sobre otros nodos e infraestructuras logísticas. El caso de estudio seleccionado para la aplicación de la metodología ha sido la República de Panamá, considerando que este país presenta algunas características singulares, entra las que destacan su alta concentración de población en la Ciudad de Panamá; que a su vez a concentrado la actividad económica del país; su alto porcentaje de zonas protegidas, lo que ha limitado la vertebración del territorio; y el Canal de Panamá y los puertos de contenedores adyacentes al mismo. La metodología se divide en tres fases principales: Fase 1: Determinación del escenario de trabajo 1. Revisión del estado del arte. 2. Determinación y obtención de las variables de estudio. Fase 2: Desarrollo del modelo de inteligencia artificial 3. Construcción de los ´arboles de decisión. 4. Construcción de las redes bayesianas. Fase 3: Conclusiones 5. Determinación de las conclusiones. Con relación al modelo de planificación aplicado al caso de estudio, una vez aplicada la metodología, se estableció un modelo compuesto por 47 variables que definen la planificación logística de Panamá, el resto de variables se definen a partir de estas, es decir, conocidas estas, el resto se definen a través de ellas. Este modelo de planificación establecido a través de la red bayesiana considera los aspectos de una planificación sostenible: económica, social y ambiental; que crean sinergia con la planificación de nodos e infraestructuras logísticas. The thesis presents the design and application of a methodology that allows the determination of parameters for the planning of nodes and logistics infrastructure in a territory, besides considering the impact of these different territorial components, as well as the population growth, economic and environmental development. The proposed methodology is based on Data Mining, which allows the discovery of patterns behind large volumes of previously processed data. The own characteristics of the territorial data makes of territorial studies an ideal field of knowledge for the implementation of some of the Data Mining techniques, such as Decision Trees and Bayesian Networks. Decision trees categorize schematically a series of predictor variables of an analyzed objective variable. Bayesian Networks represent a directed acyclic graph, a probabilistic model of variables divided in fathers and sons, and statistical inference that allow determine the probability of certainty in a hypothesis. The case of study for the application of the methodology is the Republic of Panama. This country has some unique features: a high population density in the Panama City, a concentration of economic activity, a high percentage of protected areas, and the Panama Canal. The methodology is divided into three main phases: Phase 1: definition of the work stage. 1. Review of the State of the art. 2. Determination of the variables. Phase 2: Development of artificial intelligence model 3. Construction of decision trees. 4. Construction of Bayesian Networks. Phase 3: conclusions 5. Determination of the conclusions. The application of the methodology to the case study established a model composed of 47 variables that define the logistics planning for Panama. This model of planning established through the Bayesian network considers aspects of sustainable planning and simulates the synergies between the nodes and logistical infrastructure planning.
Resumo:
ACM Computing Classification System (1998): J.3.
Resumo:
In this paper, we propose a new edge-based matching kernel for graphs by using discrete-time quantum walks. To this end, we commence by transforming a graph into a directed line graph. The reasons of using the line graph structure are twofold. First, for a graph, its directed line graph is a dual representation and each vertex of the line graph represents a corresponding edge in the original graph. Second, we show that the discrete-time quantum walk can be seen as a walk on the line graph and the state space of the walk is the vertex set of the line graph, i.e., the state space of the walk is the edges of the original graph. As a result, the directed line graph provides an elegant way of developing new edge-based matching kernel based on discrete-time quantum walks. For a pair of graphs, we compute the h-layer depth-based representation for each vertex of their directed line graphs by computing entropic signatures (computed from discrete-time quantum walks on the line graphs) on the family of K-layer expansion subgraphs rooted at the vertex, i.e., we compute the depth-based representations for edges of the original graphs through their directed line graphs. Based on the new representations, we define an edge-based matching method for the pair of graphs by aligning the h-layer depth-based representations computed through the directed line graphs. The new edge-based matching kernel is thus computed by counting the number of matched vertices identified by the matching method on the directed line graphs. Experiments on standard graph datasets demonstrate the effectiveness of our new kernel.
Resumo:
This article presents the principal results of the Ph.D. thesis Intelligent systems in bioinformatics: mapping and merging anatomical ontologies by Peter Petrov, successfully defended at the St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics, Department of Information Technologies, on 26 April 2013.
Resumo:
A special class of preferences, given by a directed acyclic graph, is considered. They are represented by incomplete pairwise comparison matrices as only partial information is available: for some pairs no comparison is given in the graph. A weighting method satisfies the property linear order preservation if it always results in a ranking such that an alternative directly preferred to another does not have a lower rank. We study whether two procedures, the Eigenvector Method and the Logarithmic Least Squares Method meet this axiom. Both weighting methods break linear order preservation, moreover, the ranking according to the Eigenvector Method depends on the incomplete pairwise comparison representation chosen.