45 resultados para SUBGRAPHS
Resumo:
The reducionism method has helped in the clari cation of functioning of many biological process. However, such process are extremely complex and have emergent properties that can not be explained or even predicted by reducionism methods. To overcome these limits, researchers have been used a set of methods known as systems biology, a new area of biology aiming to understand the interactions between the multiple components of biological processes. These interactions can be represented by a mathematical object called graph or network, where the interacting elements are represented by a vertex and the interactions by edges that connect a pair of vertexes. Into graphs it is possible to nd subgraphs, occurring in complex networks at numbers that are signi cantly higher than those in randomized networks, they are de ned as motifs. As motifs in biological networks may represent the structural units of biological processess, their detection is important. Therefore, the aim of this present work was detect, count and classify motifs present in biological integrated networks of bacteria Escherichia coli and yeast Saccharomyces cere- visiae. For this purpose, we implemented codes in MathematicaR and Python environments for detecting, counting and classifying motifs in these networks. The composition and types of motifs detected in these integrated networks indicate that such networks are organized in three main bridged modules composed by motifs in which edges are all the same type. The connecting bridges are composed by motifs in which the types of edges are diferent
Resumo:
Consider the NP-hard problem of, given a simple graph G, to find a series-parallel subgraph of G with the maximum number of edges. The algorithm that, given a connected graph G, outputs a spanning tree of G, is a 1/2-approximation. Indeed, if n is the number of vertices in G, any spanning tree in G has n-1 edges and any series-parallel graph on n vertices has at most 2n-3 edges. We present a 7/12 -approximation for this problem and results showing the limits of our approach.
Resumo:
For fixed positive integers r, k and E with 1 <= l < r and an r-uniform hypergraph H, let kappa(H, k, l) denote the number of k-colorings of the set of hyperedges of H for which any two hyperedges in the same color class intersect in at least l elements. Consider the function KC(n, r, k, l) = max(H epsilon Hn) kappa(H, k, l), where the maximum runs over the family H-n of all r-uniform hypergraphs on n vertices. In this paper, we determine the asymptotic behavior of the function KC(n, r, k, l) for every fixed r, k and l and describe the extremal hypergraphs. This variant of a problem of Erdos and Rothschild, who considered edge colorings of graphs without a monochromatic triangle, is related to the Erdos-Ko-Rado Theorem (Erdos et al., 1961 [8]) on intersecting systems of sets. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
We prove that asymptotically (as n -> infinity) almost all graphs with n vertices and C(d)n(2-1/2d) log(1/d) n edges are universal with respect to the family of all graphs with maximum degree bounded by d. Moreover, we provide an efficient deterministic embedding algorithm for finding copies of bounded degree graphs in graphs satisfying certain pseudorandom properties. We also prove a counterpart result for random bipartite graphs, where the threshold number of edges is even smaller but the embedding is randomized.
Resumo:
We study quasi-random properties of k-uniform hypergraphs. Our central notion is uniform edge distribution with respect to large vertex sets. We will find several equivalent characterisations of this property and our work can be viewed as an extension of the well known Chung-Graham-Wilson theorem for quasi-random graphs. Moreover, let K(k) be the complete graph on k vertices and M(k) the line graph of the graph of the k-dimensional hypercube. We will show that the pair of graphs (K(k),M(k)) has the property that if the number of copies of both K(k) and M(k) in another graph G are as expected in the random graph of density d, then G is quasi-random (in the sense of the Chung-Graham-Wilson theorem) with density close to d. (C) 2011 Wiley Periodicals, Inc. Random Struct. Alg., 40, 1-38, 2012
Resumo:
addplot adds twoway plot objects to an existing twoway graph. This is useful if you want to add additional objects such as titles or extra data points to a twoway graph after it has been created. Most of what addplot can do, can also be done by rerunning the original graph command including additional options or plot statements. addplot, however, might be useful if you have to modify a graph for which you cannot rerun the original command, for example, because you only have the graph file but not the data that were used to create the graph. Furthermore, addplot can do certain things that would be difficult to achieve in a single graph command (e.g. customizing individual subgraphs within a by-graph). addplot also provides a substitute for some of the functionality of the graph editor.
Resumo:
Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models recently proposed to deal with multi-dimensional classification problems, where each instance in the data set has to be assigned to more than one class variable. In this paper, we propose a Markov blanket-based approach for learning MBCs from data. Basically, it consists of determining the Markov blanket around each class variable using the HITON algorithm, then specifying the directionality over the MBC subgraphs. Our approach is applied to the prediction problem of the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson’s Disease Questionnaire (PDQ-39) in order to estimate the health-related quality of life of Parkinson’s patients. Fivefold cross-validation experiments were carried out on randomly generated synthetic data sets, Yeast data set, as well as on a real-world Parkinson’s disease data set containing 488 patients. The experimental study, including comparison with additional Bayesian network-based approaches, back propagation for multi-label learning, multi-label k-nearest neighbor, multinomial logistic regression, ordinary least squares, and censored least absolute deviations, shows encouraging results in terms of predictive accuracy as well as the identification of dependence relationships among class and feature variables.
Resumo:
Optimized structure of the educational program consisting of a set of the interconnected educational objects is offered by means of problem solution of optimum partition of the acyclic weighed graph. The condition of acyclicity preservation for subgraphs is formulated and the quantitative assessment of decision options is executed. The original algorithm of search of quasioptimum partition using the genetic algorithm scheme with coding chromosomes by permutation is offered. Object-oriented realization of algorithm in language C++ is described and results of numerical experiments are presented.
Resumo:
A K-t,K-t-design of order n is an edge-disjoint decomposition of K-n into copies of K-t,K-t. When t is odd, an extended metamorphosis of a K-t,K-t-design of order n into a 2t-cycle system of order n is obtained by taking (t - 1)/2 edge-disjoint cycles of length 2t from each K-t,K-t block, and rearranging all the remaining 1-factors in each K-t,K-t block into further 2t-cycles. The 'extended' refers to the fact that as many subgraphs isomorphic to a 2t-cycle as possible are removed from each K-t,K-t block, rather than merely one subgraph. In this paper an extended metamorphosis of a K-t,K-t-design of order congruent to 1 (mod 4t(2)) into a 2t-cycle system of the same order is given for all odd t > 3. A metamorphosis of a 2-fold K-t,K-t-design of any order congruent to 1 (mod 4t(2)) into a 2t-cycle system of the same order is also given, for all odd t > 3. (The case t = 3 appeared in Ars Combin. 64 (2002) 65-80.) When t is even, the graph K-t,K-t is easily seen to contain t/2 edge-disjoint cycles of length 2t, and so the metamorphosis in that case is straightforward. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
It is shown that there exists a triangle decomposition of the graph obtained from the complete graph of order v by removing the edges of two vertex disjoint complete subgraphs of orders u and w if and only if u, w, and v are odd, ((v)(2)) - ((u)(2)) - ((w)(2)) equivalent to 0 (mod 3), and v >= w + u + max {u, w}. Such decompositions are equivalent to group divisible designs with block size 3, one group of size u, one group of size w, and v - u - w groups of size 1. This result settles the existence problem for Steiner triple systems having two disjoint specified subsystems, thereby generalizing the well-known theorem of Doyen and Wilson on the existence of Steiner triple systems with a single specified subsystem. (c) 2005 Wiley Periodicals, Inc.
Resumo:
In this paper, we propose a new edge-based matching kernel for graphs by using discrete-time quantum walks. To this end, we commence by transforming a graph into a directed line graph. The reasons of using the line graph structure are twofold. First, for a graph, its directed line graph is a dual representation and each vertex of the line graph represents a corresponding edge in the original graph. Second, we show that the discrete-time quantum walk can be seen as a walk on the line graph and the state space of the walk is the vertex set of the line graph, i.e., the state space of the walk is the edges of the original graph. As a result, the directed line graph provides an elegant way of developing new edge-based matching kernel based on discrete-time quantum walks. For a pair of graphs, we compute the h-layer depth-based representation for each vertex of their directed line graphs by computing entropic signatures (computed from discrete-time quantum walks on the line graphs) on the family of K-layer expansion subgraphs rooted at the vertex, i.e., we compute the depth-based representations for edges of the original graphs through their directed line graphs. Based on the new representations, we define an edge-based matching method for the pair of graphs by aligning the h-layer depth-based representations computed through the directed line graphs. The new edge-based matching kernel is thus computed by counting the number of matched vertices identified by the matching method on the directed line graphs. Experiments on standard graph datasets demonstrate the effectiveness of our new kernel.
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.
Resumo:
In today's fast-paced and interconnected digital world, the data generated by an increasing number of applications is being modeled as dynamic graphs. The graph structure encodes relationships among data items, while the structural changes to the graphs as well as the continuous stream of information produced by the entities in these graphs make them dynamic in nature. Examples include social networks where users post status updates, images, videos, etc.; phone call networks where nodes may send text messages or place phone calls; road traffic networks where the traffic behavior of the road segments changes constantly, and so on. There is a tremendous value in storing, managing, and analyzing such dynamic graphs and deriving meaningful insights in real-time. However, a majority of the work in graph analytics assumes a static setting, and there is a lack of systematic study of the various dynamic scenarios, the complexity they impose on the analysis tasks, and the challenges in building efficient systems that can support such tasks at a large scale. In this dissertation, I design a unified streaming graph data management framework, and develop prototype systems to support increasingly complex tasks on dynamic graphs. In the first part, I focus on the management and querying of distributed graph data. I develop a hybrid replication policy that monitors the read-write frequencies of the nodes to decide dynamically what data to replicate, and whether to do eager or lazy replication in order to minimize network communication and support low-latency querying. In the second part, I study parallel execution of continuous neighborhood-driven aggregates, where each node aggregates the information generated in its neighborhoods. I build my system around the notion of an aggregation overlay graph, a pre-compiled data structure that enables sharing of partial aggregates across different queries, and also allows partial pre-computation of the aggregates to minimize the query latencies and increase throughput. Finally, I extend the framework to support continuous detection and analysis of activity-based subgraphs, where subgraphs could be specified using both graph structure as well as activity conditions on the nodes. The query specification tasks in my system are expressed using a set of active structural primitives, which allows the query evaluator to use a set of novel optimization techniques, thereby achieving high throughput. Overall, in this dissertation, I define and investigate a set of novel tasks on dynamic graphs, design scalable optimization techniques, build prototype systems, and show the effectiveness of the proposed techniques through extensive evaluation using large-scale real and synthetic datasets.
Resumo:
Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.
Resumo:
In this dissertation I draw a connection between quantum adiabatic optimization, spectral graph theory, heat-diffusion, and sub-stochastic processes through the operators that govern these processes and their associated spectra. In particular, we study Hamiltonians which have recently become known as ``stoquastic'' or, equivalently, the generators of sub-stochastic processes. The operators corresponding to these Hamiltonians are of interest in all of the settings mentioned above. I predominantly explore the connection between the spectral gap of an operator, or the difference between the two lowest energies of that operator, and certain equilibrium behavior. In the context of adiabatic optimization, this corresponds to the likelihood of solving the optimization problem of interest. I will provide an instance of an optimization problem that is easy to solve classically, but leaves open the possibility to being difficult adiabatically. Aside from this concrete example, the work in this dissertation is predominantly mathematical and we focus on bounding the spectral gap. Our primary tool for doing this is spectral graph theory, which provides the most natural approach to this task by simply considering Dirichlet eigenvalues of subgraphs of host graphs. I will derive tight bounds for the gap of one-dimensional, hypercube, and general convex subgraphs. The techniques used will also adapt methods recently used by Andrews and Clutterbuck to prove the long-standing ``Fundamental Gap Conjecture''.