419 resultados para VERTICES
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.
Resumo:
The goal of image retrieval and matching is to find and locate object instances in images from a large-scale image database. While visual features are abundant, how to combine them to improve performance by individual features remains a challenging task. In this work, we focus on leveraging multiple features for accurate and efficient image retrieval and matching. We first propose two graph-based approaches to rerank initially retrieved images for generic image retrieval. In the graph, vertices are images while edges are similarities between image pairs. Our first approach employs a mixture Markov model based on a random walk model on multiple graphs to fuse graphs. We introduce a probabilistic model to compute the importance of each feature for graph fusion under a naive Bayesian formulation, which requires statistics of similarities from a manually labeled dataset containing irrelevant images. To reduce human labeling, we further propose a fully unsupervised reranking algorithm based on a submodular objective function that can be efficiently optimized by greedy algorithm. By maximizing an information gain term over the graph, our submodular function favors a subset of database images that are similar to query images and resemble each other. The function also exploits the rank relationships of images from multiple ranked lists obtained by different features. We then study a more well-defined application, person re-identification, where the database contains labeled images of human bodies captured by multiple cameras. Re-identifications from multiple cameras are regarded as related tasks to exploit shared information. We apply a novel multi-task learning algorithm using both low level features and attributes. A low rank attribute embedding is joint learned within the multi-task learning formulation to embed original binary attributes to a continuous attribute space, where incorrect and incomplete attributes are rectified and recovered. To locate objects in images, we design an object detector based on object proposals and deep convolutional neural networks (CNN) in view of the emergence of deep networks. We improve a Fast RCNN framework and investigate two new strategies to detect objects accurately and efficiently: scale-dependent pooling (SDP) and cascaded rejection classifiers (CRC). The SDP improves detection accuracy by exploiting appropriate convolutional features depending on the scale of input object proposals. The CRC effectively utilizes convolutional features and greatly eliminates negative proposals in a cascaded manner, while maintaining a high recall for true objects. The two strategies together improve the detection accuracy and reduce the computational cost.
Resumo:
This thesis presents approximation algorithms for some NP-Hard combinatorial optimization problems on graphs and networks; in particular, we study problems related to Network Design. Under the widely-believed complexity-theoretic assumption that P is not equal to NP, there are no efficient (i.e., polynomial-time) algorithms that solve these problems exactly. Hence, if one desires efficient algorithms for such problems, it is necessary to consider approximate solutions: An approximation algorithm for an NP-Hard problem is a polynomial time algorithm which, for any instance of the problem, finds a solution whose value is guaranteed to be within a multiplicative factor of the value of an optimal solution to that instance. We attempt to design algorithms for which this factor, referred to as the approximation ratio of the algorithm, is as small as possible. The field of Network Design comprises a large class of problems that deal with constructing networks of low cost and/or high capacity, routing data through existing networks, and many related issues. In this thesis, we focus chiefly on designing fault-tolerant networks. Two vertices u,v in a network are said to be k-edge-connected if deleting any set of k − 1 edges leaves u and v connected; similarly, they are k-vertex connected if deleting any set of k − 1 other vertices or edges leaves u and v connected. We focus on building networks that are highly connected, meaning that even if a small number of edges and nodes fail, the remaining nodes will still be able to communicate. A brief description of some of our results is given below. We study the problem of building 2-vertex-connected networks that are large and have low cost. Given an n-node graph with costs on its edges and any integer k, we give an O(log n log k) approximation for the problem of finding a minimum-cost 2-vertex-connected subgraph containing at least k nodes. We also give an algorithm of similar approximation ratio for maximizing the number of nodes in a 2-vertex-connected subgraph subject to a budget constraint on the total cost of its edges. Our algorithms are based on a pruning process that, given a 2-vertex-connected graph, finds a 2-vertex-connected subgraph of any desired size and of density comparable to the input graph, where the density of a graph is the ratio of its cost to the number of vertices it contains. This pruning algorithm is simple and efficient, and is likely to find additional applications. Recent breakthroughs on vertex-connectivity have made use of algorithms for element-connectivity problems. We develop an algorithm that, given a graph with some vertices marked as terminals, significantly simplifies the graph while preserving the pairwise element-connectivity of all terminals; in fact, the resulting graph is bipartite. We believe that our simplification/reduction algorithm will be a useful tool in many settings. We illustrate its applicability by giving algorithms to find many trees that each span a given terminal set, while being disjoint on edges and non-terminal vertices; such problems have applications in VLSI design and other areas. We also use this reduction algorithm to analyze simple algorithms for single-sink network design problems with high vertex-connectivity requirements; we give an O(k log n)-approximation for the problem of k-connecting a given set of terminals to a common sink. We study similar problems in which different types of links, of varying capacities and costs, can be used to connect nodes; assuming there are economies of scale, we give algorithms to construct low-cost networks with sufficient capacity or bandwidth to simultaneously support flow from each terminal to the common sink along many vertex-disjoint paths. We further investigate capacitated network design, where edges may have arbitrary costs and capacities. Given a connectivity requirement R_uv for each pair of vertices u,v, the goal is to find a low-cost network which, for each uv, can support a flow of R_uv units of traffic between u and v. We study several special cases of this problem, giving both algorithmic and hardness results. In addition to Network Design, we consider certain Traveling Salesperson-like problems, where the goal is to find short walks that visit many distinct vertices. We give a (2 + epsilon)-approximation for Orienteering in undirected graphs, achieving the best known approximation ratio, and the first approximation algorithm for Orienteering in directed graphs. We also give improved algorithms for Orienteering with time windows, in which vertices must be visited between specified release times and deadlines, and other related problems. These problems are motivated by applications in the fields of vehicle routing, delivery and transportation of goods, and robot path planning.
Resumo:
Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.
Resumo:
Consider two graphs G and H. Let H^k[G] be the lexicographic product of H^k and G, where H^k is the lexicographic product of the graph H by itself k times. In this paper, we determine the spectrum of H^k[G]H and H^k when G and H are regular and the Laplacian spectrum of H^k[G] and H^k for G and H arbitrary. Particular emphasis is given to the least eigenvalue of the adjacency matrix in the case of lexicographic powers of regular graphs, and to the algebraic connectivity and the largest Laplacian eigenvalues in the case of lexicographic powers of arbitrary graphs. This approach allows the determination of the spectrum (in case of regular graphs) and Laplacian spectrum (for arbitrary graphs) of huge graphs. As an example, the spectrum of the lexicographic power of the Petersen graph with the googol number (that is, 10^100 ) of vertices is determined. The paper finishes with the extension of some well known spectral and combinatorial invariant properties of graphs to its lexicographic powers.
Resumo:
Let G be a simple graph on n vertices and e(G) edges. Consider the signless Laplacian, Q(G) = D + A, where A is the adjacency matrix and D is the diagonal matrix of the vertices degree of G. Let q1(G) and q2(G) be the first and the second largest eigenvalues of Q(G), respectively, and denote by S+ n the star graph with an additional edge. It is proved that inequality q1(G)+q2(G) e(G)+3 is tighter for the graph S+ n among all firefly graphs and also tighter to S+ n than to the graphs Kk _ Kn−k recently presented by Ashraf, Omidi and Tayfeh-Rezaie. Also, it is conjectured that S+ n minimizes f(G) = e(G) − q1(G) − q2(G) among all graphs G on n vertices.
Resumo:
The extreme sensitivity of the mass of the Higgs boson to quantum corrections from high mass states, makes it 'unnaturally' light in the standard model. This 'hierarchy problem' can be solved by symmetries, which predict new particles related, by the symmetry, to standard model fields. The Large Hadron Collider (LHC) can potentially discover these new particles, thereby finding the solution to the hierarchy problem. However, the dynamics of the Higgs boson is also sensitive to this new physics. We show that in many scenarios the Higgs can be a complementary and powerful probe of the hierarchy problem at the LHC and future colliders. If the top quark partners carry the color charge of the strong nuclear force, the production of Higgs pairs is affected. This effect is tightly correlated with single Higgs production, implying that only modest enhancements in di-Higgs production occur when the top partners are heavy. However, if the top partners are light, we show that di-Higgs production is a useful complementary probe to single Higgs production. We verify this result in the context of a simplified supersymmetric model. If the top partners do not carry color charge, their direct production is greatly reduced. Nevertheless, we show that such scenarios can be revealed through Higgs dynamics. We find that many color neutral frameworks leave observable traces in Higgs couplings, which, in some cases, may be the only way to probe these theories at the LHC. Some realizations of the color neutral framework also lead to exotic decays of the Higgs with displaced vertices. We show that these decays are so striking that the projected sensitivity for these searches, at hadron colliders, is comparable to that of searches for colored top partners. Taken together, these three case studies show the efficacy of the Higgs as a probe of naturalness.
Resumo:
International audience
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Matemática, 2016.
Resumo:
(The Mark and Recapture Network: a Heliconius case study). The current pace of habitat destruction, especially in tropical landscapes, has increased the need for understanding minimum patch requirements and patch distance as tools for conserving species in forest remnants. Mark recapture and tagging studies have been instrumental in providing parameters for functional models. Because of their popularity, ease of manipulation and well known biology, butterflies have become model in studies of spatial structure. Yet, most studies on butterflies movement have focused on temperate species that live in open habitats, in which forest patches are barrier to movement. This study aimed to view and review data from mark-recapture as a network in two species of butterfly (Heliconius erato and Heliconius melpomene). A work of marking and recapture of the species was carried out in an Atlantic forest reserve located about 20km from the city of Natal (RN). Mark recapture studies were conducted in 3 weekly visits during January-February and July-August in 2007 and 2008. Captures were more common in two sections of the dirt road, with minimal collection in the forest trail. The spatial spread of captures was similar in the two species. Yet, distances between recaptures seem to be greater for Heliconius erato than for Heliconius melpomene. In addition, the erato network is more disconnected, suggesting that this specie has shorter traveling patches. Moving on to the network, both species have similar number of links (N) and unweighed vertices (L). However, melpomene has a weighed network 50% more connections than erato. These network metrics suggest that erato has more compartmentalized network and restricted movement than melpomene. Thus, erato has a larger number of disconnected components, nC, in the network, and a smaller network diameter. The frequency distribution of network connectivity for both species was better explained by a Power-law than by a random, Poissom distribution, showing that the Power-law provides a better fit than the Poisson for both species. Moreover, the Powerlaw erato is much better adjusted than in melpomene, which should be linked to the small movements that erato makes in the network
Resumo:
In this report, we survey results on distance magic graphs and some closely related graphs. A distance magic labeling of a graph G with magic constant k is a bijection l from the vertex set to {1, 2, . . . , n}, such that for every vertex x Σ l(y) = k,y∈NG(x) where NG(x) is the set of vertices of G adjacent to x. If the graph G has a distance magic labeling we say that G is a distance magic graph. In Chapter 1, we explore the background of distance magic graphs by introducing examples of magic squares, magic graphs, and distance magic graphs. In Chapter 2, we begin by examining some basic results on distance magic graphs. We next look at results on different graph structures including regular graphs, multipartite graphs, graph products, join graphs, and splitting graphs. We conclude with other perspectives on distance magic graphs including embedding theorems, the matrix representation of distance magic graphs, lifted magic rectangles, and distance magic constants. In Chapter 3, we study graph labelings that retain the same labels as distance magic labelings, but alter the definition in some other way. These labelings include balanced distance magic labelings, closed distance magic labelings, D-distance magic labelings, and distance antimagic labelings. In Chapter 4, we examine results on neighborhood magic labelings, group distance magic labelings, and group distance antimagic labelings. These graph labelings change the label set, but are otherwise similar to distance magic graphs. In Chapter 5, we examine some applications of distance magic and distance antimagic labeling to the fair scheduling of tournaments. In Chapter 6, we conclude with some open problems.
Resumo:
In the study of complex networks, vertex centrality measures are used to identify the most important vertices within a graph. A related problem is that of measuring the centrality of an edge. In this paper, we propose a novel edge centrality index rooted in quantum information. More specifically, we measure the importance of an edge in terms of the contribution that it gives to the Von Neumann entropy of the graph. We show that this can be computed in terms of the Holevo quantity, a well known quantum information theoretical measure. While computing the Von Neumann entropy and hence the Holevo quantity requires computing the spectrum of the graph Laplacian, we show how to obtain a simplified measure through a quadratic approximation of the Shannon entropy. This in turns shows that the proposed centrality measure is strongly correlated with the negative degree centrality on the line graph. We evaluate our centrality measure through an extensive set of experiments on real-world as well as synthetic networks, and we compare it against commonly used alternative measures.
Resumo:
Laplacian-based descriptors, such as the Heat Kernel Signature and the Wave Kernel Signature, allow one to embed the vertices of a graph onto a vectorial space, and have been successfully used to find the optimal matching between a pair of input graphs. While the HKS uses a heat di↵usion process to probe the local structure of a graph, the WKS attempts to do the same through wave propagation. In this paper, we propose an alternative structural descriptor that is based on continuoustime quantum walks. More specifically, we characterise the structure of a graph using its average mixing matrix. The average mixing matrix is a doubly-stochastic matrix that encodes the time-averaged behaviour of a continuous-time quantum walk on the graph. We propose to use the rows of the average mixing matrix for increasing stopping times to develop a novel signature, the Average Mixing Matrix Signature (AMMS). We perform an extensive range of experiments and we show that the proposed signature is robust under structural perturbations of the original graphs and it outperforms both the HKS and WKS when used as a node descriptor in a graph matching task.
Resumo:
We generalize the Liapunov convexity theorem's version for vectorial control systems driven by linear ODEs of first-order p = 1 , in any dimension d ∈ N , by including a pointwise state-constraint. More precisely, given a x ‾ ( ⋅ ) ∈ W p , 1 ( [ a , b ] , R d ) solving the convexified p-th order differential inclusion L p x ‾ ( t ) ∈ co { u 0 ( t ) , u 1 ( t ) , … , u m ( t ) } a.e., consider the general problem consisting in finding bang-bang solutions (i.e. L p x ˆ ( t ) ∈ { u 0 ( t ) , u 1 ( t ) , … , u m ( t ) } a.e.) under the same boundary-data, x ˆ ( k ) ( a ) = x ‾ ( k ) ( a ) & x ˆ ( k ) ( b ) = x ‾ ( k ) ( b ) ( k = 0 , 1 , … , p − 1 ); but restricted, moreover, by a pointwise state constraint of the type 〈 x ˆ ( t ) , ω 〉 ≤ 〈 x ‾ ( t ) , ω 〉 ∀ t ∈ [ a , b ] (e.g. ω = ( 1 , 0 , … , 0 ) yielding x ˆ 1 ( t ) ≤ x ‾ 1 ( t ) ). Previous results in the scalar d = 1 case were the pioneering Amar & Cellina paper (dealing with L p x ( ⋅ ) = x ′ ( ⋅ ) ), followed by Cerf & Mariconda results, who solved the general case of linear differential operators L p of order p ≥ 2 with C 0 ( [ a , b ] ) -coefficients. This paper is dedicated to: focus on the missing case p = 1 , i.e. using L p x ( ⋅ ) = x ′ ( ⋅ ) + A ( ⋅ ) x ( ⋅ ) ; generalize the dimension of x ( ⋅ ) , from the scalar case d = 1 to the vectorial d ∈ N case; weaken the coefficients, from continuous to integrable, so that A ( ⋅ ) now becomes a d × d -integrable matrix; and allow the directional vector ω to become a moving AC function ω ( ⋅ ) . Previous vectorial results had constant ω, no matrix (i.e. A ( ⋅ ) ≡ 0 ) and considered: constant control-vertices (Amar & Mariconda) and, more recently, integrable control-vertices (ourselves).