969 resultados para graph analysis
Resumo:
[EN]Based on the theoretical tools of Complex Networks, this work provides a basic descriptive study of a synonyms dictionary, the Spanish Open Thesaurus represented as a graph. We study the main structural measures of the network compared with those of a random graph. Numerical results show that Open-Thesaurus is a graph whose topological properties approximate a scale-free network, but seems not to present the small-world property because of its sparse structure. We also found that the words of highest betweenness centrality are terms that suggest the vocabulary of psychoanalysis: placer (pleasure), ayudante (in the sense of assistant or worker), and regular (to regulate).
Resumo:
The connections between convexity and submodularity are explored, for purposes of minimizing and learning submodular set functions.
First, we develop a novel method for minimizing a particular class of submodular functions, which can be expressed as a sum of concave functions composed with modular functions. The basic algorithm uses an accelerated first order method applied to a smoothed version of its convex extension. The smoothing algorithm is particularly novel as it allows us to treat general concave potentials without needing to construct a piecewise linear approximation as with graph-based techniques.
Second, we derive the general conditions under which it is possible to find a minimizer of a submodular function via a convex problem. This provides a framework for developing submodular minimization algorithms. The framework is then used to develop several algorithms that can be run in a distributed fashion. This is particularly useful for applications where the submodular objective function consists of a sum of many terms, each term dependent on a small part of a large data set.
Lastly, we approach the problem of learning set functions from an unorthodox perspective---sparse reconstruction. We demonstrate an explicit connection between the problem of learning set functions from random evaluations and that of sparse signals. Based on the observation that the Fourier transform for set functions satisfies exactly the conditions needed for sparse reconstruction algorithms to work, we examine some different function classes under which uniform reconstruction is possible.
Resumo:
We consider a large scale network of interconnected heterogeneous dynamical components. Scalable stability conditions are derived that involve the input/output properties of individual subsystems and the interconnection matrix. The analysis is based on the Davis-Wielandt shell, a higher dimensional version of the numerical range with important convexity properties. This can be used to allow heterogeneity in the agent dynamics while relaxing normality and symmetry assumptions on the interconnection matrix. The results include small gain and passivity approaches as special cases, with the three dimensional shell shown to be inherently connected with corresponding graph separation arguments. © 2012 Society for Industrial and Applied Mathematics.
Resumo:
Lee M.H., Qualitative Circuit Models in Failure Analysis Reasoning, AI Journal. vol 111, pp239-276.1999.
Resumo:
We describe a strategy for Markov chain Monte Carlo analysis of non-linear, non-Gaussian state-space models involving batch analysis for inference on dynamic, latent state variables and fixed model parameters. The key innovation is a Metropolis-Hastings method for the time series of state variables based on sequential approximation of filtering and smoothing densities using normal mixtures. These mixtures are propagated through the non-linearities using an accurate, local mixture approximation method, and we use a regenerating procedure to deal with potential degeneracy of mixture components. This provides accurate, direct approximations to sequential filtering and retrospective smoothing distributions, and hence a useful construction of global Metropolis proposal distributions for simulation of posteriors for the set of states. This analysis is embedded within a Gibbs sampler to include uncertain fixed parameters. We give an example motivated by an application in systems biology. Supplemental materials provide an example based on a stochastic volatility model as well as MATLAB code.
Resumo:
The requirement for a very accurate dependence analysis to underpin software tools to aid the generation of efficient parallel implementations of scalar code is argued. The current status of dependence analysis is shown to be inadequate for the generation of efficient parallel code, causing too many conservative assumptions to be made. This paper summarises the limitations of conventional dependence analysis techniques, and then describes a series of extensions which enable the production of a much more accurate dependence graph. The extensions include analysis of symbolic variables, the development of a symbolic inequality disproof algorithm and its exploitation in a symbolic Banerjee inequality test; the use of inference engine proofs; the exploitation of exact dependence and dependence pre-domination attributes; interprocedural array analysis; conditional variable definition tracing; integer array tracing and division calculations. Analysis case studies on typical numerical code is shown to reduce the total dependencies estimated from conventional analysis by up to 50%. The techniques described in this paper have been embedded within a suite of tools, CAPTools, which combines analysis with user knowledge to produce efficient parallel implementations of numerical mesh based codes.
Resumo:
In the present paper, we introduce a notion of a style representing abstract, complex objects having characteristics that can be represented as structured objects. Furthermore, we provide some mathematical properties of such styles. As a main result, we present a novel approach to perform a meaningful comparative analysis of such styles by defining and using graph-theoretic measures. We compare two styles by comparing the underlying feature sets representing sets of graph structurally. To determine the structural similarity between the underlying graphs, we use graph similarity measures that are computationally efficient. More precisely, in order to compare styles, we map each feature set to a so-called median graph and compare the resulting median graphs. As an application, we perform an experimental study to compare special styles representing sets of undirected graphs and present numerical results thereof. (C) 2007 Elsevier Inc. All rights reserved.
Resumo:
Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.
Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.
Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.
Resumo:
Dissertation to obtain the degree of Doctor in Electrical and Computer Engineering, specialization of Collaborative Networks
Resumo:
A complex network is an abstract representation of an intricate system of interrelated elements where the patterns of connection hold significant meaning. One particular complex network is a social network whereby the vertices represent people and edges denote their daily interactions. Understanding social network dynamics can be vital to the mitigation of disease spread as these networks model the interactions, and thus avenues of spread, between individuals. To better understand complex networks, algorithms which generate graphs exhibiting observed properties of real-world networks, known as graph models, are often constructed. While various efforts to aid with the construction of graph models have been proposed using statistical and probabilistic methods, genetic programming (GP) has only recently been considered. However, determining that a graph model of a complex network accurately describes the target network(s) is not a trivial task as the graph models are often stochastic in nature and the notion of similarity is dependent upon the expected behavior of the network. This thesis examines a number of well-known network properties to determine which measures best allowed networks generated by different graph models, and thus the models themselves, to be distinguished. A proposed meta-analysis procedure was used to demonstrate how these network measures interact when used together as classifiers to determine network, and thus model, (dis)similarity. The analytical results form the basis of the fitness evaluation for a GP system used to automatically construct graph models for complex networks. The GP-based automatic inference system was used to reproduce existing, well-known graph models as well as a real-world network. Results indicated that the automatically inferred models exemplified functional similarity when compared to their respective target networks. This approach also showed promise when used to infer a model for a mammalian brain network.
Resumo:
Biological systems exhibit rich and complex behavior through the orchestrated interplay of a large array of components. It is hypothesized that separable subsystems with some degree of functional autonomy exist; deciphering their independent behavior and functionality would greatly facilitate understanding the system as a whole. Discovering and analyzing such subsystems are hence pivotal problems in the quest to gain a quantitative understanding of complex biological systems. In this work, using approaches from machine learning, physics and graph theory, methods for the identification and analysis of such subsystems were developed. A novel methodology, based on a recent machine learning algorithm known as non-negative matrix factorization (NMF), was developed to discover such subsystems in a set of large-scale gene expression data. This set of subsystems was then used to predict functional relationships between genes, and this approach was shown to score significantly higher than conventional methods when benchmarking them against existing databases. Moreover, a mathematical treatment was developed to treat simple network subsystems based only on their topology (independent of particular parameter values). Application to a problem of experimental interest demonstrated the need for extentions to the conventional model to fully explain the experimental data. Finally, the notion of a subsystem was evaluated from a topological perspective. A number of different protein networks were examined to analyze their topological properties with respect to separability, seeking to find separable subsystems. These networks were shown to exhibit separability in a nonintuitive fashion, while the separable subsystems were of strong biological significance. It was demonstrated that the separability property found was not due to incomplete or biased data, but is likely to reflect biological structure.
Resumo:
Accurately and reliably identifying the actual number of clusters present with a dataset of gene expression profiles, when no additional information on cluster structure is available, is a problem addressed by few algorithms. GeneMCL transforms microarray analysis data into a graph consisting of nodes connected by edges, where the nodes represent genes, and the edges represent the similarity in expression of those genes, as given by a proximity measurement. This measurement is taken to be the Pearson correlation coefficient combined with a local non-linear rescaling step. The resulting graph is input to the Markov Cluster (MCL) algorithm, which is an elegant, deterministic, non-specific and scalable method, which models stochastic flow through the graph. The algorithm is inherently affected by any cluster structure present, and rapidly decomposes a graph into cohesive clusters. The potential of the GeneMCL algorithm is demonstrated with a 5730 gene subset (IGS) of the Van't Veer breast cancer database, for which the clusterings are shown to reflect underlying biological mechanisms. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
The present work describes a new tool that helps bidders improve their competitive bidding strategies. This new tool consists of an easy-to-use graphical tool that allows the use of more complex decision analysis tools in the field of Competitive Bidding. The graphic tool described here tries to move away from previous bidding models which attempt to describe the result of an auction or a tender process by means of studying each possible bidder with probability density functions. As an illustration, the tool is applied to three practical cases. Theoretical and practical conclusions on the great potential breadth of application of the tool are also presented.
Resumo:
Cacao swollen shoot virus (CSSV) causes the Cacao swollen shoot virus disease (CSSVD) and significantly reduces production in West African cacao. This study characterised the current status of the disease in the major cacao growing States in Nigeria and attempted a clarification on the manner of CSSV transmission. Two separate field surveys and sample collections were conducted in Nigeria in summer 2012 and spring 2013. PCR-based screening of cacao leaf samples and subsequent DNA sequencing showed that the disease continues to persist in Ondo and Oyo States and in new cacao sites in Abia, Akwa Ibom, Cross River and Edo States. Mealybug samples collected were identified using a robust approach involving environmental scanning electron microscopy, histology and DNA barcoding, which highlighted the importance of integrative taxonomy in the study. The results show that the genus Planococcus (Planococcus citri (Risso) and/or Planococcus minor (Maskell)) was the most abundant vector (73.5%) at the sites examined followed by Formicococcus njalensis (Laing) (19.0 %). In a laboratory study, the feeding behaviour of Pl. citri, Pseudococcus longispinus (Targioni-Tozzetti) and Pseudococcus viburni (Signoret) on cacao were investigated using electrical penetration graph (EPG) analysis. EPG waveforms reflecting intercellular stylet penetration (C), extracellular salivation (E1e), salivation in sieve elements (E1), phloem ingestion (E2), derailed stylet mechanics (F), xylem ingestion (G) and non-probing phase (Np) were analysed. Individual mealybugs exhibited marked variation within species and significantly differed (p ≤ .05) between species for E1e and E1. PCR-based assessments of the retention time for CSSV in viruliferous Pl. citri, Ps. longispinus and Ps. viburni fed on a non-cacao diet showed that CSSV was still detectable after 144 hours. These unusually long durations for a pathogen currently classified as a semi-persistent virus have implications for the design of non-malvaceous barrier crops currently being considered for the protection of new cacao plantings.