969 resultados para graph analysis
Resumo:
Background: Thermophilic proteins sustain themselves and function at higher temperatures. Despite their structural and functional similarities with their mesophilic homologues, they show enhanced stability. Various comparative studies at genomic, protein sequence and structure levels, and experimental works highlight the different factors and dominant interacting forces contributing to this increased stability. Methods: In this comparative structure based study, we have used interaction energies between amino acids, to generate structure networks called as Protein Energy Networks (PENs). These PENs are used to compute network, sub-graph, and node specific parameters. These parameters are then compared between the thermophile-mesophile homologues. Results: The results show an increased number of clusters and low energy cliques in thermophiles as the main contributing factors for their enhanced stability. Further more, we see an increase in the number of hubs in thermophiles. We also observe no community of electrostatic cliques forming in PENs. Conclusion: In this study we were able to take an energy based network approach, to identify the factors responsible for enhanced stability of thermophiles, by comparative analysis. We were able to point out that the sub-graph parameters are the prominent contributing factors. The thermophiles have a better-packed hydrophobic core. We have also discussed how thermophiles, although increasing stability through higher connectivity retains conformational flexibility, from a cliques and communities perspective.
Resumo:
Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects. In the second part of the thesis we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases, from this unified schema by graph mining techniques. Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs. The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.
Resumo:
Background. Several types of networks, such as transcriptional, metabolic or protein-protein interaction networks of various organisms have been constructed, that have provided a variety of insights into metabolism and regulation. Here, we seek to exploit the reaction-based networks of three organisms for comparative genomics. We use concepts from spectral graph theory to systematically determine how differences in basic metabolism of organisms are reflected at the systems level and in the overall topological structures of their metabolic networks. Methodology/Principal Findings. Metabolome-based reaction networks of Mycobacterium tuberculosis, Mycobacterium leprae and Escherichia coli have been constructed based on the KEGG LIGAND database, followed by graph spectral analysis of the network to identify hubs as well as the sub-clustering of reactions. The shortest and alternate paths in the reaction networks have also been examined. Sub-cluster profiling demonstrates that reactions of the mycolic acid pathway in mycobacteria form a tightly connected sub-cluster. Identification of hubs reveals reactions involving glutamate to be central to mycobacterial metabolism, and pyruvate to be at the centre of the E. coli metabolome. The analysis of shortest paths between reactions has revealed several paths that are shorter than well established pathways. Conclusions. We conclude that severe downsizing of the leprae genome has not significantly altered the global structure of its reaction network but has reduced the total number of alternate paths between its reactions while keeping the shortest paths between them intact. The hubs in the mycobacterial networks that are absent in the human metabolome can be explored as potential drug targets. This work demonstrates the usefulness of constructing metabolome based networks of organisms and the feasibility of their analyses through graph spectral methods. The insights obtained from such studies provide a broad overview of the similarities and differences between organisms, taking comparative genomics studies to a higher dimension.
Resumo:
An intelligent computer aided defect analysis (ICADA) system, based on artificial intelligence techniques, has been developed to identify design, process or material parameters which could be responsible for the occurrence of defective castings in a manufacturing campaign. The data on defective castings for a particular time frame, which is an input to the ICADA system, has been analysed. It was observed that a large proportion, i.e. 50-80% of all the defective castings produced in a foundry, have two, three or four types of defects occurring above a threshold proportion, say 10%. Also, a large number of defect types are either not found at all or found in a very small proportion, with a threshold value below 2%. An important feature of the ICADA system is the recognition of this pattern in the analysis. Thirty casting defect types and a large number of causes numbering between 50 and 70 for each, as identified in the AFS analysis of casting defects-the standard reference source for a casting process-constituted the foundation for building the knowledge base. Scientific rationale underlying the formation of a defect during the casting process was identified and 38 metacauses were coded. Process, material and design parameters which contribute to the metacauses were systematically examined and 112 were identified as rootcauses. The interconnections between defects, metacauses and rootcauses were represented as a three tier structured graph and the handling of uncertainty in the occurrence of events such as defects, metacauses and rootcauses was achieved by Bayesian analysis. The hill climbing search technique, associated with forward reasoning, was employed to recognize one or several root causes.
Resumo:
We give a detailed construction of a finite-state transition system for a com-connected Message Sequence Graph. Though this result is well-known in the literature and forms the basis for the solution to several analysis and verification problems concerning MSG specifications, the constructions given in the literature are either not amenable to implementation, or imprecise, or simply incorrect. In contrast we give a detailed construction along with a proof of its correctness. Our transition system is amenable to implementation, and can also be used for a bounded analysis of general (not necessarily com-connected) MSG specifications.
Resumo:
We consider evolving exponential RGGs in one dimension and characterize the time dependent behavior of some of their topological properties. We consider two evolution models and study one of them detail while providing a summary of the results for the other. In the first model, the inter-nodal gaps evolve according to an exponential AR(1) process that makes the stationary distribution of the node locations exponential. For this model we obtain the one-step conditional connectivity probabilities and extend it to the k-step case. Finite and asymptotic analysis are given. We then obtain the k-step connectivity probability conditioned on the network being disconnected. We also derive the pmf of the first passage time for a connected network to become disconnected. We then describe a random birth-death model where at each instant, the node locations evolve according to an AR(1) process. In addition, a random node is allowed to die while giving birth to a node at another location. We derive properties similar to those above.
Resumo:
A new method of network analysis, a generalization in several different senses of existing methods and applicable to all networks for which a branch-admittance (or impedance) matrix can be formed, is presented. The treatment of network determinants is very general and essentially four terminal rather than three terminal, and leads to simple expressions based on trees of a simple graph associated with the network and matrix, and involving products of low-order, usually(2 times 2)determinants of tree-branch admittances, in addition to tree-branch products as in existing methods. By comparison with existing methods, the total number of trees and of tree pairs is usually considerably reduced, and this fact, together with an easy method of tree-pair sign determination which is also presented, makes the new method simpler in general. The method can be very easily adapted, by the use of infinite parameters, to accommodate ideal transformers, operational amplifiers, and other forms of network constraint; in fact, is thought to be applicable to all linear networks.
Resumo:
The last few decades have witnessed application of graph theory and topological indices derived from molecular graph in structure-activity analysis. Such applications are based on regression and various multivariate analyses. Most of the topological indices are computed for the whole molecule and used as descriptors for explaining properties/activities of chemical compounds. However, some substructural descriptors in the form of topological distance based vertex indices have been found to be useful in identifying activity related substructures and in predicting pharmacological and toxicological activities of bioactive compounds. Another important aspect of drug discovery e. g. designing novel pharmaceutical candidates could also be done from the distance distribution associated with such vertex indices. In this article, we will review the development and applications of this approach both in activity prediction as well as in designing novel compounds.
Resumo:
Wireless sensor networks can often be viewed in terms of a uniform deployment of a large number of nodes in a region of Euclidean space. Following deployment, the nodes self-organize into a mesh topology with a key aspect being self-localization. Having obtained a mesh topology in a dense, homogeneous deployment, a frequently used approximation is to take the hop distance between nodes to be proportional to the Euclidean distance between them. In this work, we analyze this approximation through two complementary analyses. We assume that the mesh topology is a random geometric graph on the nodes; and that some nodes are designated as anchors with known locations. First, we obtain high probability bounds on the Euclidean distances of all nodes that are h hops away from a fixed anchor node. In the second analysis, we provide a heuristic argument that leads to a direct approximation for the density function of the Euclidean distance between two nodes that are separated by a hop distance h. This approximation is shown, through simulation, to very closely match the true density function. Localization algorithms that draw upon the preceding analyses are then proposed and shown to perform better than some of the well-known algorithms present in the literature. Belief-propagation-based message-passing is then used to further enhance the performance of the proposed localization algorithms. To our knowledge, this is the first usage of message-passing for hop-count-based self-localization.
Resumo:
Pervasive use of pointers in large-scale real-world applications continues to make points-to analysis an important optimization-enabler. Rapid growth of software systems demands a scalable pointer analysis algorithm. A typical inclusion-based points-to analysis iteratively evaluates constraints and computes a points-to solution until a fixpoint. In each iteration, (i) points-to information is propagated across directed edges in a constraint graph G and (ii) more edges are added by processing the points-to constraints. We observe that prioritizing the order in which the information is processed within each of the above two steps can lead to efficient execution of the points-to analysis. While earlier work in the literature focuses only on the propagation order, we argue that the other dimension, that is, prioritizing the constraint processing, can lead to even higher improvements on how fast the fixpoint of the points-to algorithm is reached. This becomes especially important as we prove that finding an optimal sequence for processing the points-to constraints is NP-Complete. The prioritization scheme proposed in this paper is general enough to be applied to any of the existing points-to analyses. Using the prioritization framework developed in this paper, we implement prioritized versions of Andersen's analysis, Deep Propagation, Hardekopf and Lin's Lazy Cycle Detection and Bloom Filter based points-to analysis. In each case, we report significant improvements in the analysis times (33%, 47%, 44%, 20% respectively) as well as the memory requirements for a large suite of programs, including SPEC 2000 benchmarks and five large open source programs.
Resumo:
Peer to peer networks are being used extensively nowadays for file sharing, video on demand and live streaming. For IPTV, delay deadlines are more stringent compared to file sharing. Coolstreaming was the first P2P IPTV system. In this paper, we model New Coolstreaming (newer version of Coolstreaming) via a queueing network. We use two time scale decomposition of Markov chains to compute the stationary distribution of number of peers and the expected number of substreams in the overlay which are not being received at the required rate due to parent overloading. We also characterize the end-to-end delay encountered by a video packet received by a user and originated at the server. Three factors contribute towards the delay. The first factor is the mean shortest path length between any two overlay peers in terms of overlay hops of the partnership graph which is shown to be O (log n) where n is the number of peers in the overlay. The second factor is the mean number of routers between any two overlay neighbours which is seen to be at most O (log N-I) where N-I is the number of routers in the internet. Third factor is the mean delay at a router in the internet. We provide an approximation of this mean delay E W]. Thus, the mean end to end delay in New Coolstreaming is shown to be upper bounded by O (log E N]) (log N-I) E (W)] where E N] is the mean number of peers at a channel.
Resumo:
The ability to perform strong updates is the main contributor to the precision of flow-sensitive pointer analysis algorithms. Traditional flow-sensitive pointer analyses cannot strongly update pointers residing in the heap. This is a severe restriction for Java programs. In this paper, we propose a new flow-sensitive pointer analysis algorithm for Java that can perform strong updates on heap-based pointers effectively. Instead of points-to graphs, we represent our points-to information as maps from access paths to sets of abstract objects. We have implemented our analysis and run it on several large Java benchmarks. The results show considerable improvement in precision over the points-to graph based flow-insensitive and flow-sensitive analyses, with reasonable running time.
Resumo:
In this paper, we study a problem of designing a multi-hop wireless network for interconnecting sensors (hereafter called source nodes) to a Base Station (BS), by deploying a minimum number of relay nodes at a subset of given potential locations, while meeting a quality of service (QoS) objective specified as a hop count bound for paths from the sources to the BS. The hop count bound suffices to ensure a certain probability of the data being delivered to the BS within a given maximum delay under a light traffic model. We observe that the problem is NP-Hard. For this problem, we propose a polynomial time approximation algorithm based on iteratively constructing shortest path trees and heuristically pruning away the relay nodes used until the hop count bound is violated. Results show that the algorithm performs efficiently in various randomly generated network scenarios; in over 90% of the tested scenarios, it gave solutions that were either optimal or were worse than optimal by just one relay. We then use random graph techniques to obtain, under a certain stochastic setting, an upper bound on the average case approximation ratio of a class of algorithms (including the proposed algorithm) for this problem as a function of the number of source nodes, and the hop count bound. To the best of our knowledge, the average case analysis is the first of its kind in the relay placement literature. Since the design is based on a light traffic model, we also provide simulation results (using models for the IEEE 802.15.4 physical layer and medium access control) to assess the traffic levels up to which the QoS objectives continue to be met. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
The transcriptional regulation of gene expression is orchestrated by complex networks of interacting genes. Increasing evidence indicates that these `transcriptional regulatory networks' (TRNs) in bacteria have an inherently hierarchical architecture, although the design principles and the specific advantages offered by this type of organization have not yet been fully elucidated. In this study, we focussed on the hierarchical structure of the TRN of the gram-positive bacterium Bacillus subtilis and performed a comparative analysis with the TRN of the gram-negative bacterium Escherichia coli. Using a graph-theoretic approach, we organized the transcription factors (TFs) and sigma-factors in the TRNs of B. subtilis and E. coli into three hierarchical levels (Top, Middle and Bottom) and studied several structural and functional properties across them. In addition to many similarities, we found also specific differences, explaining the majority of them with variations in the distribution of s-factors across the hierarchical levels in the two organisms. We then investigated the control of target metabolic genes by transcriptional regulators to characterize the differential regulation of three distinct metabolic subsystems (catabolism, anabolism and central energy metabolism). These results suggest that the hierarchical architecture that we observed in B. subtilis represents an effective organization of its TRN to achieve flexibility in response to a wide range of diverse stimuli.
Resumo:
The Jansen mechanism is a one degree-of-freedom, planar, 12-link, leg mechanism that can be used in mobile robotic applications and in gait analysis. This paper presents the kinematics and dynamics of the Jansen leg mechanism. The forward kinematics, accomplished using circle intersection method, determines the trajectories of various points on the mechanism in the chassis (stationary link) reference frame. From the foot point trajectory, the step length is shown to vary linearly while step height varies non-linearly with change in crank radius. A dynamic model for the Jansen leg mechanism is proposed using bond graph approach with modulated multiport transformers. For given ground reaction force pattern and crank angular speed, this model helps determine the motor torque profile as well as the link and joint stresses. The model can therefore be used to rate the actuator torque and in design of the hardware and controller for such a system. The kinematics of the mechanism can also be obtained from this dynamic model. The proposed model is thus a useful tool for analysis and design of systems based on the Jansen leg mechanism. (C) 2015 Elsevier B.V. All rights reserved.