977 resultados para AUTOMATION
Resumo:
Background: A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN) from transcript profiling data. Results: The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (l1-constrained fitting) problem and solved finally by formulating a Linear Program (LP). A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the INSILICO1, INSILICO2 and INSILICO3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published Saccharomyces cerevisae cell cycle transcript profiling data sets capture known regulatory associations. In each S. cerevisiae LP-SLGN, the number of nodes with a particular degree follows an approximate power law suggesting that its degree distributions is similar to that observed in real-world networks. Inspection of these LP-SLGNs suggests biological hypotheses amenable to experimental verification. Conclusion: A statistically robust and computationally efficient LP-based method for estimating the topology of a large sparse undirected graph from high-dimensional data yields representations of genetic networks that are biologically plausible and useful abstractions of the structures of real genetic networks. Analysis of the statistical and topological properties of learned LP-SLGNs may have practical value; for example, genes with high random walk betweenness, a measure of the centrality of a node in a graph, are good candidates for intervention studies and hence integrated computational – experimental investigations designed to infer more realistic and sophisticated probabilistic directed graphical model representations of genetic networks. The LP-based solutions of the sparse linear regression problem described here may provide a method for learning the structure of transcription factor networks from transcript profiling and transcription factor binding motif data.
Resumo:
A unit cube in k-dimension (or a k-cube) is defined as the Cartesian product R-1 x R-2 x ... x R-k, where each R-i is a closed interval on the real line of the form [a(j), a(i), + 1]. The cubicity of G, denoted as cub(G), is the minimum k such that G is the intersection graph of a collection of k-cubes. Many NP-complete graph problems can be solved efficiently or have good approximation ratios in graphs of low cubicity. In most of these cases the first step is to get a low dimensional cube representation of the given graph. It is known that for graph G, cub(G) <= left perpendicular2n/3right perpendicular. Recently it has been shown that for a graph G, cub(G) >= 4(Delta + 1) In n, where n and Delta are the number of vertices and maximum degree of G, respectively. In this paper, we show that for a bipartite graph G = (A boolean OR B, E) with |A| = n(1), |B| = n2, n(1) <= n(2), and Delta' = min {Delta(A),Delta(B)}, where Delta(A) = max(a is an element of A)d(a) and Delta(B) = max(b is an element of B) d(b), d(a) and d(b) being the degree of a and b in G, respectively , cub(G) <= 2(Delta' + 2) bar left rightln n(2)bar left arrow. We also give an efficient randomized algorithm to construct the cube representation of G in 3 (Delta' + 2) bar right arrowIn n(2)bar left arrow dimension. The reader may note that in general Delta' can be much smaller than Delta.
Resumo:
We show that the cubicity of a connected threshold graph is equal to inverted right perpendicularlog(2) alpha inverted left perpendicular, where alpha is its independence number.
Resumo:
An axis-parallel b-dimensional box is a Cartesian product R-1 x R-2 x ... x R-b where each R-i (for 1 <= i <= b) is a closed interval of the form [a(i), b(i)] on the real line. The boxicity of any graph G, box(G) is the minimum positive integer b such that G can be represented as the intersection graph of axis-parallel b-dimensional boxes. A b-dimensional cube is a Cartesian product R-1 x R-2 x ... x R-b, where each R-i (for 1 <= i <= b) is a closed interval of the form [a(i), a(i) + 1] on the real line. When the boxes are restricted to be axis-parallel cubes in b-dimension, the minimum dimension b required to represent the graph is called the cubicity of the graph (denoted by cub(G)). In this paper we prove that cub(G) <= inverted right perpendicularlog(2) ninverted left perpendicular box(G), where n is the number of vertices in the graph. We also show that this upper bound is tight.Some immediate consequences of the above result are listed below: 1. Planar graphs have cubicity at most 3inverted right perpendicularlog(2) ninvereted left perpendicular.2. Outer planar graphs have cubicity at most 2inverted right perpendicularlog(2) ninverted left perpendicular.3. Any graph of treewidth tw has cubicity at most (tw + 2) inverted right perpendicularlog(2) ninverted left perpendicular. Thus, chordal graphs have cubicity at most (omega + 1) inverted right erpendicularlog(2) ninverted left perpendicular and circular arc graphs have cubicity at most (2 omega + 1)inverted right perpendicularlog(2) ninverted left perpendicular, where omega is the clique number.
Resumo:
The development of techniques for scaling up classifiers so that they can be applied to problems with large datasets of training examples is one of the objectives of data mining. Recently, AdaBoost has become popular among machine learning community thanks to its promising results across a variety of applications. However, training AdaBoost on large datasets is a major problem, especially when the dimensionality of the data is very high. This paper discusses the effect of high dimensionality on the training process of AdaBoost. Two preprocessing options to reduce dimensionality, namely the principal component analysis and random projection are briefly examined. Random projection subject to a probabilistic length preserving transformation is explored further as a computationally light preprocessing step. The experimental results obtained demonstrate the effectiveness of the proposed training process for handling high dimensional large datasets.
Resumo:
Let G = (V, E) be a finite, simple and undirected graph. For S subset of V, let delta(S, G) = {(u, v) is an element of E : u is an element of S and v is an element of V - S} be the edge boundary of S. Given an integer i, 1 <= i <= vertical bar V vertical bar, let the edge isoperimetric value of G at i be defined as b(e)(i, G) = min(S subset of V:vertical bar S vertical bar=i)vertical bar delta(S, G)vertical bar. The edge isoperimetric peak of G is defined as b(e)(G) = max(1 <= j <=vertical bar V vertical bar)b(e)(j, G). Let b(v)(G) denote the vertex isoperimetric peak defined in a corresponding way. The problem of determining a lower bound for the vertex isoperimetric peak in complete t-ary trees was recently considered in [Y. Otachi, K. Yamazaki, A lower bound for the vertex boundary-width of complete k-ary trees, Discrete Mathematics, in press (doi: 10.1016/j.disc.2007.05.014)]. In this paper we provide bounds which improve those in the above cited paper. Our results can be generalized to arbitrary (rooted) trees. The depth d of a tree is the number of nodes on the longest path starting from the root and ending at a leaf. In this paper we show that for a complete binary tree of depth d (denoted as T-d(2)), c(1)d <= b(e) (T-d(2)) <= d and c(2)d <= b(v)(T-d(2)) <= d where c(1), c(2) are constants. For a complete t-ary tree of depth d (denoted as T-d(t)) and d >= c log t where c is a constant, we show that c(1)root td <= b(e)(T-d(t)) <= td and c(2)d/root t <= b(v) (T-d(t)) <= d where c(1), c(2) are constants. At the heart of our proof we have the following theorem which works for an arbitrary rooted tree and not just for a complete t-ary tree. Let T = (V, E, r) be a finite, connected and rooted tree - the root being the vertex r. Define a weight function w : V -> N where the weight w(u) of a vertex u is the number of its successors (including itself) and let the weight index eta(T) be defined as the number of distinct weights in the tree, i.e eta(T) vertical bar{w(u) : u is an element of V}vertical bar. For a positive integer k, let l(k) = vertical bar{i is an element of N : 1 <= i <= vertical bar V vertical bar, b(e)(i, G) <= k}vertical bar. We show that l(k) <= 2(2 eta+k k)
Resumo:
Computation of the dependency basis is the fundamental step in solving the membership problem for functional dependencies (FDs) and multivalued dependencies (MVDs) in relational database theory. We examine this problem from an algebraic perspective. We introduce the notion of the inference basis of a set M of MVDs and show that it contains the maximum information about the logical consequences of M. We propose the notion of a dependency-lattice and develop an algebraic characterization of inference basis using simple notions from lattice theory. We also establish several interesting properties of dependency-lattices related to the implication problem. Founded on our characterization, we synthesize efficient algorithms for (a): computing the inference basis of a given set M of MVDs; (b): computing the dependency basis of a given attribute set w.r.t. M; and (c): solving the membership problem for MVDs. We also show that our results naturally extend to incorporate FDs also in a way that enables the solution of the membership problem for both FDs and MVDs put together. We finally show that our algorithms are more efficient than existing ones, when used to solve what we term the ‘generalized membership problem’.
Resumo:
Computation of the dependency basis is the fundamental step in solving the implication problem for MVDs in relational database theory. We examine this problem from an algebraic perspective. We introduce the notion of the inference basis of a set M of MVDs and show that it contains the maximum information about the logical consequences of M. We propose the notion of an MVD-lattice and develop an algebraic characterization of the inference basis using simple notions from lattice theory. We also establish several properties of MVD-lattices related to the implication problem. Founded on our characterization, we synthesize efficient algorithms for (a) computing the inference basis of a given set M of MVDs; (b) computing the dependency basis of a given attribute set w.r.t. M; and (c) solving the implication problem for MVDs. Finally, we show that our results naturally extend to incorporate FDs also in a way that enables the solution of the implication problem for both FDs and MVDs put together.
Resumo:
Simultaneous consideration of both performance and reliability issues is important in the choice of computer architectures for real-time aerospace applications. One of the requirements for such a fault-tolerant computer system is the characteristic of graceful degradation. A shared and replicated resources computing system represents such an architecture. In this paper, a combinatorial model is used for the evaluation of the instruction execution rate of a degradable, replicated resources computing system such as a modular multiprocessor system. Next, a method is presented to evaluate the computation reliability of such a system utilizing a reliability graph model and the instruction execution rate. Finally, this computation reliability measure, which simultaneously describes both performance and reliability, is applied as a constraint in an architecture optimization model for such computing systems. Index Terms-Architecture optimization, computation
Resumo:
Several techniques are known for searching an ordered collection of data. The techniques and analyses of retrieval methods based on primary attributes are straightforward. Retrieval using secondary attributes depends on several factors. For secondary attribute retrieval, the linear structures—inverted lists, multilists, doubly linked lists—and the recently proposed nonlinear tree structures—multiple attribute tree (MAT), K-d tree (kdT)—have their individual merits. It is shown in this paper that, of the two tree structures, MAT possesses several features of a systematic data structure for external file organisation which make it superior to kdT. Analytic estimates for the complexity of node searchers, in MAT and kdT for several types of queries, are developed and compared.
Resumo:
This paper presents on overview of the issues in precisely defining, specifying and evaluating the dependability of software, particularly in the context of computer controlled process systems. Dependability is intended to be a generic term embodying various quality factors and is useful for both software and hardware. While the developments in quality assurance and reliability theories have proceeded mostly in independent directions for hardware and software systems, we present here the case for developing a unified framework of dependability—a facet of operational effectiveness of modern technological systems, and develop a hierarchical systems model helpful in clarifying this view. In the second half of the paper, we survey the models and methods available for measuring and improving software reliability. The nature of software “bugs”, the failure history of the software system in the various phases of its lifecycle, the reliability growth in the development phase, estimation of the number of errors remaining in the operational phase, and the complexity of the debugging process have all been considered to varying degrees of detail. We also discuss the notion of software fault-tolerance, methods of achieving the same, and the status of other measures of software dependability such as maintainability, availability and safety.
Resumo:
This paper is aimed at reviewing the notion of Byzantine-resilient distributed computing systems, the relevant protocols and their possible applications as reported in the literature. The three agreement problems, namely, the consensus problem, the interactive consistency problem, and the generals problem have been discussed. Various agreement protocols for the Byzantine generals problem have been summarized in terms of their performance and level of fault-tolerance. The three classes of Byzantine agreement protocols discussed are the deterministic, randomized, and approximate agreement protocols. Finally, application of the Byzantine agreement protocols to clock synchronization is highlighted.
Resumo:
This paper proposes new metrics and a performance-assessment framework for vision-based weed and fruit detection and classification algorithms. In order to compare algorithms, and make a decision on which one to use fora particular application, it is necessary to take into account that the performance obtained in a series of tests is subject to uncertainty. Such characterisation of uncertainty seems not to be captured by the performance metrics currently reported in the literature. Therefore, we pose the problem as a general problem of scientific inference, which arises out of incomplete information, and propose as a metric of performance the(posterior) predictive probabilities that the algorithms will provide a correct outcome for target and background detection. We detail the framework through which these predicted probabilities can be obtained, which is Bayesian in nature. As an illustration example, we apply the framework to the assessment of performance of four algorithms that could potentially be used in the detection of capsicums (peppers).
Resumo:
The world is rich with information such as signage and maps to assist humans to navigate. We present a method to extract topological spatial information from a generic bitmap floor plan and build a topometric graph that can be used by a mobile robot for tasks such as path planning and guided exploration. The algorithm first detects and extracts text in an image of the floor plan. Using the locations of the extracted text, flood fill is used to find the rooms and hallways. Doors are found by matching SURF features and these form the connections between rooms, which are the edges of the topological graph. Our system is able to automatically detect doors and differentiate between hallways and rooms, which is important for effective navigation. We show that our method can extract a topometric graph from a floor plan and is robust against ambiguous cases most commonly seen in floor plans including elevators and stairwells.
Resumo:
The application of multilevel control strategies for load-frequency control of interconnected power systems is assuming importance. A large multiarea power system may be viewed as an interconnection of several lower-order subsystems, with possible change of interconnection pattern during operation. The solution of the control problem involves the design of a set of local optimal controllers for the individual areas, in a completely decentralised environment, plus a global controller to provide the corrective signal to account for interconnection effects. A global controller, based on the least-square-error principle suggested by Siljak and Sundareshan, has been applied for the LFC problem. A more recent work utilises certain possible beneficial aspects of interconnection to permit more desirable system performances. The paper reports the application of the latter strategy to LFC of a two-area power system. The power-system model studied includes the effects of excitation system and governor controls. A comparison of the two strategies is also made.