961 resultados para graph matching algorithms
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
The metabolism of an organism consists of a network of biochemical reactions that transform small molecules, or metabolites, into others in order to produce energy and building blocks for essential macromolecules. The goal of metabolic flux analysis is to uncover the rates, or the fluxes, of those biochemical reactions. In a steady state, the sum of the fluxes that produce an internal metabolite is equal to the sum of the fluxes that consume the same molecule. Thus the steady state imposes linear balance constraints to the fluxes. In general, the balance constraints imposed by the steady state are not sufficient to uncover all the fluxes of a metabolic network. The fluxes through cycles and alternative pathways between the same source and target metabolites remain unknown. More information about the fluxes can be obtained from isotopic labelling experiments, where a cell population is fed with labelled nutrients, such as glucose that contains 13C atoms. Labels are then transferred by biochemical reactions to other metabolites. The relative abundances of different labelling patterns in internal metabolites depend on the fluxes of pathways producing them. Thus, the relative abundances of different labelling patterns contain information about the fluxes that cannot be uncovered from the balance constraints derived from the steady state. The field of research that estimates the fluxes utilizing the measured constraints to the relative abundances of different labelling patterns induced by 13C labelled nutrients is called 13C metabolic flux analysis. There exist two approaches of 13C metabolic flux analysis. In the optimization approach, a non-linear optimization task, where candidate fluxes are iteratively generated until they fit to the measured abundances of different labelling patterns, is constructed. In the direct approach, linear balance constraints given by the steady state are augmented with linear constraints derived from the abundances of different labelling patterns of metabolites. Thus, mathematically involved non-linear optimization methods that can get stuck to the local optima can be avoided. On the other hand, the direct approach may require more measurement data than the optimization approach to obtain the same flux information. Furthermore, the optimization framework can easily be applied regardless of the labelling measurement technology and with all network topologies. In this thesis we present a formal computational framework for direct 13C metabolic flux analysis. The aim of our study is to construct as many linear constraints to the fluxes from the 13C labelling measurements using only computational methods that avoid non-linear techniques and are independent from the type of measurement data, the labelling of external nutrients and the topology of the metabolic network. The presented framework is the first representative of the direct approach for 13C metabolic flux analysis that is free from restricting assumptions made about these parameters.In our framework, measurement data is first propagated from the measured metabolites to other metabolites. The propagation is facilitated by the flow analysis of metabolite fragments in the network. Then new linear constraints to the fluxes are derived from the propagated data by applying the techniques of linear algebra.Based on the results of the fragment flow analysis, we also present an experiment planning method that selects sets of metabolites whose relative abundances of different labelling patterns are most useful for 13C metabolic flux analysis. Furthermore, we give computational tools to process raw 13C labelling data produced by tandem mass spectrometry to a form suitable for 13C metabolic flux analysis.
Resumo:
The Printed Circuit Board (PCB) layout design is one of the most important and time consuming phases during equipment design process in all electronic industries. This paper is concerned with the development and implementation of a computer aided PCB design package. A set of programs which operate on a description of the circuit supplied by the user in the form of a data file and subsequently design the layout of a double-sided PCB has been developed. The algorithms used for the design of the PCB optimise the board area and the length of copper tracks used for the interconnections. The output of the package is the layout drawing of the PCB, drawn on a CALCOMP hard copy plotter and a Tektronix 4012 storage graphics display terminal. The routing density (the board area required for one component) achieved by this package is typically 0.8 sq. inch per IC. The package is implemented on a DEC 1090 system in Pascal and FORTRAN and SIGN(1) graphics package is used for display generation.
Resumo:
Web data can often be represented in free tree form; however, free tree mining methods seldom exist. In this paper, a computationally fast algorithm FreeS is presented to discover all frequently occurring free subtrees in a database of labelled free trees. FreeS is designed using an optimal canonical form, BOCF that can uniquely represent free trees even during the presence of isomorphism. To avoid enumeration of false positive candidates, it utilises the enumeration approach based on a tree-structure guided scheme. This paper presents lemmas that introduce conditions to conform the generation of free tree candidates during enumeration. Empirical study using both real and synthetic datasets shows that FreeS is scalable and significantly outperforms (i.e. few orders of magnitude faster than) the state-of-the-art frequent free tree mining algorithms, HybridTreeMiner and FreeTreeMiner.
Resumo:
The usual task in music information retrieval (MIR) is to find occurrences of a monophonic query pattern within a music database, which can contain both monophonic and polyphonic content. The so-called query-by-humming systems are a famous instance of content-based MIR. In such a system, the user's hummed query is converted into symbolic form to perform search operations in a similarly encoded database. The symbolic representation (e.g., textual, MIDI or vector data) is typically a quantized and simplified version of the sampled audio data, yielding to faster search algorithms and space requirements that can be met in real-life situations. In this thesis, we investigate geometric approaches to MIR. We first study some musicological properties often needed in MIR algorithms, and then give a literature review on traditional (e.g., string-matching-based) MIR algorithms and novel techniques based on geometry. We also introduce some concepts from digital image processing, namely the mathematical morphology, which we will use to develop and implement four algorithms for geometric music retrieval. The symbolic representation in the case of our algorithms is a binary 2-D image. We use various morphological pre- and post-processing operations on the query and the database images to perform template matching / pattern recognition for the images. The algorithms are basically extensions to classic image correlation and hit-or-miss transformation techniques used widely in template matching applications. They aim to be a future extension to the retrieval engine of C-BRAHMS, which is a research project of the Department of Computer Science at University of Helsinki.
Resumo:
An important question which has to be answered in evaluting the suitability of a microcomputer for a control application is the time it would take to execute the specified control algorithm. In this paper, we present a method of obtaining closed-form formulas to estimate this time. These formulas are applicable to control algorithms in which arithmetic operations and matrix manipulations dominate. The method does not require writing detailed programs for implementing the control algorithm. Using this method, the execution times of a variety of control algorithms on a range of 16-bit mini- and recently announced microcomputers are calculated. The formulas have been verified independently by an analysis program, which computes the execution time bounds of control algorithms coded in Pascal when they are run on a specified micro- or minicomputer.
Resumo:
A geodesic-based approach using Lamb waves is proposed to locate the acoustic emission (AE) source and damage in an isotropic metallic structure. In the case of the AE (passive) technique, the elastic waves take the shortest path from the source to the sensor array distributed in the structure. The geodesics are computed on the meshed surface of the structure using graph theory based on Dijkstra's algorithm. By propagating the waves in reverse virtually from these sensors along the geodesic path and by locating the first intersection point of these waves, one can get the AE source location. The same approach is extended for detection of damage in a structure. The wave response matrix of the given sensor configuration for the healthy and the damaged structure is obtained experimentally. The healthy and damage response matrix is compared and their difference gives the information about the reflection of waves from the damage. These waves are backpropagated from the sensors and the above method is used to locate the damage by finding the point where intersection of geodesics occurs. In this work, the geodesic approach is shown to be suitable to obtain a practicable source location solution in a more general set-up on any arbitrary surface containing finite discontinuities. Experiments were conducted on aluminum specimens of simple and complex geometry to validate this new method.
Resumo:
In [8], we recently presented two computationally efficient algorithms named B-RED and P-RED for random early detection. In this letter, we present the mathematical proof of convergence of these algorithms under general conditions to local minima.
Resumo:
The idea of extracting knowledge in process mining is a descendant of data mining. Both mining disciplines emphasise data flow and relations among elements in the data. Unfortunately, challenges have been encountered when working with the data flow and relations. One of the challenges is that the representation of the data flow between a pair of elements or tasks is insufficiently simplified and formulated, as it considers only a one-to-one data flow relation. In this paper, we discuss how the effectiveness of knowledge representation can be extended in both disciplines. To this end, we introduce a new representation of the data flow and dependency formulation using a flow graph. The flow graph solves the issue of the insufficiency of presenting other relation types, such as many-to-one and one-to-many relations. As an experiment, a new evaluation framework is applied to the Teleclaim process in order to show how this method can provide us with more precise results when compared with other representations.
Resumo:
We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and functi approximation ideas,and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.
Resumo:
State-of-the-art image-set matching techniques typically implicitly model each image-set with a Gaussian distribution. Here, we propose to go beyond these representations and model image-sets as probability distribution functions (PDFs) using kernel density estimators. To compare and match image-sets, we exploit Csiszar´ f-divergences, which bear strong connections to the geodesic distance defined on the space of PDFs, i.e., the statistical manifold. Furthermore, we introduce valid positive definite kernels on the statistical manifold, which let us make use of more powerful classification schemes to match image-sets. Finally, we introduce a supervised dimensionality reduction technique that learns a latent space where f-divergences reflect the class labels of the data. Our experiments on diverse problems, such as video-based face recognition and dynamic texture classification, evidence the benefits of our approach over the state-of-the-art image-set matching methods.
Resumo:
Diffuse large B-cell lymphoma (DLBCL) is the most common of the non-Hodgkin lymphomas. As DLBCL is characterized by heterogeneous clinical and biological features, its prognosis varies. To date, the International Prognostic Index has been the strongest predictor of outcome for DLBCL patients. However, no biological characters of the disease are taken into account. Gene expression profiling studies have identified two major cell-of-origin phenotypes in DLBCL with different prognoses, the favourable germinal centre B-cell-like (GCB) and the unfavourable activated B-cell-like (ABC) phenotypes. However, results of the prognostic impact of the immunohistochemically defined GCB and non-GCB distinction are controversial. Furthermore, since the addition of the CD20 antibody rituximab to chemotherapy has been established as the standard treatment of DLBCL, all molecular markers need to be evaluated in the post-rituximab era. In this study, we aimed to evaluate the predictive value of immunohistochemically defined cell-of-origin classification in DLBCL patients. The GCB and non-GCB phenotypes were defined according to the Hans algorithm (CD10, BCL6 and MUM1/IRF4) among 90 immunochemotherapy- and 104 chemotherapy-treated DLBCL patients. In the chemotherapy group, we observed a significant difference in survival between GCB and non-GCB patients, with a good and a poor prognosis, respectively. However, in the rituximab group, no prognostic value of the GCB phenotype was observed. Likewise, among 29 high-risk de novo DLBCL patients receiving high-dose chemotherapy and autologous stem cell transplantation, the survival of non-GCB patients was improved, but no difference in outcome was seen between GCB and non-GCB subgroups. Since the results suggested that the Hans algorithm was not applicable in immunochemotherapy-treated DLBCL patients, we aimed to further focus on algorithms based on ABC markers. We examined the modified activated B-cell-like algorithm based (MUM1/IRF4 and FOXP1), as well as a previously reported Muris algorithm (BCL2, CD10 and MUM1/IRF4) among 88 DLBCL patients uniformly treated with immunochemotherapy. Both algorithms distinguished the unfavourable ABC-like subgroup with a significantly inferior failure-free survival relative to the GCB-like DLBCL patients. Similarly, the results of the individual predictive molecular markers transcription factor FOXP1 and anti-apoptotic protein BCL2 have been inconsistent and should be assessed in immunochemotherapy-treated DLBCL patients. The markers were evaluated in a cohort of 117 patients treated with rituximab and chemotherapy. FOXP1 expression could not distinguish between patients, with favourable and those with poor outcomes. In contrast, BCL2-negative DLBCL patients had significantly superior survival relative to BCL2-positive patients. Our results indicate that the immunohistochemically defined cell-of-origin classification in DLBCL has a prognostic impact in the immunochemotherapy era, when the identifying algorithms are based on ABC-associated markers. We also propose that BCL2 negativity is predictive of a favourable outcome. Further investigational efforts are, however, warranted to identify the molecular features of DLBCL that could enable individualized cancer therapy in routine patient care.
Resumo:
This article analyzes the effect of devising a new failure envelope by the combination of the most commonly used failure criteria for the composite laminates, on the design of composite structures. The failure criteria considered for the study are maximum stress and Tsai-Wu criteria. In addition to these popular phenomenological-based failure criteria, a micromechanics-based failure criterion called failure mechanism-based failure criterion is also considered. The failure envelopes obtained by these failure criteria are superimposed over one another and a new failure envelope is constructed based on the lowest absolute values of the strengths predicted by these failure criteria. Thus, the new failure envelope so obtained is named as most conservative failure envelope. A minimum weight design of composite laminates is performed using genetic algorithms. In addition to this, the effect of stacking sequence on the minimum weight of the laminate is also studied. Results are compared for the different failure envelopes and the conservative design is evaluated, with respect to the designs obtained by using only one failure criteria. The design approach is recommended for structures where composites are the key load-carrying members such as helicopter rotor blades.