814 resultados para Hierarchical clustering model
Resumo:
Structural information over the entire course of binding interactions based on the analyses of energy landscapes is described, which provides a framework to understand the events involved during biomolecular recognition. Conformational dynamics of malectin's exquisite selectivity for diglucosylated N-glycan (Dig-N-glycan), a highly flexible oligosaccharide comprising of numerous dihedral torsion angles, are described as an example. For this purpose, a novel approach based on hierarchical sampling for acquiring metastable molecular conformations constituting low-energy minima for understanding the structural features involved in a biologic recognition is proposed. For this purpose, four variants of principal component analysis were employed recursively in both Cartesian space and dihedral angles space that are characterized by free energy landscapes to select the most stable conformational substates. Subsequently, k-means clustering algorithm was implemented for geometric separation of the major native state to acquire a final ensemble of metastable conformers. A comparison of malectin complexes was then performed to characterize their conformational properties. Analyses of stereochemical metrics and other concerted binding events revealed surface complementarity, cooperative and bidentate hydrogen bonds, water-mediated hydrogen bonds, carbohydrate-aromatic interactions including CH-pi and stacking interactions involved in this recognition. Additionally, a striking structural transition from loop to beta-strands in malectin CRD upon specific binding to Dig-N-glycan is observed. The interplay of the above-mentioned binding events in malectin and Dig-N-glycan supports an extended conformational selection model as the underlying binding mechanism.
Resumo:
We propose a novel model for the spatio-temporal clustering of trajectories based on motion, which applies to challenging street-view video sequences of pedestrians captured by a mobile camera. A key contribution of our work is the introduction of novel probabilistic region trajectories, motivated by the non-repeatability of segmentation of frames in a video sequence. Hierarchical image segments are obtained by using a state-of-the-art hierarchical segmentation algorithm, and connected from adjacent frames in a directed acyclic graph. The region trajectories and measures of confidence are extracted from this graph using a dynamic programming-based optimisation. Our second main contribution is a Bayesian framework with a twofold goal: to learn the optimal, in a maximum likelihood sense, Random Forests classifier of motion patterns based on video features, and construct a unique graph from region trajectories of different frames, lengths and hierarchical levels. Finally, we demonstrate the use of Isomap for effective spatio-temporal clustering of the region trajectories of pedestrians. We support our claims with experimental results on new and existing challenging video sequences. © 2011 IEEE.
Resumo:
Latent variable models for network data extract a summary of the relational structure underlying an observed network. The simplest possible models subdivide nodes of the network into clusters; the probability of a link between any two nodes then depends only on their cluster assignment. Currently available models can be classified by whether clusters are disjoint or are allowed to overlap. These models can explain a "flat" clustering structure. Hierarchical Bayesian models provide a natural approach to capture more complex dependencies. We propose a model in which objects are characterised by a latent feature vector. Each feature is itself partitioned into disjoint groups (subclusters), corresponding to a second layer of hierarchy. In experimental comparisons, the model achieves significantly improved predictive performance on social and biological link prediction tasks. The results indicate that models with a single layer hierarchy over-simplify real networks.
Resumo:
Struyf, J., Dzeroski, S. Blockeel, H. and Clare, A. (2005) Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics. In proceedings of the EPIA 2005 CMB Workshop
Resumo:
The need for the ability to cluster unknown data to better understand its relationship to know data is prevalent throughout science. Besides a better understanding of the data itself or learning about a new unknown object, cluster analysis can help with processing data, data standardization, and outlier detection. Most clustering algorithms are based on known features or expectations, such as the popular partition based, hierarchical, density-based, grid based, and model based algorithms. The choice of algorithm depends on many factors, including the type of data and the reason for clustering, nearly all rely on some known properties of the data being analyzed. Recently, Li et al. proposed a new universal similarity metric, this metric needs no prior knowledge about the object. Their similarity metric is based on the Kolmogorov Complexity of objects, the objects minimal description. While the Kolmogorov Complexity of an object is not computable, in "Clustering by Compression," Cilibrasi and Vitanyi use common compression algorithms to approximate the universal similarity metric and cluster objects with high success. Unfortunately, clustering using compression does not trivially extend to higher dimensions. Here we outline a method to adapt their procedure to images. We test these techniques on images of letters of the alphabet.
Resumo:
In studies of radiation-induced DNA fragmentation and repair, analytical models may provide rapid and easy-to-use methods to test simple hypotheses regarding the breakage and rejoining mechanisms involved. The random breakage model, according to which lesions are distributed uniformly and independently of each other along the DNA, has been the model most used to describe spatial distribution of radiation-induced DNA damage. Recently several mechanistic approaches have been proposed that model clustered damage to DNA. In general, such approaches focus on the study of initial radiation-induced DNA damage and repair, without considering the effects of additional (unwanted and unavoidable) fragmentation that may take place during the experimental procedures. While most approaches, including measurement of total DNA mass below a specified value, allow for the occurrence of background experimental damage by means of simple subtractive procedures, a more detailed analysis of DNA fragmentation necessitates a more accurate treatment. We have developed a new, relatively simple model of DNA breakage and the resulting rejoining kinetics of broken fragments. Initial radiation-induced DNA damage is simulated using a clustered breakage approach, with three free parameters: the number of independently located clusters, each containing several DNA double-strand breaks (DSBs), the average number of DSBs within a cluster (multiplicity of the cluster), and the maximum allowed radius within which DSBs belonging to the same cluster are distributed. Random breakage is simulated as a special case of the DSB clustering procedure. When the model is applied to the analysis of DNA fragmentation as measured with pulsed-field gel electrophoresis (PFGE), the hypothesis that DSBs in proximity rejoin at a different rate from that of sparse isolated breaks can be tested, since the kinetics of rejoining of fragments of varying size may be followed by means of computer simulations. The problem of how to account for background damage from experimental handling is also carefully considered. We have shown that the conventional procedure of subtracting the background damage from the experimental data may lead to erroneous conclusions during the analysis of both initial fragmentation and DSB rejoining. Despite its relative simplicity, the method presented allows both the quantitative and qualitative description of radiation-induced DNA fragmentation and subsequent rejoining of double-stranded DNA fragments. (C) 2004 by Radiation Research Society.