795 resultados para hierarchical clustering
Resumo:
Modern non-invasive brain imaging technologies, such as diffusion weighted magnetic resonance imaging (DWI), enable the mapping of neural fiber tracts in the white matter, providing a basis to reconstruct a detailed map of brain structural connectivity networks. Brain connectivity networks differ from random networks in their topology, which can be measured using small worldness, modularity, and high-degree nodes (hubs). Still, little is known about how individual differences in structural brain network properties relate to age, sex, or genetic differences. Recently, some groups have reported brain network biomarkers that enable differentiation among individuals, pairs of individuals, and groups of individuals. In addition to studying new topological features, here we provide a unifying general method to investigate topological brain networks and connectivity differences between individuals, pairs of individuals, and groups of individuals at several levels of the data hierarchy, while appropriately controlling false discovery rate (FDR) errors. We apply our new method to a large dataset of high quality brain connectivity networks obtained from High Angular Resolution Diffusion Imaging (HARDI) tractography in 303 young adult twins, siblings, and unrelated people. Our proposed approach can accurately classify brain connectivity networks based on sex (93% accuracy) and kinship (88.5% accuracy). We find statistically significant differences associated with sex and kinship both in the brain connectivity networks and in derived topological metrics, such as the clustering coefficient and the communicability matrix.
Resumo:
A computationally efficient agglomerative clustering algorithm based on multilevel theory is presented. Here, the data set is divided randomly into a number of partitions. The samples of each such partition are clustered separately using hierarchical agglomerative clustering algorithm to form sub-clusters. These are merged at higher levels to get the final classification. This algorithm leads to the same classification as that of hierarchical agglomerative clustering algorithm when the clusters are well separated. The advantages of this algorithm are short run time and small storage requirement. It is observed that the savings, in storage space and computation time, increase nonlinearly with the sample size.
Resumo:
Clustering identities in a video is a useful task to aid in video search, annotation and retrieval, and cast identification. However, reliably clustering faces across multiple videos is challenging task due to variations in the appearance of the faces, as videos are captured in an uncontrolled environment. A person's appearance may vary due to session variations including: lighting and background changes, occlusions, changes in expression and make up. In this paper we propose the novel Local Total Variability Modelling (Local TVM) approach to cluster faces across a news video corpus; and incorporate this into a novel two stage video clustering system. We first cluster faces within a single video using colour, spatial and temporal cues; after which we use face track modelling and hierarchical agglomerative clustering to cluster faces across the entire corpus. We compare different face recognition approaches within this framework. Experiments on a news video database show that the Local TVM technique is able effectively model the session variation observed in the data, resulting in improved clustering performance, with much greater computational efficiency than other methods.
Resumo:
A method for determining the mutual nearest neighbours (MNN) and mutual neighbourhood value (mnv) of a sample point, using the conventional nearest neighbours, is suggested. A nonparametric, hierarchical, agglomerative clustering algorithm is developed using the above concepts. The algorithm is simple, deterministic, noniterative, requires low storage and is able to discern spherical and nonspherical clusters. The method is applicable to a wide class of data of arbitrary shape, large size and high dimensionality. The algorithm can discern mutually homogenous clusters. Strong or weak patterns can be discerned by properly choosing the neighbourhood width.
Resumo:
A nonparametric, hierarchical, disaggregative clustering algorithm is developed using a novel similarity measure, called the mutual neighborhood value (MNV), which takes into account the conventional nearest neighbor ranks of two samples with respect to each other. The algorithm is simple, noniterative, requires low storage, and needs no specification of the expected number of clusters. The algorithm appears very versatile as it is capable of discerning spherical and nonspherical clusters, linearly nonseparable clusters, clusters with unequal populations, and clusters with lowdensity bridges. Changing of the neighborhood size enables discernment of strong or weak patterns.
Resumo:
Relative geometric arrangements of the sample points, with reference to the structure of the imbedding space, produce clusters. Hence, if each sample point is imagined to acquire a volume of a small M-cube (called pattern-cell), depending on the ranges of its (M) features and number (N) of samples; then overlapping pattern-cells would indicate naturally closer sample-points. A chain or blob of such overlapping cells would mean a cluster and separate clusters would not share a common pattern-cell between them. The conditions and an analytic method to find such an overlap are developed. A simple, intuitive, nonparametric clustering procedure, based on such overlapping pattern-cells is presented. It may be classified as an agglomerative, hierarchical, linkage-type clustering procedure. The algorithm is fast, requires low storage and can identify irregular clusters. Two extensions of the algorithm, to separate overlapping clusters and to estimate the nature of pattern distributions in the sample space, are also indicated.
Resumo:
Here we rederive the hierarchy of equations for the evolution of distribution functions of various orders using a convenient parameterization. We use this to obtain equations for two- and three-point correlation functions in powers of a small parameter, viz., the initial density contrast. The correspondence of the lowest order solutions of these equations to the results from the linear theory of density perturbations is shown for an OMEGA = 1 universe. These equations are then used to calculate, to the lowest order, the induced three-point correlation function that arises from Gaussian initial conditions in an OMEGA = 1 universe. We obtain an expression which explicitly exhibits the spatial structure of the induced three-point correlation function. It is seen that the spatial structure of this quantity is independent of the value of OMEGA. We also calculate the triplet momentum. We find that the induced three-point correlation function does not have the ''hierarchical'' form often assumed. We discuss possibilities of using the induced three-point correlation to interpret observational data. The formalism developed here can also be used to test a validity of different schemes to close the
Resumo:
In this paper, we develop a game theoretic approach for clustering features in a learning problem. Feature clustering can serve as an important preprocessing step in many problems such as feature selection, dimensionality reduction, etc. In this approach, we view features as rational players of a coalitional game where they form coalitions (or clusters) among themselves in order to maximize their individual payoffs. We show how Nash Stable Partition (NSP), a well known concept in the coalitional game theory, provides a natural way of clustering features. Through this approach, one can obtain some desirable properties of the clusters by choosing appropriate payoff functions. For a small number of features, the NSP based clustering can be found by solving an integer linear program (ILP). However, for large number of features, the ILP based approach does not scale well and hence we propose a hierarchical approach. Interestingly, a key result that we prove on the equivalence between a k-size NSP of a coalitional game and minimum k-cut of an appropriately constructed graph comes in handy for large scale problems. In this paper, we use feature selection problem (in a classification setting) as a running example to illustrate our approach. We conduct experiments to illustrate the efficacy of our approach.
Resumo:
Structural information over the entire course of binding interactions based on the analyses of energy landscapes is described, which provides a framework to understand the events involved during biomolecular recognition. Conformational dynamics of malectin's exquisite selectivity for diglucosylated N-glycan (Dig-N-glycan), a highly flexible oligosaccharide comprising of numerous dihedral torsion angles, are described as an example. For this purpose, a novel approach based on hierarchical sampling for acquiring metastable molecular conformations constituting low-energy minima for understanding the structural features involved in a biologic recognition is proposed. For this purpose, four variants of principal component analysis were employed recursively in both Cartesian space and dihedral angles space that are characterized by free energy landscapes to select the most stable conformational substates. Subsequently, k-means clustering algorithm was implemented for geometric separation of the major native state to acquire a final ensemble of metastable conformers. A comparison of malectin complexes was then performed to characterize their conformational properties. Analyses of stereochemical metrics and other concerted binding events revealed surface complementarity, cooperative and bidentate hydrogen bonds, water-mediated hydrogen bonds, carbohydrate-aromatic interactions including CH-pi and stacking interactions involved in this recognition. Additionally, a striking structural transition from loop to beta-strands in malectin CRD upon specific binding to Dig-N-glycan is observed. The interplay of the above-mentioned binding events in malectin and Dig-N-glycan supports an extended conformational selection model as the underlying binding mechanism.
Resumo:
T-cell responses in humans are initiated by the binding of a peptide antigen to a human leukocyte antigen (HLA) molecule. The peptide-HLA complex then recruits an appropriate T cell, leading to cell-mediated immunity. More than 2000 HLA class-I alleles are known in humans, and they vary only in their peptide-binding grooves. The polymorphism they exhibit enables them to bind a wide range of peptide antigens from diverse sources. HLA molecules and peptides present a complex molecular recognition pattern, as many peptides bind to a given allele and a given peptide can be recognized by many alleles. A powerful grouping scheme that not only provides an insightful classification, but is also capable of dissecting the physicochemical basis of recognition specificity is necessary to address this complexity. We present a hierarchical classification of 2010 class-I alleles by using a systematic divisive clustering method. All-pair distances of alleles were obtained by comparing binding pockets in the structural models. By varying the similarity thresholds, a multilevel classification was obtained, with 7 supergroups, each further subclassifying to yield 72 groups. An independent clustering performed based only on similarities in their epitope pools correlated highly with pocket-based clustering. Physicochemical feature combinations that best explain the basis of clustering are identified. Mutual information calculated for the set of peptide ligands enables identification of binding site residues contributing to peptide specificity. The grouping of HLA molecules achieved here will be useful for rational vaccine design, understanding disease susceptibilities and predicting risk of organ transplants.
Resumo:
We propose a novel model for the spatio-temporal clustering of trajectories based on motion, which applies to challenging street-view video sequences of pedestrians captured by a mobile camera. A key contribution of our work is the introduction of novel probabilistic region trajectories, motivated by the non-repeatability of segmentation of frames in a video sequence. Hierarchical image segments are obtained by using a state-of-the-art hierarchical segmentation algorithm, and connected from adjacent frames in a directed acyclic graph. The region trajectories and measures of confidence are extracted from this graph using a dynamic programming-based optimisation. Our second main contribution is a Bayesian framework with a twofold goal: to learn the optimal, in a maximum likelihood sense, Random Forests classifier of motion patterns based on video features, and construct a unique graph from region trajectories of different frames, lengths and hierarchical levels. Finally, we demonstrate the use of Isomap for effective spatio-temporal clustering of the region trajectories of pedestrians. We support our claims with experimental results on new and existing challenging video sequences. © 2011 IEEE.
Resumo:
A system is described that tracks moving objects in a video dataset so as to extract a representation of the objects' 3D trajectories. The system then finds hierarchical clusters of similar trajectories in the video dataset. Objects' motion trajectories are extracted via an EKF formulation that provides each object's 3D trajectory up to a constant factor. To increase accuracy when occlusions occur, multiple tracking hypotheses are followed. For trajectory-based clustering and retrieval, a modified version of edit distance, called longest common subsequence (LCSS) is employed. Similarities are computed between projections of trajectories on coordinate axes. Trajectories are grouped based, using an agglomerative clustering algorithm. To check the validity of the approach, experiments using real data were performed.
Resumo:
The need for the ability to cluster unknown data to better understand its relationship to know data is prevalent throughout science. Besides a better understanding of the data itself or learning about a new unknown object, cluster analysis can help with processing data, data standardization, and outlier detection. Most clustering algorithms are based on known features or expectations, such as the popular partition based, hierarchical, density-based, grid based, and model based algorithms. The choice of algorithm depends on many factors, including the type of data and the reason for clustering, nearly all rely on some known properties of the data being analyzed. Recently, Li et al. proposed a new universal similarity metric, this metric needs no prior knowledge about the object. Their similarity metric is based on the Kolmogorov Complexity of objects, the objects minimal description. While the Kolmogorov Complexity of an object is not computable, in "Clustering by Compression," Cilibrasi and Vitanyi use common compression algorithms to approximate the universal similarity metric and cluster objects with high success. Unfortunately, clustering using compression does not trivially extend to higher dimensions. Here we outline a method to adapt their procedure to images. We test these techniques on images of letters of the alphabet.
Resumo:
Efficient searching is crucial for timely location of food and other resources. Recent studies show diverse living animals employ a theoretically optimal scale-free random search for sparse resources known as a Lévy walk, but little is known of the origins and evolution of foraging behaviour and the search strategies of extinct organisms. Here we show using simulations of self-avoiding trace fossil trails that randomly introduced strophotaxis (U-turns) – initiated by obstructions such as ¬¬¬self-trail avoidance or innate cueing – leads to random looping patterns with clustering across increasing scales that is consistent with the presence of Lévy walks. This predicts optimal Lévy searches can emerge from simple behaviours observed in fossil trails. We then analysed fossilized trails of benthic marine organisms using a novel path analysis technique and find the first evidence of Lévy-like search strategies in extinct animals. Our results show that simple search behaviours of extinct animals in heterogeneous environments give rise to hierarchically nested Brownian walk clusters that converge to optimal Lévy patterns. Primary productivity collapse and large-scale food scarcity characterising mass extinctions evident in the fossil record may have triggered adaptation of optimal Lévy-like searches. The findings suggest Lévy-like behaviour has been employed by foragers since at least the Eocene but may have a more ancient origin, which could explain recent widespread observations of such patterns among modern taxa.
Resumo:
In this paper, a hierarchical video structure summarization approach using Laplacian Eigenmap is proposed, where a small set of reference frames is selected from the video sequence to form a reference subspace to measure the dissimilarity between two arbitrary frames. In the proposed summarization scheme, the shot-level key frames are first detected from the continuity of inter-frame dissimilarity, and the sub-shot level and scene level representative frames are then summarized by using K-mean clustering. The experiment is carried on both test videos and movies, and the results show that in comparison with a similar approach using latent semantic analysis, the proposed approach using Laplacian Eigenmap can achieve a better recall rate in keyframe detection, and gives an efficient hierarchical summarization at sub shot, shot and scene levels subsequently.