786 resultados para Incremental Clustering
Resumo:
A nonparametric, hierarchical, disaggregative clustering algorithm is developed using a novel similarity measure, called the mutual neighborhood value (MNV), which takes into account the conventional nearest neighbor ranks of two samples with respect to each other. The algorithm is simple, noniterative, requires low storage, and needs no specification of the expected number of clusters. The algorithm appears very versatile as it is capable of discerning spherical and nonspherical clusters, linearly nonseparable clusters, clusters with unequal populations, and clusters with lowdensity bridges. Changing of the neighborhood size enables discernment of strong or weak patterns.
Resumo:
The paper deals with a model-theoretic approach to clustering. The approach can be used to generate cluster description based on knowledge alone. Such a process of generating descriptions would be extremely useful in clustering partially specified objects. A natural byproduct of the proposed approach is that missing values of attributes of an object can be estimated with ease in a meaningful fashion. An important feature of the approach is that noisy objects can be detected effectively, leading to the formation of natural groups. The proposed algorithm is applied to a library database consisting of a collection of books.
Resumo:
Relative geometric arrangements of the sample points, with reference to the structure of the imbedding space, produce clusters. Hence, if each sample point is imagined to acquire a volume of a small M-cube (called pattern-cell), depending on the ranges of its (M) features and number (N) of samples; then overlapping pattern-cells would indicate naturally closer sample-points. A chain or blob of such overlapping cells would mean a cluster and separate clusters would not share a common pattern-cell between them. The conditions and an analytic method to find such an overlap are developed. A simple, intuitive, nonparametric clustering procedure, based on such overlapping pattern-cells is presented. It may be classified as an agglomerative, hierarchical, linkage-type clustering procedure. The algorithm is fast, requires low storage and can identify irregular clusters. Two extensions of the algorithm, to separate overlapping clusters and to estimate the nature of pattern distributions in the sample space, are also indicated.
Resumo:
Clustering is a process of partitioning a given set of patterns into meaningful groups. The clustering process can be viewed as consisting of the following three phases: (i) feature selection phase, (ii) classification phase, and (iii) description generation phase. Conventional clustering algorithms implicitly use knowledge about the clustering environment to a large extent in the feature selection phase. This reduces the need for the environmental knowledge in the remaining two phases, permitting the usage of simple numerical measure of similarity in the classification phase. Conceptual clustering algorithms proposed by Michalski and Stepp [IEEE Trans. PAMI, PAMI-5, 396–410 (1983)] and Stepp and Michalski [Artif. Intell., pp. 43–69 (1986)] make use of the knowledge about the clustering environment in the form of a set of predefined concepts to compute the conceptual cohesiveness during the classification phase. Michalski and Stepp [IEEE Trans. PAMI, PAMI-5, 396–410 (1983)] have argued that the results obtained with the conceptual clustering algorithms are superior to conventional methods of numerical classification. However, this claim was not supported by the experimental results obtained by Dale [IEEE Trans. PAMI, PAMI-7, 241–244 (1985)]. In this paper a theoretical framework, based on an intuitively appealing set of axioms, is developed to characterize the equivalence between the conceptual clustering and conventional clustering. In other words, it is shown that any classification obtained using conceptual clustering can also be obtained using conventional clustering and vice versa.
Resumo:
Here we rederive the hierarchy of equations for the evolution of distribution functions of various orders using a convenient parameterization. We use this to obtain equations for two- and three-point correlation functions in powers of a small parameter, viz., the initial density contrast. The correspondence of the lowest order solutions of these equations to the results from the linear theory of density perturbations is shown for an OMEGA = 1 universe. These equations are then used to calculate, to the lowest order, the induced three-point correlation function that arises from Gaussian initial conditions in an OMEGA = 1 universe. We obtain an expression which explicitly exhibits the spatial structure of the induced three-point correlation function. It is seen that the spatial structure of this quantity is independent of the value of OMEGA. We also calculate the triplet momentum. We find that the induced three-point correlation function does not have the ''hierarchical'' form often assumed. We discuss possibilities of using the induced three-point correlation to interpret observational data. The formalism developed here can also be used to test a validity of different schemes to close the
Resumo:
In the knowledge-based clustering approaches reported in the literature, explicit know ledge, typically in the form of a set of concepts, is used in computing similarity or conceptual cohesiveness between objects and in grouping them. We propose a knowledge-based clustering approach in which the domain knowledge is also used in the pattern representation phase of clustering. We argue that such a knowledge-based pattern representation scheme reduces the complexity of similarity computation and grouping phases. We present a knowledge-based clustering algorithm for grouping hooks in a library.
Resumo:
We use the BBGKY hierarchy equations to calculate, perturbatively, the lowest order nonlinear correction to the two-point correlation and the pair velocity for Gaussian initial conditions in a critical density matter-dominated cosmological model. We compare our results with the results obtained using the hydrodynamic equations that neglect pressure and find that the two match, indicating that there are no effects of multistreaming at this order of perturbation. We analytically study the effect of small scales on the large scales by calculating the nonlinear correction for a Dirac delta function initial two-point correlation. We find that the induced two-point correlation has a x(-6) behavior at large separations. We have considered a class of initial conditions where the initial power spectrum at small k has the form k(n) with 0 < n less than or equal to 3 and have numerically calculated the nonlinear correction to the two-point correlation, its average over a sphere and the pair velocity over a large dynamical range. We find that at small separations the effect of the nonlinear term is to enhance the clustering, whereas at intermediate scales it can act to either increase or decrease the clustering. At large scales we find a simple formula that gives a very good fit for the nonlinear correction in terms of the initial function. This formula explicitly exhibits the influence of small scales on large scales and because of this coupling the perturbative treatment breaks down at large scales much before one would expect it to if the nonlinearity were local in real space. We physically interpret this formula in terms of a simple diffusion process. We have also investigated the case n = 0, and we find that it differs from the other cases in certain respects. We investigate a recently proposed scaling property of gravitational clustering, and we find that the lowest order nonlinear terms cause deviations from the scaling relations that are strictly valid in the linear regime. The approximate validity of these relations in the nonlinear regime in l(T)-body simulations cannot be understood at this order of evolution.
Resumo:
In this article, we present a novel application of a quantum clustering (QC) technique to objectively cluster the conformations, sampled by molecular dynamics simulations performed on different ligand bound structures of the protein. We further portray each conformational population in terms of dynamically stable network parameters which beautifully capture the ligand induced variations in the ensemble in atomistic detail. The conformational populations thus identified by the QC method and verified by network parameters are evaluated for different ligand bound states of the protein pyrrolysyl-tRNA synthetase (DhPylRS) from D. hafniense. The ligand/environment induced re-distribution of protein conformational ensembles forms the basis for understanding several important biological phenomena such as allostery and enzyme catalysis. The atomistic level characterization of each population in the conformational ensemble in terms of the re-orchestrated networks of amino acids is a challenging problem, especially when the changes are minimal at the backbone level. Here we demonstrate that the QC method is sensitive to such subtle changes and is able to cluster MD snapshots which are similar at the side-chain interaction level. Although we have applied these methods on simulation trajectories of a modest time scale (20 ns each), we emphasize that our methodology provides a general approach towards an objective clustering of large-scale MD simulation data and may be applied to probe multistate equilibria at higher time scales, and to problems related to protein folding for any protein or protein-protein/RNA/DNA complex of interest with a known structure.
Resumo:
This paper looks at the complexity of four different incremental problems. The following are the problems considered: (1) Interval partitioning of a flow graph (2) Breadth first search (BFS) of a directed graph (3) Lexicographic depth first search (DFS) of a directed graph (4) Constructing the postorder listing of the nodes of a binary tree. The last problem arises out of the need for incrementally computing the Sethi-Ullman (SU) ordering [1] of the subtrees of a tree after it has undergone changes of a given type. These problems are among those that claimed our attention in the process of our designing algorithmic techniques for incremental code generation. BFS and DFS have certainly numerous other applications, but as far as our work is concerned, incremental code generation is the common thread linking these problems. The study of the complexity of these problems is done from two different perspectives. In [2] is given the theory of incremental relative lower bounds (IRLB). We use this theory to derive the IRLBs of the first three problems. Then we use the notion of a bounded incremental algorithm [4] to prove the unboundedness of the fourth problem with respect to the locally persistent model of computation. Possibly, the lower bound result for lexicographic DFS is the most interesting. In [5] the author considers lexicographic DFS to be a problem for which the incremental version may require the recomputation of the entire solution from scratch. In that sense, our IRLB result provides further evidence for this possibility with the proviso that the incremental DFS algorithms considered be ones that do not require too much of preprocessing.
Resumo:
A method is described for estimating the incremental angle and angular velocity of a spacecraft using integrated rate parameters with the help of a star sensor alone. The chief advantage of this method is that the measured stars need not be identified, whereas the identification of the stars is necessary in earlier methods. This proposed estimation can be carried out with all of the available measurements by a simple linear Kalman filter, albeit with a time-varying sensitivity matrix. The residuals of estimated angular velocity by the proposed spacecraft incremental-angle and angular velocity estimation method are as accurate as the earlier methods. This method also enables the spacecraft attitude to be reconstructed for mapping the stars into an imaginary unit sphere in the body reference frame, which will preserve the true angular separation of the stars. This will pave the way for identification of the stars using any angular separation or triangle matching techniques applied to even a narrow field of view sensor that is made to sweep the sky. A numerical simulation for inertial as well as Earth pointing spacecraft is carried out to establish the results.
Resumo:
Delineation of homogeneous precipitation regions (regionalization) is necessary for investigating frequency and spatial distribution of meteorological droughts. The conventional methods of regionalization use statistics of precipitation as attributes to establish homogeneous regions. Therefore they cannot be used to form regions in ungauged areas, and they may not be useful to form meaningful regions in areas having sparse rain gauge density. Further, validation of the regions for homogeneity in precipitation is not possible, since the use of the precipitation statistics to form regions and subsequently to test the regional homogeneity is not appropriate. To alleviate this problem, an approach based on fuzzy cluster analysis is presented. It allows delineation of homogeneous precipitation regions in data sparse areas using large scale atmospheric variables (LSAV), which influence precipitation in the study area, as attributes. The LSAV, location parameters (latitude, longitude and altitude) and seasonality of precipitation are suggested as features for regionalization. The approach allows independent validation of the identified regions for homogeneity using statistics computed from the observed precipitation. Further it has the ability to form regions even in ungauged areas, owing to the use of attributes that can be reliably estimated even when no at-site precipitation data are available. The approach was applied to delineate homogeneous annual rainfall regions in India, and its effectiveness is illustrated by comparing the results with those obtained using rainfall statistics, regionalization based on hard cluster analysis, and meteorological sub-divisions in India. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
We present two online algorithms for maintaining a topological order of a directed acyclic graph as arcs are added, and detecting a cycle when one is created. Our first algorithm takes O(m 1/2) amortized time per arc and our second algorithm takes O(n 2.5/m) amortized time per arc, where n is the number of vertices and m is the total number of arcs. For sparse graphs, our O(m 1/2) bound improves the best previous bound by a factor of logn and is tight to within a constant factor for a natural class of algorithms that includes all the existing ones. Our main insight is that the two-way search method of previous algorithms does not require an ordered search, but can be more general, allowing us to avoid the use of heaps (priority queues). Instead, the deterministic version of our algorithm uses (approximate) median-finding; the randomized version of our algorithm uses uniform random sampling. For dense graphs, our O(n 2.5/m) bound improves the best previously published bound by a factor of n 1/4 and a recent bound obtained independently of our work by a factor of logn. Our main insight is that graph search is wasteful when the graph is dense and can be avoided by searching the topological order space instead. Our algorithms extend to the maintenance of strong components, in the same asymptotic time bounds.
Resumo:
Over past few years, the studies of cultured neuronal networks have opened up avenues for understanding the ion channels, receptor molecules, and synaptic plasticity that may form the basis of learning and memory. The hippocampal neurons from rats are dissociated and cultured on a surface containing a grid of 64 electrodes. The signals from these 64 electrodes are acquired using a fast data acquisition system MED64 (Alpha MED Sciences, Japan) at a sampling rate of 20 K samples with a precision of 16-bits per sample. A few minutes of acquired data runs in to a few hundreds of Mega Bytes. The data processing for the neural analysis is highly compute-intensive because the volume of data is huge. The major processing requirements are noise removal, pattern recovery, pattern matching, clustering and so on. In order to interface a neuronal colony to a physical world, these computations need to be performed in real-time. A single processor such as a desk top computer may not be adequate to meet this computational requirements. Parallel computing is a method used to satisfy the real-time computational requirements of a neuronal system that interacts with an external world while increasing the flexibility and scalability of the application. In this work, we developed a parallel neuronal system using a multi-node Digital Signal processing system. With 8 processors, the system is able to compute and map incoming signals segmented over a period of 200 ms in to an action in a trained cluster system in real time.