969 resultados para pattern clustering
Resumo:
This paper presents a statistical aircraft trajectory clustering approach aimed at discriminating between typical manned and expected unmanned traffic patterns. First, a resampled version of each trajectory is modelled using a mixture of Von Mises distributions (circular statistics). Second, the remodelled trajectories are globally aligned using tools from bioinformatics. Third, the alignment scores are used to cluster the trajectories using an iterative k-medoids approach and an appropriate distance function. The approach is then evaluated using synthetically generated unmanned aircraft flights combined with real air traffic position reports taken over a sector of Northern Queensland, Australia. Results suggest that the technique is useful in distinguishing between expected unmanned and manned aircraft traffic behaviour, as well as identifying some common conventional air traffic patterns.
Resumo:
Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.
Resumo:
We view association of concepts as a complex network and present a heuristic for clustering concepts by taking into account the underlying network structure of their associations. Clusters generated from our approach are qualitatively better than clusters generated from the conventional spectral clustering mechanism used for graph partitioning.
Resumo:
This paper investigates the clustering pattern in the Finnish stock market. Using trading volume and time as factors capturing the clustering pattern in the market, the Keim and Madhavan (1996) and the Engle and Russell (1998) model provide the framework for the analysis. The descriptive and the parametric analysis provide evidences that an important determinant of the famous U-shape pattern in the market is the rate of information arrivals as measured by large trading volumes and durations at the market open and close. Precisely, 1) the larger the trading volume, the greater the impact on prices both in the short and the long run, thus prices will differ across quantities. 2) Large trading volume is a non-linear function of price changes in the long run. 3) Arrival times are positively autocorrelated, indicating a clustering pattern and 4) Information arrivals as approximated by durations are negatively related to trading flow.
Resumo:
This paper discusses a method for scaling SVM with Gaussian kernel function to handle large data sets by using a selective sampling strategy for the training set. It employs a scalable hierarchical clustering algorithm to construct cluster indexing structures of the training data in the kernel induced feature space. These are then used for selective sampling of the training data for SVM to impart scalability to the training process. Empirical studies made on real world data sets show that the proposed strategy performs well on large data sets.
Resumo:
A method for determining the mutual nearest neighbours (MNN) and mutual neighbourhood value (mnv) of a sample point, using the conventional nearest neighbours, is suggested. A nonparametric, hierarchical, agglomerative clustering algorithm is developed using the above concepts. The algorithm is simple, deterministic, noniterative, requires low storage and is able to discern spherical and nonspherical clusters. The method is applicable to a wide class of data of arbitrary shape, large size and high dimensionality. The algorithm can discern mutually homogenous clusters. Strong or weak patterns can be discerned by properly choosing the neighbourhood width.
Resumo:
Clustering is a process of partitioning a given set of patterns into meaningful groups. The clustering process can be viewed as consisting of the following three phases: (i) feature selection phase, (ii) classification phase, and (iii) description generation phase. Conventional clustering algorithms implicitly use knowledge about the clustering environment to a large extent in the feature selection phase. This reduces the need for the environmental knowledge in the remaining two phases, permitting the usage of simple numerical measure of similarity in the classification phase. Conceptual clustering algorithms proposed by Michalski and Stepp [IEEE Trans. PAMI, PAMI-5, 396–410 (1983)] and Stepp and Michalski [Artif. Intell., pp. 43–69 (1986)] make use of the knowledge about the clustering environment in the form of a set of predefined concepts to compute the conceptual cohesiveness during the classification phase. Michalski and Stepp [IEEE Trans. PAMI, PAMI-5, 396–410 (1983)] have argued that the results obtained with the conceptual clustering algorithms are superior to conventional methods of numerical classification. However, this claim was not supported by the experimental results obtained by Dale [IEEE Trans. PAMI, PAMI-7, 241–244 (1985)]. In this paper a theoretical framework, based on an intuitively appealing set of axioms, is developed to characterize the equivalence between the conceptual clustering and conventional clustering. In other words, it is shown that any classification obtained using conceptual clustering can also be obtained using conventional clustering and vice versa.
Resumo:
In the knowledge-based clustering approaches reported in the literature, explicit know ledge, typically in the form of a set of concepts, is used in computing similarity or conceptual cohesiveness between objects and in grouping them. We propose a knowledge-based clustering approach in which the domain knowledge is also used in the pattern representation phase of clustering. We argue that such a knowledge-based pattern representation scheme reduces the complexity of similarity computation and grouping phases. We present a knowledge-based clustering algorithm for grouping hooks in a library.
Resumo:
Over past few years, the studies of cultured neuronal networks have opened up avenues for understanding the ion channels, receptor molecules, and synaptic plasticity that may form the basis of learning and memory. The hippocampal neurons from rats are dissociated and cultured on a surface containing a grid of 64 electrodes. The signals from these 64 electrodes are acquired using a fast data acquisition system MED64 (Alpha MED Sciences, Japan) at a sampling rate of 20 K samples with a precision of 16-bits per sample. A few minutes of acquired data runs in to a few hundreds of Mega Bytes. The data processing for the neural analysis is highly compute-intensive because the volume of data is huge. The major processing requirements are noise removal, pattern recovery, pattern matching, clustering and so on. In order to interface a neuronal colony to a physical world, these computations need to be performed in real-time. A single processor such as a desk top computer may not be adequate to meet this computational requirements. Parallel computing is a method used to satisfy the real-time computational requirements of a neuronal system that interacts with an external world while increasing the flexibility and scalability of the application. In this work, we developed a parallel neuronal system using a multi-node Digital Signal processing system. With 8 processors, the system is able to compute and map incoming signals segmented over a period of 200 ms in to an action in a trained cluster system in real time.
Resumo:
Applications in various domains often lead to very large and frequently high-dimensional data. Successful algorithms must avoid the curse of dimensionality but at the same time should be computationally efficient. Finding useful patterns in large datasets has attracted considerable interest recently. The primary goal of the paper is to implement an efficient Hybrid Tree based clustering method based on CF-Tree and KD-Tree, and combine the clustering methods with KNN-Classification. The implementation of the algorithm involves many issues like good accuracy, less space and less time. We will evaluate the time and space efficiency, data input order sensitivity, and clustering quality through several experiments.
Resumo:
Clustering techniques which can handle incomplete data have become increasingly important due to varied applications in marketing research, medical diagnosis and survey data analysis. Existing techniques cope up with missing values either by using data modification/imputation or by partial distance computation, often unreliable depending on the number of features available. In this paper, we propose a novel approach for clustering data with missing values, which performs the task by Symmetric Non-Negative Matrix Factorization (SNMF) of a complete pair-wise similarity matrix, computed from the given incomplete data. To accomplish this, we define a novel similarity measure based on Average Overlap similarity metric which can effectively handle missing values without modification of data. Further, the similarity measure is more reliable than partial distances and inherently possesses the properties required to perform SNMF. The experimental evaluation on real world datasets demonstrates that the proposed approach is efficient, scalable and shows significantly better performance compared to the existing techniques.
Resumo:
Motivated by multi-distribution divergences, which originate in information theory, we propose a notion of `multipoint' kernels, and study their applications. We study a class of kernels based on Jensen type divergences and show that these can be extended to measure similarity among multiple points. We study tensor flattening methods and develop a multi-point (kernel) spectral clustering (MSC) method. We further emphasize on a special case of the proposed kernels, which is a multi-point extension of the linear (dot-product) kernel and show the existence of cubic time tensor flattening algorithm in this case. Finally, we illustrate the usefulness of our contributions using standard data sets and image segmentation tasks.
Resumo:
Resumen: Este artículo analiza la relación entre la agrupación espacial de la distribución del ingreso y la desigualdad en las provincias de Argentina. El objetivo de este trabajo es usar técnicas espaciales para analizar hasta que punto la agrupación espacial de la distribución del ingreso afecta la desigualdad de la distribución del ingreso en un contexto regional de Argentina. En general, la literatura de desigualdad implícitamente considera a cada región o provincia como una entidad independiente y el potencial para la observación de la interacción a través del espacio a menudo se ha ignorado. Mientras tanto, la autocorrelación espacial ocurre cuando la distribución espacial de la variable de interés exhibe un patrón sistemático. Yo computo tres medidas de autocorrelación espacial global: La I de Moran, c de Geary, y G de Getis y Ord, como grado de CLUSTERING provincial entre 1991 y 2002. La principal conclusión del trabajo es que hay evidencia que provincias con desigualdad relativamente alta (baja) tienden a ser localizadas cerca de otras provincias con alta (baja) desigualdad más a menudo de lo esperado debido al azar. Por ende cada provincia no debería ser vista como una observación independiente, como ha sido supuesto implícitamente en estudios previos sobre la desigualdad de ingresos regional.
Resumo:
The objective of the work was to develop a non-invasive methodology for image acquisition, processing and nonlinear trajectory analysis of the collective fish response to a stochastic event. Object detection and motion estimation were performed by an optical flow algorithm in order to detect moving fish and simultaneously eliminate background, noise and artifacts. The Entropy and the Fractal Dimension (FD) of the trajectory followed by the centroids of the groups of fish were calculated using Shannon and permutation Entropy and the Katz, Higuchi and Katz-Castiglioni's FD algorithms respectively. The methodology was tested on three case groups of European sea bass (Dicentrarchus labrax), two of which were similar (C1 control and C2 tagged fish) and very different from the third (C3, tagged fish submerged in methylmercury contaminated water). The results indicate that Shannon entropy and Katz-Castiglioni were the most sensitive algorithms and proved to be promising tools for the non-invasive identification and quantification of differences in fish responses. In conclusion, we believe that this methodology has the potential to be embedded in online/real time architecture for contaminant monitoring programs in the aquaculture industry.