912 resultados para hierarchical clustering techniques
Resumo:
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.
Resumo:
Training data for supervised learning neural networks can be clustered such that the input/output pairs in each cluster are redundant. Redundant training data can adversely affect training time. In this paper we apply two clustering algorithms, ART2 -A and the Generalized Equality Classifier, to identify training data clusters and thus reduce the training data and training time. The approach is demonstrated for a high dimensional nonlinear continuous time mapping. The demonstration shows six-fold decrease in training time at little or no loss of accuracy in the handling of evaluation data.
Resumo:
In this paper, moving flock patterns are mined from spatio- temporal datasets by incorporating a clustering algorithm. A flock is defined as the set of data that move together for a certain continuous amount of time. Finding out moving flock patterns using clustering algorithms is a potential method to find out frequent patterns of movement in large trajectory datasets. In this approach, SPatial clusteRing algoRithm thrOugh sWarm intelligence (SPARROW) is the clustering algorithm used. The advantage of using SPARROW algorithm is that it can effectively discover clusters of widely varying sizes and shapes from large databases. Variations of the proposed method are addressed and also the experimental results show that the problem of scalability and duplicate pattern formation is addressed. This method also reduces the number of patterns produced
Resumo:
In this work a method for building multiple-model structures is presented. A clustering algorithm that uses data from the system is employed to define the architecture of the multiple-model, including the size of the region covered by each model, and the number of models. A heating ventilation and air conditioning system is used as a testbed of the proposed method.
Resumo:
In this work a method for building multiple-model structures is presented. A clustering algorithm that uses data from the system is employed to define the architecture of the multiple-model, including the size of the region covered by each model, and the number of models. A heating ventilation and air conditioning system is used as a testbed of the proposed method.
Resumo:
This paper presents a hierarchical clustering method for semantic Web service discovery. This method aims to improve the accuracy and efficiency of the traditional service discovery using vector space model. The Web service is converted into a standard vector format through the Web service description document. With the help of WordNet, a semantic analysis is conducted to reduce the dimension of the term vector and to make semantic expansion to meet the user’s service request. The process and algorithm of hierarchical clustering based semantic Web service discovery is discussed. Validation is carried out on the dataset.
Resumo:
Cognitive experiments involving motor execution (ME) and motor imagery (MI) have been intensively studied using functional magnetic resonance imaging (fMRI). However, the functional networks of a multitask paradigm which include ME and MI were not widely explored. In this article, we aimed to investigate the functional networks involved in MI and ME using a method combining the hierarchical clustering analysis (HCA) and the independent component analysis (ICA). Ten right-handed subjects were recruited to participate a multitask experiment with conditions such as visual cue, MI, ME and rest. The results showed that four activation clusters were found including parts of the visual network, ME network, the MI network and parts of the resting state network. Furthermore, the integration among these functional networks was also revealed. The findings further demonstrated that the combined HCA with ICA approach was an effective method to analyze the fMRI data of multitasks.
Resumo:
One objective of the feeder reconfiguration problem in distribution systems is to minimize the power losses for a specific load. For this problem, mathematical modeling is a nonlinear mixed integer problem that is generally hard to solve. This paper proposes an algorithm based on artificial neural network theory. In this context, clustering techniques to determine the best training set for a single neural network with generalization ability are also presented. The proposed methodology was employed for solving two electrical systems and presented good results. Moreover, the methodology can be employed for large-scale systems in real-time environment.
Resumo:
We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes.
Resumo:
The area of Human-Machine Interface is growing fast due to its high importance in all technological systems. The basic idea behind designing human-machine interfaces is to enrich the communication with the technology in a natural and easy way. Gesture interfaces are a good example of transparent interfaces. Such interfaces must identify properly the action the user wants to perform, so the proper gesture recognition is of the highest importance. However, most of the systems based on gesture recognition use complex methods requiring high-resource devices. In this work, we propose to model gestures capturing their temporal properties, which significantly reduce storage requirements, and use clustering techniques, namely self-organizing maps and unsupervised genetic algorithm, for their classification. We further propose to train a certain number of algorithms with different parameters and combine their decision using majority voting in order to decrease the false positive rate. The main advantage of the approach is its simplicity, which enables the implementation using devices with limited resources, and therefore low cost. The testing results demonstrate its high potential.
Resumo:
The study of the effectiveness of the cognitive rehabilitation processes and the identification of cognitive profiles, in order to define comparable populations, is a controversial area, but concurrently it is strongly needed in order to improve therapies. There is limited evidence about cognitive rehabilitation efficacy. Many of the trials conclude that in spite of an apparent clinical good response, differences do not show statistical significance. The common feature in all these trials is heterogeneity among populations. In this situation, observational studies on very well controlled cohort of studies, together with innovative methods in knowledge extraction, could provide methodological insights for the design of more accurate comparative trials. Some correlation studies between neuropsychological tests and patients capacities have been carried out -1---2- and also correlation between tests and morphological changes in the brain -3-. The procedures efficacy depends on three main factors: the affectation profile, the scheduled tasks and the execution results. The relationship between them makes up the cognitive rehabilitation as a discipline, but its structure is not properly defined. In this work we present a clustering method used in Neuro Personal Trainer (NPT) to group patients into cognitive profiles using data mining techniques. The system uses these clusters to personalize treatments, using the patients assigned cluster to select which tasks are more suitable for its concrete needs, by comparing the results obtained in the past by patients with the same profile.
Resumo:
The purpose of this paper is to explain the notion of clustering and a concrete clustering method- agglomerative hierarchical clustering algorithm. It shows how a data mining method like clustering can be applied to the analysis of stocks, traded on the Bulgarian Stock Exchange in order to identify similar temporal behavior of the traded stocks. This problem is solved with the aid of a data mining tool that is called XLMiner™ for Microsoft Excel Office.
Resumo:
Rigid adherence to pre-specified thresholds and static graphical representations can lead to incorrect decisions on merging of clusters. As an alternative to existing automated or semi-automated methods, we developed a visual analytics approach for performing hierarchical clustering analysis of short time-series gene expression data. Dynamic sliders control parameters such as the similarity threshold at which clusters are merged and the level of relative intra-cluster distinctiveness, which can be used to identify "weak-edges" within clusters. An expert user can drill down to further explore the dendrogram and detect nested clusters and outliers. This is done by using the sliders and by pointing and clicking on the representation to cut the branches of the tree in multiple-heights. A prototype of this tool has been developed in collaboration with a small group of biologists for analysing their own datasets. Initial feedback on the tool has been positive.
Resumo:
In this work we compare Grapholita molesta Busck (Lepidoptera: Tortricidae) populations originated from Brazil, Chile, Spain, Italy and Greece using power spectral density and phylogenetic analysis to detect any similarities between the population macro- and the molecular micro-level. Log-transformed population data were normalized and AR(p) models were developed to generate for each case population time series of equal lengths. The time-frequency/scale properties of the population data were further analyzed using wavelet analysis to detect any population dynamics frequency changes and cluster the populations. Based on the power spectral of each population time series and the hierarchical clustering schemes, populations originated from Southern America (Brazil and Chile) exhibit similar rhythmic properties and are both closer related with populations originated from Greece. Populations from Spain and especially Italy, have higher distance by terms of periodic changes on their population dynamics. Moreover, the members within the same cluster share similar spectral information, therefore they are supposed to participate in the same temporally regulated population process. On the contrary, the phylogenetic approach revealed a less structured pattern that bears indications of panmixia, as the two clusters contain individuals from both Europe and South America. This preliminary outcome will be further assessed by incorporating more individuals and likely employed a second molecular marker.