891 resultados para Hierarchical Clustering Model
Resumo:
Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis
Resumo:
This paper presents a hierarchical clustering method for semantic Web service discovery. This method aims to improve the accuracy and efficiency of the traditional service discovery using vector space model. The Web service is converted into a standard vector format through the Web service description document. With the help of WordNet, a semantic analysis is conducted to reduce the dimension of the term vector and to make semantic expansion to meet the user’s service request. The process and algorithm of hierarchical clustering based semantic Web service discovery is discussed. Validation is carried out on the dataset.
Resumo:
TPM Vol. 21, No. 4, December 2014, 435-447 – Special Issue © 2014 Cises.
Resumo:
We analyze and quantify co-movements in real effective exchange rates while considering the regional location of countries. More specifically, using the dynamic hierarchical factor model (Moench et al. (2011)), we decompose exchange rate movements into several latent components; worldwide and two regional factors as well as country-specific elements. Then, we provide evidence that the worldwide common factor is closely related to monetary policies in large advanced countries while regional common factors tend to be captured by those in the rest of the countries in a region. However, a substantial proportion of the variation in the real exchange rates is reported to be country-specific; even in Europe country-specific movements exceed worldwide and regional common factors.
Resumo:
MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.
Resumo:
The HMAX model has recently been proposed by Riesenhuber & Poggio as a hierarchical model of position- and size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view-tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study only used ``paperclip'' stimuli, as in the corresponding physiology experiment, and did not explore systematically how model units' invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and ``natural'' stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units' responses on stimulus position for which a quantitative description is offered. Interestingly, we find that scale invariance properties of hierarchical neural models are not independent of stimulus class, as opposed to translation invariance, even though both are affine transformations within the image plane.
Resumo:
In this paper we focus on the one year ahead prediction of the electricity peak-demand daily trajectory during the winter season in Central England and Wales. We define a Bayesian hierarchical model for predicting the winter trajectories and present results based on the past observed weather. Thanks to the flexibility of the Bayesian approach, we are able to produce the marginal posterior distributions of all the predictands of interest. This is a fundamental progress with respect to the classical methods. The results are encouraging in both skill and representation of uncertainty. Further extensions are straightforward at least in principle. The main two of those consist in conditioning the weather generator model with respect to additional information like the knowledge of the first part of the winter and/or the seasonal weather forecast. Copyright (C) 2006 John Wiley & Sons, Ltd.
A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms
Resumo:
Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.
Resumo:
In this paper we focus on the one year ahead prediction of the electricity peak-demand daily trajectory during the winter season in Central England and Wales. We define a Bayesian hierarchical model for predicting the winter trajectories and present results based on the past observed weather. Thanks to the flexibility of the Bayesian approach, we are able to produce the marginal posterior distributions of all the predictands of interest. This is a fundamental progress with respect to the classical methods. The results are encouraging in both skill and representation of uncertainty. Further extensions are straightforward at least in principle. The main two of those consist in conditioning the weather generator model with respect to additional information like the knowledge of the first part of the winter and/or the seasonal weather forecast. Copyright (C) 2006 John Wiley & Sons, Ltd.
Resumo:
Cognitive experiments involving motor execution (ME) and motor imagery (MI) have been intensively studied using functional magnetic resonance imaging (fMRI). However, the functional networks of a multitask paradigm which include ME and MI were not widely explored. In this article, we aimed to investigate the functional networks involved in MI and ME using a method combining the hierarchical clustering analysis (HCA) and the independent component analysis (ICA). Ten right-handed subjects were recruited to participate a multitask experiment with conditions such as visual cue, MI, ME and rest. The results showed that four activation clusters were found including parts of the visual network, ME network, the MI network and parts of the resting state network. Furthermore, the integration among these functional networks was also revealed. The findings further demonstrated that the combined HCA with ICA approach was an effective method to analyze the fMRI data of multitasks.
Resumo:
A continuous version of the hierarchical spherical model at dimension d=4 is investigated. Two limit distributions of the block spin variable X(gamma), normalized with exponents gamma = d + 2 and gamma=d at and above the critical temperature, are established. These results are proven by solving certain evolution equations corresponding to the renormalization group (RG) transformation of the O(N) hierarchical spin model of block size L(d) in the limit L down arrow 1 and N ->infinity. Starting far away from the stationary Gaussian fixed point the trajectories of these dynamical system pass through two different regimes with distinguishable crossover behavior. An interpretation of this trajectories is given by the geometric theory of functions which describe precisely the motion of the Lee-Yang zeroes. The large-N limit of RG transformation with L(d) fixed equal to 2, at the criticality, has recently been investigated in both weak and strong (coupling) regimes by Watanabe (J. Stat. Phys. 115:1669-1713, 2004) . Although our analysis deals only with N = infinity case, it complements various aspects of that work.
Resumo:
We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes.
Resumo:
Previous research has shown that motion imagery draws on the same neural circuits that are involved in perception of motion, thus leading to a motion aftereffect (Winawer et al., 2010). Imagined stimuli can induce a similar shift in participants’ psychometric functions as neural adaptation due to a perceived stimulus. However, these studies have been criticized on the grounds that they fail to exclude the possibility that the subjects might have guessed the experimental hypothesis, and behaved accordingly (Morgan et al., 2012). In particular, the authors claim that participants can adopt arbitrary response criteria, which results in similar changes of the central tendency μ of psychometric curves as those shown by Winawer et al. (2010).
Resumo:
Many destination marketing organizations in the United States and elsewhere are facing budget retrenchment for tourism marketing, especially for advertising. This study evaluates a three-stage model using Random Coefficient Logit (RCL) approach which controls for correlations between different non-independent alternatives and considers heterogeneity within individual’s responses to advertising. The results of this study indicate that the proposed RCL model results in a significantly better fit as compared to traditional logit models, and indicates that tourism advertising significantly influences tourist decisions with several variables (age, income, distance and Internet access) moderating these decisions differently depending on decision stage and product type. These findings suggest that this approach provides a better foundation for assessing, and in turn, designing more effective advertising campaigns.