35 resultados para Hier-archical clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, there has been an increas-ing interest in learning a distributed rep-resentation of word sense. Traditional context clustering based models usually require careful tuning of model parame-ters, and typically perform worse on infre-quent word senses. This paper presents a novel approach which addresses these lim-itations by first initializing the word sense embeddings through learning sentence-level embeddings from WordNet glosses using a convolutional neural networks. The initialized word sense embeddings are used by a context clustering based model to generate the distributed representations of word senses. Our learned represen-tations outperform the publicly available embeddings on 2 out of 4 metrics in the word similarity task, and 6 out of 13 sub tasks in the analogical reasoning task.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning sentence-level embeddings from WordNet glosses using a convolutional neural networks. The initialized word sense embeddings are used by a context clustering based model to generate the distributed representations of word senses. Our learned representations outperform the publicly available embeddings on half of the metrics in the word similarity task, 6 out of 13 sub tasks in the analogical reasoning task, and gives the best overall accuracy in the word sense effect classification task, which shows the effectiveness of our proposed distributed distribution learning model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we focus on the design of bivariate EDAs for discrete optimization problems and propose a new approach named HSMIEC. While the current EDAs require much time in the statistical learning process as the relationships among the variables are too complicated, we employ the Selfish gene theory (SG) in this approach, as well as a Mutual Information and Entropy based Cluster (MIEC) model is also set to optimize the probability distribution of the virtual population. This model uses a hybrid sampling method by considering both the clustering accuracy and clustering diversity and an incremental learning and resample scheme is also set to optimize the parameters of the correlations of the variables. Compared with several benchmark problems, our experimental results demonstrate that HSMIEC often performs better than some other EDAs, such as BMDA, COMIT, MIMIC and ECGA. © 2009 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Chronic traumatic encephalopathy (CTE) is a neurodegenerative disorder which may result from repetitive brain injury. A variety of tau-immunoreactive pathologies are present, including neurofibrillary tangles (NFT), neuropil threads (NT), dot-like grains (DLG), astrocytic tangles (AT), and occasional neuritic plaques (NP). In tauopathies, cellular inclusions in the cortex are clustered within specific laminae, the clusters being regularly distributed parallel to the pia mater. To determine whether a similar spatial pattern is present in CTE, clustering of the tau-immunoreactive pathology was studied in the cortex, hippocampus, and dentate gyrus in 11 cases of CTE and 7 cases of Alzheimer’s disease neuropathologic change (ADNC) without CTE. In CTE: (1) all aspects of tau-immunoreactive pathology were clustered and the clusters were frequently regularly distributed parallel to the tissue boundary, (2) clustering was similar in two CTE cases with minimal co-pathology compared with cases with associated ADNC or TDP-43 proteinopathy, (3) in a proportion of cortical gyri, estimated cluster size was similar to that of cell columns of the cortico-cortical pathways, and (4) clusters of the tau-immunoreactive pathology were infrequently spatially correlated with blood vessels. The NFT and NP in ADNC without CTE were less frequently randomly or uniformly distributed and more frequently in defined clusters than in CTE. Hence, the spatial pattern of the tau-immunoreactive pathology observed in CTE is typical of the tauopathies but with some distinct differences compared to ADNC alone. The spread of pathogenic tau along anatomical pathways could be a factor in the pathogenesis of the disease.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.