830 resultados para Labeling hierarchical clustering
Resumo:
We designed a straightforward biotinylated probe using the N-terminal substrate-like region of the inhibitory site of human cystatin C as a scaffold, linked to the thiol-specific reagent diazomethylketone group as a covalent warhead (i.e. Biot-(PEG)2-Ahx-LeuValGly-DMK). The irreversible activity-based probe bound readily to cysteine cathepsins B, L, S and K. Moreover affinity labeling is sensitive since active cathepsins were detected in the nM range using an ExtrAvidin®-peroxidase conjugate for disclosure. Biot-(PEG)2-Ahx-LeuValGly-DMK allowed a slightly more pronounced labeling for cathepsin S with a compelling second-order rate constant for association (kass = 2,320,000 M−1 s−1). Labeling of the active site is dose-dependent as observed using 6-cyclohexylamine-4-piperazinyl-1,3,5-triazine-2-carbonitrile, as competitive inhibitor of cathepsins. Finally we showed that Biot-(PEG)2-Ahx-LeuValGly-DMK may be a simple and convenient tool to label secreted and intracellular active cathepsins using a myelomonocytic cell line (THP-1 cells) as model.
Resumo:
Introduction
Mild cognitive impairment (MCI) has clinical value in its ability to predict later dementia. A better understanding of cognitive profiles can further help delineate who is most at risk of conversion to dementia. We aimed to (1) examine to what extent the usual MCI subtyping using core criteria corresponds to empirically defined clusters of patients (latent profile analysis [LPA] of continuous neuropsychological data) and (2) compare the two methods of subtyping memory clinic participants in their prediction of conversion to dementia.
Methods
Memory clinic participants (MCI, n = 139) and age-matched controls (n = 98) were recruited. Participants had a full cognitive assessment, and results were grouped (1) according to traditional MCI subtypes and (2) using LPA. MCI participants were followed over approximately 2 years after their initial assessment to monitor for conversion to dementia.
Results
Groups were well matched for age and education. Controls performed significantly better than MCI participants on all cognitive measures. With the traditional analysis, most MCI participants were in the amnestic multidomain subgroup (46.8%) and this group was most at risk of conversion to dementia (63%). From the LPA, a three-profile solution fit the data best. Profile 3 was the largest group (40.3%), the most cognitively impaired, and most at risk of conversion to dementia (68% of the group).
Discussion
LPA provides a useful adjunct in delineating MCI participants most at risk of conversion to dementia and adds confidence to standard categories of clinical inference.
Resumo:
Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability of clusters and outline the problem of generating clusterings with interpretable and reconfigurable cluster models. We develop two clustering algorithms toward the outlined goal of building interpretable and reconfigurable cluster models. They generate clusters with associated rules that are composed of conditions on word occurrences or nonoccurrences. The proposed approaches vary in the complexity of the format of the rules; RGC employs disjunctions and conjunctions in rule generation whereas RGC-D rules are simple disjunctions of conditions signifying presence of various words. In both the cases, each cluster is comprised of precisely the set of documents that satisfy the corresponding rule. Rules of the latter kind are easy to interpret, whereas the former leads to more accurate clustering. We show that our approaches outperform the unsupervised decision tree approach for rule-generating clustering and also an approach we provide for generating interpretable models for general clusterings, both by significant margins. We empirically show that the purity and f-measure losses to achieve interpretability can be as little as 3 and 5%, respectively using the algorithms presented herein.
Resumo:
Most traditional data mining algorithms struggle to cope with the sheer scale of data efficiently. In this paper, we propose a general framework to accelerate existing clustering algorithms to cluster large-scale datasets which contain large numbers of attributes, items, and clusters. Our framework makes use of locality sensitive hashing (LSH) to significantly reduce the cluster search space. We also theoretically prove that our framework has a guaranteed error bound in terms of the clustering quality. This framework can be applied to a set of centroid-based clustering algorithms that assign an object to the most similar cluster, and we adopt the popular K-Modes categorical clustering algorithm to present how the framework can be applied. We validated our framework with five synthetic datasets and a real world Yahoo! Answers dataset. The experimental results demonstrate that our framework is able to speed up the existing clustering algorithm between factors of 2 and 6, while maintaining comparable cluster purity.
Resumo:
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.
Resumo:
One of the most popular techniques of generating classifier ensembles is known as stacking which is based on a meta-learning approach. In this paper, we introduce an alternative method to stacking which is based on cluster analysis. Similar to stacking, instances from a validation set are initially classified by all base classifiers. The output of each classifier is subsequently considered as a new attribute of the instance. Following this, a validation set is divided into clusters according to the new attributes and a small subset of the original attributes of the instances. For each cluster, we find its centroid and calculate its class label. The collection of centroids is considered as a meta-classifier. Experimental results show that the new method outperformed all benchmark methods, namely Majority Voting, Stacking J48, Stacking LR, AdaBoost J48, and Random Forest, in 12 out of 22 data sets. The proposed method has two advantageous properties: it is very robust to relatively small training sets and it can be applied in semi-supervised learning problems. We provide a theoretical investigation regarding the proposed method. This demonstrates that for the method to be successful, the base classifiers applied in the ensemble should have greater than 50% accuracy levels.
Resumo:
Learning from visual representations is enhanced when learners appropriately integrate corresponding visual and verbal information. This study examined the effects of two methods of promoting integration, color coding and labeling, on learning about probabilistic reasoning from a table and text. Undergraduate students (N = 98) were randomly assigned to learn about probabilistic reasoning from one of 4 computer-based lessons generated from a 2 (color coding/no color coding) by 2 (labeling/no labeling) between-subjects design. Learners added the labels or color coding at their own pace by clicking buttons in a computer-based lesson. Participants' eye movements were recorded while viewing the lesson. Labeling was beneficial for learning, but color coding was not. In addition, labeling, but not color coding, increased attention to important information in the table and time with the lesson. Both labeling and color coding increased looks between the text and corresponding information in the table. The findings provide support for the multimedia principle, and they suggest that providing labeling enhances learning about probabilistic reasoning from text and tables
Resumo:
Social networks generally display a positively skewed degree distribution and higher values for clustering coefficient and degree assortativity than would be expected from the degree sequence. For some types of simulation studies, these properties need to be varied in the artificial networks over which simulations are to be conducted. Various algorithms to generate networks have been described in the literature but their ability to control all three of these network properties is limited. We introduce a spatially constructed algorithm that generates networks with constrained but arbitrary degree distribution, clustering coefficient and assortativity. Both a general approach and specific implementation are presented. The specific implementation is validated and used to generate networks with a constrained but broad range of property values. © Copyright JASSS.
Resumo:
Nos últimos anos temos vindo a assistir a uma mudança na forma como a informação é disponibilizada online. O surgimento da web para todos possibilitou a fácil edição, disponibilização e partilha da informação gerando um considerável aumento da mesma. Rapidamente surgiram sistemas que permitem a coleção e partilha dessa informação, que para além de possibilitarem a coleção dos recursos também permitem que os utilizadores a descrevam utilizando tags ou comentários. A organização automática dessa informação é um dos maiores desafios no contexto da web atual. Apesar de existirem vários algoritmos de clustering, o compromisso entre a eficácia (formação de grupos que fazem sentido) e a eficiência (execução em tempo aceitável) é difícil de encontrar. Neste sentido, esta investigação tem por problemática aferir se um sistema de agrupamento automático de documentos, melhora a sua eficácia quando se integra um sistema de classificação social. Analisámos e discutimos dois métodos baseados no algoritmo k-means para o clustering de documentos e que possibilitam a integração do tagging social nesse processo. O primeiro permite a integração das tags diretamente no Vector Space Model e o segundo propõe a integração das tags para a seleção das sementes iniciais. O primeiro método permite que as tags sejam pesadas em função da sua ocorrência no documento através do parâmetro Social Slider. Este método foi criado tendo por base um modelo de predição que sugere que, quando se utiliza a similaridade dos cossenos, documentos que partilham tags ficam mais próximos enquanto que, no caso de não partilharem, ficam mais distantes. O segundo método deu origem a um algoritmo que denominamos k-C. Este para além de permitir a seleção inicial das sementes através de uma rede de tags também altera a forma como os novos centróides em cada iteração são calculados. A alteração ao cálculo dos centróides teve em consideração uma reflexão sobre a utilização da distância euclidiana e similaridade dos cossenos no algoritmo de clustering k-means. No contexto da avaliação dos algoritmos foram propostos dois algoritmos, o algoritmo da “Ground truth automática” e o algoritmo MCI. O primeiro permite a deteção da estrutura dos dados, caso seja desconhecida, e o segundo é uma medida de avaliação interna baseada na similaridade dos cossenos entre o documento mais próximo de cada documento. A análise de resultados preliminares sugere que a utilização do primeiro método de integração das tags no VSM tem mais impacto no algoritmo k-means do que no algoritmo k-C. Além disso, os resultados obtidos evidenciam que não existe correlação entre a escolha do parâmetro SS e a qualidade dos clusters. Neste sentido, os restantes testes foram conduzidos utilizando apenas o algoritmo k-C (sem integração de tags no VSM), sendo que os resultados obtidos indicam que a utilização deste algoritmo tende a gerar clusters mais eficazes.
Resumo:
Background Clustering of lifestyle risk behaviours is very important in predicting premature mortality. Understanding the extent to which risk behaviours are clustered in deprived communities is vital to most effectively target public health interventions. Methods We examined co-occurrence and associations between risk behaviours (smoking, alcohol consumption, poor diet, low physical activity and high sedentary time) reported by adults living in deprived London neighbourhoods. Associations between sociodemographic characteristics and clustered risk behaviours were examined. Latent class analysis was used to identify underlying clustering of behaviours. Results Over 90% of respondents reported at least one risk behaviour. Reporting specific risk behaviours predicted reporting of further risk behaviours. Latent class analyses revealed four underlying classes. Membership of a maximal risk behaviour class was more likely for young, white males who were unable to work. Conclusions Compared with recent national level analysis, there was a weaker relationship between education and clustering of behaviours and a very high prevalence of clustering of risk behaviours in those unable to work. Young, white men who report difficulty managing on income were at high risk of reporting multiple risk behaviours. These groups may be an important target for interventions to reduce premature mortality caused by multiple risk behaviours.
Resumo:
This study explored views of 566 Italian psychology students about schizophrenia. The most frequently cited causes were psychological traumas (68%) and heredity (54%). Thirty-three percent of students firmly believed that people with the condition could recover. Reporting heredity among the causes, and identifying schizophrenia were both associated with prognostic pessimism, greater confidence in pharmacological treatments and lower confidence in psychological treatments. Schizophrenia labeling was also associated with higher perception of unpredictability and dangerousness. Compared to first year students, fourth/fifth year students more frequently reported heredity among the causes, and were more pessimistic about schizophrenia recovery. Stigma topics should be included in future psychologists’ education.
Resumo:
The high level of unemployment is one of the major problems in most European countries nowadays. Hence, the demand for small area labor market statistics has rapidly increased over the past few years. The Labour Force Survey (LFS) conducted by the Portuguese Statistical Office is the main source of official statistics on the labour market at the macro level (e.g. NUTS2 and national level). However, the LFS was not designed to produce reliable statistics at the micro level (e.g. NUTS3, municipalities or further disaggregate level) due to small sample sizes. Consequently, traditional design-based estimators are not appropriate. A solution to this problem is to consider model-based estimators that "borrow information" from related areas or past samples by using auxiliary information. This paper reviews, under the model-based approach, Best Linear Unbiased Predictors and an estimator based on the posterior predictive distribution of a Hierarchical Bayesian model. The goal of this paper is to analyze the possibility to produce accurate unemployment rate statistics at micro level from the Portuguese LFS using these kinds of stimators. This paper discusses the advantages of using each approach and the viability of its implementation.
Resumo:
Tese de doutoramento, Ciências Biomédicas (Neurociências), Universidade de Lisboa, Faculdade de Medicina, 2014