939 resultados para Incremental Clustering


20.00% 20.00%



Recent development of characterisation techniques and computer simulation has extended our ability to access atomic scale information regarding materials microstructure evolution. New results from such techniques have significantly progressed our knowledge about solute behaviour during the earliest stages of decomposition of the solid solution. This chapter updates current understanding about solute clustering and discusses the effect of solute clustering and micro-alloying on precipitate microstructure evolution in aluminium alloys. In addition, a brief review is given on the effect of severe plastic deformation on precipitate evolution in Al alloys.


20.00% 20.00%



This paper reports robustness comparison of clustering-based multi-label classification methods versus nonclustering counterparts for multi-concept associated image and video annotations. In the experimental setting of this paper, we adopted six popular multi-label classification Algorithms, two different base classifiers for problem transformation based multilabel classifications, and three different clustering algorithms for pre-clustering of the training data. We conducted experimental evaluation on two multi-label benchmark datasets: scene image data and mediamill video data. We also employed two multi-label classification evaluation metrics, namely, micro F1-measure and Hamming-loss to present the predictive performance of the classifications. The results reveal that different base classifiers and clustering methods contribute differently to the performance of the multi-label classifications. Overall, the pre-clustering methods improve the effectiveness of multi-label classifications in certain experimental settings. This provides vital information to users when deciding which multi-label classification method to choose for multiple-concept associated image and video annotations.


20.00% 20.00%



In this paper we propose Incremental Sequential PAttern Discovery using Equivalence classes (IncSPADE) algorithm to mine the dynamic database without the requirement of re-scanning the database again. In order to evaluate this algorithm, we conducted the experiments against three different artificial datasets. The result shows that IncSPADE outperformed the benchmarked algorithm called SPADE up to 20%.


20.00% 20.00%



Analysis of crowd behaviour in public places is an indispensable tool for video surveillance. Automated detection of anomalous crowd behaviour is a critical problem with the increase in human population. Anomalous events may include a person loitering about a place for unusual amounts of time; people running and causing panic; the size of a group of people growing over time etc. In this work, to detect anomalous events and objects, two types of feature coding has been proposed: spatial features and spatio-temporal features. Spatial features comprises of contrast, correlation, energy and homogeneity, which are derived from Gray Level Co-occurrence Matrix (GLCM). Spatio-temporal feature includes the time spent by an object at different locations in the scene. Hyperspherical clustering has been employed to detect the anomalies. Spatial features revealed the anomalous frames by using contrast and homogeneity measures. Loitering behaviour of the people were detected as anomalous objects using the spatio-temporal coding.


20.00% 20.00%



Multilevel clustering problems where the con-tent and contextual information are jointly clustered are ubiquitous in modern datasets. Existing works on this problem are limited to small datasets due to the use of the Gibbs sampler. We address the problem of scaling up multi-level clustering under a Bayesian nonparametric setting, extending the MC2 model proposed in (Nguyen et al., 2014). We ground our approach in structured mean-field and stochastic variational inference (SVI) and develop a tree-structured SVI algorithm that exploits the interplay between content and context modeling. Our new algorithm avoids the need to repeatedly go through the corpus as in Gibbs sampler. More crucially, our method is immediately amendable to parallelization, facilitating a scalable distributed implementation on the Apache Spark platform. We conduct extensive experiments in a variety of domains including text, images, and real-world user application activities. Direct comparison with the Gibbs-sampler demonstrates that our method is an order-of-magnitude faster without loss of model quality. Our Spark-based implementation gains an-other order-of-magnitude speedup and can scale to large real-world datasets containing millions of documents and groups.


20.00% 20.00%



OBJECTIVE: To characterise clusters of individuals based on adherence to dietary recommendations and to determine whether changes in Healthy Eating Index (HEI) scores in response to a personalised nutrition (PN) intervention varied between clusters.

DESIGN: Food4Me study participants were clustered according to whether their baseline dietary intakes met European dietary recommendations. Changes in HEI scores between baseline and month 6 were compared between clusters and stratified by whether individuals received generalised or PN advice.

SETTING: Pan-European, Internet-based, 6-month randomised controlled trial.

SUBJECTS: Adults aged 18-79 years (n 1480).

RESULTS: Individuals in cluster 1 (C1) met all recommended intakes except for red meat, those in cluster 2 (C2) met two recommendations, and those in cluster 3 (C3) and cluster 4 (C4) met one recommendation each. C1 had higher intakes of white fish, beans and lentils and low-fat dairy products and lower percentage energy intake from SFA (P<0·05). C2 consumed less chips and pizza and fried foods than C3 and C4 (P<0·05). C1 were lighter, had lower BMI and waist circumference than C3 and were more physically active than C4 (P<0·05). More individuals in C4 were smokers and wanted to lose weight than in C1 (P<0·05). Individuals who received PN advice in C4 reported greater improvements in HEI compared with C3 and C1 (P<0·05).

CONCLUSIONS: The cluster where the fewest recommendations were met (C4) reported greater improvements in HEI following a 6-month trial of PN whereas there was no difference between clusters for those randomised to the Control, non-personalised dietary intervention.


20.00% 20.00%



Anomaly detection as a kind of intrusion detection is good at detecting the unknown attacks or new attacks, and it has attracted much attention during recent years. In this paper, a new hierarchy anomaly intrusion detection model that combines the fuzzy c-means (FCM) based on genetic algorithm and SVM is proposed. During the process of detecting intrusion, the membership function and the fuzzy interval are applied to it, and the process is extended to soft classification from the previous hard classification. Then a fuzzy error correction sub interval is introduced, so when the detection result of a data instance belongs to this range, the data will be re-detected in order to improve the effectiveness of intrusion detection. Experimental results show that the proposed model can effectively detect the vast majority of network attack types, which provides a feasible solution for solving the problems of false alarm rate and detection rate in anomaly intrusion detection model.


20.00% 20.00%



This paper proposes a novel application of Visual Assessment of Tendency (VAT)-based hierarchical clustering algorithms (VAT, iVAT, and clusiVAT) for trajectory analysis. We introduce a new clustering based anomaly detection framework named iVAT+ and clusiVAT+ and use it for trajectory anomaly detection. This approach is based on partitioning the VAT-generated Minimum Spanning Tree based on an efficient thresholding scheme. The trajectories are classified as normal or anomalous based on the number of paths in the clusters. On synthetic datasets with fixed and variable numbers of clusters and anomalies, we achieve 98 % classification accuracy. Our two-stage clusiVAT method is applied to 26,039 trajectories of vehicles and pedestrians from a parking lot scene from the real life MIT trajectories dataset. The first stage clusters the trajectories ignoring directionality. The second stage divides the clusters obtained from the first stage by considering trajectory direction. We show that our novel two-stage clusiVAT approach can produce natural and informative trajectory clusters on this real life dataset while finding representative anomalies.


20.00% 20.00%



Clustering of big data has received much attention recently. In this paper, we present a new clusiVAT algorithm and compare it with four other popular data clustering algorithms. Three of the four comparison methods are based on the well known, classical batch k-means model. Specifically, we use k-means, single pass k-means, online k-means, and clustering using representatives (CURE) for numerical comparisons. clusiVAT is based on sampling the data, imaging the reordered distance matrix to estimate the number of clusters in the data visually, clustering the samples using a relative of single linkage (SL), and then noniteratively extending the labels to the rest of the data-set using the nearest prototype rule. Previous work has established that clusiVAT produces true SL clusters in compact-separated data. We have performed experiments to show that k-means and its modified algorithms suffer from initialization issues that cause many failures. On the other hand, clusiVAT needs no initialization, and almost always finds partitions that accurately match ground truth labels in labeled data. CURE also finds SL type partitions but is much slower than the other four algorithms. In our experiments, clusiVAT proves to be the fastest and most accurate of the five algorithms; e.g., it recovers 97% of the ground truth labels in the real world KDD-99 cup data (4 292 637 samples in 41 dimensions) in 76 s.


20.00% 20.00%



Durante la vita operativa di un aeromobile, gli elementi costitutivi possono essere soggetti a diverse tipologie di carichi. Questi carichi possono provocare la nascita e la propagazione di eventuali cricche, le quali una volta raggiunta una determinata dimensione possono portare alla rottura del componente stesso causando gravi incidenti. A tale proposito, la fatica costituisce uno dei fattori principali di rottura delle strutture aeronautiche. Lo studio e l’applicazione dei principi di fatica sugli aeroplani sono relativamente recenti, in quanto inizialmente gli aerei erano realizzati in tela e legno, un materiale che non soffre di fatica e assorbe le vibrazioni. I materiali aeronautici si sono evoluti nel tempo fino ad arrivare all’impiego dei materiali compositi per la costruzione degli aeromobili, nel 21. secolo. Il legame tra nascita/propagazione delle cricche e le tensioni residue ha portato allo sviluppo di numerose tecniche per la misurazione di queste ultime, con il fine di contrastare il fenomeno di rottura a fatica. Per la misurazione delle tensioni residue nei componenti metallici esistono diverse normative di riferimento, al contrario, per i materiali compositi, la normativa di riferimento è tuttora oggetto di studio. Lo scopo di questa tesi è quello di realizzare una ricerca e studiare dei metodi di riferimento per la misurazione delle tensioni residue nei laminati compositi, tramite l’approfondimento di una tecnica di misurazione delle tensioni residue, denominata Incremental Hole Drilling.


20.00% 20.00%



In this thesis we present a mathematical formulation of the interaction between microorganisms such as bacteria or amoebae and chemicals, often produced by the organisms themselves. This interaction is called chemotaxis and leads to cellular aggregation. We derive some models to describe chemotaxis. The first is the pioneristic Keller-Segel parabolic-parabolic model and it is derived by two different frameworks: a macroscopic perspective and a microscopic perspective, in which we start with a stochastic differential equation and we perform a mean-field approximation. This parabolic model may be generalized by the introduction of a degenerate diffusion parameter, which depends on the density itself via a power law. Then we derive a model for chemotaxis based on Cattaneo's law of heat propagation with finite speed, which is a hyperbolic model. The last model proposed here is a hydrodynamic model, which takes into account the inertia of the system by a friction force. In the limit of strong friction, the model reduces to the parabolic model, whereas in the limit of weak friction, we recover a hyperbolic model. Finally, we analyze the instability condition, which is the condition that leads to aggregation, and we describe the different kinds of aggregates we may obtain: the parabolic models lead to clusters or peaks whereas the hyperbolic models lead to the formation of network patterns or filaments. Moreover, we discuss the analogy between bacterial colonies and self gravitating systems by comparing the chemotactic collapse and the gravitational collapse (Jeans instability).


20.00% 20.00%



Chronic traumatic encephalopathy (CTE) is a neurodegenerative disorder which may result from repetitive brain injury. A variety of tau-immunoreactive pathologies are present, including neurofibrillary tangles (NFT), neuropil threads (NT), dot-like grains (DLG), astrocytic tangles (AT), and occasional neuritic plaques (NP). In tauopathies, cellular inclusions in the cortex are clustered within specific laminae, the clusters being regularly distributed parallel to the pia mater. To determine whether a similar spatial pattern is present in CTE, clustering of the tau-immunoreactive pathology was studied in the cortex, hippocampus, and dentate gyrus in 11 cases of CTE and 7 cases of Alzheimer’s disease neuropathologic change (ADNC) without CTE. In CTE: (1) all aspects of tau-immunoreactive pathology were clustered and the clusters were frequently regularly distributed parallel to the tissue boundary, (2) clustering was similar in two CTE cases with minimal co-pathology compared with cases with associated ADNC or TDP-43 proteinopathy, (3) in a proportion of cortical gyri, estimated cluster size was similar to that of cell columns of the cortico-cortical pathways, and (4) clusters of the tau-immunoreactive pathology were infrequently spatially correlated with blood vessels. The NFT and NP in ADNC without CTE were less frequently randomly or uniformly distributed and more frequently in defined clusters than in CTE. Hence, the spatial pattern of the tau-immunoreactive pathology observed in CTE is typical of the tauopathies but with some distinct differences compared to ADNC alone. The spread of pathogenic tau along anatomical pathways could be a factor in the pathogenesis of the disease.


20.00% 20.00%



The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.


20.00% 20.00%



Influenza A virus is an important human pathogen causative of yearly epidemics and occasional pandemics. The ability to replicate within the host cell is a determinant of virulence, amplifying viral numbers for host-to-host transmission. This process requires multiple rounds of entering permissive cells, replication, and virion assembly at the plasma membrane, the site of viral budding and release. The assembly of influenza A virus involves packaging of several viral (and host) proteins and of a segmented genome, composed of 8 distinct RNAs in the form of viral ribonucleoproteins (vRNPs). The selective assembly of the 8-segment core remains one of the most interesting unresolved problems in virology. The recycling endosome regulatory GTPase Rab11 was shown to contribute to the process, by transporting vRNPs to the periphery, giving rise to enlarged cytosolic puncta rich in Rab11 and the 8 vRNPs. We recently reported that vRNP hotspots were formed of clustered vesicles harbouring protruding electron-dense structures that resembled vRNPs. Mechanistically, vRNP hotspots were formed as vRNPs outcompeted the cognate effectors of Rab11, the Rab11-Family-Interacting-Proteins (FIPs) for binding, and as a consequence impair recycling sorting at an unknown step. Here, we speculate on the impact that such impairment might have in host immunity, membrane architecture and viral assembly.


20.00% 20.00%



Research regarding the use of social media among travelers has mainly focused on its impact on travelers’ travel planning process and there is consensus that travel decisions are highly influenced by social media. Yet, little attention has been paid to the differences among travelers regarding their use of social media for travel purposes. Based on the use of travel social media, cluster analysis was employed to identify different segments among travelers. Furthermore, the study profiles the clusters based on demographic and other travel related characteristics. The findings of this study are important to online marketers to better understand traveler’s use of social media and their characteristics, in order to adapt online marketing strategies according to the profile of each segment.