947 resultados para spatial clustering algorithms


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The aim of this study was to describe spatial patterns of the distribution of leprosy and to investigate spatial clustering of incidence rates in the state of Ceará, Northeast Brazil. The average incidence rate of leprosy for the period of 1991 to 1999 was calculated for each municipality of Ceará. Maps were used to describe the spatial distribution of the disease, and spatial statistics were applied to explore large- and small-scale variations of incidence rates. Three regions were identified in which the incidence of leprosy was particularly high. A spatial gradient in the incidence rates was identified, with a tendency of high rates to be concentrated on the North-South axis in the middle region of the state. Moran's I statistic indicated that a significant spatial autocorrelation also existed. The spatial distribution of leprosy in Ceará is heterogeneous. The reasons for spatial clustering of disease rates are not known, but might be related to an heterogeneous distribution of other factors such as crowding, social inequality, and environmental characteristics which by themselves determine the transmission of Mycobacterium leprae.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Schistosomiasis prevalence and egg counts remained low one year after chemotherapy in most households in a hyperendemic rural area in northern Minas Gerais but several distinct spatial patterns could be observed in relation to IgE levels and to a lesser extent to exposure risk (TBM) and type of water supply. An inverse relationship between pre-treatment household prevalence and egg counts on the one hand and post-treatment IgE levels on the other were noted in two of the five communities. Low exposure risk was associated with the low pre-treatment infection rates in the central village but did not contribute to the decline of infection rates after chemotherapy in the study area, as indicated by the significant increase in water contact during the posttreatment period (p < 0.0001). Distance between households and the streams and socioeconomic factors were also unimportant in predicting the spatial distribution of infection. These results are consistent with the production and antiparasitic effect of high levels of IgE in Schistosoma mansoni infection.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Praziquantel chemotherapy has been the focus of the Schistosomiasis Control Program in Brazil for the past two decades. Nevertheless, information on the impact of selective chemotherapy against Schistosoma mansoni infection under the conditions confronted by the health teams in endemic municipalities remains scarce. This paper compares the spatial pattern of infection before and after treatment with either a 40 mg/kg or 60 mg/kg dose of praziquantel by determining the intensity of spatial cluster among patients at 180 and 360 days after treatment. The spatial-temporal distribution of egg-positive patients was analysed in a Geographic Information System using the kernel smoothing technique. While all patients became egg-negative after 21 days, 17.9% and 30.9% reverted to an egg-positive condition after 180 and 360 days, respectively. Both the prevalence and intensity of infection after treatment were significantly lower in the 60 mg/kg than in the 40 mg/kg treatment group. The higher intensity of the kernel in the 40 mg/kg group compared to the 60 mg/kg group, at both 180 and 360 days, reflects the higher number of reverted cases in the lower dose group. Auxiliary, preventive measures to control transmission should be integrated with chemotherapy to achieve a more enduring impact.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices

Relevância:

90.00% 90.00%

Publicador:

Resumo:

PURPOSE: To objectively characterize different heart tissues from functional and viability images provided by composite-strain-encoding (C-SENC) MRI. MATERIALS AND METHODS: C-SENC is a new MRI technique for simultaneously acquiring cardiac functional and viability images. In this work, an unsupervised multi-stage fuzzy clustering method is proposed to identify different heart tissues in the C-SENC images. The method is based on sequential application of the fuzzy c-means (FCM) and iterative self-organizing data (ISODATA) clustering algorithms. The proposed method is tested on simulated heart images and on images from nine patients with and without myocardial infarction (MI). The resulting clustered images are compared with MRI delayed-enhancement (DE) viability images for determining MI. Also, Bland-Altman analysis is conducted between the two methods. RESULTS: Normal myocardium, infarcted myocardium, and blood are correctly identified using the proposed method. The clustered images correctly identified 90 +/- 4% of the pixels defined as infarct in the DE images. In addition, 89 +/- 5% of the pixels defined as infarct in the clustered images were also defined as infarct in DE images. The Bland-Altman results show no bias between the two methods in identifying MI. CONCLUSION: The proposed technique allows for objectively identifying divergent heart tissues, which would be potentially important for clinical decision-making in patients with MI.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Vineyards vary over space and time, making geomatics technologies ideally suited to study terroir. This study applied geomatics technologies - GPS, remote sensing and GIS - to characterize the spatial variability at Stratus Vineyards in the Niagara Region. The concept of spatial terroir was used to visualize, monitor and analyze the spatial and temporal variability of variables that influence grape quality. Spatial interpolation and spatial autocorrelation were used to measure the pattern demonstrated by soil moisture, leaf water potential, vine vigour, soil composition and grape composition on two Cabernet Franc blocks and one Chardonnay block. All variables demonstrated some spatial variability within and between the vineyard block and over time. Soil moisture exhibited the most significant spatial clustering and was temporally stable. Geomatics technologies provided valuable spatial information related to the natural spatial variability at Stratus Vineyards and can be used to inform and influence vineyard management decisions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The goal of most clustering algorithms is to find the optimal number of clusters (i.e. fewest number of clusters). However, analysis of molecular conformations of biological macromolecules obtained from computer simulations may benefit from a larger array of clusters. The Self-Organizing Map (SOM) clustering method has the advantage of generating large numbers of clusters, but often gives ambiguous results. In this work, SOMs have been shown to be reproducible when the same conformational dataset is independently clustered multiple times (~100), with the help of the Cramérs V-index (C_v). The ability of C_v to determine which SOMs are reproduced is generalizable across different SOM source codes. The conformational ensembles produced from MD (molecular dynamics) and REMD (replica exchange molecular dynamics) simulations of the penta peptide Met-enkephalin (MET) and the 34 amino acid protein human Parathyroid Hormone (hPTH) were used to evaluate SOM reproducibility. The training length for the SOM has a huge impact on the reproducibility. Analysis of MET conformational data definitively determined that toroidal SOMs cluster data better than bordered maps due to the fact that toroidal maps do not have an edge effect. For the source code from MATLAB, it was determined that the learning rate function should be LINEAR with an initial learning rate factor of 0.05 and the SOM should be trained by a sequential algorithm. The trained SOMs can be used as a supervised classification for another dataset. The toroidal 10×10 hexagonal SOMs produced from the MATLAB program for hPTH conformational data produced three sets of reproducible clusters (27%, 15%, and 13% of 100 independent runs) which find similar partitionings to those of smaller 6×6 SOMs. The χ^2 values produced as part of the C_v calculation were used to locate clusters with identical conformational memberships on independently trained SOMs, even those with different dimensions. The χ^2 values could relate the different SOM partitionings to each other.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Our essay aims at studying suitable statistical methods for the clustering of compositional data in situations where observations are constituted by trajectories of compositional data, that is, by sequences of composition measurements along a domain. Observed trajectories are known as “functional data” and several methods have been proposed for their analysis. In particular, methods for clustering functional data, known as Functional Cluster Analysis (FCA), have been applied by practitioners and scientists in many fields. To our knowledge, FCA techniques have not been extended to cope with the problem of clustering compositional data trajectories. In order to extend FCA techniques to the analysis of compositional data, FCA clustering techniques have to be adapted by using a suitable compositional algebra. The present work centres on the following question: given a sample of compositional data trajectories, how can we formulate a segmentation procedure giving homogeneous classes? To address this problem we follow the steps described below. First of all we adapt the well-known spline smoothing techniques in order to cope with the smoothing of compositional data trajectories. In fact, an observed curve can be thought of as the sum of a smooth part plus some noise due to measurement errors. Spline smoothing techniques are used to isolate the smooth part of the trajectory: clustering algorithms are then applied to these smooth curves. The second step consists in building suitable metrics for measuring the dissimilarity between trajectories: we propose a metric that accounts for difference in both shape and level, and a metric accounting for differences in shape only. A simulation study is performed in order to evaluate the proposed methodologies, using both hierarchical and partitional clustering algorithm. The quality of the obtained results is assessed by means of several indices

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper is concerned with tensor clustering with the assistance of dimensionality reduction approaches. A class of formulation for tensor clustering is introduced based on tensor Tucker decomposition models. In this formulation, an extra tensor mode is formed by a collection of tensors of the same dimensions and then used to assist a Tucker decomposition in order to achieve data dimensionality reduction. We design two types of clustering models for the tensors: PCA Tensor Clustering model and Non-negative Tensor Clustering model, by utilizing different regularizations. The tensor clustering can thus be solved by the optimization method based on the alternative coordinate scheme. Interestingly, our experiments show that the proposed models yield comparable or even better performance compared to most recent clustering algorithms based on matrix factorization.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Little follow-up data on malaria transmission in communities originating from frontier settlements in Amazonia are available. Here we describe a cohort study in a frontier settlement in Acre, Brazil, where 509 subjects contributed 489.7 person-years of follow-up. The association between malaria morbidity during the follow-up and individual, household, and spatial covariates was explored with mixed-effects logistic regression models and spatial analysis. Incidence rates for Plasmodium vivax and Plasmodium falciparum malaria were 30.0/100 and 16.3/100 person-years at risk, respectively. Malaria morbidity was strongly associated with land clearing and farming, and decreased after five years of residence in the area, suggesting that clinical immunity develops among subjects exposed to low malaria endemicity. Significant spatial clustering of malaria was observed in the areas of most recent occupation, indicating that the continuous influx of nonimmune settlers to forest-fringe areas perpetuates the cycle of environmental change and colonization that favors malaria transmission in rural Amazonia.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.