948 resultados para Clustering Analysis
Resumo:
In many fields, the spatial clustering of sampled data points has many consequences. Therefore, several indices have been proposed to assess the level of clustering affecting datasets (e.g. the Morisita index, Ripley's Kfunction and Rényi's generalized entropy). The classical Morisita index measures how many times it is more likely to select two measurement points from the same quadrats (the data set is covered by a regular grid of changing size) than it would be in the case of a random distribution generated from a Poisson process. The multipoint version (k-Morisita) takes into account k points with k >= 2. The present research deals with a new development of the k-Morisita index for (1) monitoring network characterization and for (2) detection of patterns in monitored phenomena. From a theoretical perspective, a connection between the k-Morisita index and multifractality has also been found and highlighted on a mathematical multifractal set.
Resumo:
Specific properties emerge from the structure of large networks, such as that of worldwide air traffic, including a highly hierarchical node structure and multi-level small world sub-groups that strongly influence future dynamics. We have developed clustering methods to understand the form of these structures, to identify structural properties, and to evaluate the effects of these properties. Graph clustering methods are often constructed from different components: a metric, a clustering index, and a modularity measure to assess the quality of a clustering method. To understand the impact of each of these components on the clustering method, we explore and compare different combinations. These different combinations are used to compare multilevel clustering methods to delineate the effects of geographical distance, hubs, network densities, and bridges on worldwide air passenger traffic. The ultimate goal of this methodological research is to demonstrate evidence of combined effects in the development of an air traffic network. In fact, the network can be divided into different levels of âeurooecohesionâeuro, which can be qualified and measured by comparative studies (Newman, 2002; Guimera et al., 2005; Sales-Pardo et al., 2007).
Resumo:
Homologies of minicircle kDNA of 27 Mexican stocks were studied by cross-hybridization with four kDNA probes derived from three reference stocks belonging to groups Trypanosoma cruzi I (SO34 cl4 and Silvio) and T. cruzi II (MN) and one Mexican stock. High homologies were only observed with Silvio (six stocks) and Mexican probes (11 stocks). After 30 min exposure (low homology) additional stocks were recognized with SO34 cl4 (three stocks) and Silvio (six stocks) probes; with the Mexican probe only five stocks remained non-reactive. All the stocks were typed by isoenzyme (16 loci) and Mexican stocks belonged to T. cruzi I. Hybridization patterns were not strictly correlated with the observed clustering and cross-hybridization of kDNA minicircles is not available to distinct Mexican stocks.
Resumo:
The aim of this study was to describe spatial patterns of the distribution of leprosy and to investigate spatial clustering of incidence rates in the state of Ceará, Northeast Brazil. The average incidence rate of leprosy for the period of 1991 to 1999 was calculated for each municipality of Ceará. Maps were used to describe the spatial distribution of the disease, and spatial statistics were applied to explore large- and small-scale variations of incidence rates. Three regions were identified in which the incidence of leprosy was particularly high. A spatial gradient in the incidence rates was identified, with a tendency of high rates to be concentrated on the North-South axis in the middle region of the state. Moran's I statistic indicated that a significant spatial autocorrelation also existed. The spatial distribution of leprosy in Ceará is heterogeneous. The reasons for spatial clustering of disease rates are not known, but might be related to an heterogeneous distribution of other factors such as crowding, social inequality, and environmental characteristics which by themselves determine the transmission of Mycobacterium leprae.
Resumo:
Nuclear internal transcribed spacer 2 (ITS2) rDNA sequences were used for a molecular phylogenetics analysis of five Onchocerca species. The sister species of the human parasite O. volvulus was found to be the cattle parasite O. ochengi and not O. gibsoni, contrary to chromosomal evidence. The genetic differentiation of two African populations (representing the two African strains) and a Brazilian population of O. volvulus was also studied. Phylogenetic and network reconstruction did not show any clustering of ITS2 alleles on geographic or strain grounds. Furthermore, population genetics tests showed no indication of population differentiation but suggested gene flow among the three populations.
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the RodriguesTriple Junction in the Indian Ocean were studied applying classical statistical methods(fuzzy c-means clustering, linear mixing model, principal component analysis) for theextraction of endmembers and evaluating the spatial and temporal variation ofgeochemical signals. Three main factors of sedimentation were expected by the marinegeologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. Thedisplay of fuzzy membership values and/or factor scores versus depth providedconsistent results for two factors only; the ultra-basic component could not beidentified. The reason for this may be that only traditional statistical methods wereapplied, i.e. the untransformed components were used and the cosine-theta coefficient assimilarity measure.During the last decade considerable progress in compositional data analysis was madeand many case studies were published using new tools for exploratory analysis of thesedata. Therefore it makes sense to check if the application of suitable data transformations,reduction of the D-part simplex to two or three factors and visualinterpretation of the factor scores would lead to a revision of earlier results and toanswers to open questions . In this paper we follow the lines of a paper of R. Tolosana-Delgado et al. (2005) starting with a problem-oriented interpretation of the biplotscattergram, extracting compositional factors, ilr-transformation of the components andvisualization of the factor scores in a spatial context: The compositional factors will beplotted versus depth (time) of the core samples in order to facilitate the identification ofthe expected sources of the sedimentary process.Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
Patients with Temporal Lobe Epilepsy (TLE) suffer from widespread subtle white matter abnormalities and abnormal functional connectivity extending beyond the affected lobe, as revealed by Diffusion Tensor MR Imaging, volumetric and functional MRI studies. Diffusion Spectrum Imaging (DSI) is a diffusion imaging technique with high angular resolution for improving the mapping of white matter pathways. In this study, we used DSI, connectivity matrices and topological measures to investigate how the alteration in structural connectivity influences whole brain structural networks. Eleven patients with right-sided TLE and hippocampal sclerosis and 18 controls underwent our DSI protocol at 3T. The cortical and subcortical grey matters were parcellated into 86 regions of interest and the connectivity between every region pair was estimated using global tractography and a connectivity matrix (the adjacency matrix of the structural network). We then compared the networks of patients and controls using topological measures. In patients, we found a higher characteristic path length and a lower clustering coefficient compared to controls. Local measures at node level of the clustering and efficiency showed a significant difference after a multiple comparison correction (Bonferroni). These significant nodes were located within as well outside the temporal lobe, and the localisation of most of them was consistent with regions known to be part of epileptic networks in TLE. Our results show altered connectivity patterns that are concordant with the mapping of functional epileptic networks in patients with TLE. Further studies are needed to establish the relevance of these findings for the propagation of epileptic activity, cognitive deficits in medial TLE and outcome of epilepsy surgery in individual patients.
Resumo:
In image segmentation, clustering algorithms are very popular because they are intuitive and, some of them, easy to implement. For instance, the k-means is one of the most used in the literature, and many authors successfully compare their new proposal with the results achieved by the k-means. However, it is well known that clustering image segmentation has many problems. For instance, the number of regions of the image has to be known a priori, as well as different initial seed placement (initial clusters) could produce different segmentation results. Most of these algorithms could be slightly improved by considering the coordinates of the image as features in the clustering process (to take spatial region information into account). In this paper we propose a significant improvement of clustering algorithms for image segmentation. The method is qualitatively and quantitative evaluated over a set of synthetic and real images, and compared with classical clustering approaches. Results demonstrate the validity of this new approach
Resumo:
A methodology of exploratory data analysis investigating the phenomenon of orographic precipitation enhancement is proposed. The precipitation observations obtained from three Swiss Doppler weather radars are analysed for the major precipitation event of August 2005 in the Alps. Image processing techniques are used to detect significant precipitation cells/pixels from radar images while filtering out spurious effects due to ground clutter. The contribution of topography to precipitation patterns is described by an extensive set of topographical descriptors computed from the digital elevation model at multiple spatial scales. Additionally, the motion vector field is derived from subsequent radar images and integrated into a set of topographic features to highlight the slopes exposed to main flows. Following the exploratory data analysis with a recent algorithm of spectral clustering, it is shown that orographic precipitation cells are generated under specific flow and topographic conditions. Repeatability of precipitation patterns in particular spatial locations is found to be linked to specific local terrain shapes, e.g. at the top of hills and on the upwind side of the mountains. This methodology and our empirical findings for the Alpine region provide a basis for building computational data-driven models of orographic enhancement and triggering of precipitation. Copyright (C) 2011 Royal Meteorological Society .
Resumo:
OBJECTIVE: This study assessed clustering of multiple risk behaviors (i.e., low leisure-time physical activity, low fruits/vegetables intake, and high alcohol consumption) with level of cigarette consumption. METHODS: Data from the 2002 Swiss Health Survey, a population-based cross-sectional telephone survey assessing health and self-reported risk behaviors, were used. 18,005 subjects (8052 men and 9953 women) aged 25 years old or more participated. RESULTS: Smokers more frequently had low leisure time physical activity, low fruits/vegetables intake, and high alcohol consumption than non- and ex-smokers. Frequency of each risk behavior increased steadily with cigarette consumption. Clustering of risk behaviors increased with cigarette consumption in both men and women. For men, the odds ratios of multiple (> or =2) risk behaviors other than smoking, adjusted for age, nationality, and educational level, were 1.14 (95% confidence interval: 0.97, 1.33) for ex-smokers, 1.24 (0.93, 1.64) for light smokers (1-9 cigarettes/day), 1.72 (1.36, 2.17) for moderate smokers (10-19 cigarettes/day), and 3.07 (2.59, 3.64) for heavy smokers (> or =20 cigarettes/day) versus non-smokers. Similar odds ratios were found for women for corresponding groups, i.e., 1.01 (0.86, 1.19), 1.26 (1.00, 1.58), 1.62 (1.33, 1.98), and 2.75 (2.30, 3.29). CONCLUSIONS: Counseling and intervention with smokers should take into account the strong clustering of risk behaviors with level of cigarette consumption.
Resumo:
Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices
Resumo:
Globalization involves several facility location problems that need to be handled at large scale. Location Allocation (LA) is a combinatorial problem in which the distance among points in the data space matter. Precisely, taking advantage of the distance property of the domain we exploit the capability of clustering techniques to partition the data space in order to convert an initial large LA problem into several simpler LA problems. Particularly, our motivation problem involves a huge geographical area that can be partitioned under overall conditions. We present different types of clustering techniques and then we perform a cluster analysis over our dataset in order to partition it. After that, we solve the LA problem applying simulated annealing algorithm to the clustered and non-clustered data in order to work out how profitable is the clustering and which of the presented methods is the most suitable
Resumo:
BACKGROUND: It is unknown why patients with extensive ulcerative colitis (UC) have a higher risk of colorectal cancer compared with patients with left-sided UC. This study characterizes the inflammatory processes in left-sided UC, pancolitis, and UC-associated dysplasia at the transcriptional level to identify potential biomarkers and transcripts of importance for the carcinogenic behavior of chronic inflammation. METHODS: The Affymetrix GeneChip Human Genome U133 Plus 2.0 was applied on colonic biopsies from UC patients with left-sided UC, pancolitis, dysplasia, and controls. Reverse transcription polymerase chain reaction and immunohistochemistry were performed for validating selected transcripts in the initial cohort and in 2 independent cohorts of patients with UC. Microarray data were analyzed by principal component analysis, and reverse transcription polymerase chain reaction and immunohistochemistry data by the Wilcoxon's rank-sum test. RESULTS: The principal component analysis results revealed separate clusters for left-sided UC, pancolitis, dysplasia, and controls. Close clustering of dysplastic and pancolitic samples indicated similarities in gene expression. Indeed, 101 and 656 parallel upregulated and downregulated transcripts, respectively, were identified in specimens from dysplasia and pancolitis. Validation of selected transcripts hereof identified insulin receptor alpha (INSRA) and MAP kinase interacting serine/threonine kinase 2 (MKNK2) with an enhanced expression in dysplasia compared with left-sided UC and controls, whereas laminin γ2 (LAMC2) was found with a lower expression in dysplasia compared with the remaining 3 groups. CONCLUSIONS: This study demonstrates pancolitis and left-sided UC as distinct inflammatory processes at the transcriptional level, and identifies INSRA, MKNK2, and LAMC2 as potential critical transcripts in the inflammation-driven preneoplastic process of UC.
Resumo:
In this study we investigated the effect of medial temporal lobe epilepsy (MTLE) on the global characteristics of brain connectivity estimated by topological measures. We used DSI (Diffusion Spectrum Imaging) to construct a connectivity matrix where the nodes represents the anatomical ROIs and the edges are the connections between any pair of ROIs weighted by the mean GFA/FA values. A significant difference was found between the patient group vs control group in characteristic path length, clustering coefficient and small-worldness. This suggests that the MTLE network is less efficient compared to the network of the control group.
Resumo:
BACKGROUND: The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure. RESULTS: We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae. CONCLUSION: The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.