979 resultados para clustering techniques
Resumo:
We develop a full theoretical approach to clustering in complex networks. A key concept is introduced, the edge multiplicity, that measures the number of triangles passing through an edge. This quantity extends the clustering coefficient in that it involves the properties of two¿and not just one¿vertices. The formalism is completed with the definition of a three-vertex correlation function, which is the fundamental quantity describing the properties of clustered networks. The formalism suggests different metrics that are able to thoroughly characterize transitive relations. A rigorous analysis of several real networks, which makes use of this formalism and the metrics, is also provided. It is also found that clustered networks can be classified into two main groups: the weak and the strong transitivity classes. In the first class, edge multiplicity is small, with triangles being disjoint. In the second class, edge multiplicity is high and so triangles share many edges. As we shall see in the following paper, the class a network belongs to has strong implications in its percolation properties.
Resumo:
The percolation properties of clustered networks are analyzed in detail. In the case of weak clustering, we present an analytical approach that allows us to find the critical threshold and the size of the giant component. Numerical simulations confirm the accuracy of our results. In more general terms, we show that weak clustering hinders the onset of the giant component whereas strong clustering favors its appearance. This is a direct consequence of the differences in the k-core structure of the networks, which are found to be totally different depending on the level of clustering. An empirical analysis of a real social network confirms our predictions.
Resumo:
We present a generator of random networks where both the degree-dependent clustering coefficient and the degree distribution are tunable. Following the same philosophy as in the configuration model, the degree distribution and the clustering coefficient for each class of nodes of degree k are fixed ad hoc and a priori. The algorithm generates corresponding topologies by applying first a closure of triangles and second the classical closure of remaining free stubs. The procedure unveils an universal relation among clustering and degree-degree correlations for all networks, where the level of assortativity establishes an upper limit to the level of clustering. Maximum assortativity ensures no restriction on the decay of the clustering coefficient whereas disassortativity sets a stronger constraint on its behavior. Correlation measures in real networks are seen to observe this structural bound.
Resumo:
Principles: Surgeon's experience is crucial for proper application of sentinel node biopsy (SNB) in patients with breast cancer. A 20-30 cases learning curve of sentinel node (SN) and axillary lymph node dissection (ALND) was widely practiced. In order to speed up this learning curve, surgeons may be trained intraoperative by an experienced surgeon. The purpose of this report is to evaluate the results of this procedure. Methods: Patients with one primary invasive breast cancer (cT1-T2[<3 cm]cN0) underwent SNB based on lymphoscintigraphy using technetium Tc 99m colloid, intraoperative gamma probe detection, with or without blue dye mapping. This was followed by completion ALND when SN was positive or not found. SNB was performed by one experienced surgeon (teacher) or by 10 junior surgeons trained by the experienced surgeon (trainees). Four groups were defined: (i) SNB with immediate ALND for the teacher's learning curve, (ii) SNB by the teacher, (iii) SNB by the trainees under the teacher's supervision, and (iv) SNB by the trainees alone. Results: Between May 1999 and December 2007, a total of 808 évaluable patients underwent SNB. The SN identification rate was 98% in the teacher's group, and 99% in the trainees' group (p = 0.196). SN were positive in respectively 28% and 29% of patients (p = 0.196). The distribution of isolated tumor cells, micrometastases and metastases was not statistically different between the teacher's and the trainees' groups (p = 0.163). Conclusion: These comparable results confirm the success with which the SNB was taught. This strategy avoided the 20-30 SNB followed by immediate ALND early required per surgeon.
Resumo:
PURPOSE: To objectively characterize different heart tissues from functional and viability images provided by composite-strain-encoding (C-SENC) MRI. MATERIALS AND METHODS: C-SENC is a new MRI technique for simultaneously acquiring cardiac functional and viability images. In this work, an unsupervised multi-stage fuzzy clustering method is proposed to identify different heart tissues in the C-SENC images. The method is based on sequential application of the fuzzy c-means (FCM) and iterative self-organizing data (ISODATA) clustering algorithms. The proposed method is tested on simulated heart images and on images from nine patients with and without myocardial infarction (MI). The resulting clustered images are compared with MRI delayed-enhancement (DE) viability images for determining MI. Also, Bland-Altman analysis is conducted between the two methods. RESULTS: Normal myocardium, infarcted myocardium, and blood are correctly identified using the proposed method. The clustered images correctly identified 90 +/- 4% of the pixels defined as infarct in the DE images. In addition, 89 +/- 5% of the pixels defined as infarct in the clustered images were also defined as infarct in DE images. The Bland-Altman results show no bias between the two methods in identifying MI. CONCLUSION: The proposed technique allows for objectively identifying divergent heart tissues, which would be potentially important for clinical decision-making in patients with MI.
Resumo:
Background: The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure.Results: We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae.Conclusion: The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.
Resumo:
MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.
Resumo:
Free-living energy expenditure (EE) was assessed in 37 young pregnant Gambian women at the 12th (n = 11, 53.5 +/- 1.7 kg), 24th (n = 14, 54.7 +/- 2.1 kg), and 36th (n = 12, 65.0 +/- 2.6 kg) wk of pregnancy and was compared with nonpregnant nonlactating (NPNL) control women (n = 12, 50.3 +/- 1.6 kg). The following two methods were used to assess EE: 1) the heart rate (HR) method using individual regression lines (HR vs EE) established at different activity levels in a respiration chamber and 2) the doubly labeled water (2H2(18)O) method in a subgroup of 25 pregnant and 7 control women. With the HR method the EE during the agricultural rainy season was found to be 2,408 +/- 87, 2,293 +/- 122, and 2,782 +/- 130 kcal/day at 12, 24, and 36 wk of gestation and were not significantly different from the control group (2,502 +/- 133 kcal/day). These findings were confirmed by the 2H2(18)O measurements, which failed to show any effect of pregnancy on EE. Expressed per unit body weight, the free-living EE was found to be lower (P less than 0.01 with 2H2(18)O method) at 36 wk of gestation than in the NPNL group. It is concluded that, in these Gambian women, energy-sparing mechanisms that contribute to meet the additional energy stress of gestation are operating during pregnancy (e.g., diminished spontaneous physical activity).
Resumo:
Weathering steel is commonly used as a cost-effective alternative for bridge superstructures, as the costs and environmental impacts associated with the maintenance/replacement of paint coatings are theoretically eliminated. The performance of weathering steel depends on the proper formation of a surface patina, which consists of a dense layer of corrosion product used to protect the steel from further atmospheric corrosion. The development of the weathering steel patina may be hindered by environmental factors such as humid environments, wetting/drying cycles, sheltering, exposure to de-icing chlorides, and design details that permit water to pond on steel surfaces. Weathering steel bridges constructed over or adjacent to other roadways could be subjected to sufficient salt spray that would impede the development of an adequate patina. Addressing areas of corrosion on a weathering steel bridge superstructure where a protective patina has not formed is often costly and negates the anticipated cost savings for this type of steel superstructure. Early detection of weathering steel corrosion is important to extending the service life of the bridge structure; however, written inspection procedures are not available for inspectors to evaluate the performance or quality of the patina. This project focused on the evaluation of weathering steel bridge structures, including possible methods to assess the quality of the weathering steel patina and to properly maintain the quality of the patina. The objectives of this project are summarized as follows: Identify weathering steel bridge structures that would be most vulnerable to chloride contamination, based on location, exposure, environment, and other factors. Identify locations on an individual weathering steel bridge structure that would be most susceptible to chloride contamination, such as below joints, splash/spray zones, and areas of ponding water or debris. Identify possible testing methods and/or inspection techniques for inspectors to evaluate the quality of the weathering steel patina at locations discussed above. Identify possible methods to measure and evaluate the level of chloride contamination at the locations discussed above. Evaluate the effectiveness of water washing on removing chlorides from the weathering steel patina. Develop a general prioritization for the washing of bridge structures based on the structure’s location, environment, inspection observations, patina evaluation findings, and chloride test results.