814 resultados para clustering
Resumo:
In this paper, we propose a Layered Clustering Hierarchy (LCH) communication protocol for Wireless Sensor Networks (WSNs). The design of LCH has two goals: scalability and energy-efficiency. In LCH, the sensor nodes are organized as a layered clustering structure. Each layer runs a distributed clustering protocol. By randomizing the rotation of cluster heads in each layer, the energy load is distributed evenly across sensors in the network. Our simulations show that LCH is effective in densely deployed sensor networks. On average, 70% of live sensor nodes are involved directly in the clustering communication hierarchy. Moreover, the simulations also show that the energy load and dead nodes are distributed evenly over the network. As studies prove that the performance of LCH depends mainly on the distributed clustering protocol, the location of cluster heads and cluster size are two critical factors in the design of LCH.
Resumo:
In [1], the authors proposed a framework for automated clustering and visualization of biological data sets named AUTO-HDS. This letter is intended to complement that framework by showing that it is possible to get rid of a user-defined parameter in a way that the clustering stage can be implemented more accurately while having reduced computational complexity
Resumo:
There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.
Resumo:
This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Background: The development of sugarcane as a sustainable crop has unlimited applications. The crop is one of the most economically viable for renewable energy production, and CO2 balance. Linkage maps are valuable tools for understanding genetic and genomic organization, particularly in sugarcane due to its complex polyploid genome of multispecific origins. The overall objective of our study was to construct a novel sugarcane linkage map, compiling AFLP and EST-SSR markers, and to generate data on the distribution of markers anchored to sequences of scIvana_1, a complete sugarcane transposable element, and member of the Copia superfamily. Results: The mapping population parents ('IAC66-6' and 'TUC71-7') contributed equally to polymorphisms, independent of marker type, and generated markers that were distributed into nearly the same number of co-segregation groups (or CGs). Bi-parentally inherited alleles provided the integration of 19 CGs. The marker number per CG ranged from two to 39. The total map length was 4,843.19 cM, with a marker density of 8.87 cM. Markers were assembled into 92 CGs that ranged in length from 1.14 to 404.72 cM, with an estimated average length of 52.64 cM. The greatest distance between two adjacent markers was 48.25 cM. The scIvana_1-based markers (56) were positioned on 21 CGs, but were not regularly distributed. Interestingly, the distance between adjacent scIvana_1-based markers was less than 5 cM, and was observed on five CGs, suggesting a clustered organization. Conclusions: Results indicated the use of a NBS-profiling technique was efficient to develop retrotransposon-based markers in sugarcane. The simultaneous maximum-likelihood estimates of linkage and linkage phase based strategies confirmed the suitability of its approach to estimate linkage, and construct the linkage map. Interestingly, using our genetic data it was possible to calculate the number of retrotransposonscIvana_1 (similar to 60) copies in the sugarcane genome, confirming previously reported molecular results. In addition, this research possibly will have indirect implications in crop economics e. g., productivity enhancement via QTL studies, as the mapping population parents differ in response to an important fungal disease.
Resumo:
Multicentric carpotarsal osteolysis (MCTO) is a rare skeletal dysplasia characterized by aggressive osteolysis, particularly affecting the carpal and tarsal bones, and is frequently associated with progressive renal failure. Using exome capture and next-generation sequencing in five unrelated simplex cases of MCTO, we identified previously unreported missense mutations clustering within a 51 base pair region of the single exon of MAFB, validated by Sanger sequencing. A further six unrelated simplex cases with MCTO were also heterozygous for previously unreported mutations within this same region, as were affected members of two families with autosomal-dominant MCTO. MAFB encodes a transcription factor that negatively regulates RANKL-induced osteoclastogenesis and is essential for normal renal development. Identification of this gene paves the way for development of novel therapeutic approaches for this crippling disease and provides insight into normal bone and kidney development.
Resumo:
The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.
Resumo:
Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.
Resumo:
This paper addresses the m-machine no-wait flow shop problem where the set-up time of a job is separated from its processing time. The performance measure considered is the total flowtime. A new hybrid metaheuristic Genetic Algorithm-Cluster Search is proposed to solve the scheduling problem. The performance of the proposed method is evaluated and the results are compared with the best method reported in the literature. Experimental tests show superiority of the new method for the test problems set, regarding the solution quality. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Resumo:
Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.
Resumo:
The present work proposes a method based on CLV (Clustering around Latent Variables) for identifying groups of consumers in L-shape data. This kind of datastructure is very common in consumer studies where a panel of consumers is asked to assess the global liking of a certain number of products and then, preference scores are arranged in a two-way table Y. External information on both products (physicalchemical description or sensory attributes) and consumers (socio-demographic background, purchase behaviours or consumption habits) may be available in a row descriptor matrix X and in a column descriptor matrix Z respectively. The aim of this method is to automatically provide a consumer segmentation where all the three matrices play an active role in the classification, getting homogeneous groups from all points of view: preference, products and consumer characteristics. The proposed clustering method is illustrated on data from preference studies on food products: juices based on berry fruits and traditional cheeses from Trentino. The hedonic ratings given by the consumer panel on the products under study were explained with respect to the product chemical compounds, sensory evaluation and consumer socio-demographic information, purchase behaviour and consumption habits.
Resumo:
The intensity of regional specialization in specific activities, and conversely, the level of industrial concentration in specific locations, has been used as a complementary evidence for the existence and significance of externalities. Additionally, economists have mainly focused the debate on disentangling the sources of specialization and concentration processes according to three vectors: natural advantages, internal, and external scale economies. The arbitrariness of partitions plays a key role in capturing these effects, while the selection of the partition would have to reflect the actual characteristics of the economy. Thus, the identification of spatial boundaries to measure specialization becomes critical, since most likely the model will be adapted to different scales of distance, and be influenced by different types of externalities or economies of agglomeration, which are based on the mechanisms of interaction with particular requirements of spatial proximity. This work is based on the analysis of the spatial aspect of economic specialization supported by the manufacturing industry case. The main objective is to propose, for discrete and continuous space: i) a measure of global specialization; ii) a local disaggregation of the global measure; and iii) a spatial clustering method for the identification of specialized agglomerations.