6 resultados para Clustering methods
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Risers are flexible multilayered pipes formed by an inner flexible metal structure surrounded by polymer layers and spiral wound steel ligaments, also known as armor wires. Since these risers are used to link subsea pipelines to floating oil and gas production installations, and their failure could produce catastrophic consequences, some methods have been proposed to monitor the armor integrity. However, until now there is no practical method that allows the automatic non-destructive detection of individual armor wire rupture. In this work we show a method using magnetic Barkhausen noise that has shown high efficiency in the detection of armor wire rupture. The results are examined under the cyclic and static load conditions of the riser. This work also analyzes the theory behind the singular dependence of the magnetic Barkhausen noise on the applied tension in riser armor wires.
Resumo:
O objetivo deste trabalho foi comparar três métodos para determinação do número de grupos em estudos com aplicação de métodos hierárquicos de agrupamentos, baseando-se em dados obtidos a partir da caracterização de acessos de Capsicum, de modo a identificar aquele com maior poder de discriminação. Os métodos de Mojena, de Tocher e o método RMSSTD foram aplicados com a finalidade de determinar o número ideal de grupos formados na fase final do procedimento de agrupamento com o método UPGMA. Foram analisados 49 acessos da espécie Capsicum chinense do Banco de Germoplasma de Hortaliças da Universidade Federal de Viçosa, em relação a dez características morfológicas com o intuito de identificar e agrupar os acessos mais similares, tornando possível a seleção de genótipos superiores, ou seja, com as características comerciais de interesse. Os resultados mostraram que o método RMSSTD permitiu concluir sobre a existência de sete grupos, evidenciando um maior poder de discriminação para este método, em relação ao método de otimização de Tocher e ao método de Mojena, que formaram respectivamente, quatro e três grupos.
Resumo:
In [1], the authors proposed a framework for automated clustering and visualization of biological data sets named AUTO-HDS. This letter is intended to complement that framework by showing that it is possible to get rid of a user-defined parameter in a way that the clustering stage can be implemented more accurately while having reduced computational complexity
Resumo:
There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.
Resumo:
Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.