20 resultados para Fuzzy C-Means clustering

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: To characterize the PI component of long latency auditory evoked potentials (LLAEPs) in cochlear implant users with auditory neuropathy spectrum disorder (ANSD) and determine firstly whether they correlate with speech perception performance and secondly whether they correlate with other variables related to cochlear implant use. Methods: This study was conducted at the Center for Audiological Research at the University of Sao Paulo. The sample included 14 pediatric (4-11 years of age) cochlear implant users with ANSD, of both sexes, with profound prelingual hearing loss. Patients with hypoplasia or agenesis of the auditory nerve were excluded from the study. LLAEPs produced in response to speech stimuli were recorded using a Smart EP USB Jr. system. The subjects' speech perception was evaluated using tests 5 and 6 of the Glendonald Auditory Screening Procedure (GASP). Results: The P-1 component was detected in 12/14 (85.7%) children with ANSD. Latency of the P-1 component correlated with duration of sensorial hearing deprivation (*p = 0.007, r = 0.7278), but not with duration of cochlear implant use. An analysis of groups assigned according to GASP performance (k-means clustering) revealed that aspects of prior central auditory system development reflected in the P-1 component are related to behavioral auditory skills. Conclusions: In children with ANSD using cochlear implants, the P-1 component can serve as a marker of central auditory cortical development and a predictor of the implanted child's speech perception performance. (c) 2012 Elsevier Ireland Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The clustering problem consists in finding patterns in a data set in order to divide it into clusters with high within-cluster similarity. This paper presents the study of a problem, here called MMD problem, which aims at finding a clustering with a predefined number of clusters that minimizes the largest within-cluster distance (diameter) among all clusters. There are two main objectives in this paper: to propose heuristics for the MMD and to evaluate the suitability of the best proposed heuristic results according to the real classification of some data sets. Regarding the first objective, the results obtained in the experiments indicate a good performance of the best proposed heuristic that outperformed the Complete Linkage algorithm (the most used method from the literature for this problem). Nevertheless, regarding the suitability of the results according to the real classification of the data sets, the proposed heuristic achieved better quality results than C-Means algorithm, but worse than Complete Linkage.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Backgrounds Ea aims: The boundaries between the categories of body composition provided by vectorial analysis of bioimpedance are not well defined. In this paper, fuzzy sets theory was used for modeling such uncertainty. Methods: An Italian database with 179 cases 18-70 years was divided randomly into developing (n = 20) and testing samples (n = 159). From the 159 registries of the testing sample, 99 contributed with unequivocal diagnosis. Resistance/height and reactance/height were the input variables in the model. Output variables were the seven categories of body composition of vectorial analysis. For each case the linguistic model estimated the membership degree of each impedance category. To compare such results to the previously established diagnoses Kappa statistics was used. This demanded singling out one among the output set of seven categories of membership degrees. This procedure (defuzzification rule) established that the category with the highest membership degree should be the most likely category for the case. Results: The fuzzy model showed a good fit to the development sample. Excellent agreement was achieved between the defuzzified impedance diagnoses and the clinical diagnoses in the testing sample (Kappa = 0.85, p < 0.001). Conclusions: fuzzy linguistic model was found in good agreement with clinical diagnoses. If the whole model output is considered, information on to which extent each BIVA category is present does better advise clinical practice with an enlarged nosological framework and diverse therapeutic strategies. (C) 2012 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Microplastics are omnipresent in the oceans and generally have negative impacts on the biota. However, flotsam may increase the availability of hard substrates, which are considered a limiting resource for some oceanic species, e.g. as oviposition sites for the ocean insect Halobates. This study describes the use of plastic pellets as an oviposition site for Halobates micans and discusses possible effects on its abundance and dispersion. Inspection of egg masses on stranded particles on beaches revealed that a mean of 24% (from 0% to 62%) of the pellets bore eggs (mean of 5 and max. of 48 eggs per pellet). Most eggs (63%) contained embryos, while 37% were empty egg shells. This shows that even small plastic particles are used as oviposition site by H. micans, and that marine litter may have a positive effect over the abundance and dispersion of this species. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The development of sugarcane as a sustainable crop has unlimited applications. The crop is one of the most economically viable for renewable energy production, and CO2 balance. Linkage maps are valuable tools for understanding genetic and genomic organization, particularly in sugarcane due to its complex polyploid genome of multispecific origins. The overall objective of our study was to construct a novel sugarcane linkage map, compiling AFLP and EST-SSR markers, and to generate data on the distribution of markers anchored to sequences of scIvana_1, a complete sugarcane transposable element, and member of the Copia superfamily. Results: The mapping population parents ('IAC66-6' and 'TUC71-7') contributed equally to polymorphisms, independent of marker type, and generated markers that were distributed into nearly the same number of co-segregation groups (or CGs). Bi-parentally inherited alleles provided the integration of 19 CGs. The marker number per CG ranged from two to 39. The total map length was 4,843.19 cM, with a marker density of 8.87 cM. Markers were assembled into 92 CGs that ranged in length from 1.14 to 404.72 cM, with an estimated average length of 52.64 cM. The greatest distance between two adjacent markers was 48.25 cM. The scIvana_1-based markers (56) were positioned on 21 CGs, but were not regularly distributed. Interestingly, the distance between adjacent scIvana_1-based markers was less than 5 cM, and was observed on five CGs, suggesting a clustered organization. Conclusions: Results indicated the use of a NBS-profiling technique was efficient to develop retrotransposon-based markers in sugarcane. The simultaneous maximum-likelihood estimates of linkage and linkage phase based strategies confirmed the suitability of its approach to estimate linkage, and construct the linkage map. Interestingly, using our genetic data it was possible to calculate the number of retrotransposonscIvana_1 (similar to 60) copies in the sugarcane genome, confirming previously reported molecular results. In addition, this research possibly will have indirect implications in crop economics e. g., productivity enhancement via QTL studies, as the mapping population parents differ in response to an important fungal disease.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Increased uric acid (UA) is strongly linked to cardiovascular disease. However, the independent role of UA is still debated because it is associated with several cardiovascular risk factors including obesity and metabolic syndrome. This study assessed the association of UA with increased high-sensitivity C-reactive protein (hs-CRP), increased ratio of triglyceride to high-density lipoprotein cholesterol (TG/HDL), sonographically detected hepatic steatosis, and their clustering in the presence and absence of obesity and metabolic syndrome. We evaluated 3,518 employed subjects without clinical cardiovascular disease from November 2008 through July 2010. Prevalence of tis-CRP >= 3 mg/L was 19%, that of TG/HDL >= 3 was 44%, and that of hepatic steatosis was 43%. In multivariable logistic regression after adjusting for traditional cardiovascular risk factors and confounders, highest versus lowest UA quartile was associated with hs-CRP >= 3 mg/L (odds ratio [OR] 1.52, 95% confidence interval [CI] 1.01 to 2.28, p = 0.04), TG/HDL >= 3 (OR 3.29, 95% CI 2.36 to 4.60, p <0.001), and hepatic steatosis (OR 3.10, 95% CI 2.22 to 4.32, p <0.001) independently of obesity and metabolic syndrome. Association of UA with hs-CRP >= 3 mg/L became nonsignificant in analyses stratified by obesity. Ascending UA quartiles compared to the lowest UA quartile demonstrated a graded increase in the odds of having 2 or 3 of these risk conditions and a successive decrease in the odds of having none. In conclusion, high UA levels were associated with increased TG/HDL and hepatic steatosis independently of metabolic syndrome and obesity and with increased hs-CRP independently of metabolic syndrome. (C) 2012 Elsevier Inc. All rights reserved. (Am J Cardiol 2012;110:1787-1792)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Some atomic multipoles (charges, dipoles and quadrupoles) from the Quantum Theory of Atoms in Molecules (QTAIM) and CHELPG charges are used to investigate interactions between a proton and a molecule (F2, Cl2, BF, AlF, BeO, MgO, LiH, H2CO, NH3, PH3, BF3, and CO2). Calculations were done at the B3LYP/6-311G(3d,3p) level. The main aspect of this work is the investigation of polarization effects over electrostatic potentials and atomic multipoles along a medium to long range of interaction distances. Large electronic charge fluxes and polarization changes are induced by a proton mainly when this positive particle approaches the least electronegative atom of diatomic heteronuclear molecules. The search for simple equations to describe polarization on electrostatic potentials from QTAIM quantities resulted in linear relations with r-4 (r is the interaction distance) for many cases. Moreover, the contribution from atomic dipoles to these potentials is usually the most affected contribution by polarization what reinforces the need for these dipoles to a minimal description of purely electrostatic interactions. Finally, CHELPG charges provide a description of polarization effects on electrostatic potentials that is in disagreement with physical arguments for certain of these molecules. (c) 2012 Wiley Periodicals, Inc.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A deep theoretical analysis of the graph cut image segmentation framework presented in this paper simultaneously translates into important contributions in several directions. The most important practical contribution of this work is a full theoretical description, and implementation, of a novel powerful segmentation algorithm, GC(max). The output of GC(max) coincides with a version of a segmentation algorithm known as Iterative Relative Fuzzy Connectedness, IRFC. However, GC(max) is considerably faster than the classic IRFC algorithm, which we prove theoretically and show experimentally. Specifically, we prove that, in the worst case scenario, the GC(max) algorithm runs in linear time with respect to the variable M=|C|+|Z|, where |C| is the image scene size and |Z| is the size of the allowable range, Z, of the associated weight/affinity function. For most implementations, Z is identical to the set of allowable image intensity values, and its size can be treated as small with respect to |C|, meaning that O(M)=O(|C|). In such a situation, GC(max) runs in linear time with respect to the image size |C|. We show that the output of GC(max) constitutes a solution of a graph cut energy minimization problem, in which the energy is defined as the a"" (a) norm ayenF (P) ayen(a) of the map F (P) that associates, with every element e from the boundary of an object P, its weight w(e). This formulation brings IRFC algorithms to the realm of the graph cut energy minimizers, with energy functions ayenF (P) ayen (q) for qa[1,a]. Of these, the best known minimization problem is for the energy ayenF (P) ayen(1), which is solved by the classic min-cut/max-flow algorithm, referred to often as the Graph Cut algorithm. We notice that a minimization problem for ayenF (P) ayen (q) , qa[1,a), is identical to that for ayenF (P) ayen(1), when the original weight function w is replaced by w (q) . Thus, any algorithm GC(sum) solving the ayenF (P) ayen(1) minimization problem, solves also one for ayenF (P) ayen (q) with qa[1,a), so just two algorithms, GC(sum) and GC(max), are enough to solve all ayenF (P) ayen (q) -minimization problems. We also show that, for any fixed weight assignment, the solutions of the ayenF (P) ayen (q) -minimization problems converge to a solution of the ayenF (P) ayen(a)-minimization problem (ayenF (P) ayen(a)=lim (q -> a)ayenF (P) ayen (q) is not enough to deduce that). An experimental comparison of the performance of GC(max) and GC(sum) algorithms is included. This concentrates on comparing the actual (as opposed to provable worst scenario) algorithms' running time, as well as the influence of the choice of the seeds on the output.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a structural damage detection methodology based on genetic algorithms and dynamic parameters. Three chromosomes are used to codify an individual in the population. The first and second chromosomes locate and quantify damage, respectively. The third permits the self-adaptation of the genetic parameters. The natural frequencies and mode shapes are used to formulate the objective function. A numerical analysis was performed for several truss structures under different damage scenarios. The results have shown that the methodology can reliably identify damage scenarios using noisy measurements and that it results in only a few misidentified elements. (C) 2012 Civil-Comp Ltd and Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses the m-machine no-wait flow shop problem where the set-up time of a job is separated from its processing time. The performance measure considered is the total flowtime. A new hybrid metaheuristic Genetic Algorithm-Cluster Search is proposed to solve the scheduling problem. The performance of the proposed method is evaluated and the results are compared with the best method reported in the literature. Experimental tests show superiority of the new method for the test problems set, regarding the solution quality. (c) 2012 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The analysis of spatial relations among objects in an image is an important vision problem that involves both shape analysis and structural pattern recognition. In this paper, we propose a new approach to characterize the spatial relation along, an important feature of spatial configurations in space that has been overlooked in the literature up to now. We propose a mathematical definition of the degree to which an object A is along an object B, based on the region between A and B and a degree of elongatedness of this region. In order to better fit the perceptual meaning of the relation, distance information is included as well. In order to cover a more wide range of potential applications, both the crisp and fuzzy cases are considered. In the crisp case, the objects are represented in terms of 2D regions or ID contours, and the definition of the alongness between them is derived from a visibility notion and from the region between the objects. However, the computational complexity of this approach leads us to the proposition of a new model to calculate the between region using the convex hull of the contours. On the fuzzy side, the region-based approach is extended. Experimental results obtained using synthetic shapes and brain structures in medical imaging corroborate the proposed model and the derived measures of alongness, thus showing that they agree with the common sense. (C) 2011 Elsevier Ltd. All rights reserved.