18 resultados para Traditional clustering
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
In this study, a total of 172 samples of minimally processed vegetables (MPV) were collected from supermarkets in the city of Campinas, Brazil. The MPV were analyzed using traditional and/or alternative methods for total aerobic mesophilic bacteria, total coliforms, Escherichia coil, coagulase positive staphylococci, Salmonella and Listeria monocytogenes. All the MPV analyzed presented populations of aerobic mesophilic microorganisms and total coliforms were >4 log(10) CFU/g and 1.0-3.4 log(10) CFU/g, respectively. E. coil was enumerated in only 10 samples out of 172 collected, while none of the 172 samples of MPV presented contamination by coagulase positive Staphylococcus (<10(1) CFU/g). Among the four methods used for detection of Salmonella in MPV (Vidas, 1,2 Test, Reveal, and Traditional), when Reveal was used a total of 29 positive samples were reported. For L monocytogenes, the four methods tested (Vidas, Vip, Reveal, and traditional) performed similarly. The presence of Salmonella and L monocytogenes in MPV was confirmed in one (watercress) and two samples (watercress and escarole), respectively. In conclusion, it has been observed that the microbiological quality of MPV commercialized in Campinas is generally satisfactory. Besides, the choice of microbiological method should be based not only on resource and time issues, but also on parameters such as sensitivity and specificity for the specific foods under ahalysis. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
In [1], the authors proposed a framework for automated clustering and visualization of biological data sets named AUTO-HDS. This letter is intended to complement that framework by showing that it is possible to get rid of a user-defined parameter in a way that the clustering stage can be implemented more accurately while having reduced computational complexity
Resumo:
Strain ST211CH, identified as a strain of Enterococcus faecium, isolated from Lombo produced a bacteriocin that inhibited the growth of Enterococcus spp., Listeria spp., Klebsiella spp., Lactobacillus spp., Pseudomonas spp., Staphylococcus spp. and Streptococcus spp. The mode of action of the bacteriocin named as bacteriocin ST211Ch was bactericidal against Enterococcus faecalis ATCC19443. As determined by Tricine-SDS-PAGE, the approximate molecular mass of the bacteriocin was 8.0 kDa. Loss in antimicrobial activity was recorded after treatment with proteolytic enzymes. Maximum activity of bacteriocin ST211Ch was measured in broth cultures of E. faecium strain ST211Ch after 24 h; thereafter, the activity was reduced. Bacteriocin ST211Ch remained active after exposure to various temperatures and pHs, as well as to Triton X-100, Tween-80, Tween-20, sodium dodecyl sulfate, NaCl, urea and EDTA. Effect of media components on production of bacteriocin ST211Ch was also studied. On the basis of PCR reactions targeting different bacteriocin genes, i.e. enterocins, curvacins and sakacins, no evidences for the presence of these genes in the total DNA of E. faecium strain ST211Ch was obtained. The bacterium most probably produced a bacteriocin different from those mentioned above. Based on the antimicrobial spectrum, stability and mode of action of bacteriocin ST211CH, E. faecium strain ST211Ch might be considered as a potential candidate with beneficial properties for use in biopreservation to control food spoilage bacteria.
Resumo:
There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.
Resumo:
Background: The development of sugarcane as a sustainable crop has unlimited applications. The crop is one of the most economically viable for renewable energy production, and CO2 balance. Linkage maps are valuable tools for understanding genetic and genomic organization, particularly in sugarcane due to its complex polyploid genome of multispecific origins. The overall objective of our study was to construct a novel sugarcane linkage map, compiling AFLP and EST-SSR markers, and to generate data on the distribution of markers anchored to sequences of scIvana_1, a complete sugarcane transposable element, and member of the Copia superfamily. Results: The mapping population parents ('IAC66-6' and 'TUC71-7') contributed equally to polymorphisms, independent of marker type, and generated markers that were distributed into nearly the same number of co-segregation groups (or CGs). Bi-parentally inherited alleles provided the integration of 19 CGs. The marker number per CG ranged from two to 39. The total map length was 4,843.19 cM, with a marker density of 8.87 cM. Markers were assembled into 92 CGs that ranged in length from 1.14 to 404.72 cM, with an estimated average length of 52.64 cM. The greatest distance between two adjacent markers was 48.25 cM. The scIvana_1-based markers (56) were positioned on 21 CGs, but were not regularly distributed. Interestingly, the distance between adjacent scIvana_1-based markers was less than 5 cM, and was observed on five CGs, suggesting a clustered organization. Conclusions: Results indicated the use of a NBS-profiling technique was efficient to develop retrotransposon-based markers in sugarcane. The simultaneous maximum-likelihood estimates of linkage and linkage phase based strategies confirmed the suitability of its approach to estimate linkage, and construct the linkage map. Interestingly, using our genetic data it was possible to calculate the number of retrotransposonscIvana_1 (similar to 60) copies in the sugarcane genome, confirming previously reported molecular results. In addition, this research possibly will have indirect implications in crop economics e. g., productivity enhancement via QTL studies, as the mapping population parents differ in response to an important fungal disease.
Resumo:
Multicentric carpotarsal osteolysis (MCTO) is a rare skeletal dysplasia characterized by aggressive osteolysis, particularly affecting the carpal and tarsal bones, and is frequently associated with progressive renal failure. Using exome capture and next-generation sequencing in five unrelated simplex cases of MCTO, we identified previously unreported missense mutations clustering within a 51 base pair region of the single exon of MAFB, validated by Sanger sequencing. A further six unrelated simplex cases with MCTO were also heterozygous for previously unreported mutations within this same region, as were affected members of two families with autosomal-dominant MCTO. MAFB encodes a transcription factor that negatively regulates RANKL-induced osteoclastogenesis and is essential for normal renal development. Identification of this gene paves the way for development of novel therapeutic approaches for this crippling disease and provides insight into normal bone and kidney development.
Resumo:
The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.
Resumo:
Increased uric acid (UA) is strongly linked to cardiovascular disease. However, the independent role of UA is still debated because it is associated with several cardiovascular risk factors including obesity and metabolic syndrome. This study assessed the association of UA with increased high-sensitivity C-reactive protein (hs-CRP), increased ratio of triglyceride to high-density lipoprotein cholesterol (TG/HDL), sonographically detected hepatic steatosis, and their clustering in the presence and absence of obesity and metabolic syndrome. We evaluated 3,518 employed subjects without clinical cardiovascular disease from November 2008 through July 2010. Prevalence of tis-CRP >= 3 mg/L was 19%, that of TG/HDL >= 3 was 44%, and that of hepatic steatosis was 43%. In multivariable logistic regression after adjusting for traditional cardiovascular risk factors and confounders, highest versus lowest UA quartile was associated with hs-CRP >= 3 mg/L (odds ratio [OR] 1.52, 95% confidence interval [CI] 1.01 to 2.28, p = 0.04), TG/HDL >= 3 (OR 3.29, 95% CI 2.36 to 4.60, p <0.001), and hepatic steatosis (OR 3.10, 95% CI 2.22 to 4.32, p <0.001) independently of obesity and metabolic syndrome. Association of UA with hs-CRP >= 3 mg/L became nonsignificant in analyses stratified by obesity. Ascending UA quartiles compared to the lowest UA quartile demonstrated a graded increase in the odds of having 2 or 3 of these risk conditions and a successive decrease in the odds of having none. In conclusion, high UA levels were associated with increased TG/HDL and hepatic steatosis independently of metabolic syndrome and obesity and with increased hs-CRP independently of metabolic syndrome. (C) 2012 Elsevier Inc. All rights reserved. (Am J Cardiol 2012;110:1787-1792)
Resumo:
This paper presents an analysis of the capacity of design centric methodologies to prepare engineering students to succeed in the market. Gaps are brainstormed and analyzed with reference to their importance. Reasons that may lead the newly graduated engineers not to succeed right from the beginning of their professional lives have also been evaluated. A comparison among the two subjects above was prepared, reviewed and analyzed. The influence of multidisciplinary, multicultural and complex environmental influences created in the current global business era is taken into account. The industry requirements in terms of what they expect to 'receive' from their engineers are evaluated and compared to the remaining of the study above. An innovative approach to current engineering education that utilizes traditional design-centric methodologies is then proposed, aggregating new disciplines to supplement the traditional engineering education. The solution encompasses the inclusion of disciplines from Human Sciences and Emotional Intelligence fields willing to better prepare the engineer of tomorrow to work in a multidisciplinary, globalized, complex and team working environment. A pilot implementation of such an approach is reviewed and conclusions are drawn from this educational project.
Resumo:
This work aimed to study the characteristics of the fibres of the species Bactris setosa ('tucum') used by close-knit social groups, located in Sorocaba - Sao Paulo - Brazil, in basket-making techniques, for possible applications in textile activity. Optical microscopy (NBR 13 538:1995) and Tensile Properties (ASTM D 3 822-2001) were used to assess properties such as the fibre structre, linear density, breaking force, elongation at break and breaking tenacity of each species. Bactris setosa showed a longitudinal view similar to that of sisal; an average linear density of 41.2 tex, a tenacity average of 11.96 cN/tex, closer to fiberglass, and an elongation ranging between 1.35 and 3.87%. It is important to clarify the delicacy and detail of the tests, and from this we highlight the importance of carrying out these studies, based on which science and technology must be linked with socio-environmental aspects.
Resumo:
The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information retrieval, and represents a key step for developing the so-called Semantic Web. Humans disambiguate words in a straightforward fashion, but this does not apply to computers. In this paper we address the problem of Word Sense Disambiguation (WSD) by treating texts as complex networks, and show that word senses can be distinguished upon characterizing the local structure around ambiguous words. Our goal was not to obtain the best possible disambiguation system, but we nevertheless found that in half of the cases our approach outperforms traditional shallow methods. We show that the hierarchical connectivity and clustering of words are usually the most relevant features for WSD. The results reported here shed light on the relationship between semantic and structural parameters of complex networks. They also indicate that when combined with traditional techniques the complex network approach may be useful to enhance the discrimination of senses in large texts. Copyright (C) EPLA, 2012
Resumo:
Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.
Resumo:
This paper addresses the m-machine no-wait flow shop problem where the set-up time of a job is separated from its processing time. The performance measure considered is the total flowtime. A new hybrid metaheuristic Genetic Algorithm-Cluster Search is proposed to solve the scheduling problem. The performance of the proposed method is evaluated and the results are compared with the best method reported in the literature. Experimental tests show superiority of the new method for the test problems set, regarding the solution quality. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.