853 resultados para height partition clustering
Resumo:
In [1], the authors proposed a framework for automated clustering and visualization of biological data sets named AUTO-HDS. This letter is intended to complement that framework by showing that it is possible to get rid of a user-defined parameter in a way that the clustering stage can be implemented more accurately while having reduced computational complexity
Resumo:
Polymorphisms in the VDR gene were reported to be associated with variations in intrauterine and postnatal growth and with adult height, but also with other traits that are strongly correlated such as the BMI, insulin sensitivity, insulin secretion and hyperglycemia. Here, we assessed the impact of VDR polymorphisms on body height and its interactions with obesity- and glucose tolerance-related traits in obese children and adolescents. We studied 173 prepubertal (Tanner's stage 1) and 146 pubertal (Tanner's stages 2-5) obese children who were referred for a weight-loss program. Three single nucleotide polymorphisms were genotyped: rs1544410 (BsmI), rs7975232 (ApaI) and rs731236 (TaqI). BsmI and TaqI genotypes were significantly associated with height in pubertal children, but the associations did not reach statistical significance in prepubertal children. In stepwise regression analyses, the lean body mass, insulin secretion, BsmI or TaqI genotypes and the father's and the mother's height were independently and positively associated with height in pubertal children. These covariables accounted for 46% of the trait variance. The height of homozygous carriers of the minor allele of BsmI was 0.65 z-scores (4 cm) higher than the height of homozygous carriers of the major allele (P=.0006). Haplotype analyses confirmed the associations of the minor alleles of BsmI and TaqI with increased height. In conclusion, VDR genotypes were significantly associated with height in pubertal obese children. The associations were independent from the effects of confounding traits, such as the body fat mass, insulin secretion, insulin sensitivity and glucose tolerance. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.
Resumo:
This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Background: The development of sugarcane as a sustainable crop has unlimited applications. The crop is one of the most economically viable for renewable energy production, and CO2 balance. Linkage maps are valuable tools for understanding genetic and genomic organization, particularly in sugarcane due to its complex polyploid genome of multispecific origins. The overall objective of our study was to construct a novel sugarcane linkage map, compiling AFLP and EST-SSR markers, and to generate data on the distribution of markers anchored to sequences of scIvana_1, a complete sugarcane transposable element, and member of the Copia superfamily. Results: The mapping population parents ('IAC66-6' and 'TUC71-7') contributed equally to polymorphisms, independent of marker type, and generated markers that were distributed into nearly the same number of co-segregation groups (or CGs). Bi-parentally inherited alleles provided the integration of 19 CGs. The marker number per CG ranged from two to 39. The total map length was 4,843.19 cM, with a marker density of 8.87 cM. Markers were assembled into 92 CGs that ranged in length from 1.14 to 404.72 cM, with an estimated average length of 52.64 cM. The greatest distance between two adjacent markers was 48.25 cM. The scIvana_1-based markers (56) were positioned on 21 CGs, but were not regularly distributed. Interestingly, the distance between adjacent scIvana_1-based markers was less than 5 cM, and was observed on five CGs, suggesting a clustered organization. Conclusions: Results indicated the use of a NBS-profiling technique was efficient to develop retrotransposon-based markers in sugarcane. The simultaneous maximum-likelihood estimates of linkage and linkage phase based strategies confirmed the suitability of its approach to estimate linkage, and construct the linkage map. Interestingly, using our genetic data it was possible to calculate the number of retrotransposonscIvana_1 (similar to 60) copies in the sugarcane genome, confirming previously reported molecular results. In addition, this research possibly will have indirect implications in crop economics e. g., productivity enhancement via QTL studies, as the mapping population parents differ in response to an important fungal disease.
Resumo:
Multicentric carpotarsal osteolysis (MCTO) is a rare skeletal dysplasia characterized by aggressive osteolysis, particularly affecting the carpal and tarsal bones, and is frequently associated with progressive renal failure. Using exome capture and next-generation sequencing in five unrelated simplex cases of MCTO, we identified previously unreported missense mutations clustering within a 51 base pair region of the single exon of MAFB, validated by Sanger sequencing. A further six unrelated simplex cases with MCTO were also heterozygous for previously unreported mutations within this same region, as were affected members of two families with autosomal-dominant MCTO. MAFB encodes a transcription factor that negatively regulates RANKL-induced osteoclastogenesis and is essential for normal renal development. Identification of this gene paves the way for development of novel therapeutic approaches for this crippling disease and provides insight into normal bone and kidney development.
Resumo:
The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.
Resumo:
Managed environments in the form of well watered and water stressed trials were performed to study the genetic basis of grain yield and stay green in sorghum with the objective of validating previously detected QTL. As variations in phenology and plant height may influence QTL detection for the target traits, QTL for flowering time and plant height were introduced as cofactors in QTL analyses for yield and stay green. All but one of the flowering time QTL were detected near yield and stay green QTL. Similar co-localization was observed for two plant height QTL. QTL analysis for yield, using flowering time/plant height cofactors, led to yield QTL on chromosomes 2, 3, 6, 8 and 10. For stay green, QTL on chromosomes 3, 4, 8 and 10 were not related to differences in flowering time/plant height. The physical positions for markers in QTL regions projected on the sorghum genome suggest that the previously detected plant height QTL, Sb-HT9-1, and Dw2, in addition to the maturity gene, Ma5, had a major confounding impact on the expression of yield and stay green QTL. Co-localization between an apparently novel stay green QTL and a yield QTL on chromosome 3 suggests there is potential for indirect selection based on stay green to improve drought tolerance in sorghum. Our QTL study was carried out with a moderately sized population and spanned a limited geographic range, but still the results strongly emphasize the necessity of corrections for phenology in QTL mapping for drought tolerance traits in sorghum.
Resumo:
Centrifugal countercurrent distribution (CCCD) in an aqueous two-phase system (TPS) is a resolute technique revealing sperm heterogeneity and for the estimation of the fertilizing potential of a given semen sample. However, separated sperm subpopulations have never been tested for their fertilizing ability yet. Here, we have compared sperm quality parameters and the fertilizing ability of sperm subpopulations separated by the CCCD process from ram semen samples maintained at 20 degrees C or cooled down to 5 degrees C. Total and progressive sperm motility was evaluated by computer-assisted analysis using a CASA system and membrane integrity was evaluated by flow cytometry by staining with CFDA/Pl. The capacitation state, staining with chlortetracycline, and apoptosis-related markers, such as phosphatidylserine (PS) translocation detected with Annexin V. and DNA damage detected by the TUNEL assay, were determined by fluorescence microscopy. Additionally, the fertilizing ability of the fractionated subpopulations was comparative assessed by zona binding assay (ZBA). CCCD analysis revealed that the number of spermatozoa displaying membrane and DNA alterations was higher in samples chilled at 5 degrees C than at 20 degrees C. which can be reflected in the displacement to the left of the CCCD profiles. The spermatozoa located in the central and right chambers (more hydrophobic) presented higher values (P<0.01) of membrane integrity, lower PS translocation (P<0.05) and DNA damage (P<0.001) than those in the left part of the profile, where apoptotic markers were significantly increased and the proportion of viable non-capacitated sperm was reduced. We have developed a new protocol to recover spermatozoa from the CCCD fractions and we proved that these differences were related with the fertilizing ability determined by ZBA, because we found that the number of spermatozoa attached per oocyte was significantly higher for spermatozoa recovered from the central and right chambers, in both types of samples. This is the first time, to our knowledge that sperm recovered from a two-phase partition procedure are used for fertilization assays. These results open up new possibilities for using specific subpopulations of sperm for artificial insemination or in vitro fertilization, not only regarding better sperm quality but also certain characteristics such as subpopulations enriched in spermatozoa bearing X or Y chromosome that we have already isolated or any other feature. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Site-specific height-diameter models may be used to improve biomass estimates for forest inventories where only diameter at breast height (DBH) measurements are available. In this study, we fit height-diameter models for vegetation types of a tropical Atlantic forest using field measurements of height across plots along an altitudinal gradient. To fit height-diameter models, we sampled trees by DBH class and measured tree height within 13 one-hectare permanent plots established at four altitude classes. To select the best model we tested the performance of 11 height-diameter models using the Akaike Information Criterion (AIC). The Weibull and Chapman-Richards height-diameter models performed better than other models, and regional site-specific models performed better than the general model. In addition, there is a slight variation of height-diameter relationships across the altitudinal gradient and an extensive difference in the stature between the Atlantic and Amazon forests. The results showed the effect of altitude on tree height estimates and emphasize the need for altitude-specific models that produce more accurate results than a general model that encompasses all altitudes. To improve biomass estimation, the development of regional height-diameter models that estimate tree height using a subset of randomly sampled trees presents an approach to supplement surveys where only diameter has been measured.
Resumo:
We have performed multicanonical simulations to study the critical behavior of the two-dimensional Ising model with dipole interactions. This study concerns the thermodynamic phase transitions in the range of the interaction delta where the phase characterized by striped configurations of width h = 1 is observed. Controversial results obtained from local update algorithms have been reported for this region, including the claimed existence of a second-order phase transition line that becomes first order above a tricritical point located somewhere between delta = 0.85 and 1. Our analysis relies on the complex partition function zeros obtained with high statistics from multicanonical simulations. Finite size scaling relations for the leading partition function zeros yield critical exponents. that are clearly consistent with a single second-order phase transition line, thus excluding such a tricritical point in that region of the phase diagram. This conclusion is further supported by analysis of the specific heat and susceptibility of the orientational order parameter.
Resumo:
The aim of this study was to describe the distribution of waist circumference (WC) and WC to height (WCTH) values among Kaingang indigenous adolescents in order to estimate the prevalence of high WCTH values and evaluate the correlation between WC and WCTH and body mass index (BMI)-for-age. A total of 1,803 indigenous adolescents were evaluated using a school-based cross-sectional study. WCTH values > 0.5 were considered high. Higher mean WC and WCTH values were observed for girls in all age categories. WCTH values > 0.5 were observed in 25.68% of the overall sample of adolescents. Mean WC and WCTH values were significantly higher for adolescents with BMI/age z-scores > 2 than for those with normal z-scores. The correlation coefficients of WC and WCTH for BMI/age were r = 0.68 and 0.76, respectively, for boys, and r = 0.79 and 0.80, respectively, for girls. This study highlights elevated mean WC and WCTH values and high prevalence of abdominal obesity among Kaingang indigenous adolescents.
Resumo:
In developed countries, children with intrauterine growth restriction (IUGR) or born preterm (PT) tend to achieve catch-up growth. There is little information about height catch-up in developing countries and about height catch-down in both developed and developing countries. We studied the effect of IUGR and PT birth on height catch-up and catch-down growth of children from two cohorts of liveborn singletons. Data from 1,463 children was collected at birth and at school age in Ribeirao Preto (RP), a more developed city, and in Sao Luis (SL), a less developed city. A change in z-score between schoolchild height z-score and birth length z-score >= 0.67 was considered catch-up; a change in z-score <=-0.67 indicated catch-down growth. The explanatory variables were: appropriate weight for gestational age/PT birth in four categories: term children without IUGR (normal), IUGR only (term with IUGR), PT only ( preterm without IUGR) and preterm with IUGR; infant's sex; maternal parity, age, schooling and marital status; occupation of family head; family income and neonatal ponderal index (PI). The risk ratio for catch-up and catch-down was estimated by multinomial logistic regression for each city. In RP, preterms without IUGR (RR = 4.13) and thin children (PI<10th percentile, RR = 14.39) had a higher risk of catch-down; catch-up was higher among terms with IUGR (RR = 5.53), preterms with IUGR (RR = 5.36) and children born to primiparous mothers (RR = 1.83). In SL, catch-down was higher among preterms without IUGR (RR = 5.19), girls (RR = 1.52) and children from low-income families ( RR = 2.74); the lowest risk of catch-down (RR = 0.27) and the highest risk of catch-up (RR = 3.77) were observed among terms with IUGR. In both cities, terms with IUGR presented height catch-up growth whereas preterms with IUGR only had height catch-up growth in the more affluent setting. Preterms without IUGR presented height catch-down growth, suggesting that a better socioeconomic situation facilitates height catch-up and prevents height catch-down growth.
Resumo:
Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.
Resumo:
This paper addresses the m-machine no-wait flow shop problem where the set-up time of a job is separated from its processing time. The performance measure considered is the total flowtime. A new hybrid metaheuristic Genetic Algorithm-Cluster Search is proposed to solve the scheduling problem. The performance of the proposed method is evaluated and the results are compared with the best method reported in the literature. Experimental tests show superiority of the new method for the test problems set, regarding the solution quality. (c) 2012 Elsevier Ltd. All rights reserved.