22 resultados para Clustering Analysis
Resumo:
Clustering analysis of data from DNA microarray hybridization studies is an essential task for identifying biologically relevant groups of genes. Attribute cluster algorithm (ACA) has provided an attractive way to group and select meaningful genes. However, ACA needs much prior knowledge about the genes to set the number of clusters. In practical applications, if the number of clusters is misspecified, the performance of the ACA will deteriorate rapidly. In fact, it is a very demanding to do that because of our little knowledge. We propose the Cooperative Competition Cluster Algorithm (CCCA) in this paper. In the algorithm, we assume that both cooperation and competition exist simultaneously between clusters in the process of clustering. By using this principle of Cooperative Competition, the number of clusters can be found in the process of clustering. Experimental results on a synthetic and gene expression data are demonstrated. The results show that CCCA can choose the number of clusters automatically and get excellent performance with respect to other competing methods.
Resumo:
BACKGROUND: We used four years of paediatric severe acute respiratory illness (SARI) sentinel surveillance in Blantyre, Malawi to identify factors associated with clinical severity and co-viral clustering.
METHODS: From January 2011 to December 2014, 2363 children aged 3 months to 14 years presenting to hospital with SARI were enrolled. Nasopharyngeal aspirates were tested for influenza and other respiratory viruses. We assessed risk factors for clinical severity and conducted clustering analysis to identify viral clusters in children with co-viral detection.
RESULTS: Hospital-attended influenza-positive SARI incidence was 2.0 cases per 10,000 children annually; it was highest children aged under 1 year (6.3 cases per 10,000), and HIV-infected children aged 5 to 9 years (6.0 cases per 10,000). 605 (26.8%) SARI cases had warning signs, which were positively associated with HIV infection (adjusted risk ratio [aRR]: 2.4, 95% CI: 1.4, 3.9), RSV infection (aRR: 1.9, 95% CI: 1.3, 3.0) and rainy season (aRR: 2.4, 95% CI: 1.6, 3.8). We identified six co-viral clusters; one cluster was associated with SARI with warning signs.
CONCLUSIONS: Influenza vaccination may benefit young children and HIV infected children in this setting. Viral clustering may be associated with SARI severity; its assessment should be included in routine SARI surveillance.
Resumo:
Single nucleotide polymorphisms (SNPs) are predicted to supersede microsatellites as the marker of choice for population genetic studies in the near future. To date, however, very few studies have directly compared both marker systems in natural populations, particularly in non-model organisms. In the present study, we compared the utility of SNPs and microsatellites for population genetic analysis of the red seaweed Chondrus crispus (Florideophyceae). Six SNP loci yielded very different patterns of intrapopulation genetic diversity compared to those obtained using seven moderately (mean 5.2 alleles) polymorphic microsatellite loci, although Bayesian clustering analysis gave largely congruent results between the two marker classes. A weak but significant pattern of isolation-by-distance was observed across scales from a few hundred metres to approximately 200?km using the combined SNP and microsatellite data set of 13 loci. Over larger scales, however, there was little correlation between genetic divergence and geographical distance. Our findings suggest that even a moderate number of SNPs is sufficient to determine patterns of genetic diversity across natural populations, and also highlight the fact that patterns of genetic variation in seaweeds arise through a complex interplay of short- and long-term natural processes, as well as anthropogenic influence.
Resumo:
Aim: Retrospective genetic monitoring, comparing genetic diversity of extant populations with historical samples, can provide valuable and often unique insights into evolutionary processes informing conservation strategies. The Yellow marsh saxifrage (Saxifraga hirculus) is listed as ‘critically endangered’ in Ireland with only two extant populations. We quantified genetic changes over time and identified genotypes in extant populations that could be used as founders for reintroductions to sites where the species is extinct.
Location: Ireland.
Methods: Samples were obtained from both locations where the species is currently found, including the most threatened site at the Garron Plateau, Co. Antrim, which held only 13 individuals during 2011. Herbarium samples covering the period from 1886 to 1957 were obtained including plants from the same area as the most threatened population, as well as three extinct populations. In total, 422 individuals (319 present-day and 103 historical) were genotyped at six microsatellite loci. Species distribution modelling was used to identify areas of potentially suitable habitat for reintroductions.
Results: Level of phenotypic diversity within the most threatened population was significantly lower in the present-day compared with historical samples but levels of observed heterozygosity and number of alleles, whilst reduced, did not differ significantly. However, Bayesian clustering analysis suggested gradual lineage replacement over time. All three measures of genetic diversity were generally lower at the most threatened population compared with the more substantial extant populations in Co. Mayo. Species distribution modelling suggested that habitat at one site where the species is extinct may be suitable for reintroduction.
Main conclusions: The dominant genetic lineage in the most threatened population is rare elsewhere; thus, care needs to be taken when formulating any potential reintroduction programme. Our findings highlight both the need for genetic monitoring of threatened populations, but also for its swift implementation before levels of diversity become critically low.
Resumo:
Traditional Chinese Medicines (TCMs) derived from animal horns are one of the most important types of Chinese medicine. In the present study, a fast and sensitive analytical method was established for qualitative and quantitative determination of 14 nucleosides and nucleobases in animal horns using hydrophilic interaction ultra-high performance liquid chromatography coupled with triple-quadruple tandem mass spectrometry (HILIC-UPLC-QQQ-MS/MS) in selective reaction monitoring (SRM) mode. The method was optimized and validated, and showed good linearity, precision, repeatability, and accuracy. The method was successfully used to determine contents of the 14 nucleosides and nucleobases in 25 animal horn samples. Hierarchical clustering analysis (HCA) and principal component analysis (PCA) were performed and the 25 samples were thereby divided into two groups, which agreed with taxonomy. The method may enable quick and effective search of substitutes for precious horns.
Resumo:
In this paper we study the classification of spatiotemporal pattern of one-dimensional cellular automata (CA) whereas the classification comprises CA rules including their initial conditions. We propose an exploratory analysis method based on the normalized compression distance (NCD) of spatiotemporal patterns which is used as dissimilarity measure for a hierarchical clustering. Our approach is different with respect to the following points. First, the classification of spatiotemporal pattern is comparative because the NCD evaluates explicitly the difference of compressibility among two objects, e.g., strings corresponding to spatiotemporal patterns. This is in contrast to all other measures applied so far in a similar context because they are essentially univariate. Second, Kolmogorov complexity, which underlies the NCD, was used in the classification of CA with respect to their spatiotemporal pattern. Third, our method is semiautomatic allowing us to investigate hundreds or thousands of CA rules or initial conditions simultaneously to gain insights into their organizational structure. Our numerical results are not only plausible confirming previous classification attempts but also shed light on the intricate influence of random initial conditions on the classification results.
Resumo:
In studies of radiation-induced DNA fragmentation and repair, analytical models may provide rapid and easy-to-use methods to test simple hypotheses regarding the breakage and rejoining mechanisms involved. The random breakage model, according to which lesions are distributed uniformly and independently of each other along the DNA, has been the model most used to describe spatial distribution of radiation-induced DNA damage. Recently several mechanistic approaches have been proposed that model clustered damage to DNA. In general, such approaches focus on the study of initial radiation-induced DNA damage and repair, without considering the effects of additional (unwanted and unavoidable) fragmentation that may take place during the experimental procedures. While most approaches, including measurement of total DNA mass below a specified value, allow for the occurrence of background experimental damage by means of simple subtractive procedures, a more detailed analysis of DNA fragmentation necessitates a more accurate treatment. We have developed a new, relatively simple model of DNA breakage and the resulting rejoining kinetics of broken fragments. Initial radiation-induced DNA damage is simulated using a clustered breakage approach, with three free parameters: the number of independently located clusters, each containing several DNA double-strand breaks (DSBs), the average number of DSBs within a cluster (multiplicity of the cluster), and the maximum allowed radius within which DSBs belonging to the same cluster are distributed. Random breakage is simulated as a special case of the DSB clustering procedure. When the model is applied to the analysis of DNA fragmentation as measured with pulsed-field gel electrophoresis (PFGE), the hypothesis that DSBs in proximity rejoin at a different rate from that of sparse isolated breaks can be tested, since the kinetics of rejoining of fragments of varying size may be followed by means of computer simulations. The problem of how to account for background damage from experimental handling is also carefully considered. We have shown that the conventional procedure of subtracting the background damage from the experimental data may lead to erroneous conclusions during the analysis of both initial fragmentation and DSB rejoining. Despite its relative simplicity, the method presented allows both the quantitative and qualitative description of radiation-induced DNA fragmentation and subsequent rejoining of double-stranded DNA fragments. (C) 2004 by Radiation Research Society.
Resumo:
Introduction
Mild cognitive impairment (MCI) has clinical value in its ability to predict later dementia. A better understanding of cognitive profiles can further help delineate who is most at risk of conversion to dementia. We aimed to (1) examine to what extent the usual MCI subtyping using core criteria corresponds to empirically defined clusters of patients (latent profile analysis [LPA] of continuous neuropsychological data) and (2) compare the two methods of subtyping memory clinic participants in their prediction of conversion to dementia.
Methods
Memory clinic participants (MCI, n = 139) and age-matched controls (n = 98) were recruited. Participants had a full cognitive assessment, and results were grouped (1) according to traditional MCI subtypes and (2) using LPA. MCI participants were followed over approximately 2 years after their initial assessment to monitor for conversion to dementia.
Results
Groups were well matched for age and education. Controls performed significantly better than MCI participants on all cognitive measures. With the traditional analysis, most MCI participants were in the amnestic multidomain subgroup (46.8%) and this group was most at risk of conversion to dementia (63%). From the LPA, a three-profile solution fit the data best. Profile 3 was the largest group (40.3%), the most cognitively impaired, and most at risk of conversion to dementia (68% of the group).
Discussion
LPA provides a useful adjunct in delineating MCI participants most at risk of conversion to dementia and adds confidence to standard categories of clinical inference.
Resumo:
This papers examines the use of trajectory distance measures and clustering techniques to define normal
and abnormal trajectories in the context of pedestrian tracking in public spaces. In order to detect abnormal
trajectories, what is meant by a normal trajectory in a given scene is firstly defined. Then every trajectory
that deviates from this normality is classified as abnormal. By combining Dynamic Time Warping and a
modified K-Means algorithms for arbitrary-length data series, we have developed an algorithm for trajectory
clustering and abnormality detection. The final system performs with an overall accuracy of 83% and 75%
when tested in two different standard datasets.
Resumo:
Query processing over the Internet involving autonomous data sources is a major task in data integration. It requires the estimated costs of possible queries in order to select the best one that has the minimum cost. In this context, the cost of a query is affected by three factors: network congestion, server contention state, and complexity of the query. In this paper, we study the effects of both the network congestion and server contention state on the cost of a query. We refer to these two factors together as system contention states. We present a new approach to determining the system contention states by clustering the costs of a sample query. For each system contention state, we construct two cost formulas for unary and join queries respectively using the multiple regression process. When a new query is submitted, its system contention state is estimated first using either the time slides method or the statistical method. The cost of the query is then calculated using the corresponding cost formulas. The estimated cost of the query is further adjusted to improve its accuracy. Our experiments show that our methods can produce quite accurate cost estimates of the submitted queries to remote data sources over the Internet.
Resumo:
Juvenile idiopathic arthritis (JIA) comprises a poorly understood group of chronic, childhood onset, autoimmune diseases with variable clinical outcomes. We investigated whether profiling of the synovial fluid (SF) proteome by a fluorescent dye based, two-dimensional gel (DIGE) approach could distinguish patients in whom inflammation extends to affect a large number of joints, early in the disease process. SF samples from 22 JIA patients were analyzed: 10 with oligoarticular arthritis, 5 extended oligoarticular and 7 polyarticular disease. SF samples were labeled with Cy dyes and separated by two-dimensional electrophoresis. Multivariate analyses were used to isolate a panel of proteins which distinguish patient subgroups. Proteins were identified using MALDI-TOF mass spectrometry with expression further verified by Western immunoblotting and immunohistochemistry. Hierarchical clustering based on the expression levels of a set of 40 proteins segregated the extended oligoarticular from the oligoarticular patients (p <0.05). Expression patterns of the isolated protein panel have also been observed over time, as disease spreads to multiple joints. The data indicates that synovial fluid proteome profiles could be used to stratify patients based on risk of disease extension. These protein profiles may also assist in monitoring therapeutic responses over time and help predict joint damage. © 2009 American Chemical Society.
Resumo:
RNA polymerase I (Pol I) produces large ribosomal RNAs (rRNAs). In this study, we show that the Rpa49 and Rpa34 Pol I subunits, which do not have counterparts in Pol II and Pol III complexes, are functionally conserved using heterospecific complementation of the human and Schizosaccharomyces pombe orthologues in Saccharomyces cerevisiae. Deletion of RPA49 leads to the disappearance of nucleolar structure, but nucleolar assembly can be restored by decreasing ribosomal gene copy number from 190 to 25. Statistical analysis of Miller spreads in the absence of Rpa49 demonstrates a fourfold decrease in Pol I loading rate per gene and decreased contact between adjacent Pol I complexes. Therefore, the Rpa34 and Rpa49 Pol I–specific subunits are essential for nucleolar assembly and for the high polymerase loading rate associated with frequent contact between adjacent enzymes. Together our data suggest that localized rRNA production results in spatially constrained rRNA production, which is instrumental for nucleolar assembly.
Resumo:
Aim We carried out a phylogeographic study across the range of the herbaceous plant species Monotropa hypopitys L. in North America to determine whether its current disjunct distribution is due to recolonization from separate eastern and western refugia after the Last Glacial Maximum (LGM). Location North America: Pacific Northwest and north-eastern USA/south-eastern Canada. Methods Palaeodistribution modelling was carried out to determine suitable climatic regions for M. hypopitys at the LGM. We analysed between 155 and 176 individuals from 39 locations spanning the species' entire range in North America. Sequence data were obtained for the chloroplast rps2 gene (n=168) and for the nuclear ITS region (n=158). Individuals were also genotyped for eight microsatellite loci (n=176). Interpolation of diversity values was used to visualize the range-wide distribution of genetic diversity for each of the three marker classes. Minimum spanning networks were constructed showing the relationships between the rps2 and ITS haplotypes, and the geographical distributions of these haplotypes were plotted. The numbers of genetic clusters based on the microsatellite data were estimated using Bayesian clustering approaches. Results The palaeodistribution modelling indicated suitable climate envelopes for M. hypopitys at the LGM in both the Pacific Northwest and south-eastern USA. High levels of genetic diversity and endemic haplotypes were found in Oregon, the Alexander Archipelago, Wisconsin, and in the south-eastern part of the species' distribution range. Main conclusions Our results suggest a complex recolonization history for M. hypopitys in North America, involving persistence in separate eastern and western refugia. A generally high degree of congruence between the different marker classes analysed indicated the presence of multiple refugia, with at least two refugia in each area. In the west, putative refugia were identified in Oregon and the Alexander Archipelago, whereas eastern refugia may have been located in the southern part of the species' current distribution, as well as in the 'Driftless Area'. These findings are in contrast to a previous study on the related species Orthilia secunda, which has a similar disjunct distribution to M. hypopitys, but which appears to have recolonized solely from western refugia. © 2011 Blackwell Publishing Ltd.
Resumo:
The recent emergence of high-throughput arrays for methylation analysis has made the influence of tumor content on the interpretation of methylation levels increasingly pertinent. However, to what degree does tumor content have an influence, and what degree of tumor content makes a specimen acceptable for accurate analysis remains unclear. Taking a systematic approach, we analyzed 98 unselected formalin-fixed and paraffin-embedded gastric tumors and matched normal tissue samples using the Illumina GoldenGate methylation assay. Unsupervised hierarchical clustering showed 2 separate clusters with a significant difference in average tumor content levels. The probes identified to be significantly differentially methylated between the tumors and normals also differed according to the tumor content of the samples included, with the sensitivity of identifying the