925 resultados para Data clustering. Fuzzy C-Means. Cluster centers initialization. Validation indices
Resumo:
We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes.
Resumo:
Abstract. Rock magnetic, biochemical and inorganic records of the sediment cores PG1351 and Lz1024 from Lake El’gygytgyn, Chukotka peninsula, Far East Russian Arctic, were subject to a hierarchical agglomerative cluster analysis in order to refine and extend the pattern of climate modes as defined by Melles et al. (2007). Cluster analysis of the data obtained from both cores yielded similar results, differentiating clearly between the four climate modes warm, peak warm, cold and dry, and cold and moist. In addition, two transitional phases were identified, representing the early stages of a cold phase and slightly colder conditions during a warm phase. The statistical approach can thus be used to resolve gradual changes in the sedimentary units as an indicator of available oxygen in the hypolimnion in greater detail. Based upon cluster analyses on core Lz1024, the published succession of climate modes in core PG1351, covering the last 250 ka, was modified and extended back to 350 ka. Comparison to the marine oxygen isotope (�18O) stack LR04 (Lisiecki and Raymo, 2005) and the summer insolation at 67.5� N, with the extended Lake El’gygytgyn parameter records of magnetic susceptibility (�LF), total organic carbon content (TOC) and the chemical index of alteration (CIA; Minyuk et al., 2007), revealed that all stages back to marine isotope stage (MIS) 10 and most of the substages are clearly reflected in the pattern derived from the cluster analysis.
Resumo:
Traditionally, ontologies describe knowledge representation in a denotational, formalized, and deductive way. In addition, in this paper, we propose a semiotic, inductive, and approximate approach to ontology creation. We define a conceptual framework, a semantics extraction algorithm, and a first proof of concept applying the algorithm to a small set of Wikipedia documents. Intended as an extension to the prevailing top-down ontologies, we introduce an inductive fuzzy grassroots ontology, which organizes itself organically from existing natural language Web content. Using inductive and approximate reasoning to reflect the natural way in which knowledge is processed, the ontology’s bottom-up build process creates emergent semantics learned from the Web. By this means, the ontology acts as a hub for computing with words described in natural language. For Web users, the structural semantics are visualized as inductive fuzzy cognitive maps, allowing an initial form of intelligence amplification. Eventually, we present an implementation of our inductive fuzzy grassroots ontology Thus,this paper contributes an algorithm for the extraction of fuzzy grassroots ontologies from Web data by inductive fuzzy classification.
Resumo:
Online reputation management deals with monitoring and influencing the online record of a person, an organization or a product. The Social Web offers increasingly simple ways to publish and disseminate personal or opinionated information, which can rapidly have a disastrous influence on the online reputation of some of the entities. This dissertation can be split into three parts: In the first part, possible fuzzy clustering applications for the Social Semantic Web are investigated. The second part explores promising Social Semantic Web elements for organizational applications,while in the third part the former two parts are brought together and a fuzzy online reputation analysis framework is introduced and evaluated. Theentire PhD thesis is based on literature reviews as well as on argumentative-deductive analyses.The possible applications of Social Semantic Web elements within organizations have been researched using a scenario and an additional case study together with two ancillary case studies—based on qualitative interviews. For the conception and implementation of the online reputation analysis application, a conceptual framework was developed. Employing test installations and prototyping, the essential parts of the framework have been implemented.By following a design sciences research approach, this PhD has created two artifacts: a frameworkand a prototype as proof of concept. Bothartifactshinge on twocoreelements: a (cluster analysis-based) translation of tags used in the Social Web to a computer-understandable fuzzy grassroots ontology for the Semantic Web, and a (Topic Maps-based) knowledge representation system, which facilitates a natural interaction with the fuzzy grassroots ontology. This is beneficial to the identification of unknown but essential Web data that could not be realized through conventional online reputation analysis. Theinherent structure of natural language supports humans not only in communication but also in the perception of the world. Fuzziness is a promising tool for transforming those human perceptions intocomputer artifacts. Through fuzzy grassroots ontologies, the Social Semantic Web becomes more naturally and thus can streamline online reputation management.
Resumo:
The Social Web offers increasingly simple ways to publish and disseminate personal or opinionated information, which can rapidly exhibit a disastrous influence on the online reputation of organizations. Based on social Web data, this study describes the building of an ontology based on fuzzy sets. At the end of a recurring harvesting of folksonomies by Web agents, the aggregated tags are purified, linked, and transformed to a so-called fuzzy grassroots ontology by means of a fuzzy clustering algorithm. This self-updating ontology is used for online reputation analysis, a crucial task of reputation management, with the goal to follow the online conversation going on around an organization to discover and monitor its reputation. In addition, an application of the Fuzzy Online Reputation Analysis (FORA) framework, lesson learned, and potential extensions are discussed in this article.
Resumo:
SUMMARY There is interest in the potential of companion animal surveillance to provide data to improve pet health and to provide early warning of environmental hazards to people. We implemented a companion animal surveillance system in Calgary, Alberta and the surrounding communities. Informatics technologies automatically extracted electronic medical records from participating veterinary practices and identified cases of enteric syndrome in the warehoused records. The data were analysed using time-series analyses and a retrospective space-time permutation scan statistic. We identified a seasonal pattern of reports of occurrences of enteric syndromes in companion animals and four statistically significant clusters of enteric syndrome cases. The cases within each cluster were examined and information about the animals involved (species, age, sex), their vaccination history, possible exposure or risk behaviour history, information about disease severity, and the aetiological diagnosis was collected. We then assessed whether the cases within the cluster were unusual and if they represented an animal or public health threat. There was often insufficient information recorded in the medical record to characterize the clusters by aetiology or exposures. Space-time analysis of companion animal enteric syndrome cases found evidence of clustering. Collection of more epidemiologically relevant data would enhance the utility of practice-based companion animal surveillance.
Resumo:
Background. Injecting drug users (IDUs) are at risk of infection with Hepatitis C Virus (HCV) and Human Immunodeficiency Virus (HIV). Independently, each of these viruses is a serious threat to health, with HIV ravaging the body’s immune system, and HCV causing cirrhosis, liver cancer and liver failure. Co-infection with HIV/HCV weakens the response to antiretroviral therapy in HIV patients. IDUs with HIV/HCV co-infection are at a 20 times higher risk of having liver-related morbidity and mortality than IDUs with HIV alone. In Vietnam, studies to ascertain the prevalence of HIV have found high rates, but little is known about their HCV status. ^ Purpose. To measure the prevalence of HCV and HIV infection and identify factors associated with these viruses among IDUs at drug treatment centers in northern Vietnam. ^ Methods. A cross-sectional study was conducted from November 2007 to February 2008 with 455 injecting drug users aged 18 to 39 years, admitted no more than two months earlier to one of four treatment centers in Northern Vietnam (Hatay Province) (response rate=95%). Participants, all of whom had completed detoxification and provided informed consent, completed a risk assessment questionnaire and had their blood drawn to test for the presence of antibody-HCV and antibody-HIV with enzyme immuno assays. Univariate and multivariable logistic regression models were utilized to explore the strength of association using HIV, HCV infections and HIV/HCV co-infection as outcomes and demographic characteristics, drug use and sexual behaviors as factors associated with these outcomes. Unadjusted and adjusted odds ratios and 95% confidence intervals were calculated. ^ Results. Among all IDU study participants, the prevalence of HCV alone was 76.9%, HIV alone was 19.8%. The prevalence of HIV/HCV co-infection was 92.2% of HIV-positive and 23.7% of HCV-positive respondents. No sexual risk behaviors for lifetime, six months or 30 days prior to admission were significantly associated with HCV or HIV infection among these IDUs. Only duration of injection drug use was independently associated with HCV and HIV infection, respectively. Longer duration was associated with higher prevalence. Nevertheless, while HCV infection among IDUs who reported being in their first year of injecting drugs were lower than longer time injectors, their rates were still substantial, 67.5%. ^ Compared with either HCV mono-infection or HIV/HCV non-infection, HIV/HCV co-infection was associated with the length of drug injection history but was not associated with sexual behaviors. Higher education was associated with a lower prevalence of HIV/HCV co-infection. When compared with HIV/HCV non-infection, current marriage was associated with a lower prevalence of HIV/HCV co-infection. ^ Conclusions. HCV was prevalent among IDUs from 18 to 39 years old at four drug treatment centers in northern Vietnam. Co-infection with HCV was predominant among HIV-positive IDUs. HCV and HIV co-infection were closely associated with the length of injection drug history. Further research regarding HCV/HIV co-infection should include non-injecting drug users to assess the magnitude of sexual risk behaviors on HIV and HCV infection. (At these treatment centers non-IDUs constituted 10-20% of the population.) High prevalence of HCV prevalence among IDUs, especially among HIV-infected IDUs, suggests that drug treatment centers serving IDUs should include not only HIV prevention education but they should also include the prevention of viral hepatitis. In addition, IDUs who are HIV-positive need to be tested for HCV to receive the best course of therapy and achieve the best response to HIV treatment. These data also suggest that because many IDUs get infected with HCV in the first year of their injection drug career, and because they also engaged in high risk sexual behaviors, outreach programs should focus on harm reduction, safer drug use and sexual practices to prevent infection among drug users who have not yet begun injecting drugs and to prevent further spread of HCV, HIV and co-infection. ^
Resumo:
We have performed quantitative X-ray diffraction (qXRD) analysis of 157 grab or core-top samples from the western Nordic Seas between (WNS) ~57°-75°N and 5° to 45° W. The RockJock Vs6 analysis includes non-clay (20) and clay (10) mineral species in the <2 mm size fraction that sum to 100 weight %. The data matrix was reduced to 9 and 6 variables respectively by excluding minerals with low weight% and by grouping into larger groups, such as the alkali and plagioclase feldspars. Because of its potential dual origins calcite was placed outside of the sum. We initially hypothesized that a combination of regional bedrock outcrops and transport associated with drift-ice, meltwater plumes, and bottom currents would result in 6 clusters defined by "similar" mineral compositions. The hypothesis was tested by use of a fuzzy k-mean clustering algorithm and key minerals were identified by step-wise Discriminant Function Analysis. Key minerals in defining the clusters include quartz, pyroxene, muscovite, and amphibole. With 5 clusters, 87.5% of the observations are correctly classified. The geographic distributions of the five k-mean clusters compares reasonably well with the original hypothesis. The close spatial relationship between bedrock geology and discrete cluster membership stresses the importance of this variable at both the WNS-scale and at a more local scale in NE Greenland.
Resumo:
The data acquired by Remote Sensing systems allow obtaining thematic maps of the earth's surface, by means of the registered image classification. This implies the identification and categorization of all pixels into land cover classes. Traditionally, methods based on statistical parameters have been widely used, although they show some disadvantages. Nevertheless, some authors indicate that those methods based on artificial intelligence, may be a good alternative. Thus, fuzzy classifiers, which are based on Fuzzy Logic, include additional information in the classification process through based-rule systems. In this work, we propose the use of a genetic algorithm (GA) to select the optimal and minimum set of fuzzy rules to classify remotely sensed images. Input information of GA has been obtained through the training space determined by two uncorrelated spectral bands (2D scatter diagrams), which has been irregularly divided by five linguistic terms defined in each band. The proposed methodology has been applied to Landsat-TM images and it has showed that this set of rules provides a higher accuracy level in the classification process
Resumo:
Energy consumption in data centers is nowadays a critical objective because of its dramatic environmental and economic impact. Over the last years, several approaches have been proposed to tackle the energy/cost optimization problem, but most of them have failed on providing an analytical model to target both the static and dynamic optimization domains for complex heterogeneous data centers. This paper proposes and solves an optimization problem for the energy-driven configuration of a heterogeneous data center. It also advances in the proposition of a new mechanism for task allocation and distribution of workload. The combination of both approaches outperforms previous published results in the field of energy minimization in heterogeneous data centers and scopes a promising area of research.
Resumo:
Reducing the energy consumption for computation and cooling in servers is a major challenge considering the data center energy costs today. To ensure energy-efficient operation of servers in data centers, the relationship among computa- tional power, temperature, leakage, and cooling power needs to be analyzed. By means of an innovative setup that enables monitoring and controlling the computing and cooling power consumption separately on a commercial enterprise server, this paper studies temperature-leakage-energy tradeoffs, obtaining an empirical model for the leakage component. Using this model, we design a controller that continuously seeks and settles at the optimal fan speed to minimize the energy consumption for a given workload. We run a customized dynamic load-synthesis tool to stress the system. Our proposed cooling controller achieves up to 9% energy savings and 30W reduction in peak power in comparison to the default cooling control scheme.
Resumo:
Presented here are femtosecond pump-probe studies on the water-solvated 7-azaindole dimer, a model DNA base pair. In particular, studies are presented that further elucidate the nature of the reactive and nonreactive dimers and also provide new insights establishing that the excited state double-proton transfer in the dimer occurs in a stepwise rather than a concerted manner. A major question addressed is whether the incorporation of a water molecule with the dimer results in the formation of species that are unable to undergo excited state double-proton transfer, as suggested by a recent study reported in the literature [Nakajima, A., Hirano, M., Hasumi, R., Kaya, K., Watanabe, H., Carter, C. C., Williamson, J. M. & Miller, T. (1997) J. Phys. Chem. 101, 392–398]. In contrast to this earlier work, our present findings reveal that both reactive and nonreactive dimers can coexist in the molecular beam under the same experimental conditions and definitively show that the clustering of water does not induce the formation of the nonreactive dimer. Rather, when present with a species already determined to be a nonreactive dimer, the addition of water can actually facilitate the occurrence of the proton transfer reaction. Furthermore, on attaining a critical hydration number, the data for the nonreactive dimer suggest a solvation-induced conformational structure change leading to proton transfer on the photoexcited half of the 7-azaindole dimer.
Resumo:
A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
Resumo:
This paper proposes an adaptive algorithm for clustering cumulative probability distribution functions (c.p.d.f.) of a continuous random variable, observed in different populations, into the minimum homogeneous clusters, making no parametric assumptions about the c.p.d.f.’s. The distance function for clustering c.p.d.f.’s that is proposed is based on the Kolmogorov–Smirnov two sample statistic. This test is able to detect differences in position, dispersion or shape of the c.p.d.f.’s. In our context, this statistic allows us to cluster the recorded data with a homogeneity criterion based on the whole distribution of each data set, and to decide whether it is necessary to add more clusters or not. In this sense, the proposed algorithm is adaptive as it automatically increases the number of clusters only as necessary; therefore, there is no need to fix in advance the number of clusters. The output of the algorithm are the common c.p.d.f. of all observed data in the cluster (the centroid) and, for each cluster, the Kolmogorov–Smirnov statistic between the centroid and the most distant c.p.d.f. The proposed algorithm has been used for a large data set of solar global irradiation spectra distributions. The results obtained enable to reduce all the information of more than 270,000 c.p.d.f.’s in only 6 different clusters that correspond to 6 different c.p.d.f.’s.