936 resultados para Cluster Analysis of Variables
Resumo:
This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).
Resumo:
We describe a network module detection approach which combines a rapid and robust clustering algorithm with an objective measure of the coherence of the modules identified. The approach is applied to the network of genetic regulatory interactions surrounding the tumor suppressor gene p53. This algorithm identifies ten clusters in the p53 network, which are visually coherent and biologically plausible.
Resumo:
This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers.
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT
Resumo:
The paper treats the task for cluster analysis of a given assembly of objects on the basis of the information contained in the description table of these objects. Various methods of cluster analysis are briefly considered. Heuristic method and rules for classification of the given assembly of objects are presented for the cases when their division into classes and the number of classes is not known. The algorithm is checked by a test example and two program products (PP) – learning systems and software for company management. Analysis of the results is presented.
Resumo:
© 2014 Cises This work is distributed with License Creative Commons Attribution-Non commercial-No derivatives 4.0 International (CC BY-BC-ND 4.0)
Resumo:
Background: The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. New method: We propose a complete pipeline for the cluster analysis of ERP data. To increase the signalto-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA)to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). Results: After validating the pipeline on simulated data, we tested it on data from two experiments – a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership.
Resumo:
Abstract. Rock magnetic, biochemical and inorganic records of the sediment cores PG1351 and Lz1024 from Lake El’gygytgyn, Chukotka peninsula, Far East Russian Arctic, were subject to a hierarchical agglomerative cluster analysis in order to refine and extend the pattern of climate modes as defined by Melles et al. (2007). Cluster analysis of the data obtained from both cores yielded similar results, differentiating clearly between the four climate modes warm, peak warm, cold and dry, and cold and moist. In addition, two transitional phases were identified, representing the early stages of a cold phase and slightly colder conditions during a warm phase. The statistical approach can thus be used to resolve gradual changes in the sedimentary units as an indicator of available oxygen in the hypolimnion in greater detail. Based upon cluster analyses on core Lz1024, the published succession of climate modes in core PG1351, covering the last 250 ka, was modified and extended back to 350 ka. Comparison to the marine oxygen isotope (�18O) stack LR04 (Lisiecki and Raymo, 2005) and the summer insolation at 67.5� N, with the extended Lake El’gygytgyn parameter records of magnetic susceptibility (�LF), total organic carbon content (TOC) and the chemical index of alteration (CIA; Minyuk et al., 2007), revealed that all stages back to marine isotope stage (MIS) 10 and most of the substages are clearly reflected in the pattern derived from the cluster analysis.
Resumo:
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Resumo:
Cork stopper manufacturing process includes an operation, known as stabilisation, by which humid cork slabs are extensively colonised by fungi. The effects of fungal growth on cork are yet to be completely understood and are considered to be involved in the so called “cork taint” of bottled wine. It is essential to identify environmental constraints which define the appearance of the colonising fungal species and to trace their origin to the forest and/or as residents in the manufacturing space. The present article correlates two sets of data, from consecutive years and the same season, of systematic biologic sampling of two manufacturing units, located in the North and South of Portugal. Chrysonilia sitophila dominance was identified, followed by a high diversity of Penicillium species. Penicillium glabrum, found in all samples, was the most frequent isolated species. P. glabrum intra-species variability was investigated using DNA fingerprinting techniques revealing highly discriminative polymorphic markers in the genome. Cluster analysis of P. glabrum data was discussed in relation to the geographical location of strains, and results suggest that P. glabrum arise from predominantly the manufacturing space, although cork resident fungi can also contrib
Resumo:
Case-crossover is one of the most used designs for analyzing the health-related effects of air pollution. Nevertheless, no one has reviewed its application and methodology in this context. Objective: We conducted a systematic review of case-crossover (CCO) designs used to study the relationship between air pollution and morbidity and mortality, from the standpoint of methodology and application.Data sources and extraction: A search was made of the MEDLINE and EMBASE databases.Reports were classified as methodologic or applied. From the latter, the following information was extracted: author, study location, year, type of population (general or patients), dependent variable(s), independent variable(s), type of CCO design, and whether effect modification was analyzed for variables at the individual level. Data synthesis: The review covered 105 reports that fulfilled the inclusion criteria. Of these, 24 addressed methodological aspects, and the remainder involved the design’s application. In the methodological reports, the designs that yielded the best results in simulation were symmetric bidirectional CCO and time-stratified CCO. Furthermore, we observed an increase across time in the use of certain CCO designs, mainly symmetric bidirectional and time-stratified CCO. The dependent variables most frequently analyzed were those relating to hospital morbidity; the pollutants most often studied were those linked to particulate matter. Among the CCO-application reports, 13.6% studied effect modification for variables at the individual level.Conclusions: The use of CCO designs has undergone considerable growth; the most widely used designs were those that yielded better results in simulation studies: symmetric bidirectional and time-stratified CCO. However, the advantages of CCO as a method of analysis of variables at the individual level are put to little use
Resumo:
Case-crossover is one of the most used designs for analyzing the health-related effects of air pollution. Nevertheless, no one has reviewed its application and methodology in this context. Objective: We conducted a systematic review of case-crossover (CCO) designs used to study the relationship between air pollution and morbidity and mortality, from the standpoint of methodology and application.Data sources and extraction: A search was made of the MEDLINE and EMBASE databases.Reports were classified as methodologic or applied. From the latter, the following information was extracted: author, study location, year, type of population (general or patients), dependent variable(s), independent variable(s), type of CCO design, and whether effect modification was analyzed for variables at the individual level. Data synthesis: The review covered 105 reports that fulfilled the inclusion criteria. Of these, 24 addressed methodological aspects, and the remainder involved the design’s application. In the methodological reports, the designs that yielded the best results in simulation were symmetric bidirectional CCO and time-stratified CCO. Furthermore, we observed an increase across time in the use of certain CCO designs, mainly symmetric bidirectional and time-stratified CCO. The dependent variables most frequently analyzed were those relating to hospital morbidity; the pollutants most often studied were those linked to particulate matter. Among the CCO-application reports, 13.6% studied effect modification for variables at the individual level.Conclusions: The use of CCO designs has undergone considerable growth; the most widely used designs were those that yielded better results in simulation studies: symmetric bidirectional and time-stratified CCO. However, the advantages of CCO as a method of analysis of variables at the individual level are put to little use
Resumo:
The transcriptional effects of deregulated myc gene overexpression are implicated in tumorigenesis in a spectrum of experimental and naturally occurring neoplasms. In follicles of the chicken bursa of Fabricius, myc induction of B-cell neoplasia requires a target cell population present during early bursal development and progresses through preneoplastic transformed follicles to metastatic lymphomas. We developed a chicken immune system cDNA microarray to analyze broad changes in gene expression that occur during normal embryonic B-cell development and during myc-induced neoplastic transformation in the bursa. The number of mRNAs showing at least 3-fold change was greater during myc-induced lymphomagenesis than during normal development, and hierarchical cluster analysis of expression patterns revealed that levels of several hundred mRNAs varied in concert with levels of myc overexpression. A set of 41 mRNAs were most consistently elevated in myc-overexpressing preneoplastic and neoplastic cells, most involved in processes thought to be subject to regulation by Myc. The mRNAs for another cluster of genes were overexpressed in neoplasia independent of myc expression level, including a small subset with the expression signature of embryonic bursal lymphocytes. Overexpression of myc, and some of the genes overexpressed with myc, may be important for generation of preneoplastic transformed follicles. However, expression profiles of late metastatic tumors showed a large variation in concert with myc expression levels, and some showed minimal myc overexpression. Therefore, high-level myc overexpression may be more important in the early induction of these lymphomas than in maintenance of late-stage metastases.
Resumo:
The urban heat island effect is often associated with large metropolises. However, in the Netherlands even small cities will be affected by the phenomenon in the future (Hove et al., 2011), due to the dispersed or mosaic urbanisation patterns in particularly the southern part of the country: the province of North Brabant. This study analyses the average night time land surface temperature (LST) of 21 North-Brabant urban areas through 22 satellite images retrieved by Modis 11A1 during the 2006 heat wave and uses Landsat 5 Thematic Mapper to map albedo and normalized difference temperature index (NDVI) values. Albedo, NDVI and imperviousness are found to play the most relevant role in the increase of nighttime LST. The surface cover cluster analysis of these three parameters reveals that the 12 “urban living environment” categories used in the region of North Brabant can actually be reduced to 7 categories, which simplifies the design guidelines to improve the surface thermal behaviour of the different neighbourhoods thus reducing the Urban Heat Island (UHI) effect in existing medium size cities and future developments adjacent to those cities.