783 resultados para grid, clustering, statistical, clustering
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Resumo:
Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.
Resumo:
The present work proposes a method based on CLV (Clustering around Latent Variables) for identifying groups of consumers in L-shape data. This kind of datastructure is very common in consumer studies where a panel of consumers is asked to assess the global liking of a certain number of products and then, preference scores are arranged in a two-way table Y. External information on both products (physicalchemical description or sensory attributes) and consumers (socio-demographic background, purchase behaviours or consumption habits) may be available in a row descriptor matrix X and in a column descriptor matrix Z respectively. The aim of this method is to automatically provide a consumer segmentation where all the three matrices play an active role in the classification, getting homogeneous groups from all points of view: preference, products and consumer characteristics. The proposed clustering method is illustrated on data from preference studies on food products: juices based on berry fruits and traditional cheeses from Trentino. The hedonic ratings given by the consumer panel on the products under study were explained with respect to the product chemical compounds, sensory evaluation and consumer socio-demographic information, purchase behaviour and consumption habits.
Resumo:
The intensity of regional specialization in specific activities, and conversely, the level of industrial concentration in specific locations, has been used as a complementary evidence for the existence and significance of externalities. Additionally, economists have mainly focused the debate on disentangling the sources of specialization and concentration processes according to three vectors: natural advantages, internal, and external scale economies. The arbitrariness of partitions plays a key role in capturing these effects, while the selection of the partition would have to reflect the actual characteristics of the economy. Thus, the identification of spatial boundaries to measure specialization becomes critical, since most likely the model will be adapted to different scales of distance, and be influenced by different types of externalities or economies of agglomeration, which are based on the mechanisms of interaction with particular requirements of spatial proximity. This work is based on the analysis of the spatial aspect of economic specialization supported by the manufacturing industry case. The main objective is to propose, for discrete and continuous space: i) a measure of global specialization; ii) a local disaggregation of the global measure; and iii) a spatial clustering method for the identification of specialized agglomerations.
Resumo:
Hybrid vehicles represent the future for automakers, since they allow to improve the fuel economy and to reduce the pollutant emissions. A key component of the hybrid powertrain is the Energy Storage System, that determines the ability of the vehicle to store and reuse energy. Though electrified Energy Storage Systems (ESS), based on batteries and ultracapacitors, are a proven technology, Alternative Energy Storage Systems (AESS), based on mechanical, hydraulic and pneumatic devices, are gaining interest because they give the possibility of realizing low-cost mild-hybrid vehicles. Currently, most literature of design methodologies focuses on electric ESS, which are not suitable for AESS design. In this contest, The Ohio State University has developed an Alternative Energy Storage System design methodology. This work focuses on the development of driving cycle analysis methodology that is a key component of Alternative Energy Storage System design procedure. The proposed methodology is based on a statistical approach to analyzing driving schedules that represent the vehicle typical use. Driving data are broken up into power events sequence, namely traction and braking events, and for each of them, energy-related and dynamic metrics are calculated. By means of a clustering process and statistical synthesis methods, statistically-relevant metrics are determined. These metrics define cycle representative braking events. By using these events as inputs for the Alternative Energy Storage System design methodology, different system designs are obtained. Each of them is characterized by attributes, namely system volume and weight. In the last part the work, the designs are evaluated in simulation by introducing and calculating a metric related to the energy conversion efficiency. Finally, the designs are compared accounting for attributes and efficiency values. In order to automate the driving data extraction and synthesis process, a specific script Matlab based has been developed. Results show that the driving cycle analysis methodology, based on the statistical approach, allows to extract and synthesize cycle representative data. The designs based on cycle statistically-relevant metrics are properly sized and have satisfying efficiency values with respect to the expectations. An exception is the design based on the cycle worst-case scenario, corresponding to same approach adopted by the conventional electric ESS design methodologies. In this case, a heavy system with poor efficiency is produced. The proposed new methodology seems to be a valid and consistent support for Alternative Energy Storage System design.
Resumo:
There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints. Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group. In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.
Resumo:
Il task del data mining si pone come obiettivo l'estrazione automatica di schemi significativi da grandi quantità di dati. Un esempio di schemi che possono essere cercati sono raggruppamenti significativi dei dati, si parla in questo caso di clustering. Gli algoritmi di clustering tradizionali mostrano grossi limiti in caso di dataset ad alta dimensionalità, composti cioè da oggetti descritti da un numero consistente di attributi. Di fronte a queste tipologie di dataset è necessario quindi adottare una diversa metodologia di analisi: il subspace clustering. Il subspace clustering consiste nella visita del reticolo di tutti i possibili sottospazi alla ricerca di gruppi signicativi (cluster). Una ricerca di questo tipo è un'operazione particolarmente costosa dal punto di vista computazionale. Diverse ottimizzazioni sono state proposte al fine di rendere gli algoritmi di subspace clustering più efficienti. In questo lavoro di tesi si è affrontato il problema da un punto di vista diverso: l'utilizzo della parallelizzazione al fine di ridurre il costo computazionale di un algoritmo di subspace clustering.
Resumo:
In questo lavoro di tesi si è studiato il clustering degli ammassi di galassie e la determinazione della posizione del picco BAO per ottenere vincoli sui parametri cosmologici. A tale scopo si è implementato un codice per la stima dell'errore tramite i metodi di jackknife e bootstrap. La misura del picco BAO confrontata con i modelli cosmologici, grazie all'errore stimato molto piccolo, è risultato in accordo con il modelli LambdaCDM, e permette di ottenere vincoli su alcuni parametri dei modelli cosmologici.
Resumo:
We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes.
Resumo:
In recent years, enamel matrix derivative (EMD) has garnered much interest in the dental field for its apparent bioactivity that stimulates regeneration of periodontal tissues including periodontal ligament, cementum and alveolar bone. Despite its widespread use, the underlying cellular mechanisms remain unclear and an understanding of its biological interactions could identify new strategies for tissue engineering. Previous in vitro research has demonstrated that EMD promotes premature osteoblast clustering at early time points. The aim of the present study was to evaluate the influence of cell clustering on vital osteoblast cell-cell communication and adhesion molecules, connexin 43 (cx43) and N-cadherin (N-cad) as assessed by immunofluorescence imaging, real-time PCR and Western blot analysis. In addition, differentiation markers of osteoblasts were quantified using alkaline phosphatase, osteocalcin and von Kossa staining. EMD significantly increased the expression of connexin 43 and N-cadherin at early time points ranging from 2 to 5 days. Protein expression was localized to cell membranes when compared to control groups. Alkaline phosphatase activity was also significantly increased on EMD-coated samples at 3, 5 and 7 days post seeding. Interestingly, higher activity was localized to cell cluster regions. There was a 3 fold increase in osteocalcin and bone sialoprotein mRNA levels for osteoblasts cultured on EMD-coated culture dishes. Moreover, EMD significantly increased extracellular mineral deposition in cell clusters as assessed through von Kossa staining at 5, 7, 10 and 14 days post seeding. We conclude that EMD up-regulates the expression of vital osteoblast cell-cell communication and adhesion molecules, which enhances the differentiation and mineralization activity of osteoblasts. These findings provide further support for the clinical evidence that EMD increases the speed and quality of new bone formation in vivo.