971 resultados para Genetic clustering analysis
Resumo:
A methodology based on data mining techniques to support the analysis of zonal prices in real transmission networks is proposed in this paper. The mentioned methodology uses clustering algorithms to group the buses in typical classes that include a set of buses with similar LMP values. Two different clustering algorithms have been used to determine the LMP clusters: the two-step and K-means algorithms. In order to evaluate the quality of the partition as well as the best performance algorithm adequacy measurements indices are used. The paper includes a case study using a Locational Marginal Prices (LMP) data base from the California ISO (CAISO) in order to identify zonal prices.
Resumo:
This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendall τ rank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method’s artifacts.
Resumo:
This paper studies musical opus from the point of view of three mathematical tools: entropy, pseudo phase plane (PPP), and multidimensional scaling (MDS). The experiments analyze ten sets of different musical styles. First, for each musical composition, the PPP is produced using the time series lags captured by the average mutual information. Second, to unravel hidden relationships between the musical styles the MDS technique is used. The MDS is calculated based on two alternative metrics obtained from the PPP, namely, the average mutual information and the fractal dimension. The results reveal significant differences in the musical styles, demonstrating the feasibility of the proposed strategy and motivating further developments towards a dynamical analysis of musical sounds.
Resumo:
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.
Resumo:
Six open reading frames (ORFs) located on chromosome VII of Saccharomyces cerevisiae (YGR205w, YGR210c, YGR211w, YGR241c, YGR243w and YGR244c) were disrupted in two different genetic backgrounds using short-flanking homology (SFH) gene replacement. Sporulation and tetrad analysis showed that YGR211w, recently identified as the yeast ZPR1 gene, is an essential gene. The other five genes are non-essential, and no phenotypes could be associated to their inactivation. Two of these genes have recently been further characterized: YGR241c (YAP1802) encodes a yeast adaptor protein and YGR244c (LSC2) encodes the b-subunit of the succinyl-CoA ligase. For each ORF, a replacement cassette with long flanking regions homologous to the target locus was cloned in pUG7, and the cognate wild-type gene was cloned in pRS416.
Resumo:
This article is is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Attribution-NonCommercial (CC BY-NC) license lets others remix, tweak, and build upon work non-commercially, and although the new works must also acknowledge & be non-commercial.
Resumo:
3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.
Resumo:
TPM Vol. 21, No. 4, December 2014, 435-447 – Special Issue © 2014 Cises.
Resumo:
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
The paper formulates a genetic algorithm that evolves two types of objects in a plane. The fitness function promotes a relationship between the objects that is optimal when some kind of interface between them occurs. Furthermore, the algorithm adopts an hexagonal tessellation of the two-dimensional space for promoting an efficient method of the neighbour modelling. The genetic algorithm produces special patterns with resemblances to those revealed in percolation phenomena or in the symbiosis found in lichens. Besides the analysis of the spacial layout, a modelling of the time evolution is performed by adopting a distance measure and the modelling in the Fourier domain in the perspective of fractional calculus. The results reveal a consistent, and easy to interpret, set of model parameters for distinct operating conditions.
Resumo:
This paper focus on a demand response model analysis in a smart grid context considering a contingency scenario. A fuzzy clustering technique is applied on the developed demand response model and an analysis is performed for the contingency scenario. Model considerations and architecture are described. The demand response developed model aims to support consumers decisions regarding their consumption needs and possible economic benefits.
Resumo:
This paper focus on a demand response model analysis in a smart grid context considering a contingency scenario. A fuzzy clustering technique is applied on the developed demand response model and an analysis is performed for the contingency scenario. Model considerations and architecture are described. The demand response developed model aims to support consumers decisions regarding their consumption needs and possible economic benefits.