113 resultados para evolutionary genetics


Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper tackles the problem of showing that evolutionary algorithms for fuzzy clustering can be more efficient than systematic (i.e. repetitive) approaches when the number of clusters in a data set is unknown. To do so, a fuzzy version of an Evolutionary Algorithm for Clustering (EAC) is introduced. A fuzzy cluster validity criterion and a fuzzy local search algorithm are used instead of their hard counterparts employed by EAC. Theoretical complexity analyses for both the systematic and evolutionary algorithms under interest are provided. Examples with computational experiments and statistical analyses are also presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Support vector machines (SVMs) were originally formulated for the solution of binary classification problems. In multiclass problems, a decomposition approach is often employed, in which the multiclass problem is divided into multiple binary subproblems, whose results are combined. Generally, the performance of SVM classifiers is affected by the selection of values for their parameters. This paper investigates the use of genetic algorithms (GAs) to tune the parameters of the binary SVMs in common multiclass decompositions. The developed GA may search for a set of parameter values common to all binary classifiers or for differentiated values for each binary classifier. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is an increasing interest in the application of Evolutionary Algorithms (EAs) to induce classification rules. This hybrid approach can benefit areas where classical methods for rule induction have not been very successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when one or more classes heavily outnumber other classes. Frequently, classical machine learning (ML) classifiers are not able to learn in the presence of imbalanced data sets, inducing classification models that always predict the most numerous classes. In this work, we propose a novel hybrid approach to deal with this problem. We create several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are fed to classical ML systems that produce rule sets. The rule sets are combined creating a pool of rules and an EA is used to build a classifier from this pool of rules. This hybrid approach has some advantages over undersampling, since it reduces the amount of discarded information, and some advantages over oversampling, since it avoids overfitting. The proposed approach was experimentally analysed and the experimental results show an improvement in the classification performance measured as the area under the receiver operating characteristics (ROC) curve.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Schistosoma mansoni is a well-adapted blood-dwelling parasitic helminth, persisting for decades in its human host despite being continually exposed to potential immune attack. Here, we describe in detail micro-exon genes (MEG) in S. mansoni, some present in multiple copies, which represent a novel molecular system for creating protein variation through the alternate splicing of short (<= 36 bp) symmetric exons organized in tandem. Analysis of three closely related copies of one MEG family allowed us to trace several evolutionary events and propose a mechanism for micro-exon generation and diversification. Microarray experiments show that the majority of MEGs are up-regulated in life cycle stages associated with establishment in the mammalian host after skin penetration. Sequencing of RT-PCR products allowed the description of several alternate splice forms of micro-exon genes, highlighting the potential use of these transcripts to generate a complex pool of protein variants. We obtained direct evidence for the existence of such pools by proteomic analysis of secretions from migrating schistosomula and mature eggs. Whole-mount in situ hybridization and immunolocalization showed that MEG transcripts and proteins were restricted to glands or epithelia exposed to the external environment. The ability of schistosomes to produce a complex pool of variant proteins aligns them with the other major groups of blood parasites, but using a completely different mechanism. We believe that our data open a new chapter in the study of immune evasion by schistosomes, and their ability to generate variant proteins could represent a significant obstacle to vaccine development.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It has been postulated that noncoding RNAs (ncRNAs) are involved in the posttranscriptional control of gene expression, and may have contributed to the emergence of the complex attributes observed in mammalians. We show here that the complement of ncRNAs expressed from intronic regions of the human and mouse genomes comprises at least 78,147 and 39,660 transcriptional units, respectively. To identify conserved intronic sequences expressed in both humans and mice, we used custom-designed human cDNA microarrays to separately interrogate RNA from mouse and human liver, kidney, and prostate tissues. An overlapping tissue expression signature was detected for both species, comprising 198 transcripts; among these, 22 RNAs map to intronic regions with evidence of evolutionary conservation in humans and mice. Transcription of selected human-mouse intronic ncRNAs was confirmed using strand-specific RT-PCR. Altogether, these results support an evolutionarily conserved role of intronic ncRNAs in human and mouse, which are likely to be involved in the fine tuning of gene expression regulation in different mammalian tissues. (C) 2008 Elsevier Inc. All rights reserved.