940 resultados para genetic selection


Relevância:

40.00% 40.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A completely effective vaccine for malaria (one of the major infectious diseases worldwide) is not yet available; different membrane proteins involved in parasite-host interactions have been proposed as candidates for designing it. It has been found that proteins encoded by the merozoite surface protein (msp)-7 multigene family are antibody targets in natural infection; the nucleotide diversity of three Pvmsp-7 genes was thus analyzed in a Colombian parasite population. By contrast with P. falciparum msp-7 loci and ancestral P. vivax msp-7 genes, specie-specific duplicates of the latter specie display high genetic variability, generated by single nucleotide polymorphisms, repeat regions, and recombination. At least three major allele types are present in Pvmsp-7C, Pvmsp-7H and Pvmsp-7I and positive selection seems to be operating on the central region of these msp-7 genes. Although this region has high genetic polymorphism, the C-terminus (Pfam domain ID: PF12948) is conserved and could be an important candidate when designing a subunit-based antimalarial vaccine.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A completely effective vaccine for malaria (one of the major infectious diseases worldwide) is not yet available; different membrane proteins involved in parasite-host interactions have been proposed as candidates for designing it. It has been found that proteins encoded by the merozoite surface protein (msp)-7 multigene family are antibody targets in natural infection; the nucleotide diversity of three Pvmsp-7 genes was thus analyzed in a Colombian parasite population. By contrast with P. falciparum msp-7 loci and ancestral P. vivax msp-7 genes, specie-specific duplicates of the latter specie display high genetic variability, generated by single nucleotide polymorphisms, repeat regions, and recombination. At least three major allele types are present in Pvmsp-7C, Pvmsp-7H and Pvmsp-7I and positive selection seems to be operating on the central region of these msp-7 genes. Although this region has high genetic polymorphism, the C-terminus (Pfam domain ID: PF12948) is conserved and could be an important candidate when designing a subunit-based antimalarial vaccine.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Indirect and direct models of sexual selection make different predictions regarding the quantitative genetic relationships between sexual ornaments and fitness. Indirect models predict that ornaments should have a high heritability and that strong positive genetic covariance should exist between fitness and the ornament. Direct models, on the other hand, make no such assumptions about the level of genetic variance in fitness and the ornament, and are therefore likely to be more important when environmental sources of variation are large. Here we test these predictions in a wild population of the blue tit (Parus caeruleus), a species in which plumage coloration has been shown to be under sexual selection. Using 3 years of cross-fostering data from over 250 breeding attempts, we partition the covariance between parental coloration and aspects of nestling fitness into a genetic and environmental component. Contrary to indirect models of sexual selection, but in agreement with direct models, we show that variation in coloration is only weakly heritable (h(2) < 0.11), and that two components of offspring fitness-nestling size and fledgling recruitment-are strongly dependent on parental effects, rather than genetic effects. Furthermore, there was no evidence of significant positive genetic covariation between parental colour and offspring traits. Contrary to direct benefit models, however, we find little evidence that variation in colour reliably indicates the level of parental care provided by either males or females. Taken together, these results indicate that the assumptions of indirect models of sexual selection are not supported by the genetic basis of the traits reported on here.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper is concerned with the use of a genetic algorithm to select financial ratios for corporate distress classification models. For this purpose, the fitness value associated to a set of ratios is made to reflect the requirements of maximizing the amount of information available for the model and minimizing the collinearity between the model inputs. A case study involving 60 failed and continuing British firms in the period 1997-2000 is used for illustration. The classification model based on ratios selected by the genetic algorithm compares favorably with a model employing ratios usually found in the financial distress literature.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This work investigates the problem of feature selection in neuroimaging features from structural MRI brain images for the classification of subjects as healthy controls, suffering from Mild Cognitive Impairment or Alzheimer’s Disease. A Genetic Algorithm wrapper method for feature selection is adopted in conjunction with a Support Vector Machine classifier. In very large feature sets, feature selection is found to be redundant as the accuracy is often worsened when compared to an Support Vector Machine with no feature selection. However, when just the hippocampal subfields are used, feature selection shows a significant improvement of the classification accuracy. Three-class Support Vector Machines and two-class Support Vector Machines combined with weighted voting are also compared with the former and found more useful. The highest accuracy achieved at classifying the test data was 65.5% using a genetic algorithm for feature selection with a three-class Support Vector Machine classifier.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Selecting a set of features which is optimal for a given task is the problem which plays an important role in a wide variety of contexts including pattern recognition, images understanding and machine learning. The concept of reduction of the decision table based on the rough set is very useful for feature selection. In this paper, a genetic algorithm based approach is presented to search the relative reduct decision table of the rough set. This approach has the ability to accommodate multiple criteria such as accuracy and cost of classification into the feature selection process and finds the effective feature subset for texture classification . On the basis of the effective feature subset selected, this paper presents a method to extract the objects which are higher than their surroundings, such as trees or forest, in the color aerial images. The experiments results show that the feature subset selected and the method of the object extraction presented in this paper are practical and effective.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Different data classification algorithms have been developed and applied in various areas to analyze and extract valuable information and patterns from large datasets with noise and missing values. However, none of them could consistently perform well over all datasets. To this end, ensemble methods have been suggested as the promising measures. This paper proposes a novel hybrid algorithm, which is the combination of a multi-objective Genetic Algorithm (GA) and an ensemble classifier. While the ensemble classifier, which consists of a decision tree classifier, an Artificial Neural Network (ANN) classifier, and a Support Vector Machine (SVM) classifier, is used as the classification committee, the multi-objective Genetic Algorithm is employed as the feature selector to facilitate the ensemble classifier to improve the overall sample classification accuracy while also identifying the most important features in the dataset of interest. The proposed GA-Ensemble method is tested on three benchmark datasets, and compared with each individual classifier as well as the methods based on mutual information theory, bagging and boosting. The results suggest that this GA-Ensemble method outperform other algorithms in comparison, and be a useful method for classification and feature selection problems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Here, we report for the first time, to our knowledge, a strong correlation between a measure of individual genetic diversity and song complexity, a sexually selected male trait in sedge warblers, Acrocephalus schoenobaenus. We also find that females prefer to mate with males who will maximize this diversity in individual progeny. The genetic diversity of each offspring is further increased by means of nonrandom fertilization, as we also show that the fertilizing sperm contains a haplotype more genetically distant to that of the egg than expected by chance. These findings suggest that species' mating preferences may be subject to fine tuning aimed at increasing offspring viability through increased genetic diversity. This includes external and internal mechanisms of selection, even within the ejaculate of a single male.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.
Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.
Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Because selection is often sex-dependent, alleles can have positive effects on fitness in one sex and negative effects in the other, resulting in intralocus sexual conflict. Evolutionary theory predicts that intralocus sexual conflict can drive the evolution of sex limitation, sex-linkage, and sex chromosome differentiation. However, evidence that sex-dependent selection results in sex-linkage is limited. Here, we formally partition the contribution of Y-linked and non-Y-linked quantitative genetic variation in coloration, tail, and body size of male guppies (Poecilia reticulata)—traits previously implicated as sexually antagonistic. We show that these traits are strongly genetically correlated, both on and off the Y chromosome, but that these correlations differ in sign and magnitude between both parts of the genome. As predicted, variation in attractiveness was found to be associated with the Y-linked, rather than with the non-Y-linked component of genetic variation in male ornamentation. These findings show how the evolution of Y-linkage may be able to resolve sexual conflict. More generally, they provide unique insight into how sex-specific selection has the potential to differentially shape the genetic architecture of fitness traits across different parts of the genome.