4 resultados para Population-based survey

em Brock University, Canada


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis introduces the Salmon Algorithm, a search meta-heuristic which can be used for a variety of combinatorial optimization problems. This algorithm is loosely based on the path finding behaviour of salmon swimming upstream to spawn. There are a number of tunable parameters in the algorithm, so experiments were conducted to find the optimum parameter settings for different search spaces. The algorithm was tested on one instance of the Traveling Salesman Problem and found to have superior performance to an Ant Colony Algorithm and a Genetic Algorithm. It was then tested on three coding theory problems - optimal edit codes, optimal Hamming distance codes, and optimal covering codes. The algorithm produced improvements on the best known values for five of six of the test cases using edit codes. It matched the best known results on four out of seven of the Hamming codes as well as three out of three of the covering codes. The results suggest the Salmon Algorithm is competitive with established guided random search techniques, and may be superior in some search spaces.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Human endogenous retroviruses (HERVs) are the result of ancient germ cell infections of human germ cells by exogenous retroviruses. HERVs belong to the long terminal repeat (LTR) group of retrotransposons that comprise ~8% of the human genome. The majority of the HERVs documented have been truncated and/or incurred lethal mutations and no longer encode functional genes; however a very small number of HERVs seem to maintain functional in making new copies by retrotranspositon as suggested by the identification of a handful of polymorphic HERV insertions in human populations. The objectives of this study were to identify novel insertion of HERVs via analysis of personal genomic data and survey the polymorphism levels of new and known HERV insertions in the human genome. Specifically, this study involves the experimental validation of polymorphic HERV insertion candidates predicted by personal genome-based computation prediction and survey the polymorphism level within the human population based on a set of 30 diverse human DNA samples. Based on computational analysis of a limited number of personal genome sequences, PCR genotyping aided in the identification of 15 dimorphic, 2 trimorphic and 5 fixed full-length HERV-K insertions not previously investigated. These results suggest that the proliferation rate of HERVKs, perhaps also other ERVs, in the human genome may be much higher than we previously appreciated and the recently inserted HERVs exhibit a high level of instability. Throughout this study we have observed the frequent presence of additional forms of genotypes for these HERV insertions, and we propose for the first time the establishment of new genotype reporting nomenclature to reflect all possible combinations of the pre-integration site, solo-LTR and full-length HERV alleles.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.