21 resultados para Selection Algorithms
em Brock University, Canada
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
Many real-world optimization problems contain multiple (often conflicting) goals to be optimized concurrently, commonly referred to as multi-objective problems (MOPs). Over the past few decades, a plethora of multi-objective algorithms have been proposed, often tested on MOPs possessing two or three objectives. Unfortunately, when tasked with solving MOPs with four or more objectives, referred to as many-objective problems (MaOPs), a large majority of optimizers experience significant performance degradation. The downfall of these optimizers is that simultaneously maintaining a well-spread set of solutions along with appropriate selection pressure to converge becomes difficult as the number of objectives increase. This difficulty is further compounded for large-scale MaOPs, i.e., MaOPs possessing large amounts of decision variables. In this thesis, we explore the challenges of many-objective optimization and propose three new promising algorithms designed to efficiently solve MaOPs. Experimental results demonstrate the proposed optimizers to perform very well, often outperforming state-of-the-art many-objective algorithms.
Resumo:
This research attempted to address the question of the role of explicit algorithms and episodic contexts in the acquisition of computational procedures for regrouping in subtraction. Three groups of students having difficulty learning to subtract with regrouping were taught procedures for doing so through either an explicit algorithm, an episodic content or an examples approach. It was hypothesized that the use of an explicit algorithm represented in a flow chart format would facilitate the acquisition and retention of specific procedural steps relative to the other two conditions. On the other hand, the use of paragraph stories to create episodic content was expected to facilitate the retrieval of algorithms, particularly in a mixed presentation format. The subjects were tested on similar, near, and far transfer questions over a four-day period. Near and far transfer algorithms were also introduced on Day Two. The results suggested that both explicit and episodic context facilitate performance on questions requiring subtraction with regrouping. However, the differential effects of these two approaches on near and far transfer questions were not as easy to identify. Explicit algorithms may facilitate the acquisition of specific procedural steps while at the same time inhibiting the application of such steps to transfer questions. Similarly, the value of episodic context in cuing the retrieval of an algorithm may be limited by the ability of a subject to identify and classify a new question as an exemplar of a particular episodically deflned problem type or category. The implications of these findings in relation to the procedures employed in the teaching of Mathematics to students with learning problems are discussed in detail.
Resumo:
How does fire affect the plant and animal community of the boreal forest? This study attempted to examine the changes in plant composition and productivity, and small mammal demography brought about by fire in the northern boreal environment at Chick Lake, N.W.T. (65053fN, 128°14,W). Two 5*6 ha plots measuring 375m x 150m were selected for study during the summers of 1973 and 197^. One had been unburned for 120 years, the other was part of a fire which burned in the spring of 1969. Grids of 15m x 15m were established in each plot and meter square quadrats taken at each of the 250 grid intersections in order to determine plant composition and density. Aerial primary production was assessed by clipping and drying 80 samples of terminal new production for each species under investigation. Small mammal populations were sampled by placing a Sherman live trap at each grid intersection for ten days in every month. The two plots were similar in plant species composition which suggested that most regrowth in the burned area was from rootstocks which survived the fire. The plant data were submitted to a cluster analysis that revealed nine separate species associations, six of which occured in the burned area and eight of which occured in the control. These were subsequently treated as habitats for purposes of comparison with small mammal distributions. The burned area showed a greater productivity in flowers and fruits although total productivity in the control area was higher due to a large contribution from the non-vascular component. Maximum aerial productivity as dry wieght was measured at 157.1 g/m and 207.8 g/m for the burn and control respectively. Microtus pennsylvanicus and Clethrionomys rutilus were the two most common small mammals encountered; Microtus xanthognathus, Synaptomys borealis, and Phenacomys intermedius also occured in the area. Populations of M. pennsylvanicus and C. rutilus were high during the summer of 1973; however, M. pennsylvanicus was rare on the control but abundant on the burn, while C. rutilus was rare on the burn but abundant in the control. During the summer of 197^ populations declined, with the result that few voles of any species were caught in the burn while equal numbers of the two species were caught in the control. During the summer of 1973 M. pennsylvanicus showed a positive association to the most productive habitat type in the burn which was avoided by C. rutilus. In the control £• rutilus showed a similar positive association to the most productive habitat type which was avoided by M. pennsylvanicus. In all cases for the high population year of 1973# the two species never overlapped in habitat preference. When populations declined in 197^f "both species showed a strong association for the most productive habitat in the control. This would suggest that during a high population year, an abundant species can exclude competitors from a chosen habitat, but that this dominance decreases as population levels decrease. It is possible that M. pennsylvanicus is a more efficient competitor in a recently burned environment, while C. rutilus assumes this role once non-vascular regrowth becomes extensive.
Resumo:
A. strain of Drosophila melanog-aster deficient in null amylase activity (Amylase ) was isolated from a wild null population of flies. The survivorship of Amylase homozygous flies is very low when the principal dietary carbohydrate source is starch. However, the survivorship of the null Amylase genotype is comparable to the wild type when the dietary starch is replaced by glucose. In addition, the null viability of the amylase-producing and Amylase strains is comparable v and very lm<] f on a medium with no carbohydrates . Furthermore, amylase-producing genotypes were shovm to excrete enzymatically active amylase protein into the food medium. The excreted amylase causes the external breakdown of dietary starch to sugar. These results led to the following null prediction: the viability of the A.mvlase genotype (fed on a starch rich diet) might increase in the presence of individuals which were amylase-producing. It was shown experimentally that such an increase in viability did in fact occur and that this increase v\Tas proportional to the number of mnylase..::producing fli.es present. These results provide a unique example of a non-"competi ti ve inter-genotype interaction, and one where the underlying physio~ logical and biochemical mechanism has been fully understood.
Resumo:
The (n, k)-star interconnection network was proposed in 1995 as an attractive alternative to the n-star topology in parallel computation. The (n, k )-star has significant advantages over the n-star which itself was proposed as an attractive alternative to the popular hypercube. The major advantage of the (n, k )-star network is its scalability, which makes it more flexible than the n-star as an interconnection network. In this thesis, we will focus on finding graph theoretical properties of the (n, k )-star as well as developing parallel algorithms that run on this network. The basic topological properties of the (n, k )-star are first studied. These are useful since they can be used to develop efficient algorithms on this network. We then study the (n, k )-star network from algorithmic point of view. Specifically, we will investigate both fundamental and application algorithms for basic communication, prefix computation, and sorting, etc. A literature review of the state-of-the-art in relation to the (n, k )-star network as well as some open problems in this area are also provided.
Resumo:
Bioinformatics applies computers to problems in molecular biology. Previous research has not addressed edit metric decoders. Decoders for quaternary edit metric codes are finding use in bioinformatics problems with applications to DNA. By using side effect machines we hope to be able to provide efficient decoding algorithms for this open problem. Two ideas for decoding algorithms are presented and examined. Both decoders use Side Effect Machines(SEMs) which are generalizations of finite state automata. Single Classifier Machines(SCMs) use a single side effect machine to classify all words within a code. Locking Side Effect Machines(LSEMs) use multiple side effect machines to create a tree structure of subclassification. The goal is to examine these techniques and provide new decoders for existing codes. Presented are ideas for best practices for the creation of these two types of new edit metric decoders.
Resumo:
The (n, k)-arrangement interconnection topology was first introduced in 1992. The (n, k )-arrangement graph is a class of generalized star graphs. Compared with the well known n-star, the (n, k )-arrangement graph is more flexible in degree and diameter. However, there are few algorithms designed for the (n, k)-arrangement graph up to present. In this thesis, we will focus on finding graph theoretical properties of the (n, k)- arrangement graph and developing parallel algorithms that run on this network. The topological properties of the arrangement graph are first studied. They include the cyclic properties. We then study the problems of communication: broadcasting and routing. Embedding problems are also studied later on. These are very useful to develop efficient algorithms on this network. We then study the (n, k )-arrangement network from the algorithmic point of view. Specifically, we will investigate both fundamental and application algorithms such as prefix sums computation, sorting, merging and basic geometry computation: finding convex hull on the (n, k )-arrangement graph. A literature review of the state-of-the-art in relation to the (n, k)-arrangement network is also provided, as well as some open problems in this area.
Resumo:
The hyper-star interconnection network was proposed in 2002 to overcome the drawbacks of the hypercube and its variations concerning the network cost, which is defined by the product of the degree and the diameter. Some properties of the graph such as connectivity, symmetry properties, embedding properties have been studied by other researchers, routing and broadcasting algorithms have also been designed. This thesis studies the hyper-star graph from both the topological and algorithmic point of view. For the topological properties, we try to establish relationships between hyper-star graphs with other known graphs. We also give a formal equation for the surface area of the graph. Another topological property we are interested in is the Hamiltonicity problem of this graph. For the algorithms, we design an all-port broadcasting algorithm and a single-port neighbourhood broadcasting algorithm for the regular form of the hyper-star graphs. These algorithms are both optimal time-wise. Furthermore, we prove that the folded hyper-star, a variation of the hyper-star, to be maixmally fault-tolerant.
Resumo:
Hub location problem is an NP-hard problem that frequently arises in the design of transportation and distribution systems, postal delivery networks, and airline passenger flow. This work focuses on the Single Allocation Hub Location Problem (SAHLP). Genetic Algorithms (GAs) for the capacitated and uncapacitated variants of the SAHLP based on new chromosome representations and crossover operators are explored. The GAs is tested on two well-known sets of real-world problems with up to 200 nodes. The obtained results are very promising. For most of the test problems the GA obtains improved or best-known solutions and the computational time remains low. The proposed GAs can easily be extended to other variants of location problems arising in network design planning in transportation systems.
Resumo:
One of the most common bee genera in the Niagara Region, the genus Ceratina (Hymenoptera: Apidae) is composed of four species, C. dupla, C. calcarata, the very rare C. strenua, and a previously unknown species provisionally named C. near dupla. The primary goal of this thesis was to investigate how these closely related species coexist with one another in the Niagara ~ee community. The first necessary step was to describe and compare the nesting biologies and life histories of the three most common species, C. dupla, C. calcarata and the new C. near dupla, which was conducted in 2008 via nest collections and pan trapping. Ceratina dupla and C. calcarata were common, each comprising 49% of the population, while C. near dupla was rare, comprising only 2% of the population. Ceratina dupla and C. near dupla both nested more commonly in teasel (Dipsacus sp.) in the sun, occasionally in raspberry (Rubus sp.) in the shade, and never in shady sumac (Rhus sp.), while C. calcarata nested most commonly in raspberry and sumac (shaded) and occasionally in teasel (sunny). Ceratina near dupla differed from both C. dupla and C. calcarata in that it appeared to be partially bivoltine, with some females founding nests very early and then again very late in the season. To examine the interactions and possible competition for nests that may be taking place between C. dupla and C. calcarata, a nest choice experiment was conducted in 2009. This experiment allowed both species to choose among twigs from all three substrates in the sun and in the shade. I then compared the results from 2008 (where bees chose from what was available), to where they nested when given all options (2009 experiment). Both C. dupla and C. calcarata had the same preferences for microhabitat and nest substrate in 2009, that being raspberry and sumac twigs in the sun. As that microhabitat and nest substrate combination is extremely rare in nature, both species must make a choice. In nature Ceratina dupla nests more often in the preferred microhabitat (sun), while C. calcarata nests in the preferred substrate (raspberry). Nesting in the shade also leads to smaller clutch sizes, higher parasitism and lower numbers of live brood in C. calcarata, suggesting that C. dupla may be outcompeting C. calcarata for the sunny nesting sites. The development and host preferences of Ceratina parasitoids were also examined. Ceratina species in Niagara were parasitized by no less than eight species of arthropod. Six of these were wasps from the superfamily Chalcidoidea (Hymenoptera), one was a wasp from the family Ichneumonidae (Hymenoptera) and one was a physogastric mite from the family Pyemotidae (Acari). Parasites shared a wide range of developmental strategies, from ichneumonid larvae that needed to consume multiple Ceratina immatures to complete development, to the species from the Eulophidae (Baryscapus) and Encyrtidae (Coelopencyrtus), in which multiple individuals completed development inside a single Ceratina host. Biological data on parasitoids is scarce in the scientific literature, and this Chapter documents these interactions for future research.
Resumo:
The main focus of this thesis is to evaluate and compare Hyperbalilearning algorithm (HBL) to other learning algorithms. In this work HBL is compared to feed forward artificial neural networks using back propagation learning, K-nearest neighbor and 103 algorithms. In order to evaluate the similarity of these algorithms, we carried out three experiments using nine benchmark data sets from UCI machine learning repository. The first experiment compares HBL to other algorithms when sample size of dataset is changing. The second experiment compares HBL to other algorithms when dimensionality of data changes. The last experiment compares HBL to other algorithms according to the level of agreement to data target values. Our observations in general showed, considering classification accuracy as a measure, HBL is performing as good as most ANn variants. Additionally, we also deduced that HBL.:s classification accuracy outperforms 103's and K-nearest neighbour's for the selected data sets.
Resumo:
Hub Location Problems play vital economic roles in transportation and telecommunication networks where goods or people must be efficiently transferred from an origin to a destination point whilst direct origin-destination links are impractical. This work investigates the single allocation hub location problem, and proposes a genetic algorithm (GA) approach for it. The effectiveness of using a single-objective criterion measure for the problem is first explored. Next, a multi-objective GA employing various fitness evaluation strategies such as Pareto ranking, sum of ranks, and weighted sum strategies is presented. The effectiveness of the multi-objective GA is shown by comparison with an Integer Programming strategy, the only other multi-objective approach found in the literature for this problem. Lastly, two new crossover operators are proposed and an empirical study is done using small to large problem instances of the Civil Aeronautics Board (CAB) and Australian Post (AP) data sets.