5 resultados para correlation-based feature selection

em Brock University, Canada


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection plays an important role in knowledge discovery and data mining nowadays. In traditional rough set theory, feature selection using reduct - the minimal discerning set of attributes - is an important area. Nevertheless, the original definition of a reduct is restrictive, so in one of the previous research it was proposed to take into account not only the horizontal reduction of information by feature selection, but also a vertical reduction considering suitable subsets of the original set of objects. Following the work mentioned above, a new approach to generate bireducts using a multi--objective genetic algorithm was proposed. Although the genetic algorithms were used to calculate reduct in some previous works, we did not find any work where genetic algorithms were adopted to calculate bireducts. Compared to the works done before in this area, the proposed method has less randomness in generating bireducts. The genetic algorithm system estimated a quality of each bireduct by values of two objective functions as evolution progresses, so consequently a set of bireducts with optimized values of these objectives was obtained. Different fitness evaluation methods and genetic operators, such as crossover and mutation, were applied and the prediction accuracies were compared. Five datasets were used to test the proposed method and two datasets were used to perform a comparison study. Statistical analysis using the one-way ANOVA test was performed to determine the significant difference between the results. The experiment showed that the proposed method was able to reduce the number of bireducts necessary in order to receive a good prediction accuracy. Also, the influence of different genetic operators and fitness evaluation strategies on the prediction accuracy was analyzed. It was shown that the prediction accuracies of the proposed method are comparable with the best results in machine learning literature, and some of them outperformed it.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sweat bees exhibit a range of social behaviours, from solitary nesting, in which no workers are produced, to strong eusociality, in which workers exhibit a high degree of altruism, behaviour that is measured by the degree of personal reproductive sacrifice. Field studies were carried out for seven weeks during May-June 2000 in southern Greece in order to investigate intraspecific social variation, and test the hypothesis of a north-south cline of decreasing eusociality in the obligately eusocial sweat bee L. (E.) malachurum. A comparative study, using principal components analysis, was performed to determine if patterns of intraspecific social variation in L. malachurum reflect the patterns of social variation within the subgenus, Evylaeus, as a whole. The results of the field study reveal that, in Greece, two worker broods were produced followed by a third brood consisting of gynes, males and some workers, indicating that there was an overlap in worker and gyne production. There was strong caste distinction between queens and workers. Workers actively foraged and participated in nest construction as most workers (58%, n=303) had a high degree of mandibular wear. Workers did not participate in the oviposition of Brood 3 gynes since only 0.7% (n=278) of workers were mated. Furthermore, queen survival until the end of Brood 3 and a substantial size differential of 10.6% between queens and workers suggested that queen domination over worker behaviour during the early to mid-part of the colony cycle was plausible. Male production in Brood 3 by some workers was likely, since the timing of worker ovarian development corresponded with the timing of male production. These findings suggest that workers of the first two broods were primarily altruistic, but some (28%) Brood 1 (9%) and Brood 2 (19%) workers produced males, indicating that the degree of altruistic behaviour declined during the lifetime of the colony. In comparison with other L. malachurum populations in Europe, the Greek population of L. malachurum had a weaker social level as a result of the higher proportion of workers potentially involved in male production, thus 3 supporting the hypothesis of a southerly cline of decreasing eusociality. Furthermore, intraspecific variation in social level across Europe appears to be due to longer breeding seasons in more southerly locations that would promote the production of larger colonies and provide opportunities for workers to evade queen control. The comparative study using principal components analysis on 20 solitary (of the subgenera Evylaeus and Lasioglossum), eusocial and socially polymorphic Evylaeus species and populations reveals that six traits are closely associated with stronger eusociality in Evylaeus. These traits are: (1) a reduction in the proportion of males in the early brood(s); (2) a reduction in the proportion of females that mate; (3) an increase in the mean number of first brood workers; (4) a reduction in the proportion of females with developed ovaries; (5) an increase in size dimorphism between castes, and (6) nest guarding. These are traits that most significantly define principal component one and therefore distinguish social type as indicated by a clear separation of the eusocial and the solitary populations, with a socially polymorphic species falling in between. Furthermore, most of these traits are under foundress control and may suggest that the evolutionary loss or gain of eusociality is based on selection pressures on a founding female. Colony size and female ovarian development are common factors distinguishing social variation in L. malachurum and within the subgenus as a whole. The principal components analysis excluding the solitary species and the socially aberrant L. marginatum populations show the L. malachurum populations separated based on an increasing proportion of workers with developed ovaries as populations are found more south, lending further support to the hypothesis of a north-south cline of decreasing eusociality.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A feature-based fitness function is applied in a genetic programming system to synthesize stochastic gene regulatory network models whose behaviour is defined by a time course of protein expression levels. Typically, when targeting time series data, the fitness function is based on a sum-of-errors involving the values of the fluctuating signal. While this approach is successful in many instances, its performance can deteriorate in the presence of noise. This thesis explores a fitness measure determined from a set of statistical features characterizing the time series' sequence of values, rather than the actual values themselves. Through a series of experiments involving symbolic regression with added noise and gene regulatory network models based on the stochastic 'if-calculus, it is shown to successfully target oscillating and non-oscillating signals. This practical and versatile fitness function offers an alternate approach, worthy of consideration for use in algorithms that evaluate noisy or stochastic behaviour.