923 resultados para Classification errors


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP's problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages for object classification is presented. Two main GP approaches for visual pattern classification, namely the block-classifiers and the pixel-classifiers, were studied. Results showed that the pixel-classifiers generally performed better. Using these results, a suitable language was selected for the real-time implementation. Synthetic video data was used in the experiments. The goal of the experiments was to evolve a unique classifier for each texture pattern that existed in the video. The experiments revealed that the system was capable of correctly tracking the textures in the video. The performance of the system was on-par with real-time requirements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This note investigates the adequacy of the finite-sample approximation provided by the Functional Central Limit Theorem (FCLT) when the errors are allowed to be dependent. We compare the distribution of the scaled partial sums of some data with the distribution of the Wiener process to which it converges. Our setup is purposely very simple in that it considers data generated from an ARMA(1,1) process. Yet, this is sufficient to bring out interesting conclusions about the particular elements which cause the approximations to be inadequate in even quite large sample sizes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose exact likelihood-based mean-variance efficiency tests of the market portfolio in the context of Capital Asset Pricing Model (CAPM), allowing for a wide class of error distributions which include normality as a special case. These tests are developed in the frame-work of multivariate linear regressions (MLR). It is well known however that despite their simple statistical structure, standard asymptotically justified MLR-based tests are unreliable. In financial econometrics, exact tests have been proposed for a few specific hypotheses [Jobson and Korkie (Journal of Financial Economics, 1982), MacKinlay (Journal of Financial Economics, 1987), Gib-bons, Ross and Shanken (Econometrica, 1989), Zhou (Journal of Finance 1993)], most of which depend on normality. For the gaussian model, our tests correspond to Gibbons, Ross and Shanken’s mean-variance efficiency tests. In non-gaussian contexts, we reconsider mean-variance efficiency tests allowing for multivariate Student-t and gaussian mixture errors. Our framework allows to cast more evidence on whether the normality assumption is too restrictive when testing the CAPM. We also propose exact multivariate diagnostic checks (including tests for multivariate GARCH and mul-tivariate generalization of the well known variance ratio tests) and goodness of fit tests as well as a set estimate for the intervening nuisance parameters. Our results [over five-year subperiods] show the following: (i) multivariate normality is rejected in most subperiods, (ii) residual checks reveal no significant departures from the multivariate i.i.d. assumption, and (iii) mean-variance efficiency tests of the market portfolio is not rejected as frequently once it is allowed for the possibility of non-normal errors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This note develops general model-free adjustment procedures for the calculation of unbiased volatility loss functions based on practically feasible realized volatility benchmarks. The procedures, which exploit the recent asymptotic distributional results in Barndorff-Nielsen and Shephard (2002a), are both easy to implement and highly accurate in empirically realistic situations. On properly accounting for the measurement errors in the volatility forecast evaluations reported in Andersen, Bollerslev, Diebold and Labys (2003), the adjustments result in markedly higher estimates for the true degree of return-volatility predictability.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Affiliation: Centre Robert-Cedergren de l'Université de Montréal en bio-informatique et génomique & Département de biochimie, Université de Montréal

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Les employés d’un organisme utilisent souvent un schéma de classification personnel pour organiser les documents électroniques qui sont sous leur contrôle direct, ce qui suggère la difficulté pour d’autres employés de repérer ces documents et la perte possible de documentation pour l’organisme. Aucune étude empirique n’a été menée à ce jour afin de vérifier dans quelle mesure les schémas de classification personnels permettent, ou même facilitent, le repérage des documents électroniques par des tiers, dans le cadre d’un travail collaboratif par exemple, ou lorsqu’il s’agit de reconstituer un dossier. Le premier objectif de notre recherche était de décrire les caractéristiques de schémas de classification personnels utilisés pour organiser et classer des documents administratifs électroniques. Le deuxième objectif consistait à vérifier, dans un environnement contrôlé, les différences sur le plan de l’efficacité du repérage de documents électroniques qui sont fonction du schéma de classification utilisé. Nous voulions vérifier s’il était possible de repérer un document avec la même efficacité, quel que soit le schéma de classification utilisé pour ce faire. Une collecte de données en deux étapes fut réalisée pour atteindre ces objectifs. Nous avons d’abord identifié les caractéristiques structurelles, logiques et sémantiques de 21 schémas de classification utilisés par des employés de l’Université de Montréal pour organiser et classer les documents électroniques qui sont sous leur contrôle direct. Par la suite, nous avons comparé, à partir d'une expérimentation contrôlée, la capacité d’un groupe de 70 répondants à repérer des documents électroniques à l’aide de cinq schémas de classification ayant des caractéristiques structurelles, logiques et sémantiques variées. Trois variables ont été utilisées pour mesurer l’efficacité du repérage : la proportion de documents repérés, le temps moyen requis (en secondes) pour repérer les documents et la proportion de documents repérés dès le premier essai. Les résultats révèlent plusieurs caractéristiques structurelles, logiques et sémantiques communes à une majorité de schémas de classification personnels : macro-structure étendue, structure peu profonde, complexe et déséquilibrée, regroupement par thème, ordre alphabétique des classes, etc. Les résultats des tests d’analyse de la variance révèlent des différences significatives sur le plan de l’efficacité du repérage de documents électroniques qui sont fonction des caractéristiques structurelles, logiques et sémantiques du schéma de classification utilisé. Un schéma de classification caractérisé par une macro-structure peu étendue et une logique basée partiellement sur une division par classes d’activités augmente la probabilité de repérer plus rapidement les documents. Au plan sémantique, une dénomination explicite des classes (par exemple, par utilisation de définitions ou en évitant acronymes et abréviations) augmente la probabilité de succès au repérage. Enfin, un schéma de classification caractérisé par une macro-structure peu étendue, une logique basée partiellement sur une division par classes d’activités et une sémantique qui utilise peu d’abréviations augmente la probabilité de repérer les documents dès le premier essai.