952 resultados para Data recovery (Computer science)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hub Location Problems play vital economic roles in transportation and telecommunication networks where goods or people must be efficiently transferred from an origin to a destination point whilst direct origin-destination links are impractical. This work investigates the single allocation hub location problem, and proposes a genetic algorithm (GA) approach for it. The effectiveness of using a single-objective criterion measure for the problem is first explored. Next, a multi-objective GA employing various fitness evaluation strategies such as Pareto ranking, sum of ranks, and weighted sum strategies is presented. The effectiveness of the multi-objective GA is shown by comparison with an Integer Programming strategy, the only other multi-objective approach found in the literature for this problem. Lastly, two new crossover operators are proposed and an empirical study is done using small to large problem instances of the Civil Aeronautics Board (CAB) and Australian Post (AP) data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Remote sensing techniques involving hyperspectral imagery have applications in a number of sciences that study some aspects of the surface of the planet. The analysis of hyperspectral images is complex because of the large amount of information involved and the noise within that data. Investigating images with regard to identify minerals, rocks, vegetation and other materials is an application of hyperspectral remote sensing in the earth sciences. This thesis evaluates the performance of two classification and clustering techniques on hyperspectral images for mineral identification. Support Vector Machines (SVM) and Self-Organizing Maps (SOM) are applied as classification and clustering techniques, respectively. Principal Component Analysis (PCA) is used to prepare the data to be analyzed. The purpose of using PCA is to reduce the amount of data that needs to be processed by identifying the most important components within the data. A well-studied dataset from Cuprite, Nevada and a dataset of more complex data from Baffin Island were used to assess the performance of these techniques. The main goal of this research study is to evaluate the advantage of training a classifier based on a small amount of data compared to an unsupervised method. Determining the effect of feature extraction on the accuracy of the clustering and classification method is another goal of this research. This thesis concludes that using PCA increases the learning accuracy, and especially so in classification. SVM classifies Cuprite data with a high precision and the SOM challenges SVM on datasets with high level of noise (like Baffin Island).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DNA assembly is among the most fundamental and difficult problems in bioinformatics. Near optimal assembly solutions are available for bacterial and small genomes, however assembling large and complex genomes especially the human genome using Next-Generation-Sequencing (NGS) technologies is shown to be very difficult because of the highly repetitive and complex nature of the human genome, short read lengths, uneven data coverage and tools that are not specifically built for human genomes. Moreover, many algorithms are not even scalable to human genome datasets containing hundreds of millions of short reads. The DNA assembly problem is usually divided into several subproblems including DNA data error detection and correction, contig creation, scaffolding and contigs orientation; each can be seen as a distinct research area. This thesis specifically focuses on creating contigs from the short reads and combining them with outputs from other tools in order to obtain better results. Three different assemblers including SOAPdenovo [Li09], Velvet [ZB08] and Meraculous [CHS+11] are selected for comparative purposes in this thesis. Obtained results show that this thesis’ work produces comparable results to other assemblers and combining our contigs to outputs from other tools, produces the best results outperforming all other investigated assemblers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP's problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages for object classification is presented. Two main GP approaches for visual pattern classification, namely the block-classifiers and the pixel-classifiers, were studied. Results showed that the pixel-classifiers generally performed better. Using these results, a suitable language was selected for the real-time implementation. Synthetic video data was used in the experiments. The goal of the experiments was to evolve a unique classifier for each texture pattern that existed in the video. The experiments revealed that the system was capable of correctly tracking the textures in the video. The performance of the system was on-par with real-time requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Understanding the relationship between genetic diseases and the genes associated with them is an important problem regarding human health. The vast amount of data created from a large number of high-throughput experiments performed in the last few years has resulted in an unprecedented growth in computational methods to tackle the disease gene association problem. Nowadays, it is clear that a genetic disease is not a consequence of a defect in a single gene. Instead, the disease phenotype is a reflection of various genetic components interacting in a complex network. In fact, genetic diseases, like any other phenotype, occur as a result of various genes working in sync with each other in a single or several biological module(s). Using a genetic algorithm, our method tries to evolve communities containing the set of potential disease genes likely to be involved in a given genetic disease. Having a set of known disease genes, we first obtain a protein-protein interaction (PPI) network containing all the known disease genes. All the other genes inside the procured PPI network are then considered as candidate disease genes as they lie in the vicinity of the known disease genes in the network. Our method attempts to find communities of potential disease genes strongly working with one another and with the set of known disease genes. As a proof of concept, we tested our approach on 16 breast cancer genes and 15 Parkinson's Disease genes. We obtained comparable or better results than CIPHER, ENDEAVOUR and GPEC, three of the most reliable and frequently used disease-gene ranking frameworks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As a result of mutation in genes, which is a simple change in our DNA, we will have undesirable phenotypes which are known as genetic diseases or disorders. These small changes, which happen frequently, can have extreme results. Understanding and identifying these changes and associating these mutated genes with genetic diseases can play an important role in our health, by making us able to find better diagnosis and therapeutic strategies for these genetic diseases. As a result of years of experiments, there is a vast amount of data regarding human genome and different genetic diseases that they still need to be processed properly to extract useful information. This work is an effort to analyze some useful datasets and to apply different techniques to associate genes with genetic diseases. Two genetic diseases were studied here: Parkinson’s disease and breast cancer. Using genetic programming, we analyzed the complex network around known disease genes of the aforementioned diseases, and based on that we generated a ranking for genes, based on their relevance to these diseases. In order to generate these rankings, centrality measures of all nodes in the complex network surrounding the known disease genes of the given genetic disease were calculated. Using genetic programming, all the nodes were assigned scores based on the similarity of their centrality measures to those of the known disease genes. Obtained results showed that this method is successful at finding these patterns in centrality measures and the highly ranked genes are worthy as good candidate disease genes for being studied. Using standard benchmark tests, we tested our approach against ENDEAVOUR and CIPHER - two well known disease gene ranking frameworks - and we obtained comparable results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Classical relational databases lack proper ways to manage certain real-world situations including imprecise or uncertain data. Fuzzy databases overcome this limitation by allowing each entry in the table to be a fuzzy set where each element of the corresponding domain is assigned a membership degree from the real interval [0…1]. But this fuzzy mechanism becomes inappropriate in modelling scenarios where data might be incomparable. Therefore, we become interested in further generalization of fuzzy database into L-fuzzy database. In such a database, the characteristic function for a fuzzy set maps to an arbitrary complete Brouwerian lattice L. From the query language perspectives, the language of fuzzy database, FSQL extends the regular Structured Query Language (SQL) by adding fuzzy specific constructions. In addition to that, L-fuzzy query language LFSQL introduces appropriate linguistic operations to define and manipulate inexact data in an L-fuzzy database. This research mainly focuses on defining the semantics of LFSQL. However, it requires an abstract algebraic theory which can be used to prove all the properties of, and operations on, L-fuzzy relations. In our study, we show that the theory of arrow categories forms a suitable framework for that. Therefore, we define the semantics of LFSQL in the abstract notion of an arrow category. In addition, we implement the operations of L-fuzzy relations in Haskell and develop a parser that translates algebraic expressions into our implementation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lattice valued fuzziness is more general than crispness or fuzziness based on the unit interval. In this work, we present a query language for a lattice based fuzzy database. We define a Lattice Fuzzy Structured Query Language (LFSQL) taking its membership values from an arbitrary lattice L. LFSQL can handle, manage and represent crisp values, linear ordered membership degrees and also allows membership degrees from lattices with non-comparable values. This gives richer membership degrees, and hence makes LFSQL more flexible than FSQL or SQL. In order to handle vagueness or imprecise information, every entry into an L-fuzzy database is an L-fuzzy set instead of crisp values. All of this makes LFSQL an ideal query language to handle imprecise data where some factors are non-comparable. After defining the syntax of the language formally, we provide its semantics using L-fuzzy sets and relations. The semantics can be used in future work to investigate concepts such as functional dependencies. Last but not least, we present a parser for LFSQL implemented in Haskell.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection plays an important role in knowledge discovery and data mining nowadays. In traditional rough set theory, feature selection using reduct - the minimal discerning set of attributes - is an important area. Nevertheless, the original definition of a reduct is restrictive, so in one of the previous research it was proposed to take into account not only the horizontal reduction of information by feature selection, but also a vertical reduction considering suitable subsets of the original set of objects. Following the work mentioned above, a new approach to generate bireducts using a multi--objective genetic algorithm was proposed. Although the genetic algorithms were used to calculate reduct in some previous works, we did not find any work where genetic algorithms were adopted to calculate bireducts. Compared to the works done before in this area, the proposed method has less randomness in generating bireducts. The genetic algorithm system estimated a quality of each bireduct by values of two objective functions as evolution progresses, so consequently a set of bireducts with optimized values of these objectives was obtained. Different fitness evaluation methods and genetic operators, such as crossover and mutation, were applied and the prediction accuracies were compared. Five datasets were used to test the proposed method and two datasets were used to perform a comparison study. Statistical analysis using the one-way ANOVA test was performed to determine the significant difference between the results. The experiment showed that the proposed method was able to reduce the number of bireducts necessary in order to receive a good prediction accuracy. Also, the influence of different genetic operators and fitness evaluation strategies on the prediction accuracy was analyzed. It was shown that the prediction accuracies of the proposed method are comparable with the best results in machine learning literature, and some of them outperformed it.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Un certain nombre de théories pédagogiques ont été établies depuis plus de 20 ans. Elles font appel aux réactions de l’apprenant en situation d’apprentissage, mais aucune théorie pédagogique n’a pu décrire complètement un processus d’enseignement en tenant compte de toutes les réactions émotionnelles de l’apprenant. Nous souhaitons intégrer les émotions de l’apprenant dans ces processus d’apprentissage, car elles sont importantes dans les mécanismes d’acquisition de connaissances et dans la mémorisation. Récemment on a vu que le facteur émotionnel est considéré jouer un rôle très important dans les processus cognitifs. Modéliser les réactions émotionnelles d’un apprenant en cours du processus d’apprentissage est une nouveauté pour un Système Tutoriel Intelligent. Pour réaliser notre recherche, nous examinerons les théories pédagogiques qui n’ont pas considéré les émotions de l’apprenant. Jusqu’à maintenant, aucun Système Tutoriel Intelligent destiné à l’enseignement n’a incorporé la notion de facteur émotionnel pour un apprenant humain. Notre premier objectif est d’analyser quelques stratégies pédagogiques et de détecter les composantes émotionnelles qui peuvent y être ou non. Nous cherchons à déterminer dans cette analyse quel type de méthode didactique est utilisé, autrement dit, que fait le tuteur pour prévoir et aider l’apprenant à accomplir sa tâche d’apprentissage dans des conditions optimales. Le deuxième objectif est de proposer l’amélioration de ces méthodes en ajoutant les facteurs émotionnels. On les nommera des « méthodes émotionnelles ». Le dernier objectif vise à expérimenter le modèle d’une théorie pédagogique améliorée en ajoutant les facteurs émotionnels. Dans le cadre de cette recherche nous analyserons un certain nombre de théories pédagogiques, parmi lesquelles les théories de Robert Gagné, Jerome Bruner, Herbert J. Klausmeier et David Merrill, pour chercher à identifier les composantes émotionnelles. Aucune théorie pédagogique n’a mis l’accent sur les émotions au cours du processus d’apprentissage. Ces théories pédagogiques sont développées en tenant compte de plusieurs facteurs externes qui peuvent influencer le processus d’apprentissage. Nous proposons une approche basée sur la prédiction d’émotions qui est liée à de potentielles causes déclenchées par différents facteurs déterminants au cours du processus d’apprentissage. Nous voulons développer une technique qui permette au tuteur de traiter la réaction émotionnelle de l’apprenant à un moment donné au cours de son processus d’apprentissage et de l’inclure dans une méthode pédagogique. Pour atteindre le deuxième objectif de notre recherche, nous utiliserons un module tuteur apprenant basé sur le principe de l’éducation des émotions de l’apprenant, modèle qui vise premièrement sa personnalité et deuxièmement ses connaissances. Si on défini l’apprenant, on peut prédire ses réactions émotionnelles (positives ou négatives) et on peut s’assurer de la bonne disposition de l’apprenant, de sa coopération, sa communication et l’optimisme nécessaires à régler les problèmes émotionnels. Pour atteindre le troisième objectif, nous proposons une technique qui permet au tuteur de résoudre un problème de réaction émotionnelle de l’apprenant à un moment donné du processus d’apprentissage. Nous appliquerons cette technique à une théorie pédagogique. Pour cette première théorie, nous étudierons l’effet produit par certaines stratégies pédagogiques d’un tuteur virtuel au sujet de l’état émotionnel de l’apprenant, et pour ce faire, nous développerons une structure de données en ligne qu’un agent tuteur virtuel peut induire à l’apprenant des émotions positives. Nous analyserons les résultats expérimentaux en utilisant la première théorie et nous les comparerons ensuite avec trois autres théories que nous avons proposées d’étudier. En procédant de la sorte, nous atteindrons le troisième objectif de notre recherche, celui d’expérimenter un modèle d’une théorie pédagogique et de le comparer ensuite avec d’autres théories dans le but de développer ou d’améliorer les méthodes émotionnelles. Nous analyserons les avantages, mais aussi les insuffisances de ces théories par rapport au comportement émotionnel de l’apprenant. En guise de conclusion de cette recherche, nous retiendrons de meilleures théories pédagogiques ou bien nous suggérerons un moyen de les améliorer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Réalisé en cotutelle avec l'Université Bordeaux 1 (France)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

De nos jours, les logiciels doivent continuellement évoluer et intégrer toujours plus de fonctionnalités pour ne pas devenir obsolètes. C'est pourquoi, la maintenance représente plus de 60% du coût d'un logiciel. Pour réduire les coûts de programmation, les fonctionnalités sont programmées plus rapidement, ce qui induit inévitablement une baisse de qualité. Comprendre l’évolution du logiciel est donc devenu nécessaire pour garantir un bon niveau de qualité et retarder le dépérissement du code. En analysant à la fois les données sur l’évolution du code contenues dans un système de gestion de versions et les données quantitatives que nous pouvons déduire du code, nous sommes en mesure de mieux comprendre l'évolution du logiciel. Cependant, la quantité de données générées par une telle analyse est trop importante pour être étudiées manuellement et les méthodes d’analyses automatiques sont peu précises. Dans ce mémoire, nous proposons d'analyser ces données avec une méthode semi automatique : la visualisation. Eyes Of Darwin, notre système de visualisation en 3D, utilise une métaphore avec des quartiers et des bâtiments d'une ville pour visualiser toute l'évolution du logiciel sur une seule vue. De plus, il intègre un système de réduction de l'occlusion qui transforme l'écran de l'utilisateur en une fenêtre ouverte sur la scène en 3D qu'il affiche. Pour finir, ce mémoire présente une étude exploratoire qui valide notre approche.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La théorie de l'information quantique étudie les limites fondamentales qu'imposent les lois de la physique sur les tâches de traitement de données comme la compression et la transmission de données sur un canal bruité. Cette thèse présente des techniques générales permettant de résoudre plusieurs problèmes fondamentaux de la théorie de l'information quantique dans un seul et même cadre. Le théorème central de cette thèse énonce l'existence d'un protocole permettant de transmettre des données quantiques que le receveur connaît déjà partiellement à l'aide d'une seule utilisation d'un canal quantique bruité. Ce théorème a de plus comme corollaires immédiats plusieurs théorèmes centraux de la théorie de l'information quantique. Les chapitres suivants utilisent ce théorème pour prouver l'existence de nouveaux protocoles pour deux autres types de canaux quantiques, soit les canaux de diffusion quantiques et les canaux quantiques avec information supplémentaire fournie au transmetteur. Ces protocoles traitent aussi de la transmission de données quantiques partiellement connues du receveur à l'aide d'une seule utilisation du canal, et ont comme corollaires des versions asymptotiques avec et sans intrication auxiliaire. Les versions asymptotiques avec intrication auxiliaire peuvent, dans les deux cas, être considérées comme des versions quantiques des meilleurs théorèmes de codage connus pour les versions classiques de ces problèmes. Le dernier chapitre traite d'un phénomène purement quantique appelé verrouillage: il est possible d'encoder un message classique dans un état quantique de sorte qu'en lui enlevant un sous-système de taille logarithmique par rapport à sa taille totale, on puisse s'assurer qu'aucune mesure ne puisse avoir de corrélation significative avec le message. Le message se trouve donc «verrouillé» par une clé de taille logarithmique. Cette thèse présente le premier protocole de verrouillage dont le critère de succès est que la distance trace entre la distribution jointe du message et du résultat de la mesure et le produit de leur marginales soit suffisamment petite.