744 resultados para 380305 Knowledge Representation and Machine Learning
Resumo:
Cette thèse envisage un ensemble de méthodes permettant aux algorithmes d'apprentissage statistique de mieux traiter la nature séquentielle des problèmes de gestion de portefeuilles financiers. Nous débutons par une considération du problème général de la composition d'algorithmes d'apprentissage devant gérer des tâches séquentielles, en particulier celui de la mise-à-jour efficace des ensembles d'apprentissage dans un cadre de validation séquentielle. Nous énumérons les desiderata que des primitives de composition doivent satisfaire, et faisons ressortir la difficulté de les atteindre de façon rigoureuse et efficace. Nous poursuivons en présentant un ensemble d'algorithmes qui atteignent ces objectifs et présentons une étude de cas d'un système complexe de prise de décision financière utilisant ces techniques. Nous décrivons ensuite une méthode générale permettant de transformer un problème de décision séquentielle non-Markovien en un problème d'apprentissage supervisé en employant un algorithme de recherche basé sur les K meilleurs chemins. Nous traitons d'une application en gestion de portefeuille où nous entraînons un algorithme d'apprentissage à optimiser directement un ratio de Sharpe (ou autre critère non-additif incorporant une aversion au risque). Nous illustrons l'approche par une étude expérimentale approfondie, proposant une architecture de réseaux de neurones spécialisée à la gestion de portefeuille et la comparant à plusieurs alternatives. Finalement, nous introduisons une représentation fonctionnelle de séries chronologiques permettant à des prévisions d'être effectuées sur un horizon variable, tout en utilisant un ensemble informationnel révélé de manière progressive. L'approche est basée sur l'utilisation des processus Gaussiens, lesquels fournissent une matrice de covariance complète entre tous les points pour lesquels une prévision est demandée. Cette information est utilisée à bon escient par un algorithme qui transige activement des écarts de cours (price spreads) entre des contrats à terme sur commodités. L'approche proposée produit, hors échantillon, un rendement ajusté pour le risque significatif, après frais de transactions, sur un portefeuille de 30 actifs.
Resumo:
Quand le E-learning a émergé il ya 20 ans, cela consistait simplement en un texte affiché sur un écran d'ordinateur, comme un livre. Avec les changements et les progrès dans la technologie, le E-learning a parcouru un long chemin, maintenant offrant un matériel éducatif personnalisé, interactif et riche en contenu. Aujourd'hui, le E-learning se transforme de nouveau. En effet, avec la prolifération des systèmes d'apprentissage électronique et des outils d'édition de contenu éducatif, ainsi que les normes établies, c’est devenu plus facile de partager et de réutiliser le contenu d'apprentissage. En outre, avec le passage à des méthodes d'enseignement centrées sur l'apprenant, en plus de l'effet des techniques et technologies Web2.0, les apprenants ne sont plus seulement les récipiendaires du contenu d'apprentissage, mais peuvent jouer un rôle plus actif dans l'enrichissement de ce contenu. Par ailleurs, avec la quantité d'informations que les systèmes E-learning peuvent accumuler sur les apprenants, et l'impact que cela peut avoir sur leur vie privée, des préoccupations sont soulevées afin de protéger la vie privée des apprenants. Au meilleur de nos connaissances, il n'existe pas de solutions existantes qui prennent en charge les différents problèmes soulevés par ces changements. Dans ce travail, nous abordons ces questions en présentant Cadmus, SHAREK, et le E-learning préservant la vie privée. Plus précisément, Cadmus est une plateforme web, conforme au standard IMS QTI, offrant un cadre et des outils adéquats pour permettre à des tuteurs de créer et partager des questions de tests et des examens. Plus précisément, Cadmus fournit des modules telles que EQRS (Exam Question Recommender System) pour aider les tuteurs à localiser des questions appropriées pour leur examens, ICE (Identification of Conflits in Exams) pour aider à résoudre les conflits entre les questions contenu dans un même examen, et le Topic Tree, conçu pour aider les tuteurs à mieux organiser leurs questions d'examen et à assurer facilement la couverture des différent sujets contenus dans les examens. D'autre part, SHAREK (Sharing REsources and Knowledge) fournit un cadre pour pouvoir profiter du meilleur des deux mondes : la solidité des systèmes E-learning et la flexibilité de PLE (Personal Learning Environment) tout en permettant aux apprenants d'enrichir le contenu d'apprentissage, et les aider à localiser nouvelles ressources d'apprentissage. Plus précisément, SHAREK combine un système recommandation multicritères, ainsi que des techniques et des technologies Web2.0, tels que le RSS et le web social, pour promouvoir de nouvelles ressources d'apprentissage et aider les apprenants à localiser du contenu adapté. Finalement, afin de répondre aux divers besoins de la vie privée dans le E-learning, nous proposons un cadre avec quatre niveaux de vie privée, ainsi que quatre niveaux de traçabilité. De plus, nous présentons ACES (Anonymous Credentials for E-learning Systems), un ensemble de protocoles, basés sur des techniques cryptographiques bien établies, afin d'aider les apprenants à atteindre leur niveau de vie privée désiré.
Resumo:
Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.
Predicting sense of community and participation by applying machine learning to open government data
Resumo:
Community capacity is used to monitor socio-economic development. It is composed of a number of dimensions, which can be measured to understand the possible issues in the implementation of a policy or the outcome of a project targeting a community. Measuring community capacity dimensions is usually expensive and time consuming, requiring locally organised surveys. Therefore, we investigate a technique to estimate them by applying the Random Forests algorithm on secondary open government data. This research focuses on the prediction of measures for two dimensions: sense of community and participation. The most important variables for this prediction were determined. The variables included in the datasets used to train the predictive models complied with two criteria: nationwide availability; sufficiently fine-grained geographic breakdown, i.e. neighbourhood level. The models explained 77% of the sense of community measures and 63% of participation. Due to the low geographic detail of the outcome measures available, further research is required to apply the predictive models to a neighbourhood level. The variables that were found to be more determinant for prediction were only partially in agreement with the factors that, according to the social science literature consulted, are the most influential for sense of community and participation. This finding should be further investigated from a social science perspective, in order to be understood in depth.
Resumo:
The proposal presented in this thesis is to provide designers of knowledge based supervisory systems of dynamic systems with a framework to facilitate their tasks avoiding interface problems among tools, data flow and management. The approach is thought to be useful to both control and process engineers in assisting their tasks. The use of AI technologies to diagnose and perform control loops and, of course, assist process supervisory tasks such as fault detection and diagnose, are in the scope of this work. Special effort has been put in integration of tools for assisting expert supervisory systems design. With this aim the experience of Computer Aided Control Systems Design (CACSD) frameworks have been analysed and used to design a Computer Aided Supervisory Systems (CASSD) framework. In this sense, some basic facilities are required to be available in this proposed framework: ·
Resumo:
Negative correlations between task performance in dynamic control tasks and verbalizable knowledge, as assessed by a post-task questionnaire, have been interpreted as dissociations that indicate two antagonistic modes of learning, one being “explicit”, the other “implicit”. This paper views the control tasks as finite-state automata and offers an alternative interpretation of these negative correlations. It is argued that “good controllers” observe fewer different state transitions and, consequently, can answer fewer post-task questions about system transitions than can “bad controllers”. Two experiments demonstrate the validity of the argument by showing the predicted negative relationship between control performance and the number of explored state transitions, and the predicted positive relationship between the number of explored state transitions and questionnaire scores. However, the experiments also elucidate important boundary conditions for the critical effects. We discuss the implications of these findings, and of other problems arising from the process control paradigm, for conclusions about implicit versus explicit learning processes.
Resumo:
Automatic indexing and retrieval of digital data poses major challenges. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions, or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. For a number of years research has been ongoing in the field of ontological engineering with the aim of using ontologies to add such (meta) knowledge to information. In this paper, we describe the architecture of a system (Dynamic REtrieval Analysis and semantic metadata Management (DREAM)) designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval. The DREAM Demonstrator has been evaluated as deployed in the film post-production phase to support the process of storage, indexing and retrieval of large data sets of special effects video clips as an exemplar application domain. This paper provides its performance and usability results and highlights the scope for future enhancements of the DREAM architecture which has proven successful in its first and possibly most challenging proving ground, namely film production, where it is already in routine use within our test bed Partners' creative processes. (C) 2009 Published by Elsevier B.V.
Resumo:
One of the most pervading concepts underlying computational models of information processing in the brain is linear input integration of rate coded uni-variate information by neurons. After a suitable learning process this results in neuronal structures that statically represent knowledge as a vector of real valued synaptic weights. Although this general framework has contributed to the many successes of connectionism, in this paper we argue that for all but the most basic of cognitive processes, a more complex, multi-variate dynamic neural coding mechanism is required - knowledge should not be spacially bound to a particular neuron or group of neurons. We conclude the paper with discussion of a simple experiment that illustrates dynamic knowledge representation in a spiking neuron connectionist system.
Resumo:
Species` potential distribution modelling consists of building a representation of the fundamental ecological requirements of a species from biotic and abiotic conditions where the species is known to occur. Such models can be valuable tools to understand the biogeography of species and to support the prediction of its presence/absence considering a particular environment scenario. This paper investigates the use of different supervised machine learning techniques to model the potential distribution of 35 plant species from Latin America. Each technique was able to extract a different representation of the relations between the environmental conditions and the distribution profile of the species. The experimental results highlight the good performance of random trees classifiers, indicating this particular technique as a promising candidate for modelling species` potential distribution. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The main purpose of this thesis project is to prediction of symptom severity and cause in data from test battery of the Parkinson’s disease patient, which is based on data mining. The collection of the data is from test battery on a hand in computer. We use the Chi-Square method and check which variables are important and which are not important. Then we apply different data mining techniques on our normalize data and check which technique or method gives good results.The implementation of this thesis is in WEKA. We normalize our data and then apply different methods on this data. The methods which we used are Naïve Bayes, CART and KNN. We draw the Bland Altman and Spearman’s Correlation for checking the final results and prediction of data. The Bland Altman tells how the percentage of our confident level in this data is correct and Spearman’s Correlation tells us our relationship is strong. On the basis of results and analysis we see all three methods give nearly same results. But if we see our CART (J48 Decision Tree) it gives good result of under predicted and over predicted values that’s lies between -2 to +2. The correlation between the Actual and Predicted values is 0,794in CART. Cause gives the better percentage classification result then disability because it can use two classes.
Resumo:
Developing successful navigation and mapping strategies is an essential part of autonomous robot research. However, hardware limitations often make for inaccurate systems. This project serves to investigate efficient alternatives to mapping an environment, by first creating a mobile robot, and then applying machine learning to the robot and controlling systems to increase the robustness of the robot system. My mapping system consists of a semi-autonomous robot drone in communication with a stationary Linux computer system. There are learning systems running on both the robot and the more powerful Linux system. The first stage of this project was devoted to designing and building an inexpensive robot. Utilizing my prior experience from independent studies in robotics, I designed a small mobile robot that was well suited for simple navigation and mapping research. When the major components of the robot base were designed, I began to implement my design. This involved physically constructing the base of the robot, as well as researching and acquiring components such as sensors. Implementing the more complex sensors became a time-consuming task, involving much research and assistance from a variety of sources. A concurrent stage of the project involved researching and experimenting with different types of machine learning systems. I finally settled on using neural networks as the machine learning system to incorporate into my project. Neural nets can be thought of as a structure of interconnected nodes, through which information filters. The type of neural net that I chose to use is a type that requires a known set of data that serves to train the net to produce the desired output. Neural nets are particularly well suited for use with robotic systems as they can handle cases that lie at the extreme edges of the training set, such as may be produced by "noisy" sensor data. Through experimenting with available neural net code, I became familiar with the code and its function, and modified it to be more generic and reusable for multiple applications of neural nets.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Background: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.Results: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.Conclusions: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.
Resumo:
Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.