868 resultados para Data Mining


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les quatre principales activités de la gestion de risque thérapeutique comportent l’identification, l’évaluation, la minimisation, et la communication du risque. Ce mémoire aborde les problématiques liées à l’identification et à la minimisation du risque par la réalisation de deux études dont les objectifs sont de: 1) Développer et valider un outil de « data mining » pour la détection des signaux à partir des banques de données de soins de santé du Québec; 2) Effectuer une revue systématique afin de caractériser les interventions de minimisation de risque (IMR) ayant été implantées. L’outil de détection de signaux repose sur la méthode analytique du quotient séquentiel de probabilité (MaxSPRT) en utilisant des données de médicaments délivrés et de soins médicaux recueillis dans une cohorte rétrospective de 87 389 personnes âgées vivant à domicile et membres du régime d’assurance maladie du Québec entre les années 2000 et 2009. Quatre associations « médicament-événement indésirable (EI) » connues et deux contrôles « négatifs » ont été utilisés. La revue systématique a été faite à partir d’une revue de la littérature ainsi que des sites web de six principales agences réglementaires. La nature des RMIs ont été décrites et des lacunes de leur implémentation ont été soulevées. La méthode analytique a mené à la détection de signaux dans l'une des quatre combinaisons médicament-EI. Les principales contributions sont: a) Le premier outil de détection de signaux à partir des banques de données administratives canadiennes; b) Contributions méthodologiques par la prise en compte de l'effet de déplétion des sujets à risque et le contrôle pour l'état de santé du patient. La revue a identifié 119 IMRs dans la littérature et 1,112 IMRs dans les sites web des agences réglementaires. La revue a démontré qu’il existe une augmentation des IMRs depuis l’introduction des guides réglementaires en 2005 mais leur efficacité demeure peu démontrée.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

S’insérant dans les domaines de la Lecture et de l’Analyse de Textes Assistées par Ordinateur (LATAO), de la Gestion Électronique des Documents (GÉD), de la visualisation de l’information et, en partie, de l’anthropologie, cette recherche exploratoire propose l’expérimentation d’une méthodologie descriptive en fouille de textes afin de cartographier thématiquement un corpus de textes anthropologiques. Plus précisément, nous souhaitons éprouver la méthode de classification hiérarchique ascendante (CHA) pour extraire et analyser les thèmes issus de résumés de mémoires et de thèses octroyés de 1985 à 2009 (1240 résumés), par les départements d’anthropologie de l’Université de Montréal et de l’Université Laval, ainsi que le département d’histoire de l’Université Laval (pour les résumés archéologiques et ethnologiques). En première partie de mémoire, nous présentons notre cadre théorique, c'est-à-dire que nous expliquons ce qu’est la fouille de textes, ses origines, ses applications, les étapes méthodologiques puis, nous complétons avec une revue des principales publications. La deuxième partie est consacrée au cadre méthodologique et ainsi, nous abordons les différentes étapes par lesquelles ce projet fut conduit; la collecte des données, le filtrage linguistique, la classification automatique, pour en nommer que quelques-unes. Finalement, en dernière partie, nous présentons les résultats de notre recherche, en nous attardant plus particulièrement sur deux expérimentations. Nous abordons également la navigation thématique et les approches conceptuelles en thématisation, par exemple, en anthropologie, la dichotomie culture ̸ biologie. Nous terminons avec les limites de ce projet et les pistes d’intérêts pour de futures recherches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nous présentons dans cette thèse notre travail dans le domaine de la visualisation. Nous nous sommes intéressés au problème de la génération des bulletins météorologiques. Étant donné une masse énorme d’information générée par Environnement Canada et un utilisateur, il faut lui générer une visualisation personnalisée qui répond à ses besoins et à ses préférences. Nous avons développé MeteoVis, un générateur de bulletin météorologique. Comme nous avons peu d’information sur le profil de l’utilisateur, nous nous sommes basés sur les utilisateurs similaires pour lui calculer ses besoins et ses préférences. Nous utilisons l'apprentissage non supervisé pour regrouper les utilisateurs similaires. Nous calculons le taux de similarité des profils utilisateurs dans le même cluster pour pondérer les besoins et les préférences. Nous avons mené, avec l’aide d'utilisateurs n’ayant aucun rapport avec le projet, des expériences d'évaluation et de comparaison de notre outil par rapport à celui utilisé actuellement par Environnement Canada. Les résultats de cette évaluation montrent que les visualisation générées par MeteoVis sont de loin meilleures que les bulletins actuels préparés par EC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Triple quadrupole mass spectrometers coupled with high performance liquid chromatography are workhorses in quantitative bioanalyses. It provides substantial benefits including reproducibility, sensitivity and selectivity for trace analysis. Selected Reaction Monitoring allows targeted assay development but data sets generated contain very limited information. Data mining and analysis of non-targeted high-resolution mass spectrometry profiles of biological samples offer the opportunity to perform more exhaustive assessments, including quantitative and qualitative analysis. The objectives of this study was to test method precision and accuracy, statistically compare bupivacaine drug concentration in real study samples and verify if high resolution and accurate mass data collected in scan mode can actually permit retrospective data analysis, more specifically, extract metabolite related information. The precision and accuracy data presented using both instruments provided equivalent results. Overall, the accuracy was ranging from 106.2 to 113.2% and the precision observed was from 1.0 to 3.7%. Statistical comparisons using a linear regression between both methods reveal a coefficient of determination (R2) of 0.9996 and a slope of 1.02 demonstrating a very strong correlation between both methods. Individual sample comparison showed differences from -4.5% to 1.6% well within the accepted analytical error. Moreover, post acquisition extracted ion chromatograms at m/z 233.1648 ± 5 ppm (M-56) and m/z 305.2224 ± 5 ppm (M+16) revealed the presence of desbutyl-bupivacaine and three distinct hydroxylated bupivacaine metabolites. Post acquisition analysis allowed us to produce semiquantitative evaluations of the concentration-time profiles for bupicavaine metabolites.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les positions des évènements de recombinaison s’agrègent ensemble, formant des hotspots déterminés en partie par la protéine à évolution rapide PRDM9. En particulier, ces positions de hotspots sont déterminées par le domaine de doigts de zinc (ZnF) de PRDM9 qui reconnait certains motifs d’ADN. Les allèles de PRDM9 contenant le ZnF de type k ont été préalablement associés avec une cohorte de patients affectés par la leucémie aigüe lymphoblastique. Les allèles de PRDM9 sont difficiles à identifier à partir de données de séquençage de nouvelle génération (NGS), en raison de leur nature répétitive. Dans ce projet, nous proposons une méthode permettant la caractérisation d’allèles de PRDM9 à partir de données de NGS, qui identifie le nombre d’allèles contenant un type spécifique de ZnF. Cette méthode est basée sur la corrélation entre les profils représentant le nombre de séquences nucléotidiques uniques à chaque ZnF retrouvés chez les lectures de NGS simulées sans erreur d’une paire d’allèles et chez les lectures d’un échantillon. La validité des prédictions obtenues par notre méthode est confirmée grâce à analyse basée sur les simulations. Nous confirmons également que la méthode peut correctement identifier le génotype d’allèles de PRDM9 qui n’ont pas encore été identifiés. Nous conduisons une analyse préliminaire identifiant le génotype des allèles de PRDM9 contenant un certain type de ZnF dans une cohorte de patients atteints de glioblastomes multiforme pédiatrique, un cancer du cerveau caractérisé par les mutations récurrentes dans le gène codant pour l’histone H3, la cible de l’activité épigénétique de PRDM9. Cette méthode ouvre la possibilité d’identifier des associations entre certains allèles de PRDM9 et d’autres types de cancers pédiatriques, via l’utilisation de bases de données de NGS de cellules tumorales.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An Overview of known spatial clustering algorithms The space of interest can be the two-dimensional abstraction of the surface of the earth or a man-made space like the layout of a VLSI design, a volume containing a model of the human brain, or another 3d-space representing the arrangement of chains of protein molecules. The data consists of geometric information and can be either discrete or continuous. The explicit location and extension of spatial objects define implicit relations of spatial neighborhood (such as topological, distance and direction relations) which are used by spatial data mining algorithms. Therefore, spatial data mining algorithms are required for spatial characterization and spatial trend analysis. Spatial data mining or knowledge discovery in spatial databases differs from regular data mining in analogous with the differences between non-spatial data and spatial data. The attributes of a spatial object stored in a database may be affected by the attributes of the spatial neighbors of that object. In addition, spatial location, and implicit information about the location of an object, may be exactly the information that can be extracted through spatial data mining

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper highlights the prediction of learning disabilities (LD) in school-age children using rough set theory (RST) with an emphasis on application of data mining. In rough sets, data analysis start from a data table called an information system, which contains data about objects of interest, characterized in terms of attributes. These attributes consist of the properties of learning disabilities. By finding the relationship between these attributes, the redundant attributes can be eliminated and core attributes determined. Also, rule mining is performed in rough sets using the algorithm LEM1. The prediction of LD is accurately done by using Rosetta, the rough set tool kit for analysis of data. The result obtained from this study is compared with the output of a similar study conducted by us using Support Vector Machine (SVM) with Sequential Minimal Optimisation (SMO) algorithm. It is found that, using the concepts of reduct and global covering, we can easily predict the learning disabilities in children

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper highlights the prediction of Learning Disabilities (LD) in school-age children using two classification methods, Support Vector Machine (SVM) and Decision Tree (DT), with an emphasis on applications of data mining. About 10% of children enrolled in school have a learning disability. Learning disability prediction in school age children is a very complicated task because it tends to be identified in elementary school where there is no one sign to be identified. By using any of the two classification methods, SVM and DT, we can easily and accurately predict LD in any child. Also, we can determine the merits and demerits of these two classifiers and the best one can be selected for the use in the relevant field. In this study, Sequential Minimal Optimization (SMO) algorithm is used in performing SVM and J48 algorithm is used in constructing decision trees.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning disability (LD) is a neurological condition that affects a child’s brain and impairs his ability to carry out one or many specific tasks. LD affects about 10% of children enrolled in schools. There is no cure for learning disabilities and they are lifelong. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Just as there are many different types of LDs, there are a variety of tests that may be done to pinpoint the problem The information gained from an evaluation is crucial for finding out how the parents and the school authorities can provide the best possible learning environment for child. This paper proposes a new approach in artificial neural network (ANN) for identifying LD in children at early stages so as to solve the problems faced by them and to get the benefits to the students, their parents and school authorities. In this study, we propose a closest fit algorithm data preprocessing with ANN classification to handle missing attribute values. This algorithm imputes the missing values in the preprocessing stage. Ignoring of missing attribute values is a common trend in all classifying algorithms. But, in this paper, we use an algorithm in a systematic approach for classification, which gives a satisfactory result in the prediction of LD. It acts as a tool for predicting the LD accurately, and good information of the child is made available to the concerned

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning Disability (LD) is a classification including several disorders in which a child has difficulty in learning in a typical manner, usually caused by an unknown factor or factors. LD affects about 15% of children enrolled in schools. The prediction of learning disability is a complicated task since the identification of LD from diverse features or signs is a complicated problem. There is no cure for learning disabilities and they are life-long. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. The aim of this paper is to develop a new algorithm for imputing missing values and to determine the significance of the missing value imputation method and dimensionality reduction method in the performance of fuzzy and neuro fuzzy classifiers with specific emphasis on prediction of learning disabilities in school age children. In the basic assessment method for prediction of LD, checklists are generally used and the data cases thus collected fully depends on the mood of children and may have also contain redundant as well as missing values. Therefore, in this study, we are proposing a new algorithm, viz. the correlation based new algorithm for imputing the missing values and Principal Component Analysis (PCA) for reducing the irrelevant attributes. After the study, it is found that, the preprocessing methods applied by us improves the quality of data and thereby increases the accuracy of the classifiers. The system is implemented in Math works Software Mat Lab 7.10. The results obtained from this study have illustrated that the developed missing value imputation method is very good contribution in prediction system and is capable of improving the performance of a classifier.