788 resultados para dINSCY, subspace clustering, data mining, parallelo, distribuito, algoritmo


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Les positions des évènements de recombinaison s’agrègent ensemble, formant des hotspots déterminés en partie par la protéine à évolution rapide PRDM9. En particulier, ces positions de hotspots sont déterminées par le domaine de doigts de zinc (ZnF) de PRDM9 qui reconnait certains motifs d’ADN. Les allèles de PRDM9 contenant le ZnF de type k ont été préalablement associés avec une cohorte de patients affectés par la leucémie aigüe lymphoblastique. Les allèles de PRDM9 sont difficiles à identifier à partir de données de séquençage de nouvelle génération (NGS), en raison de leur nature répétitive. Dans ce projet, nous proposons une méthode permettant la caractérisation d’allèles de PRDM9 à partir de données de NGS, qui identifie le nombre d’allèles contenant un type spécifique de ZnF. Cette méthode est basée sur la corrélation entre les profils représentant le nombre de séquences nucléotidiques uniques à chaque ZnF retrouvés chez les lectures de NGS simulées sans erreur d’une paire d’allèles et chez les lectures d’un échantillon. La validité des prédictions obtenues par notre méthode est confirmée grâce à analyse basée sur les simulations. Nous confirmons également que la méthode peut correctement identifier le génotype d’allèles de PRDM9 qui n’ont pas encore été identifiés. Nous conduisons une analyse préliminaire identifiant le génotype des allèles de PRDM9 contenant un certain type de ZnF dans une cohorte de patients atteints de glioblastomes multiforme pédiatrique, un cancer du cerveau caractérisé par les mutations récurrentes dans le gène codant pour l’histone H3, la cible de l’activité épigénétique de PRDM9. Cette méthode ouvre la possibilité d’identifier des associations entre certains allèles de PRDM9 et d’autres types de cancers pédiatriques, via l’utilisation de bases de données de NGS de cellules tumorales.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper highlights the prediction of learning disabilities (LD) in school-age children using rough set theory (RST) with an emphasis on application of data mining. In rough sets, data analysis start from a data table called an information system, which contains data about objects of interest, characterized in terms of attributes. These attributes consist of the properties of learning disabilities. By finding the relationship between these attributes, the redundant attributes can be eliminated and core attributes determined. Also, rule mining is performed in rough sets using the algorithm LEM1. The prediction of LD is accurately done by using Rosetta, the rough set tool kit for analysis of data. The result obtained from this study is compared with the output of a similar study conducted by us using Support Vector Machine (SVM) with Sequential Minimal Optimisation (SMO) algorithm. It is found that, using the concepts of reduct and global covering, we can easily predict the learning disabilities in children

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper highlights the prediction of Learning Disabilities (LD) in school-age children using two classification methods, Support Vector Machine (SVM) and Decision Tree (DT), with an emphasis on applications of data mining. About 10% of children enrolled in school have a learning disability. Learning disability prediction in school age children is a very complicated task because it tends to be identified in elementary school where there is no one sign to be identified. By using any of the two classification methods, SVM and DT, we can easily and accurately predict LD in any child. Also, we can determine the merits and demerits of these two classifiers and the best one can be selected for the use in the relevant field. In this study, Sequential Minimal Optimization (SMO) algorithm is used in performing SVM and J48 algorithm is used in constructing decision trees.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning disability (LD) is a neurological condition that affects a child’s brain and impairs his ability to carry out one or many specific tasks. LD affects about 10% of children enrolled in schools. There is no cure for learning disabilities and they are lifelong. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Just as there are many different types of LDs, there are a variety of tests that may be done to pinpoint the problem The information gained from an evaluation is crucial for finding out how the parents and the school authorities can provide the best possible learning environment for child. This paper proposes a new approach in artificial neural network (ANN) for identifying LD in children at early stages so as to solve the problems faced by them and to get the benefits to the students, their parents and school authorities. In this study, we propose a closest fit algorithm data preprocessing with ANN classification to handle missing attribute values. This algorithm imputes the missing values in the preprocessing stage. Ignoring of missing attribute values is a common trend in all classifying algorithms. But, in this paper, we use an algorithm in a systematic approach for classification, which gives a satisfactory result in the prediction of LD. It acts as a tool for predicting the LD accurately, and good information of the child is made available to the concerned

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning Disability (LD) is a classification including several disorders in which a child has difficulty in learning in a typical manner, usually caused by an unknown factor or factors. LD affects about 15% of children enrolled in schools. The prediction of learning disability is a complicated task since the identification of LD from diverse features or signs is a complicated problem. There is no cure for learning disabilities and they are life-long. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. The aim of this paper is to develop a new algorithm for imputing missing values and to determine the significance of the missing value imputation method and dimensionality reduction method in the performance of fuzzy and neuro fuzzy classifiers with specific emphasis on prediction of learning disabilities in school age children. In the basic assessment method for prediction of LD, checklists are generally used and the data cases thus collected fully depends on the mood of children and may have also contain redundant as well as missing values. Therefore, in this study, we are proposing a new algorithm, viz. the correlation based new algorithm for imputing the missing values and Principal Component Analysis (PCA) for reducing the irrelevant attributes. After the study, it is found that, the preprocessing methods applied by us improves the quality of data and thereby increases the accuracy of the classifiers. The system is implemented in Math works Software Mat Lab 7.10. The results obtained from this study have illustrated that the developed missing value imputation method is very good contribution in prediction system and is capable of improving the performance of a classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning Disability (LD) is a neurological condition that affects a child’s brain and impairs his ability to carry out one or many specific tasks. LD affects about 15 % of children enrolled in schools. The prediction of LD is a vital and intricate job. The aim of this paper is to design an effective and powerful tool, using the two intelligent methods viz., Artificial Neural Network and Adaptive Neuro-Fuzzy Inference System, for measuring the percentage of LD that affected in school-age children. In this study, we are proposing some soft computing methods in data preprocessing for improving the accuracy of the tool as well as the classifier. The data preprocessing is performed through Principal Component Analysis for attribute reduction and closest fit algorithm is used for imputing missing values. The main idea in developing the LD prediction tool is not only to predict the LD present in children but also to measure its percentage along with its class like low or minor or major. The system is implemented in Mathworks Software MatLab 7.10. The results obtained from this study have illustrated that the designed prediction system or tool is capable of measuring the LD effectively

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In our study we use a kernel based classification technique, Support Vector Machine Regression for predicting the Melting Point of Drug – like compounds in terms of Topological Descriptors, Topological Charge Indices, Connectivity Indices and 2D Auto Correlations. The Machine Learning model was designed, trained and tested using a dataset of 100 compounds and it was found that an SVMReg model with RBF Kernel could predict the Melting Point with a mean absolute error 15.5854 and Root Mean Squared Error 19.7576

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Knowledge discovery support environments include beside classical data analysis tools also data mining tools. For supporting both kinds of tools, a unified knowledge representation is needed. We show that concept lattices which are used as knowledge representation in Conceptual Information Systems can also be used for structuring the results of mining association rules. Vice versa, we use ideas of association rules for reducing the complexity of the visualization of Conceptual Information Systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we study two orthogonal extensions of the classical data mining problem of mining association rules, and show how they naturally interact. The first is the extension from a propositional representation to datalog, and the second is the condensed representation of frequent itemsets by means of Formal Concept Analysis (FCA). We combine the notion of frequent datalog queries with iceberg concept lattices (also called closed itemsets) of FCA and introduce two kinds of iceberg query lattices as condensed representations of frequent datalog queries. We demonstrate that iceberg query lattices provide a natural way to visualize relational association rules in a non-redundant way.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Einhergehend mit der Entwicklung und zunehmenden Verfügbarkeit des Internets hat sich die Art der Informationsbereitstellung und der Informationsbeschaffung deutlich geändert. Die einstmalige Trennung zwischen Publizist und Konsument wird durch kollaborative Anwendungen des sogenannten Web 2.0 aufgehoben, wo jeder Teilnehmer gleichsam Informationen bereitstellen und konsumieren kann. Zudem können Einträge anderer Teilnehmer erweitert, kommentiert oder diskutiert werden. Mit dem Social Web treten schließlich die sozialen Beziehungen und Interaktionen der Teilnehmer in den Vordergrund. Dank mobiler Endgeräte können zu jeder Zeit und an nahezu jedem Ort Nachrichten verschickt und gelesen werden, neue Bekannschaften gemacht oder der aktuelle Status dem virtuellen Freundeskreis mitgeteilt werden. Mit jeder Aktivität innerhalb einer solchen Applikation setzt sich ein Teilnehmer in Beziehung zu Datenobjekten und/oder anderen Teilnehmern. Dies kann explizit geschehen, indem z.B. ein Artikel geschrieben wird und per E-Mail an Freunde verschickt wird. Beziehungen zwischen Datenobjekten und Nutzern fallen aber auch implizit an, wenn z.B. die Profilseite eines anderen Teilnehmers aufgerufen wird oder wenn verschiedene Teilnehmer einen Artikel ähnlich bewerten. Im Rahmen dieser Arbeit wird ein formaler Ansatz zur Analyse und Nutzbarmachung von Beziehungsstrukturen entwickelt, welcher auf solchen expliziten und impliziten Datenspuren aufbaut. In einem ersten Teil widmet sich diese Arbeit der Analyse von Beziehungen zwischen Nutzern in Applikationen des Social Web unter Anwendung von Methoden der sozialen Netzwerkanalyse. Innerhalb einer typischen sozialen Webanwendung haben Nutzer verschiedene Möglichkeiten zu interagieren. Aus jedem Interaktionsmuster werden Beziehungsstrukturen zwischen Nutzern abgeleitet. Der Vorteil der impliziten Nutzer-Interaktionen besteht darin, dass diese häufig vorkommen und quasi nebenbei im Betrieb des Systems abfallen. Jedoch ist anzunehmen, dass eine explizit angegebene Freundschaftsbeziehung eine stärkere Aussagekraft hat, als entsprechende implizite Interaktionen. Ein erster Schwerpunkt dieser Arbeit ist entsprechend der Vergleich verschiedener Beziehungsstrukturen innerhalb einer sozialen Webanwendung. Der zweite Teil dieser Arbeit widmet sich der Analyse eines der weit verbreitetsten Profil-Attributen von Nutzern in sozialen Webanwendungen, dem Vornamen. Hierbei finden die im ersten Teil vorgestellten Verfahren und Analysen Anwendung, d.h. es werden Beziehungsnetzwerke für Namen aus Daten von sozialen Webanwendungen gewonnen und mit Methoden der sozialen Netzwerkanalyse untersucht. Mithilfe externer Beschreibungen von Vornamen werden semantische Ähnlichkeiten zwischen Namen bestimmt und mit jeweiligen strukturellen Ähnlichkeiten in den verschiedenen Beziehungsnetzwerken verglichen. Die Bestimmung von ähnlichen Namen entspricht in einer praktischen Anwendung der Suche von werdenden Eltern nach einem passenden Vornamen. Die Ergebnisse zu der Analyse von Namensbeziehungen sind die Grundlage für die Implementierung der Namenssuchmaschine Nameling, welche im Rahmen dieser Arbeit entwickelt wurde. Mehr als 35.000 Nutzer griffen innerhalb der ersten sechs Monate nach Inbetriebnahme auf Nameling zu. Die hierbei anfallenden Nutzungsdaten wiederum geben Aufschluss über individuelle Vornamenspräferenzen der Anwender. Im Rahmen dieser Arbeit werden diese Nutzungsdaten vorgestellt und zur Bestimmung sowie Bewertung von personalisierten Vornamensempfehlungen verwendet. Abschließend werden Ansätze zur Diversifizierung von personalisierten Vornamensempfehlungen vorgestellt, welche statische Beziehungsnetzwerke für Namen mit den individuellen Nutzungsdaten verknüpft.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Co-training is a semi-supervised learning method that is designed to take advantage of the redundancy that is present when the object to be identified has multiple descriptions. Co-training is known to work well when the multiple descriptions are conditional independent given the class of the object. The presence of multiple descriptions of objects in the form of text, images, audio and video in multimedia applications appears to provide redundancy in the form that may be suitable for co-training. In this paper, we investigate the suitability of utilizing text and image data from the Web for co-training. We perform measurements to find indications of conditional independence in the texts and images obtained from the Web. Our measurements suggest that conditional independence is likely to be present in the data. Our experiments, within a relevance feedback framework to test whether a method that exploits the conditional independence outperforms methods that do not, also indicate that better performance can indeed be obtained by designing algorithms that exploit this form of the redundancy when it is present.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recommender systems attempt to predict items in which a user might be interested, given some information about the user's and items' profiles. Most existing recommender systems use content-based or collaborative filtering methods or hybrid methods that combine both techniques (see the sidebar for more details). We created Informed Recommender to address the problem of using consumer opinion about products, expressed online in free-form text, to generate product recommendations. Informed recommender uses prioritized consumer product reviews to make recommendations. Using text-mining techniques, it maps each piece of each review comment automatically into an ontology