782 resultados para Spatial data mining
                                
Resumo:
Triple quadrupole mass spectrometers coupled with high performance liquid chromatography are workhorses in quantitative bioanalyses. It provides substantial benefits including reproducibility, sensitivity and selectivity for trace analysis. Selected Reaction Monitoring allows targeted assay development but data sets generated contain very limited information. Data mining and analysis of non-targeted high-resolution mass spectrometry profiles of biological samples offer the opportunity to perform more exhaustive assessments, including quantitative and qualitative analysis. The objectives of this study was to test method precision and accuracy, statistically compare bupivacaine drug concentration in real study samples and verify if high resolution and accurate mass data collected in scan mode can actually permit retrospective data analysis, more specifically, extract metabolite related information. The precision and accuracy data presented using both instruments provided equivalent results. Overall, the accuracy was ranging from 106.2 to 113.2% and the precision observed was from 1.0 to 3.7%. Statistical comparisons using a linear regression between both methods reveal a coefficient of determination (R2) of 0.9996 and a slope of 1.02 demonstrating a very strong correlation between both methods. Individual sample comparison showed differences from -4.5% to 1.6% well within the accepted analytical error. Moreover, post acquisition extracted ion chromatograms at m/z 233.1648 ± 5 ppm (M-56) and m/z 305.2224 ± 5 ppm (M+16) revealed the presence of desbutyl-bupivacaine and three distinct hydroxylated bupivacaine metabolites. Post acquisition analysis allowed us to produce semiquantitative evaluations of the concentration-time profiles for bupicavaine metabolites.
                                
Resumo:
Les positions des évènements de recombinaison s’agrègent ensemble, formant des hotspots déterminés en partie par la protéine à évolution rapide PRDM9. En particulier, ces positions de hotspots sont déterminées par le domaine de doigts de zinc (ZnF) de PRDM9 qui reconnait certains motifs d’ADN. Les allèles de PRDM9 contenant le ZnF de type k ont été préalablement associés avec une cohorte de patients affectés par la leucémie aigüe lymphoblastique. Les allèles de PRDM9 sont difficiles à identifier à partir de données de séquençage de nouvelle génération (NGS), en raison de leur nature répétitive. Dans ce projet, nous proposons une méthode permettant la caractérisation d’allèles de PRDM9 à partir de données de NGS, qui identifie le nombre d’allèles contenant un type spécifique de ZnF. Cette méthode est basée sur la corrélation entre les profils représentant le nombre de séquences nucléotidiques uniques à chaque ZnF retrouvés chez les lectures de NGS simulées sans erreur d’une paire d’allèles et chez les lectures d’un échantillon. La validité des prédictions obtenues par notre méthode est confirmée grâce à analyse basée sur les simulations. Nous confirmons également que la méthode peut correctement identifier le génotype d’allèles de PRDM9 qui n’ont pas encore été identifiés. Nous conduisons une analyse préliminaire identifiant le génotype des allèles de PRDM9 contenant un certain type de ZnF dans une cohorte de patients atteints de glioblastomes multiforme pédiatrique, un cancer du cerveau caractérisé par les mutations récurrentes dans le gène codant pour l’histone H3, la cible de l’activité épigénétique de PRDM9. Cette méthode ouvre la possibilité d’identifier des associations entre certains allèles de PRDM9 et d’autres types de cancers pédiatriques, via l’utilisation de bases de données de NGS de cellules tumorales.
                                
Resumo:
The present investigation on the Muvattupuzha river basin is an integrated approach based on hydrogeological, geophysical, hydrogeochemical parameters and the results are interpreted using satellite data. GIS also been used to combine the various spatial and non-spatial data. The salient finding of the present study are accounted below to provide a holistic picture on the groundwaters of the Muvattupuzha river basin. In the Muvattupuzha river basin the groundwaters are drawn from the weathered and fractured zones. The groundwater level fluctuations of the basin from 1992 to 2001 reveal that the water level varies between a minimum of 0.003 m and a maximum of 3.45 m. The groundwater fluctuation is affected by rainfall. Various aquifer parameters like transmissivity, storage coefficient, optimum yield, time for full recovery and specific capacity indices are analyzed. The depth to the bedrock of the basin varies widely from 1.5 to 17 mbgl. A ground water prospective map of phreatic aquifer has been prepared based on thickness of the weathered zone and low resistivity values (<500 ohm-m) and accordingly the basin is classified in three phreatic potential zones as good, moderate and poor. The groundwater of the Muvattupuzha river basin, the pH value ranges from 5.5 to 8.1, in acidic nature. Hydrochemical facies diagram reveals that most of the samples in both the seasons fall in mixing and dissolution facies and a few in static and dynamic natures. Further study is needed on impact of dykes on the occurrence and movement of groundwater, impact of seapages from irrigation canals on the groundwater quality and resources of this basin, and influence of inter-basin transfer of surface water on groundwater.
                                
Resumo:
Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.
                                
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
                                
Resumo:
The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified
                                
Resumo:
This paper highlights the prediction of learning disabilities (LD) in school-age children using rough set theory (RST) with an emphasis on application of data mining. In rough sets, data analysis start from a data table called an information system, which contains data about objects of interest, characterized in terms of attributes. These attributes consist of the properties of learning disabilities. By finding the relationship between these attributes, the redundant attributes can be eliminated and core attributes determined. Also, rule mining is performed in rough sets using the algorithm LEM1. The prediction of LD is accurately done by using Rosetta, the rough set tool kit for analysis of data. The result obtained from this study is compared with the output of a similar study conducted by us using Support Vector Machine (SVM) with Sequential Minimal Optimisation (SMO) algorithm. It is found that, using the concepts of reduct and global covering, we can easily predict the learning disabilities in children
                                
Resumo:
This paper highlights the prediction of Learning Disabilities (LD) in school-age children using two classification methods, Support Vector Machine (SVM) and Decision Tree (DT), with an emphasis on applications of data mining. About 10% of children enrolled in school have a learning disability. Learning disability prediction in school age children is a very complicated task because it tends to be identified in elementary school where there is no one sign to be identified. By using any of the two classification methods, SVM and DT, we can easily and accurately predict LD in any child. Also, we can determine the merits and demerits of these two classifiers and the best one can be selected for the use in the relevant field. In this study, Sequential Minimal Optimization (SMO) algorithm is used in performing SVM and J48 algorithm is used in constructing decision trees.
                                
Resumo:
Learning disability (LD) is a neurological condition that affects a child’s brain and impairs his ability to carry out one or many specific tasks. LD affects about 10% of children enrolled in schools. There is no cure for learning disabilities and they are lifelong. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Just as there are many different types of LDs, there are a variety of tests that may be done to pinpoint the problem The information gained from an evaluation is crucial for finding out how the parents and the school authorities can provide the best possible learning environment for child. This paper proposes a new approach in artificial neural network (ANN) for identifying LD in children at early stages so as to solve the problems faced by them and to get the benefits to the students, their parents and school authorities. In this study, we propose a closest fit algorithm data preprocessing with ANN classification to handle missing attribute values. This algorithm imputes the missing values in the preprocessing stage. Ignoring of missing attribute values is a common trend in all classifying algorithms. But, in this paper, we use an algorithm in a systematic approach for classification, which gives a satisfactory result in the prediction of LD. It acts as a tool for predicting the LD accurately, and good information of the child is made available to the concerned
                                
Resumo:
Learning Disability (LD) is a classification including several disorders in which a child has difficulty in learning in a typical manner, usually caused by an unknown factor or factors. LD affects about 15% of children enrolled in schools. The prediction of learning disability is a complicated task since the identification of LD from diverse features or signs is a complicated problem. There is no cure for learning disabilities and they are life-long. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. The aim of this paper is to develop a new algorithm for imputing missing values and to determine the significance of the missing value imputation method and dimensionality reduction method in the performance of fuzzy and neuro fuzzy classifiers with specific emphasis on prediction of learning disabilities in school age children. In the basic assessment method for prediction of LD, checklists are generally used and the data cases thus collected fully depends on the mood of children and may have also contain redundant as well as missing values. Therefore, in this study, we are proposing a new algorithm, viz. the correlation based new algorithm for imputing the missing values and Principal Component Analysis (PCA) for reducing the irrelevant attributes. After the study, it is found that, the preprocessing methods applied by us improves the quality of data and thereby increases the accuracy of the classifiers. The system is implemented in Math works Software Mat Lab 7.10. The results obtained from this study have illustrated that the developed missing value imputation method is very good contribution in prediction system and is capable of improving the performance of a classifier.
                                
Resumo:
Learning Disability (LD) is a neurological condition that affects a child’s brain and impairs his ability to carry out one or many specific tasks. LD affects about 15 % of children enrolled in schools. The prediction of LD is a vital and intricate job. The aim of this paper is to design an effective and powerful tool, using the two intelligent methods viz., Artificial Neural Network and Adaptive Neuro-Fuzzy Inference System, for measuring the percentage of LD that affected in school-age children. In this study, we are proposing some soft computing methods in data preprocessing for improving the accuracy of the tool as well as the classifier. The data preprocessing is performed through Principal Component Analysis for attribute reduction and closest fit algorithm is used for imputing missing values. The main idea in developing the LD prediction tool is not only to predict the LD present in children but also to measure its percentage along with its class like low or minor or major. The system is implemented in Mathworks Software MatLab 7.10. The results obtained from this study have illustrated that the designed prediction system or tool is capable of measuring the LD effectively
                                
Resumo:
In our study we use a kernel based classification technique, Support Vector Machine Regression for predicting the Melting Point of Drug – like compounds in terms of Topological Descriptors, Topological Charge Indices, Connectivity Indices and 2D Auto Correlations. The Machine Learning model was designed, trained and tested using a dataset of 100 compounds and it was found that an SVMReg model with RBF Kernel could predict the Melting Point with a mean absolute error 15.5854 and Root Mean Squared Error 19.7576
                                
Resumo:
A GIS has been designed with limited Functionalities; but with a novel approach in Aits design. The spatial data model adopted in the design of KBGIS is the unlinked vector model. Each map entity is encoded separately in vector fonn, without referencing any of its neighbouring entities. Spatial relations, in other words, are not encoded. This approach is adequate for routine analysis of geographic data represented on a planar map, and their display (Pages 105-106). Even though spatial relations are not encoded explicitly, they can be extracted through the specially designed queries. This work was undertaken as an experiment to study the feasibility of developing a GIS using a knowledge base in place of a relational database. The source of input spatial data was accurate sheet maps that were manually digitised. Each identifiable geographic primitive was represented as a distinct object, with its spatial properties and attributes defined. Composite spatial objects, made up of primitive objects, were formulated, based on production rules defining such compositions. The facts and rules were then organised into a production system, using OPS5
                                
Resumo:
Die gegenwärtige Entwicklung der internationalen Klimapolitik verlangt von Deutschland eine Reduktion seiner Treibhausgasemissionen. Wichtigstes Treibhausgas ist Kohlendioxid, das durch die Verbrennung fossiler Energieträger in die Atmosphäre freigesetzt wird. Die Reduktionsziele können prinzipiell durch eine Verminderung der Emissionen sowie durch die Schaffung von Kohlenstoffsenken erreicht werden. Senken beschreiben dabei die biologische Speicherung von Kohlenstoff in Böden und Wäldern. Eine wichtige Einflussgröße auf diese Prozesse stellt die räumliche Dynamik der Landnutzung einer Region dar. In dieser Arbeit wird das Modellsystem HILLS entwickelt und zur Simulation dieser komplexen Wirkbeziehungen im Bundesland Hessen genutzt. Ziel ist es, mit HILLS über eine Analyse des aktuellen Zustands hinaus auch Szenarien über Wege der zukünftigen regionalen Entwicklung von Landnutzung und ihrer Wirkung auf den Kohlenstoffhaushalt bis 2020 zu untersuchen. Für die Abbildung der räumlichen und zeitlichen Dynamik von Landnutzung in Hessen wird das Modell LUCHesse entwickelt. Seine Aufgabe ist die Simulation der relevanten Prozesse auf einem 1 km2 Raster, wobei die Raten der Änderung exogen als Flächentrends auf Ebene der hessischen Landkreise vorgegeben werden. LUCHesse besteht aus Teilmodellen für die Prozesse: (A) Ausbreitung von Siedlungs- und Gewerbefläche, (B) Strukturwandel im Agrarsektor sowie (C) Neuanlage von Waldflächen (Aufforstung). Jedes Teilmodell umfasst Methoden zur Bewertung der Standorteignung der Rasterzellen für unterschiedliche Landnutzungsklassen und zur Zuordnung der Trendvorgaben zu solchen Rasterzellen, die jeweils am besten für eine Landnutzungsklasse geeignet sind. Eine Validierung der Teilmodelle erfolgt anhand von statistischen Daten für den Zeitraum von 1990 bis 2000. Als Ergebnis eines Simulationslaufs werden für diskrete Zeitschritte digitale Karten der Landnutzugsverteilung in Hessen erzeugt. Zur Simulation der Kohlenstoffspeicherung wird eine modifizierte Version des Ökosystemmodells Century entwickelt (GIS-Century). Sie erlaubt einen gesteuerten Simulationslauf in Jahresschritten und unterstützt die Integration des Modells als Komponente in das HILLS Modellsystem. Es werden verschiedene Anwendungsschemata für GIS-Century entwickelt, mit denen die Wirkung der Stilllegung von Ackerflächen, der Aufforstung sowie der Bewirtschaftung bereits bestehender Wälder auf die Kohlenstoffspeicherung untersucht werden kann. Eine Validierung des Modells und der Anwendungsschemata erfolgt anhand von Feld- und Literaturdaten. HILLS implementiert eine sequentielle Kopplung von LUCHesse mit GIS-Century. Die räumliche Kopplung geschieht dabei auf dem 1 km2 Raster, die zeitliche Kopplung über die Einführung eines Landnutzungsvektors, der die Beschreibung der Landnutzungsänderung einer Rasterzelle während des Simulationszeitraums enthält. Außerdem integriert HILLS beide Modelle über ein dienste- und datenbankorientiertes Konzept in ein Geografisches Informationssystem (GIS). Auf diesem Wege können die GIS-Funktionen zur räumlichen Datenhaltung und Datenverarbeitung genutzt werden. Als Anwendung des Modellsystems wird ein Referenzszenario für Hessen mit dem Zeithorizont 2020 berechnet. Das Szenario setzt im Agrarsektor eine Umsetzung der AGENDA 2000 Politik voraus, die in großem Maße zu Stilllegung von Ackerflächen führt, während für den Bereich Siedlung und Gewerbe sowie Aufforstung die aktuellen Trends der Flächenausdehnung fortgeschrieben werden. Mit HILLS ist es nun möglich, die Wirkung dieser Landnutzungsänderungen auf die biologische Kohlenstoffspeicherung zu quantifizieren. Während die Ausdehnung von Siedlungsflächen als Kohlenstoffquelle identifiziert werden kann (37 kt C/a), findet sich die wichtigste Senke in der Bewirtschaftung bestehender Waldflächen (794 kt C/a). Weiterhin führen die Stilllegung von Ackerfläche (26 kt C/a) sowie Aufforstung (29 kt C/a) zu einer zusätzlichen Speicherung von Kohlenstoff. Für die Kohlenstoffspeicherung in Böden zeigen die Simulationsexperimente sehr klar, dass diese Senke nur von beschränkter Dauer ist.
                                
Resumo:
Many recent Web 2.0 resource sharing applications can be subsumed under the "folksonomy" moniker. Regardless of the type of resource shared, all of these share a common structure describing the assignment of tags to resources by users. In this report, we generalize the notions of clustering and characteristic path length which play a major role in the current research on networks, where they are used to describe the small-world effects on many observable network datasets. To that end, we show that the notion of clustering has two facets which are not equivalent in the generalized setting. The new measures are evaluated on two large-scale folksonomy datasets from resource sharing systems on the web.
 
                    