864 resultados para Data mining methods
Resumo:
This thesis investigated the potential use of Linear Predictive Coding in speech communication applications. A Modified Block Adaptive Predictive Coder is developed, which reduces the computational burden and complexity without sacrificing the speech quality, as compared to the conventional adaptive predictive coding (APC) system. For this, changes in the evaluation methods have been evolved. This method is as different from the usual APC system in that the difference between the true and the predicted value is not transmitted. This allows the replacement of the high order predictor in the transmitter section of a predictive coding system, by a simple delay unit, which makes the transmitter quite simple. Also, the block length used in the processing of the speech signal is adjusted relative to the pitch period of the signal being processed rather than choosing a constant length as hitherto done by other researchers. The efficiency of the newly proposed coder has been supported with results of computer simulation using real speech data. Three methods for voiced/unvoiced/silent/transition classification have been presented. The first one is based on energy, zerocrossing rate and the periodicity of the waveform. The second method uses normalised correlation coefficient as the main parameter, while the third method utilizes a pitch-dependent correlation factor. The third algorithm which gives the minimum error probability has been chosen in a later chapter to design the modified coder The thesis also presents a comparazive study beh-cm the autocorrelation and the covariance methods used in the evaluaiicn of the predictor parameters. It has been proved that the azztocorrelation method is superior to the covariance method with respect to the filter stabf-it)‘ and also in an SNR sense, though the increase in gain is only small. The Modified Block Adaptive Coder applies a switching from pitch precitzion to spectrum prediction when the speech segment changes from a voiced or transition region to an unvoiced region. The experiments cont;-:ted in coding, transmission and simulation, used speech samples from .\£=_‘ajr2_1a:r1 and English phrases. Proposal for a speaker reecgnifion syste: and a phoneme identification system has also been outlized towards the end of the thesis.
Resumo:
An Overview of known spatial clustering algorithms The space of interest can be the two-dimensional abstraction of the surface of the earth or a man-made space like the layout of a VLSI design, a volume containing a model of the human brain, or another 3d-space representing the arrangement of chains of protein molecules. The data consists of geometric information and can be either discrete or continuous. The explicit location and extension of spatial objects define implicit relations of spatial neighborhood (such as topological, distance and direction relations) which are used by spatial data mining algorithms. Therefore, spatial data mining algorithms are required for spatial characterization and spatial trend analysis. Spatial data mining or knowledge discovery in spatial databases differs from regular data mining in analogous with the differences between non-spatial data and spatial data. The attributes of a spatial object stored in a database may be affected by the attributes of the spatial neighbors of that object. In addition, spatial location, and implicit information about the location of an object, may be exactly the information that can be extracted through spatial data mining
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Resumo:
The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified
Resumo:
This paper highlights the prediction of learning disabilities (LD) in school-age children using rough set theory (RST) with an emphasis on application of data mining. In rough sets, data analysis start from a data table called an information system, which contains data about objects of interest, characterized in terms of attributes. These attributes consist of the properties of learning disabilities. By finding the relationship between these attributes, the redundant attributes can be eliminated and core attributes determined. Also, rule mining is performed in rough sets using the algorithm LEM1. The prediction of LD is accurately done by using Rosetta, the rough set tool kit for analysis of data. The result obtained from this study is compared with the output of a similar study conducted by us using Support Vector Machine (SVM) with Sequential Minimal Optimisation (SMO) algorithm. It is found that, using the concepts of reduct and global covering, we can easily predict the learning disabilities in children
Resumo:
Learning disability (LD) is a neurological condition that affects a child’s brain and impairs his ability to carry out one or many specific tasks. LD affects about 10% of children enrolled in schools. There is no cure for learning disabilities and they are lifelong. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Just as there are many different types of LDs, there are a variety of tests that may be done to pinpoint the problem The information gained from an evaluation is crucial for finding out how the parents and the school authorities can provide the best possible learning environment for child. This paper proposes a new approach in artificial neural network (ANN) for identifying LD in children at early stages so as to solve the problems faced by them and to get the benefits to the students, their parents and school authorities. In this study, we propose a closest fit algorithm data preprocessing with ANN classification to handle missing attribute values. This algorithm imputes the missing values in the preprocessing stage. Ignoring of missing attribute values is a common trend in all classifying algorithms. But, in this paper, we use an algorithm in a systematic approach for classification, which gives a satisfactory result in the prediction of LD. It acts as a tool for predicting the LD accurately, and good information of the child is made available to the concerned
Resumo:
In our study we use a kernel based classification technique, Support Vector Machine Regression for predicting the Melting Point of Drug – like compounds in terms of Topological Descriptors, Topological Charge Indices, Connectivity Indices and 2D Auto Correlations. The Machine Learning model was designed, trained and tested using a dataset of 100 compounds and it was found that an SVMReg model with RBF Kernel could predict the Melting Point with a mean absolute error 15.5854 and Root Mean Squared Error 19.7576
Resumo:
Many recent Web 2.0 resource sharing applications can be subsumed under the "folksonomy" moniker. Regardless of the type of resource shared, all of these share a common structure describing the assignment of tags to resources by users. In this report, we generalize the notions of clustering and characteristic path length which play a major role in the current research on networks, where they are used to describe the small-world effects on many observable network datasets. To that end, we show that the notion of clustering has two facets which are not equivalent in the generalized setting. The new measures are evaluated on two large-scale folksonomy datasets from resource sharing systems on the web.
Resumo:
Knowledge discovery support environments include beside classical data analysis tools also data mining tools. For supporting both kinds of tools, a unified knowledge representation is needed. We show that concept lattices which are used as knowledge representation in Conceptual Information Systems can also be used for structuring the results of mining association rules. Vice versa, we use ideas of association rules for reducing the complexity of the visualization of Conceptual Information Systems.
Resumo:
In this paper we study two orthogonal extensions of the classical data mining problem of mining association rules, and show how they naturally interact. The first is the extension from a propositional representation to datalog, and the second is the condensed representation of frequent itemsets by means of Formal Concept Analysis (FCA). We combine the notion of frequent datalog queries with iceberg concept lattices (also called closed itemsets) of FCA and introduce two kinds of iceberg query lattices as condensed representations of frequent datalog queries. We demonstrate that iceberg query lattices provide a natural way to visualize relational association rules in a non-redundant way.
Resumo:
Einhergehend mit der Entwicklung und zunehmenden Verfügbarkeit des Internets hat sich die Art der Informationsbereitstellung und der Informationsbeschaffung deutlich geändert. Die einstmalige Trennung zwischen Publizist und Konsument wird durch kollaborative Anwendungen des sogenannten Web 2.0 aufgehoben, wo jeder Teilnehmer gleichsam Informationen bereitstellen und konsumieren kann. Zudem können Einträge anderer Teilnehmer erweitert, kommentiert oder diskutiert werden. Mit dem Social Web treten schließlich die sozialen Beziehungen und Interaktionen der Teilnehmer in den Vordergrund. Dank mobiler Endgeräte können zu jeder Zeit und an nahezu jedem Ort Nachrichten verschickt und gelesen werden, neue Bekannschaften gemacht oder der aktuelle Status dem virtuellen Freundeskreis mitgeteilt werden. Mit jeder Aktivität innerhalb einer solchen Applikation setzt sich ein Teilnehmer in Beziehung zu Datenobjekten und/oder anderen Teilnehmern. Dies kann explizit geschehen, indem z.B. ein Artikel geschrieben wird und per E-Mail an Freunde verschickt wird. Beziehungen zwischen Datenobjekten und Nutzern fallen aber auch implizit an, wenn z.B. die Profilseite eines anderen Teilnehmers aufgerufen wird oder wenn verschiedene Teilnehmer einen Artikel ähnlich bewerten. Im Rahmen dieser Arbeit wird ein formaler Ansatz zur Analyse und Nutzbarmachung von Beziehungsstrukturen entwickelt, welcher auf solchen expliziten und impliziten Datenspuren aufbaut. In einem ersten Teil widmet sich diese Arbeit der Analyse von Beziehungen zwischen Nutzern in Applikationen des Social Web unter Anwendung von Methoden der sozialen Netzwerkanalyse. Innerhalb einer typischen sozialen Webanwendung haben Nutzer verschiedene Möglichkeiten zu interagieren. Aus jedem Interaktionsmuster werden Beziehungsstrukturen zwischen Nutzern abgeleitet. Der Vorteil der impliziten Nutzer-Interaktionen besteht darin, dass diese häufig vorkommen und quasi nebenbei im Betrieb des Systems abfallen. Jedoch ist anzunehmen, dass eine explizit angegebene Freundschaftsbeziehung eine stärkere Aussagekraft hat, als entsprechende implizite Interaktionen. Ein erster Schwerpunkt dieser Arbeit ist entsprechend der Vergleich verschiedener Beziehungsstrukturen innerhalb einer sozialen Webanwendung. Der zweite Teil dieser Arbeit widmet sich der Analyse eines der weit verbreitetsten Profil-Attributen von Nutzern in sozialen Webanwendungen, dem Vornamen. Hierbei finden die im ersten Teil vorgestellten Verfahren und Analysen Anwendung, d.h. es werden Beziehungsnetzwerke für Namen aus Daten von sozialen Webanwendungen gewonnen und mit Methoden der sozialen Netzwerkanalyse untersucht. Mithilfe externer Beschreibungen von Vornamen werden semantische Ähnlichkeiten zwischen Namen bestimmt und mit jeweiligen strukturellen Ähnlichkeiten in den verschiedenen Beziehungsnetzwerken verglichen. Die Bestimmung von ähnlichen Namen entspricht in einer praktischen Anwendung der Suche von werdenden Eltern nach einem passenden Vornamen. Die Ergebnisse zu der Analyse von Namensbeziehungen sind die Grundlage für die Implementierung der Namenssuchmaschine Nameling, welche im Rahmen dieser Arbeit entwickelt wurde. Mehr als 35.000 Nutzer griffen innerhalb der ersten sechs Monate nach Inbetriebnahme auf Nameling zu. Die hierbei anfallenden Nutzungsdaten wiederum geben Aufschluss über individuelle Vornamenspräferenzen der Anwender. Im Rahmen dieser Arbeit werden diese Nutzungsdaten vorgestellt und zur Bestimmung sowie Bewertung von personalisierten Vornamensempfehlungen verwendet. Abschließend werden Ansätze zur Diversifizierung von personalisierten Vornamensempfehlungen vorgestellt, welche statische Beziehungsnetzwerke für Namen mit den individuellen Nutzungsdaten verknüpft.
Resumo:
Consumer reviews, opinions and shared experiences in the use of a product is a powerful source of information about consumer preferences that can be used in recommender systems. Despite the importance and value of such information, there is no comprehensive mechanism that formalizes the opinions selection and retrieval process and the utilization of retrieved opinions due to the difficulty of extracting information from text data. In this paper, a new recommender system that is built on consumer product reviews is proposed. A prioritizing mechanism is developed for the system. The proposed approach is illustrated using the case study of a recommender system for digital cameras
Resumo:
Monitor a distribution network implies working with a huge amount of data coining from the different elements that interact in the network. This paper presents a visualization tool that simplifies the task of searching the database for useful information applicable to fault management or preventive maintenance of the network
Resumo:
What is Programming? A useful definition Object Orientation (and it’s counterparts) Thinking OO Programming Blocks Variables Logic Data Structures Methods