5 resultados para Statistical mean
em Cochin University of Science
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Resumo:
The standard models for statistical signal extraction assume that the signal and noise are generated by linear Gaussian processes. The optimum filter weights for those models are derived using the method of minimum mean square error. In the present work we study the properties of signal extraction models under the assumption that signal/noise are generated by symmetric stable processes. The optimum filter is obtained by the method of minimum dispersion. The performance of the new filter is compared with their Gaussian counterparts by simulation.
Resumo:
Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.
Resumo:
In this paper, we study the relationship between the failure rate and the mean residual life of doubly truncated random variables. Accordingly, we develop characterizations for exponential, Pareto 11 and beta distributions. Further, we generalize the identities for fire Pearson and the exponential family of distributions given respectively in Nair and Sankaran (1991) and Consul (1995). Applications of these measures in file context of lengthbiased models are also explored
Resumo:
Adaptive filter is a primary method to filter Electrocardiogram (ECG), because it does not need the signal statistical characteristics. In this paper, an adaptive filtering technique for denoising the ECG based on Genetic Algorithm (GA) tuned Sign-Data Least Mean Square (SD-LMS) algorithm is proposed. This technique minimizes the mean-squared error between the primary input, which is a noisy ECG, and a reference input which can be either noise that is correlated in some way with the noise in the primary input or a signal that is correlated only with ECG in the primary input. Noise is used as the reference signal in this work. The algorithm was applied to the records from the MIT -BIH Arrhythmia database for removing the baseline wander and 60Hz power line interference. The proposed algorithm gave an average signal to noise ratio improvement of 10.75 dB for baseline wander and 24.26 dB for power line interference which is better than the previous reported works