113 resultados para machine learning algorithms


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Microarray data provides quantitative information about the transcription profile of cells. To analyse microarray datasets, methodology of machine learning has increasingly attracted bioinformatics researchers. Some approaches of machine learning are widely used to classify and mine biological datasets. However, many gene expression datasets are extremely high dimensionality, traditional machine learning methods cannot be applied effectively and efficiently. This paper proposes a robust algorithm to find out rule groups to classify gene expression datasets. Unlike the most classification algorithms, which select dimensions (genes) heuristically to form rules groups to identify classes such as cancerous and normal tissues, our algorithm guarantees finding out best-k dimensions (genes) to form rule groups for the classification of expression datasets. Our experiments show that the rule groups obtained by our algorithm have higher accuracy than that of other classification approaches.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the last decade, the Internet email has become one of the primary method of communication used by everyone for the exchange of ideas and information. However, in recent years, along with the rapid growth of the Internet and email, there has been a dramatic growth in spam. Classifications algorithms have been successfully used to filter spam, but with a certain amount of false positive trade-offs. This problem is mainly caused by the dynamic nature of spam content, spam delivery strategies, as well as the diversification of the classification algorithms. This paper presents an approach of email classification to overcome the burden of analyzing technique of GL (grey list) analyser as further refinements of our previous multi-classifier based email classification [10]. In this approach, we introduce a “majority voting grey list (MVGL)” analyzing technique with two different variations which will analyze only the product of GL emails. Our empirical evidence proofs the improvements of this approach, in terms of complexity and cost, compared to existing GL analyser. This approach also overcomes the limitation of human interaction of existing analyzing technique.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The thesis investigates various machine learning approaches to reducing data dimensionality, and studies the impact of asymmetric data on learning in image retrieval. Efficient algorithms are proposed to reduce the data dimensionality. Integration strategies for one-class classification are designed to address asymmetric data issue and improve retrieval effectiveness.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Missing data imputation is a key issue in learning from incomplete data. Various techniques have been developed with great successes on dealing with missing values in data sets with homogeneous attributes (their independent attributes are all either continuous or discrete). This paper studies a new setting of missing data imputation, i.e., imputing missing data in data sets with heterogeneous attributes (their independent attributes are of different types), referred to as imputing mixed-attribute data sets. Although many real applications are in this setting, there is no estimator designed for imputing mixed-attribute data sets. This paper first proposes two consistent estimators for discrete and continuous missing target values, respectively. And then, a mixture-kernel-based iterative estimator is advocated to impute mixed-attribute data sets. The proposed method is evaluated with extensive experiments compared with some typical algorithms, and the result demonstrates that the proposed approach is better than these existing imputation methods in terms of classification accuracy and root mean square error (RMSE) at different missing ratios.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of instance selection is to identify which instances (examples, patterns) in a large dataset should be selected as representatives of the entire dataset, without significant loss of information. When a machine learning method is applied to the reduced dataset, the accuracy of the model should not be significantly worse than if the same method were applied to the entire dataset. The reducibility of any dataset, and hence the success of instance selection methods, surely depends on the characteristics of the dataset, as well as the machine learning method. This paper adopts a meta-learning approach, via an empirical study of 112 classification datasets from the UCI Repository [1], to explore the relationship between data characteristics, machine learning methods, and the success of instance selection method.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper focuses on the choice of a supervised learning algorithm and possible data preprocessing in the domain of data-driven haptic simulation. This is done through a comparison of the performance of different supervised learning techniques with and without data preprocessing. The simulation of haptic interactions with deformable objects using data-driven methods has emerged as an alternative to parametric methods. The accuracy of the simulation depends on the empirical data and the learning method. Several methods were suggested in the literature and here we provide a comparison between their performance and applicability to this domain. We selected four examples to be compared: singular learning mechanism which is artificial neural networks (ANN), attribute selection followed by ANN learning process, ensemble of multiple learning techniques, and attribute selection followed by the learning ensemble. These methods performance was compared in the domain of simulating multiple interactions with a deformable object with nonlinear material behavior.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow ƒ1 and ƒ2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, ƒ1 and ƒ2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In multi-agent systems, most of the time, an agent does not have complete information about the preferences and decision making processes of other agents. This prevents even the cooperative agents from making coordinated choices, purely due to their ignorance of what others want. To overcome this problem, traditional coordination methods rely heavily on inter-agent communication, and thus become very inefficient when communication is costly or simply not desirable (e.g. to preserve privacy). In this paper, we propose the use of learning to complement communication in acquiring knowledge about other agents. We augment the communication-intensive negotiating agent architecture with a learning module, implemented as a Bayesian classifier. This allows our agents to incrementally update models of other agents' preferences from past negotiations with them. Based on these models, the agents can make sound predictions about others' preferences, thus reducing the need for communication in their future interactions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Rapid growth of technical developments has created huge challenges for microphone forensics - a sub-category of audio forensic science, because of the availability of numerous digital recording devices and massive amount of recording data. Demand for fast and efficient methods to assure integrity and authenticity of information is becoming more and more important in criminal investigation nowadays. Machine learning has emerged as an important technique to support audio analysis processes of microphone forensic practitioners. However, its application to real life situations using supervised learning is still facing great challenges due to expensiveness in collecting data and updating system. In this paper, we introduce a new machine learning approach which is called One-class Classification (OCC) to be applied to microphone forensics; we demonstrate its capability on a corpus of audio samples collected from several microphones. In addition, we propose a representative instance classification framework (RICF) that can effectively improve performance of OCC algorithms for recording signal with noise. Experiment results and analysis indicate that OCC has the potential to benefit microphone forensic practitioners in developing new tools and techniques for effective and efficient analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, an Evolutionary Artificial Neural Network (EANN), which combines the Fuzzy ARTMAP (FAM) neural network and a hybrid Chaos Genetic Algorithm (CGA), is proposed for undertaking pattern classification tasks. The hybrid CGA is a modified version of the hybrid real-coded genetic algorithms that includes a Chaotic Mapping Operator (CMO) in its search and adaptation process. It is used to evolve the connection weights in FAM, and the resulting EANN is known as FAM-hybrid CGA. The CMO in the hybrid CGA is used to generate a group of chromosomes that incorporates the characteristics of chaos. The chromosomes are then adapted with an arbitrary small amount of variation in every generation. As the evolution procedure proceeds, chromosomes with considerable differences are produced. Such chromosomes, which are located at different regions of interest in the solution space, are able to provide good solutions to undertake search and adaption problems. The effectiveness of the proposed FAM-hybrid CGA model is first evaluated using benchmark medical data sets from the UCI machine learning repository. Its applicability to medical decision support is then demonstrated using a real database of patient records with suspected Acute Coronary Syndrome. The results indicate that FAM-hybrid CGA is able to outperform its neural network counterpart (i.e., FAM), and it can be employed as a useful pattern classification tool for tackling medical decision support tasks.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In named entity recognition (NER) for biomedical literature, approaches based on combined classifiers have demonstrated great performance improvement compared to a single (best) classifier. This is mainly owed to sufficient level of diversity exhibited among classifiers, which is a selective property of classifier set. Given a large number of classifiers, how to select different classifiers to put into a classifier-ensemble is a crucial issue of multiple classifier-ensemble design. With this observation in mind, we proposed a generic genetic classifier-ensemble method for the classifier selection in biomedical NER. Various diversity measures and majority voting are considered, and disjoint feature subsets are selected to construct individual classifiers. A basic type of individual classifier – Support Vector Machine (SVM) classifier is adopted as SVM-classifier committee. A multi-objective Genetic algorithm (GA) is employed as the classifier selector to facilitate the ensemble classifier to improve the overall sample classification accuracy. The proposed approach is tested on the benchmark dataset – GENIA version 3.02 corpus, and compared with both individual best SVM classifier and SVM-classifier ensemble algorithm as well as other machine learning methods such as CRF, HMM and MEMM. The results show that the proposed approach outperforms other classification algorithms and can be a useful method for the biomedical NER problem.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This chapter addresses the exploitation of a supervised machine learning technique to automatically induce Arabic-to-English transfer rules from chunks of parallel aligned linguistic resources. The induced structural transfer rules encode the linguistic translation knowledge for converting an Arabic syntactic structure into a target English syntactic structure. These rules are going to be an integral part of an Arabic-English transfer-based machine translation. Nevertheless, a novel morphological rule induction method is employed for learning Arabic morphological rules that are applied in our Arabic morphological analyzer. To demonstrate the capability of the automated rule induction technique, we conducted rule-based translation experiments that use induced rules from a relatively small data set. The translation quality of the hybrid translation experiments achieved good results in terms of WER.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper studies efficient learning with respect to mind changes. Our starting point is the idea that a learner that is efficient with respect to mind changes minimizes mind changes not only globally in the entire learning problem, but also locally in subproblems after receiving some evidence. Formalizing this idea leads to the notion of uniform mind change optimality. We characterize the structure of language classes that can be identified with at most α mind changes by some learner (not necessarily effective): A language class L is identifiable with α mind changes iff the accumulation order of L is at most α. Accumulation order is a classic concept from point-set topology. To aid the construction of learning algorithms, we show that the characteristic property of uniformly mind change optimal learners is that they output conjectures (languages) with maximal accumulation order. We illustrate the theory by describing mind change optimal learners for various problems such as identifying linear subspaces and one-variable patterns.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Efficient management of chronic diseases is critical in modern health care. We consider diabetes mellitus, and our ongoing goal is to examine how machine learning can deliver information for clinical efficiency. The challenge is to aggregate highly heterogeneous sources including demographics, diagnoses, pathologies and treatments, and extract similar groups so that care plans can be designed. To this end, we extend our recent model, the mixed-variate restricted Boltzmann machine (MV.RBM), as it seamlessly integrates multiple data types for each patient aggregated over time and outputs a homogeneous representation called "latent profile" that can be used for patient clustering, visualisation, disease correlation analysis and prediction. We demonstrate that the method outperforms all baselines on these tasks - the primary characteristics of patients in the same groups are able to be identified and the good result can be achieved for the diagnosis codes prediction.