113 resultados para machine learning algorithms


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Computational Intelligence (CI) models comprise robust computing methodologies with a high level of machine learning quotient. CI models, in general, are useful for designing computerized intelligent systems/machines that possess useful characteristics mimicking human behaviors and capabilities in solving complex tasks, e.g., learning, adaptation, and evolution. Examples of some popular CI models include fuzzy systems, artificial neural networks, evolutionary algorithms, multi-agent systems, decision trees, rough set theory, knowledge-based systems, and hybrid of these models. This special issue highlights how different computational intelligence models, coupled with other complementary techniques, can be used to handle problems encountered in image processing and information reasoning.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Artificial neural networks are an effective means of allowing software agents to learn about and filter aspects of their domain. In this paper we explore the use of artificial neural networks in the context of dance performance. The software agent’s neural network is presented with movement in the form of motion capture streams, both pre-recorded and live. Learning can be viewed as analogous to rehearsal, recognition and response to performance. The interrelationship between the software agent and dancer throughout the process is considered as a potential means of allowing the agent to function beyond its limited self-contained capability.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Monotonicity preserving interpolation and approximation have received substantial attention in the last thirty years because of their numerous applications in computer aided-design, statistics, and machine learning [9, 10, 19]. Constrained splines are particularly popular because of their flexibility in modeling different geometrical shapes, sound theoretical properties, and availability of numerically stable algorithms [9,10,26]. In this work we examine parallelization and adaptation for GPUs of a few algorithms of monotone spline interpolation and data smoothing, which arose in the context of estimating probability distributions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The recent years have seen extensive work on statistics-based network traffic classification using machine learning (ML) techniques. In the particular scenario of learning from unlabeled traffic data, some classic unsupervised clustering algorithms (e.g. K-Means and EM) have been applied but the reported results are unsatisfactory in terms of low accuracy. This paper presents a novel approach for the task, which performs clustering based on Random Forest (RF) proximities instead of Euclidean distances. The approach consists of two steps. In the first step, we derive a proximity measure for each pair of data points by performing a RF classification on the original data and a set of synthetic data. In the next step, we perform a K-Medoids clustering to partition the data points into K groups based on the proximity matrix. Evaluations have been conducted on real-world Internet traffic traces and the experimental results indicate that the proposed approach is more accurate than the previous methods.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. To realize our framework, we introduce a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real datasets show the superiority of our model to other recent multi-task learning methods. Copyright 2013 by the author(s).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Statistics-based Internet traffic classification using machine learning techniques has attracted extensive research interest lately, because of the increasing ineffectiveness of traditional port-based and payload-based approaches. In particular, unsupervised learning, that is, traffic clustering, is very important in real-life applications, where labeled training data are difficult to obtain and new patterns keep emerging. Although previous studies have applied some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. In order to improve the accuracy of traffic clustering, we propose a constrained clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Moreover, we study the effects of unsupervised feature discretization on traffic clustering by using a fundamental binning method. A number of real-world Internet traffic traces have been used in our evaluation, and the results show that the proposed approach not only improves the quality of traffic clusters in terms of overall accuracy and per-class metrics, but also speeds up the convergence.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Privacy preserving in data release and mining is a hot topic in the information security field currently. As a new privacy notion, differential privacy (DP) has grown in popularity recently due to its rigid and provable privacy guarantee. After analyzing the advantage of differential privacy model relative to the traditional ones, this paper surveys the theory of differential privacy and its application on two aspects, privacy preserving data release (PPDR) and privacy preserving data mining (PPDM). In PPDR, we introduce the DP-based data release methodologies in interactive/non-interactive settings and compare them in terms of accuracy and sample complexity. In PPDM, we mainly summarize the implementation of DP in various data mining algorithms with interface-based/fully access-based modes as well as evaluating the performance of the algorithms. We finally review other applications of DP in various fields and discuss the future research directions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the arrival of big data era, the Internet traffic is growing exponentially. A wide variety of applications arise on the Internet and traffic classification is introduced to help people manage the massive applications on the Internet for security monitoring and quality of service purposes. A large number of Machine Learning (ML) algorithms are introduced to deal with traffic classification. A significant challenge to the classification performance comes from imbalanced distribution of data in traffic classification system. In this paper, we proposed an Optimised Distance-based Nearest Neighbor (ODNN), which has the capability of improving the classification performance of imbalanced traffic data. We analyzed the proposed ODNN approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments were implemented on the real-world traffic dataset. The results show that the performance of “small classes” can be improved significantly even only with small number of training data and the performance of “large classes” remains stable.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Urban traffic as one of the most important challenges in modern city life needs practically effective and efficient solutions. Artificial intelligence methods have gained popularity for optimal traffic light control. In this paper, a review of most important works in the field of controlling traffic signal timing, in particular studies focusing on Q-learning, neural network, and fuzzy logic system are presented. As per existing literature, the intelligent methods show a higher performance compared to traditional controlling methods. However, a study that compares the performance of different learning methods is not published yet. In this paper, the aforementioned computational intelligence methods and a fixed-time method are implemented to set signals times and minimize total delays for an isolated intersection. These methods are developed and compared on a same platform. The intersection is treated as an intelligent agent that learns to propose an appropriate green time for each phase. The appropriate green time for all the intelligent controllers are estimated based on the received traffic information. A comprehensive comparison is made between the performance of Q-learning, neural network, and fuzzy logic system controller for two different scenarios. The three intelligent learning controllers present close performances with multiple replication orders in two scenarios. On average Q-learning has 66%, neural network 71%, and fuzzy logic has 74% higher performance compared to the fixed-time controller.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND: The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors. METHODS: Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r. RESULTS: For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93. CONCLUSIONS: The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis advances several theoretical and practical aspects of the recently introduced restricted Boltzmann machine - a powerful probabilistic and generative framework for modelling data and learning representations. The contributions of this study represent a systematic and common theme in learning structured representations from complex data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The recent upsurge in microbial genome data has revealed that hemoglobin-like (HbL) proteins may be widely distributed among bacteria and that some organisms may carry more than one HbL encoding gene. However, the discovery of HbL proteins has been limited to a small number of bacteria only. This study describes the prediction of HbL proteins and their domain classification using a machine learning approach. Support vector machine (SVM) models were developed for predicting HbL proteins based upon amino acid composition (AC), dipeptide composition (DC), hybrid method (AC + DC), and position specific scoring matrix (PSSM). In addition, we introduce for the first time a new prediction method based on max to min amino acid residue (MM) profiles. The average accuracy, standard deviation (SD), false positive rate (FPR), confusion matrix, and receiver operating characteristic (ROC) were analyzed. We also compared the performance of our proposed models in homology detection databases. The performance of the different approaches was estimated using fivefold cross-validation techniques. Prediction accuracy was further investigated through confusion matrix and ROC curve analysis. All experimental results indicate that the proposed BacHbpred can be a perspective predictor for determination of HbL related proteins. BacHbpred, a web tool, has been developed for HbL prediction.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. Our motivation rests in the Thurstonian view that many discrete data types can be considered as being generated from a subset of underlying latent continuous variables, and in the observation that each realisation of a discrete type imposes certain inequalities on those variables. Thus learning and inference in TBM reduce to making sense of a set of inequalities. Our proposed TBM naturally supports the following types: Gaussian, intervals, censored, binary, categorical, muticategorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures; namely handwritten digit recognition, collaborative filtering and complex social survey analysis. Copyright 2013 by the author(s).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis develops machine learning techniques to discover activities and contexts from pervasive sensor data. These techniques are especially suitable for streaming sensor data as they can infer the context space automatically. They are applicable in many real world applications such as activity monitoring or organization management.