178 resultados para machine learning


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Monotonicity preserving interpolation and approximation have received substantial attention in the last thirty years because of their numerous applications in computer aided-design, statistics, and machine learning [9, 10, 19]. Constrained splines are particularly popular because of their flexibility in modeling different geometrical shapes, sound theoretical properties, and availability of numerically stable algorithms [9,10,26]. In this work we examine parallelization and adaptation for GPUs of a few algorithms of monotone spline interpolation and data smoothing, which arose in the context of estimating probability distributions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The recent years have seen extensive work on statistics-based network traffic classification using machine learning (ML) techniques. In the particular scenario of learning from unlabeled traffic data, some classic unsupervised clustering algorithms (e.g. K-Means and EM) have been applied but the reported results are unsatisfactory in terms of low accuracy. This paper presents a novel approach for the task, which performs clustering based on Random Forest (RF) proximities instead of Euclidean distances. The approach consists of two steps. In the first step, we derive a proximity measure for each pair of data points by performing a RF classification on the original data and a set of synthetic data. In the next step, we perform a K-Medoids clustering to partition the data points into K groups based on the proximity matrix. Evaluations have been conducted on real-world Internet traffic traces and the experimental results indicate that the proposed approach is more accurate than the previous methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the arrival of Big Data Era, properly utilizing the power of big data is becoming increasingly essential for the strength and competitiveness of businesses and organizations. We are facing grand challenges from big data from different perspectives, such as processing, communication, security, and privacy. In this talk, we discuss the big data challenges in network traffic classification and our solutions to the challenges. The significance of the research lies in the fact that each year the network traffic increase exponentially on the current Internet. Traffic classification has wide applications in network management, from security monitoring to quality of service measurements. Recent research tends to apply machine-learning techniques to flow statistical feature based classification methods. In this talk, we propose a series of novel approaches for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. We analyze the new classification approaches and their performance benefit from both theoretical and empirical perspectives. A large number of experiments are carried out on two real-world traffic datasets to validate the proposed approach. The results show the traffic classification performance can be improved significantly even under the extreme difficult circumstance of very few training samples. Our work has significant impact on security applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The track cycling Omnium is a multi-event competition that has recently been expanded to include the Elimination Race (ER), which presents a unique set of physical and tactical demands. The purpose of this research was to characterise the performance attributes of successful and unsuccessful cyclists in the ER, that are also predictive of performance. Video recordings of four international level ERs were analysed. The performance attributes measured related to the cyclists’ velocity and two dimensional position in the peloton. The average velocity of the peloton up to lap 30 (of 50) was relatively high and consistent (52.2±1.5 km/h). After lap 30, there was a significant (p<0.001) change in velocity (49.9±2.4 km/h), characterised by more fluctuations in lap-to-lap velocity. Successful ER cyclists adopted a tactic of remaining in the middle of the peloton, in the lower lanes of the velodrome, thus avoiding the risk of elimination at the rear and the extra effort required to remain on the front of the peloton. Unsuccessful cyclists tended to reside in the rear and upper (higher) portions of the peloton, risking elimination more often and having to ride faster than those in the lower lanes of the velodrome. The physiological demands of the Elimination Race that are determined by velocity, vary throughout the Elimination Race and the pattern of movement within the peloton is different for successful and unsuccessful cyclists. The findings of the present study may confirm some aspects of race tactics that are currently thought to be optimal, but they also reveal novel information that is useful to coaches and cyclists who compete in the Elimination Race.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The proposed approach based on physiological characteristics of sitting behaviours and sophisticated machine learning techniques would enable an effective and practical solution to driver fatigue prognosis since it is insensitive to the illumination of driving environment, non-obtrusive to driver, without violating driver’s privacy, more acceptable by drivers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Probabilistic topic models have become a standard in modern machine learning with wide applications in organizing and summarizing ‘documents’ in high-dimensional data such as images, videos, texts, gene expression data, and so on. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics than bag-of-word interpretation, but also more informative for classification tasks. This paper describes the Topic Model Kernel (TMK), a high dimensional mapping for Support Vector Machine classification of data generated from probabilistic topic models. The applicability of our proposed kernel is demonstrated in several classification tasks from real world datasets. We outperform existing kernels on the distributional features and give the comparative results on non-probabilistic data types.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis focuses on the data sparsity issue and the temporal dynamic issue in the context of collaborative filtering, and addresses them with imputation techniques, low-rank subspace techniques and optimizations techniques from the machine learning perspective. A comprehensive survey on the development of collaborative filtering techniques is also included.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The thesis addresses a number of critical problems in regard to fully automating the process of network traffic classification and protocol identification. Several effective solutions based on statistical analysis and machine learning techniques are proposed, which significantly reduce the requirements for human interventions in network traffic classification systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Automatic causal discovery is a challenge research with extraordinary significance in sceintific research and in many real world problems where recovery of causes and effects and their causality relationship is an essential task. This paper firstly introduces the causality and perspectives of causal discovery. Then it provides an anlaysis on the three major approaches that are proposed in the last decades for the automatic discovery of casual models from given data. Afterwards it presents a analysis on the capability and applicability of the different proposed approaches followed by a conclusion on the potentials and the future research. © 2013 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Healthcare plays an important role in promoting the general health and well-being of people around the world. The difficulty in healthcare data classification arises from the uncertainty and the high-dimensional nature of the medical data collected. This paper proposes an integration of fuzzy standard additive model (SAM) with genetic algorithm (GA), called GSAM, to deal with uncertainty and computational challenges. GSAM learning process comprises three continual steps: rule initialization by unsupervised learning using the adaptive vector quantization clustering, evolutionary rule optimization by GA and parameter tuning by the gradient descent supervised learning. Wavelet transformation is employed to extract discriminative features for high-dimensional datasets. GSAM becomes highly capable when deployed with small number of wavelet features as its computational burden is remarkably reduced. The proposed method is evaluated using two frequently-used medical datasets: the Wisconsin breast cancer and Cleveland heart disease from the UCI Repository for machine learning. Experiments are organized with a five-fold cross validation and performance of classification techniques are measured by a number of important metrics: accuracy, F-measure, mutual information and area under the receiver operating characteristic curve. Results demonstrate the superiority of the GSAM compared to other machine learning methods including probabilistic neural network, support vector machine, fuzzy ARTMAP, and adaptive neuro-fuzzy inference system. The proposed approach is thus helpful as a decision support system for medical practitioners in the healthcare practice.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Previous efforts in the prospective evaluation of individuals who experience attenuated psychotic symptoms have attempted to isolate mechanisms underlying the onset of full-threshold psychotic illness. In contrast, there has been little research investigating specific predictors of positive outcomes. In this study, we sought to determine biological and clinical factors associated with treatment response, here indexed by functional improvement in a pre-post examination of a 12-week randomized controlled intervention in individuals at ultra-high risk (UHR) for psychosis. Participants received either long-chain omega-3 (ω-3) polyunsaturated fatty acids (PUFAs) or placebo. To allow the determination of factors specifically relevant to each intervention, and to be able to contrast them, both treatment groups were investigated in parallel. Univariate linear regression analysis indicated that higher levels of erythrocyte membrane α-linolenic acid (ALA; the parent fatty acid of the ω-3 family) and more severe negative symptoms at baseline predicted subsequent functional improvement in the treatment group, whereas less severe positive symptoms and lower functioning at baseline were predictive in the placebo group. A multivariate machine learning analysis, known as Gaussian Process Classification (GPC), confirmed that baseline fatty acids predicted response to treatment in the ω-3 PUFA group with high levels of sensitivity, specificity and accuracy. In addition, GPC revealed that baseline fatty acids were predictive in the placebo group. In conclusion, our investigation indicates that UHR patients with higher levels of ALA may specifically benefit from ω-3 PUFA supplementation. In addition, multivariate machine learning analysis suggests that fatty acids could potentially be used to inform prognostic evaluations and treatment decisions at the level of the individual. Notably, multiple statistical analyses were conducted in a relatively small sample, limiting the conclusions that can be drawn from what we believe to be a first-of-its-kind study. Additional studies with larger samples are therefore needed to evaluate the generalizability of these findings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes a combination of fuzzy standard additive model (SAM) with wavelet features for medical diagnosis. Wavelet transformation is used to reduce the dimension of high-dimensional datasets. This helps to improve the convergence speed of supervised learning process of the fuzzy SAM, which has a heavy computational burden in high-dimensional data. Fuzzy SAM becomes highly capable when deployed with wavelet features. This combination remarkably reduces its computational training burden. The performance of the proposed methodology is examined for two frequently used medical datasets: the lump breast cancer and heart disease. Experiments are deployed with a five-fold cross validation. Results demonstrate the superiority of the proposed method compared to other machine learning methods including probabilistic neural network, support vector machine, fuzzy ARTMAP, and adaptive neuro-fuzzy inference system. Faster convergence but higher accuracy shows a win-win solution of the proposed approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multibeam echosounders (MBES) are increasingly becoming the tool of choice for marine habitat mapping applications. In turn, the rapid expansion of habitat mapping studies has resulted in a need for automated classification techniques to efficiently map benthic habitats, assess confidence in model outputs, and evaluate the importance of variables driving the patterns observed. The benthic habitat characterisation process often involves the analysis of MBES bathymetry, backscatter mosaic or angular response with observation data providing ground truth. However, studies that make use of the full range of MBES outputs within a single classification process are limited. We present an approach that integrates backscatter angular response with MBES bathymetry, backscatter mosaic and their derivatives in a classification process using a Random Forests (RF) machine-learning algorithm to predict the distribution of benthic biological habitats. This approach includes a method of deriving statistical features from backscatter angular response curves created from MBES data collated within homogeneous regions of a backscatter mosaic. Using the RF algorithm we assess the relative importance of each variable in order to optimise the classification process and simplify models applied. The results showed that the inclusion of the angular response features in the classification process improved the accuracy of the final habitat maps from 88.5% to 93.6%. The RF algorithm identified bathymetry and the angular response mean as the two most important predictors. However, the highest classification rates were only obtained after incorporating additional features derived from bathymetry and the backscatter mosaic. The angular response features were found to be more important to the classification process compared to the backscatter mosaic features. This analysis indicates that integrating angular response information with bathymetry and the backscatter mosaic, along with their derivatives, constitutes an important improvement for studying the distribution of benthic habitats, which is necessary for effective marine spatial planning and resource management.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An accurate Named Entity Recognition (NER) is important for knowledge discovery in text mining. This paper proposes an ensemble machine learning approach to recognise Named Entities (NEs) from unstructured and informal medical text. Specifically, Conditional Random Field (CRF) and Maximum Entropy (ME) classifiers are applied individually to the test data set from the i2b2 2010 medication challenge. Each classifier is trained using a different set of features. The first set focuses on the contextual features of the data, while the second concentrates on the linguistic features of each word. The results of the two classifiers are then combined. The proposed approach achieves an f-score of 81.8%, showing a considerable improvement over the results from CRF and ME classifiers individually which achieve f-scores of 76% and 66.3% for the same data set, respectively.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hidden patterns and contexts play an important part in intelligent pervasive systems. Most of the existing works have focused on simple forms of contexts derived directly from raw signals. High-level constructs and patterns have been largely neglected or remained under-explored in pervasive computing, mainly due to the growing complexity over time and the lack of efficient principal methods to extract them. Traditional parametric modeling approaches from machine learning find it difficult to discover new, unseen patterns and contexts arising from continuous growth of data streams due to its practice of training-then-prediction paradigm. In this work, we propose to apply Bayesian nonparametric models as a systematic and rigorous paradigm to continuously learn hidden patterns and contexts from raw social signals to provide basic building blocks for context-aware applications. Bayesian nonparametric models allow the model complexity to grow with data, fitting naturally to several problems encountered in pervasive computing. Under this framework, we use nonparametric prior distributions to model the data generative process, which helps towards learning the number of latent patterns automatically, adapting to changes in data and discovering never-seen-before patterns, contexts and activities. The proposed methods are agnostic to data types, however our work shall demonstrate to two types of signals: accelerometer activity data and Bluetooth proximal data. © 2014 IEEE.