145 resultados para Machine learning methods

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traffic congestion is one of the major problems in modern cities. This study applies machine learning methods to determine green times in order to minimize in an isolated intersection. Q-learning and neural networks are applied here to set signal light times and minimize total delays. It is assumed that an intersection behaves in a similar fashion to an intelligent agent learning how to set green times in each cycle based on traffic information. Here, a comparison between Q-learning and neural network is presented. In Q-learning, considering continuous green time requires a large state space, making the learning process practically impossible. In contrast to Q-learning methods, the neural network model can easily set the appropriate green time to fit the traffic demand. The performance of the proposed neural network is compared with two traditional alternatives for controlling traffic lights. Simulation results indicate that the application of the proposed method greatly reduces the total delay in the network compared to the alternative methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs.

OBJECTIVE: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence.

METHODS: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method.

RESULTS: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models.

CONCLUSIONS: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the increasing unreliability of traditional port-based methods, Internet traffic classification has attracted a lot of research efforts in recent years. Quite a lot of previous papers have focused on using statistical characteristics as discriminators and applying machine learning techniques to classify the traffic flows. In this paper, we propose a novel machine learning based approach where the features are extracted from packet payload instead of flow statistics. Specifically, every flow is represented by a feature vector, in which each item indicates the occurrence of a particular token, i.e.; a common substring, in the payload. We have applied various machine learning algorithms to evaluate the idea and used different feature selection schemes to identify the critical tokens. Experimental result based on a real-world traffic data set shows that the approach can achieve high accuracy with low overhead.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Graph matching is an important class of methods in pattern recognition. Typically, a graph representing an unknown pattern is matched with a database of models. If the database of model graphs is large, an additional factor in induced into the overall complexity of the matching process. Various techniques for reducing the influence of this additional factor have been described in the literature. In this paper we propose to extract simple features from a graph and use them to eliminate candidate graphs from the database. The most powerful set of features and a decision tree useful for candidate elimination are found by means of the C4.5 algorithm, which was originally proposed for inductive learning of classication rules. Experimental results are reported demonstrating that effcient candidate elimination can be achieved by the proposed procedure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An understanding of the distribution and extent of marine habitats is essential for the implementation of ecosystem-based management strategies. Historically this had been difficult in marine environments until the advancement of acoustic sensors. This study demonstrates the applicability of supervised learning techniques for benthic habitat characterization using angular backscatter response data. With the advancement of multibeam echo-sounder (MBES) technology, full coverage datasets of physical structure over vast regions of the seafloor are now achievable. Supervised learning methods typically applied to terrestrial remote sensing provide a cost-effective approach for habitat characterization in marine systems. However the comparison of the relative performance of different classifiers using acoustic data is limited. Characterization of acoustic backscatter data from MBES using four different supervised learning methods to generate benthic habitat maps is presented. Maximum Likelihood Classifier (MLC), Quick, Unbiased, Efficient Statistical Tree (QUEST), Random Forest (RF) and Support Vector Machine (SVM) were evaluated to classify angular backscatter response into habitat classes using training data acquired from underwater video observations. Results for biota classifications indicated that SVM and RF produced the highest accuracies, followed by QUEST and MLC, respectively. The most important backscatter data were from the moderate incidence angles between 30° and 50°. This study presents initial results for understanding how acoustic backscatter from MBES can be optimized for the characterization of marine benthic biological habitats. © 2012 by the authors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Structural MRI offers anatomical details and high sensitivity to pathological changes. It can demonstrate certain patterns of brain changes present at a structural level. Research to date has shown that volumetric analysis of brain regions has importance in depression detection. However, such analysis has had very minimal use in depression detection studies at individual level. Optimally combining various brain volumetric features/attributes, and summarizing the data into a distinctive set of variables remain difficult. This study investigates machine learning algorithms that automatically identify relevant data attributes for depression detection. Different machine learning techniques are studied for depression classification based on attributes extracted from structural MRI (sMRI) data. The attributes include volume calculated from whole brain, white matter, grey matter and hippocampus. Attributes subset selection is performed aiming to remove redundant attributes using three filtering methods and one hybrid method, in combination with ranker search algorithms. The highest average classification accuracy, obtained by using a combination of both SVM-EM and IG-Random Tree algorithms, is 85.23%. The classification approach implemented in this study can achieve higher accuracy than most reported studies using sMRI data, specifically for detection of depression.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using the prediction of cancer outcome as a model, we have tested the hypothesis that through analysing routinely collected digital data contained in an electronic administrative record (EAR), using machine-learning techniques, we could enhance conventional methods in predicting clinical outcomes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

METHODS: The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

RESULTS: After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

CONCLUSION: The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The popularity of Twitter attracts more and more spammers. Spammers send unwanted tweets to Twitter users to promote websites or services, which are harmful to normal users. In order to stop spammers, researchers have proposed a number of mechanisms. The focus of recent works is on the application of machine learning techniques into Twitter spam detection. However, tweets are retrieved in a streaming way, and Twitter provides the Streaming API for developers and researchers to access public tweets in real time. There lacks a performance evaluation of existing machine learning-based streaming spam detection methods. In this paper, we bridged the gap by carrying out a performance evaluation, which was from three different aspects of data, feature, and model. A big ground-truth of over 600 million public tweets was created by using a commercial URL-based security tool. For real-time spam detection, we further extracted 12 lightweight features for tweet representation. Spam detection was then transformed to a binary classification problem in the feature space and can be solved by conventional machine learning algorithms. We evaluated the impact of different factors to the spam detection performance, which included spam to nonspam ratio, feature discretization, training data size, data sampling, time-related data, and machine learning algorithms. The results show the streaming spam tweet detection is still a big challenge and a robust detection technique should take into account the three aspects of data, feature, and model.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

At first blush, user modeling appears to be a prime candidate for straightforward application of standard machine learning techniques. Observations of the user's behavior can provide training examples that a machine learning system can use to form a model designed to predict future actions. However, user modeling poses a number of challenges for machine learning that have hindered its application in user modeling, including: the need for large data sets; the need for labeled data; concept drift; and computational complexity. This paper examines each of these issues and reviews approaches to resolving them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Working in the UK, Sadler-Smith, Down and Lean, in their article “‘Modern’ learning methods: rhetoric and reality”, Personnel Review, Vol. 29 No. 4, 2000, pp. 474-90, have shown that distance learning methods are neither favoured nor perceived as effective by enterprises pursuing training that yields a competitive edge. They have suggested that these methods need to be integrated with other more conventional on-job training methods. This paper, based on Australian research, shows a tension between the requirements of flexible training methods based on distance learning methods, and the characteristics that typify learners and their workplaces. That identified tension is used to suggest how an integration of training methods may be effected in workplaces.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spam is commonly defined as unsolicited email messages and the goal of spam filtering is to distinguish between spam and legitimate email messages. Much work has been done to filter spam from legitimate emails using machine learning algorithm and substantial performance has been achieved with some amount of false positive (FP) tradeoffs. In the case of spam detection FP problem is unacceptable sometimes. In this paper, an adaptive spam filtering model has been proposed based on Machine learning (ML) algorithms which will get better accuracy by reducing FP problems. This model consists of individual and combined filtering approach from existing well known ML algorithms. The proposed model considers both individual and collective output and analyzes them by an analyzer. A dynamic feature selection (DFS) technique also proposed in this paper for getting better accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spam is commonly known as unsolicited or unwanted email messages in the Internet causing potential threat to Internet Security. Users spend a valuable amount of time deleting spam emails. More importantly, ever increasing spam emails occupy server storage space and consume network bandwidth. Keyword-based spam email filtering strategies will eventually be less successful to model spammer behavior as the spammer constantly changes their tricks to circumvent these filters. The evasive tactics that the spammer uses are patterns and these patterns can be modeled to combat spam. This paper investigates the possibilities of modeling spammer behavioral patterns by well-known classification algorithms such as Naïve Bayesian classifier (Naive Bayes), Decision Tree Induction (DTI) and Support Vector Machines (SVMs). Preliminary experimental results demonstrate a promising detection rate of around 92%, which is considerably an enhancement of performance compared to similar spammer behavior modeling research.