11 resultados para regression algorithm

em Deakin Research Online - Australia


Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

METHODS: The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

RESULTS: After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

CONCLUSION: The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner’s choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper a fuzzy linear regression (FLR) model integrated with a genetic algorithm (GA) is proposed. The proposed GA-FLR model is applied to modeling of a stereo vision system. A set of empirical data from stereo vision object measurement is collected based on the full factorial design technique. Three regression models, namely ordinary least-squares regression (OLS), FLR, and GA-FLR, are developed, and with their performances compared. The results show that the proposed GA-FLR model performs better than OLS and FLR in modeling of a stereo vision system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two Dimensional Locality Preserving Projection (2D-LPP) is a recent extension of LPP, a popular face recognition algorithm. It has been shown that 2D-LPP performs better than PCA, 2D-PCA and LPP. However, the computational cost of 2D-LPP is high. This paper proposes a novel algorithm called Ridge Regression for Two Dimensional Locality Preserving Projection (RR- 2DLPP), which is an extension of 2D-LPP with the use of ridge regression. RR-2DLPP is comparable to 2DLPP in performance whilst having a lower computational cost. The experimental results on three benchmark face data sets - the ORL, Yale and FERET databases - demonstrate the effectiveness and efficiency of RR-2DLPP compared with other face recognition algorithms such as PCA, LPP, SR, 2D-PCA and 2D-LPP.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present novel ridge regression (RR) and kernel ridge regression (KRR) techniques for multivariate labels and apply the methods to the problem of face recognition. Motivated by the fact that the regular simplex vertices are separate points with highest degree of symmetry, we choose such vertices as the targets for the distinct individuals in recognition and apply RR or KRR to map the training face images into a face subspace where the training images from each individual will locate near their individual targets. We identify the new face image by mapping it into this face subspace and comparing its distance to all individual targets. An efficient cross-validation algorithm is also provided for selecting the regularization and kernel parameters. Experiments were conducted on two face databases and the results demonstrate that the proposed algorithm significantly outperforms the three popular linear face recognition techniques (Eigenfaces, Fisherfaces and Laplacianfaces) and also performs comparably with the recently developed Orthogonal Laplacianfaces with the advantage of computational speed. Experimental results also demonstrate that KRR outperforms RR as expected since KRR can utilize the nonlinear structure of the face images. Although we concentrate on face recognition in this paper, the proposed method is general and may be applied for general multi-category classification problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A hybrid neural network model, based on the fusion of fuzzy adaptive resonance theory (FA ART) and the general regression neural network (GRNN), is proposed in this paper. Both FA and the GRNN are incremental learning systems and are very fast in network training. The proposed hybrid model, denoted as GRNNFA, is able to retain these advantages and, at the same time, to reduce the computational requirements in calculating and storing information of the kernels. A clustering version of the GRNN is designed with data compression by FA for noise removal. An adaptive gradient-based kernel width optimization algorithm has also been devised. Convergence of the gradient descent algorithm can be accelerated by the geometric incremental growth of the updating factor. A series of experiments with four benchmark datasets have been conducted to assess and compare effectiveness of GRNNFA with other approaches. The GRNNFA model is also employed in a novel application task for predicting the evacuation time of patrons at typical karaoke centers in Hong Kong in the event of fire. The results positively demonstrate the applicability of GRNNFA in noisy data regression problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Robust regression in statistics leads to challenging optimization problems. Here, we study one such problem, in which the objective is non-smooth, non-convex and expensive to calculate. We study the numerical performance of several derivative-free optimization algorithms with the aim of computing robust multivariate estimators. Our experiences demonstrate that the existing algorithms often fail to deliver optimal solutions. We introduce three new methods that use Powell's derivative-free algorithm. The proposed methods are reliable and can be used when processing very large data sets containing outliers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Karnik-Mendel (KM) algorithm is the most used and researched type reduction (TR) algorithm in literature. This algorithm is iterative in nature and despite consistent long term effort, no general closed form formula has been found to replace this computationally expensive algorithm. In this research work, we demonstrate that the outcome of KM algorithm can be approximated by simple linear regression techniques. Since most of the applications will have a fixed range of inputs with small scale variations, it is possible to handle those complexities in design phase and build a fuzzy logic system (FLS) with low run time computational burden. This objective can be well served by the application of regression techniques. This work presents an overview of feasibility of regression techniques for design of data-driven type reducers while keeping the uncertainty bound in FLS intact. Simulation results demonstrates the approximation error is less than 2%. Thus our work preserve the essence of Karnik-Mendel algorithm and serves the requirement of low
computational complexities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Karnik-Mendel (KM) algorithm is the most widely used type reduction (TR) method in literature for the design of interval type-2 fuzzy logic systems (IT2FLS). Its iterative nature for finding left and right switch points is its Achilles heel. Despite a decade of research, none of the alternative TR methods offer uncertainty measures equivalent to KM algorithm. This paper takes a data-driven approach to tackle the computational burden of this algorithm while keeping its key features. We propose a regression method to approximate left and right switch points found by KM algorithm. Approximator only uses the firing intervals, rnles centroids, and FLS strnctural features as inputs. Once training is done, it can precisely approximate the left and right switch points through basic vector multiplications. Comprehensive simulation results demonstrate that the approximation accuracy for a wide variety of FLSs is 100%. Flexibility, ease of implementation, and speed are other features of the proposed method.