217 resultados para regression algorithm
Resumo:
A novel two-stage construction algorithm for linear-in-the-parameters classifier is proposed, aiming at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage to construct a sparse linear-in-the-parameters classifier. For the first stage learning of generating the prefiltered signal, a two-level algorithm is introduced to maximise the model's generalisation capability, in which an elastic net model identification algorithm using singular value decomposition is employed at the lower level while the two regularisation parameters are selected by maximising the Bayesian evidence using a particle swarm optimization algorithm. Analysis is provided to demonstrate how “Occam's razor” is embodied in this approach. The second stage of sparse classifier construction is based on an orthogonal forward regression with the D-optimality algorithm. Extensive experimental results demonstrate that the proposed approach is effective and yields competitive results for noisy data sets.
Resumo:
A new sparse kernel density estimator is introduced. Our main contribution is to develop a recursive algorithm for the selection of significant kernels one at time using the minimum integrated square error (MISE) criterion for both kernel selection. The proposed approach is simple to implement and the associated computational cost is very low. Numerical examples are employed to demonstrate that the proposed approach is effective in constructing sparse kernel density estimators with competitive accuracy to existing kernel density estimators.
Resumo:
We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.
Resumo:
AIMS: The objective of the present investigation was to examine the relationship of three polymorphisms, Thr394Thr, Gly482Ser and +A2962G, of the peroxisome proliferator activated receptor-gamma co-activator-1 alpha (PGC-1alpha) gene with Type 2 diabetes in Asian Indians. METHODS: The study group comprised 515 Type 2 diabetic and 882 normal glucose tolerant subjects chosen from the Chennai Urban Rural Epidemiology Study, an ongoing population-based study in southern India. The three polymorphisms were genotyped using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP). Haplotype frequencies were estimated using an expectation-maximization (EM) algorithm. Linkage disequilibrium was estimated from the estimates of haplotypic frequencies. RESULTS: The three polymorphisms studied were not in linkage disequilibrium. With respect to the Thr394Thr polymorphism, 20% of the Type 2 diabetic patients (103/515) had the GA genotype compared with 12% of the normal glucose tolerance (NGT) subjects (108/882) (P = 0.0004). The frequency of the A allele was also higher in Type 2 diabetic subjects (0.11) compared with NGT subjects (0.07) (P = 0.002). Regression analysis revealed the odds ratio for Type 2 diabetes for the susceptible genotype (XA) to be 1.683 (95% confidence intervals: 1.264-2.241, P = 0.0004). Age adjusted glycated haemoglobin (P = 0.003), serum cholesterol (P = 0.001) and low-density lipoprotein (LDL) cholesterol (P = 0.001) levels and systolic blood pressure (P = 0.001) were higher in the NGT subjects with the XA genotype compared with GG genotype. There were no differences in genotype or allelic distribution between the Type 2 diabetic and NGT subjects with respect to the Gly482Ser and +A2962G polymorphisms. CONCLUSIONS: The A allele of Thr394Thr (G --> A) polymorphism of the PGC-1 gene is associated with Type 2 diabetes in Asian Indian subjects and the XA genotype confers 1.6 times higher risk for Type 2 diabetes compared with the GG genotype in this population.
Resumo:
In this article, we illustrate experimentally an important consequence of the stochastic component in choice behaviour which has not been acknowledged so far. Namely, its potential to produce ‘regression to the mean’ (RTM) effects. We employ a novel approach to individual choice under risk, based on repeated multiple-lottery choices (i.e. choices among many lotteries), to show how the high degree of stochastic variability present in individual decisions can distort crucially certain results through RTM effects. We demonstrate the point in the context of a social comparison experiment.
Resumo:
Reinforcing the Low Voltage (LV) distribution network will become essential to ensure it remains within its operating constraints as demand on the network increases. The deployment of energy storage in the distribution network provides an alternative to conventional reinforcement. This paper presents a control methodology for energy storage to reduce peak demand in a distribution network based on day-ahead demand forecasts and historical demand data. The control methodology pre-processes the forecast data prior to a planning phase to build in resilience to the inevitable errors between the forecasted and actual demand. The algorithm uses no real time adjustment so has an economical advantage over traditional storage control algorithms. Results show that peak demand on a single phase of a feeder can be reduced even when there are differences between the forecasted and the actual demand. In particular, results are presented that demonstrate when the algorithm is applied to a large number of single phase demand aggregations that it is possible to identify which of these aggregations are the most suitable candidates for the control methodology.
Resumo:
The equations of Milsom are evaluated, giving the ground range and group delay of radio waves propagated via the horizontally stratified model ionosphere proposed by Bradley and Dudeney. Expressions for the ground range which allow for the effects of the underlying E- and F1-regions are used to evaluate the basic maximum usable frequency or M-factors for single F-layer hops. An algorithm for the rapid calculation of the M-factor at a given range is developed, and shown to be accurate to within 5%. The results reveal that the M(3000)F2-factor scaled from vertical-incidence ionograms using the standard URSI procedure can be up to 7.5% in error. A simple addition to the algorithm effects a correction to ionogram values to make these accurate to 0.5%.
Resumo:
Classical regression methods take vectors as covariates and estimate the corresponding vectors of regression parameters. When addressing regression problems on covariates of more complex form such as multi-dimensional arrays (i.e. tensors), traditional computational models can be severely compromised by ultrahigh dimensionality as well as complex structure. By exploiting the special structure of tensor covariates, the tensor regression model provides a promising solution to reduce the model’s dimensionality to a manageable level, thus leading to efficient estimation. Most of the existing tensor-based methods independently estimate each individual regression problem based on tensor decomposition which allows the simultaneous projections of an input tensor to more than one direction along each mode. As a matter of fact, multi-dimensional data are collected under the same or very similar conditions, so that data share some common latent components but can also have their own independent parameters for each regression task. Therefore, it is beneficial to analyse regression parameters among all the regressions in a linked way. In this paper, we propose a tensor regression model based on Tucker Decomposition, which identifies not only the common components of parameters across all the regression tasks, but also independent factors contributing to each particular regression task simultaneously. Under this paradigm, the number of independent parameters along each mode is constrained by a sparsity-preserving regulariser. Linked multiway parameter analysis and sparsity modeling further reduce the total number of parameters, with lower memory cost than their tensor-based counterparts. The effectiveness of the new method is demonstrated on real data sets.
Resumo:
This paper describes a fast integer sorting algorithm, herein referred as Bit-index sort, which is a non-comparison sorting algorithm for partial per-mutations, with linear complexity order in execution time. Bit-index sort uses a bit-array to classify input sequences of distinct integers, and exploits built-in bit functions in C compilers supported by machine hardware to retrieve the ordered output sequence. Results show that Bit-index sort outperforms in execution time to quicksort and counting sort algorithms. A parallel approach for Bit-index sort using two simultaneous threads is included, which obtains speedups up to 1.6.
Resumo:
This article is concerned with the liability of search engines for algorithmically produced search suggestions, such as through Google’s ‘autocomplete’ function. Liability in this context may arise when automatically generated associations have an offensive or defamatory meaning, or may even induce infringement of intellectual property rights. The increasing number of cases that have been brought before courts all over the world puts forward questions on the conflict of fundamental freedoms of speech and access to information on the one hand, and personality rights of individuals— under a broader right of informational self-determination—on the other. In the light of the recent judgment of the Court of Justice of the European Union (EU) in Google Spain v AEPD, this article concludes that many requests for removal of suggestions including private individuals’ information will be successful on the basis of EU data protection law, even absent prejudice to the person concerned.
Resumo:
Observations from the Heliospheric Imager (HI) instruments aboard the twin STEREO spacecraft have enabled the compilation of several catalogues of coronal mass ejections (CMEs), each characterizing the propagation of CMEs through the inner heliosphere. Three such catalogues are the Rutherford Appleton Laboratory (RAL)-HI event list, the Solar Stormwatch CME catalogue, and, presented here, the J-tracker catalogue. Each catalogue uses a different method to characterize the location of CME fronts in the HI images: manual identification by an expert, the statistical reduction of the manual identifications of many citizen scientists, and an automated algorithm. We provide a quantitative comparison of the differences between these catalogues and techniques, using 51 CMEs common to each catalogue. The time-elongation profiles of these CME fronts are compared, as are the estimates of the CME kinematics derived from application of three widely used single-spacecraft-fitting techniques. The J-tracker and RAL-HI profiles are most similar, while the Solar Stormwatch profiles display a small systematic offset. Evidence is presented that these differences arise because the RAL-HI and J-tracker profiles follow the sunward edge of CME density enhancements, while Solar Stormwatch profiles track closer to the antisunward (leading) edge. We demonstrate that the method used to produce the time-elongation profile typically introduces more variability into the kinematic estimates than differences between the various single-spacecraft-fitting techniques. This has implications for the repeatability and robustness of these types of analyses, arguably especially so in the context of space weather forecasting, where it could make the results strongly dependent on the methods used by the forecaster.
Resumo:
Sclera segmentation is shown to be of significant importance for eye and iris biometrics. However, sclera segmentation has not been extensively researched as a separate topic, but mainly summarized as a component of a broader task. This paper proposes a novel sclera segmentation algorithm for colour images which operates at pixel-level. Exploring various colour spaces, the proposed approach is robust to image noise and different gaze directions. The algorithm’s robustness is enhanced by a two-stage classifier. At the first stage, a set of simple classifiers is employed, while at the second stage, a neural network classifier operates on the probabilities’ space generated by the classifiers at stage 1. The proposed method was ranked the 1st in Sclera Segmentation Benchmarking Competition 2015, part of BTAS 2015, with a precision of 95.05% corresponding to a recall of 94.56%.
Resumo:
Forecasting wind power is an important part of a successful integration of wind power into the power grid. Forecasts with lead times longer than 6 h are generally made by using statistical methods to post-process forecasts from numerical weather prediction systems. Two major problems that complicate this approach are the non-linear relationship between wind speed and power production and the limited range of power production between zero and nominal power of the turbine. In practice, these problems are often tackled by using non-linear non-parametric regression models. However, such an approach ignores valuable and readily available information: the power curve of the turbine's manufacturer. Much of the non-linearity can be directly accounted for by transforming the observed power production into wind speed via the inverse power curve so that simpler linear regression models can be used. Furthermore, the fact that the transformed power production has a limited range can be taken care of by employing censored regression models. In this study, we evaluate quantile forecasts from a range of methods: (i) using parametric and non-parametric models, (ii) with and without the proposed inverse power curve transformation and (iii) with and without censoring. The results show that with our inverse (power-to-wind) transformation, simpler linear regression models with censoring perform equally or better than non-linear models with or without the frequently used wind-to-power transformation.