12 resultados para Selection Algorithms
em Aston University Research Archive
Resumo:
There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.
Resumo:
The standard reference clinical score quantifying average Parkinson's disease (PD) symptom severity is the Unified Parkinson's Disease Rating Scale (UPDRS). At present, UPDRS is determined by the subjective clinical evaluation of the patient's ability to adequately cope with a range of tasks. In this study, we extend recent findings that UPDRS can be objectively assessed to clinically useful accuracy using simple, self-administered speech tests, without requiring the patient's physical presence in the clinic. We apply a wide range of known speech signal processing algorithms to a large database (approx. 6000 recordings from 42 PD patients, recruited to a six-month, multi-centre trial) and propose a number of novel, nonlinear signal processing algorithms which reveal pathological characteristics in PD more accurately than existing approaches. Robust feature selection algorithms select the optimal subset of these algorithms, which is fed into non-parametric regression and classification algorithms, mapping the signal processing algorithm outputs to UPDRS. We demonstrate rapid, accurate replication of the UPDRS assessment with clinically useful accuracy (about 2 UPDRS points difference from the clinicians' estimates, p < 0.001). This study supports the viability of frequent, remote, cost-effective, objective, accurate UPDRS telemonitoring based on self-administered speech tests. This technology could facilitate large-scale clinical trials into novel PD treatments.
Resumo:
A formalism recently introduced by Prugel-Bennett and Shapiro uses the methods of statistical mechanics to model the dynamics of genetic algorithms. To be of more general interest than the test cases they consider. In this paper, the technique is applied to the subset sum problem, which is a combinatorial optimization problem with a strongly non-linear energy (fitness) function and many local minima under single spin flip dynamics. It is a problem which exhibits an interesting dynamics, reminiscent of stabilizing selection in population biology. The dynamics are solved under certain simplifying assumptions and are reduced to a set of difference equations for a small number of relevant quantities. The quantities used are the population's cumulants, which describe its shape, and the mean correlation within the population, which measures the microscopic similarity of population members. Including the mean correlation allows a better description of the population than the cumulants alone would provide and represents a new and important extension of the technique. The formalism includes finite population effects and describes problems of realistic size. The theory is shown to agree closely to simulations of a real genetic algorithm and the mean best energy is accurately predicted.
Resumo:
A theoretical model is presented which describes selection in a genetic algorithm (GA) under a stochastic fitness measure and correctly accounts for finite population effects. Although this model describes a number of selection schemes, we only consider Boltzmann selection in detail here as results for this form of selection are particularly transparent when fitness is corrupted by additive Gaussian noise. Finite population effects are shown to be of fundamental importance in this case, as the noise has no effect in the infinite population limit. In the limit of weak selection we show how the effects of any Gaussian noise can be removed by increasing the population size appropriately. The theory is tested on two closely related problems: the one-max problem corrupted by Gaussian noise and generalization in a perceptron with binary weights. The averaged dynamics can be accurately modelled for both problems using a formalism which describes the dynamics of the GA using methods from statistical mechanics. The second problem is a simple example of a learning problem and by considering this problem we show how the accurate characterization of noise in the fitness evaluation may be relevant in machine learning. The training error (negative fitness) is the number of misclassified training examples in a batch and can be considered as a noisy version of the generalization error if an independent batch is used for each evaluation. The noise is due to the finite batch size and in the limit of large problem size and weak selection we show how the effect of this noise can be removed by increasing the population size. This allows the optimal batch size to be determined, which minimizes computation time as well as the total number of training examples required.
Resumo:
We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler 'naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach.
Resumo:
Data visualization algorithms and feature selection techniques are both widely used in bioinformatics but as distinct analytical approaches. Until now there has been no method of measuring feature saliency while training a data visualization model. We derive a generative topographic mapping (GTM) based data visualization approach which estimates feature saliency simultaneously with the training of the visualization model. The approach not only provides a better projection by modeling irrelevant features with a separate noise model but also gives feature saliency values which help the user to assess the significance of each feature. We compare the quality of projection obtained using the new approach with the projections from traditional GTM and self-organizing maps (SOM) algorithms. The results obtained on a synthetic and a real-life chemoinformatics dataset demonstrate that the proposed approach successfully identifies feature significance and provides coherent (compact) projections. © 2006 IEEE.
Resumo:
Automatically generating maps of a measured variable of interest can be problematic. In this work we focus on the monitoring network context where observations are collected and reported by a network of sensors, and are then transformed into interpolated maps for use in decision making. Using traditional geostatistical methods, estimating the covariance structure of data collected in an emergency situation can be difficult. Variogram determination, whether by method-of-moment estimators or by maximum likelihood, is very sensitive to extreme values. Even when a monitoring network is in a routine mode of operation, sensors can sporadically malfunction and report extreme values. If this extreme data destabilises the model, causing the covariance structure of the observed data to be incorrectly estimated, the generated maps will be of little value, and the uncertainty estimates in particular will be misleading. Marchant and Lark [2007] propose a REML estimator for the covariance, which is shown to work on small data sets with a manual selection of the damping parameter in the robust likelihood. We show how this can be extended to allow treatment of large data sets together with an automated approach to all parameter estimation. The projected process kriging framework of Ingram et al. [2007] is extended to allow the use of robust likelihood functions, including the two component Gaussian and the Huber function. We show how our algorithm is further refined to reduce the computational complexity while at the same time minimising any loss of information. To show the benefits of this method, we use data collected from radiation monitoring networks across Europe. We compare our results to those obtained from traditional kriging methodologies and include comparisons with Box-Cox transformations of the data. We discuss the issue of whether to treat or ignore extreme values, making the distinction between the robust methods which ignore outliers and transformation methods which treat them as part of the (transformed) process. Using a case study, based on an extreme radiological events over a large area, we show how radiation data collected from monitoring networks can be analysed automatically and then used to generate reliable maps to inform decision making. We show the limitations of the methods and discuss potential extensions to remedy these.
Resumo:
Many of the applications of geometric modelling are concerned with the computation of well-defined properties of the model. The applications which have received less attention are those which address questions to which there is no unique answer. This thesis describes such an application: the automatic production of a dimensioned engineering drawing. One distinctive feature of this operation is the requirement for sophisticated decision-making algorithms at each stage in the processing of the geometric model. Hence, the thesis is focussed upon the design, development and implementation of such algorithms. Various techniques for geometric modelling are briefly examined and then details are given of the modelling package that was developed for this project, The principles of orthographic projection and dimensioning are treated and some published work on the theory of dimensioning is examined. A new theoretical approach to dimensioning is presented and discussed. The existing body of knowledge on decision-making is sampled and the author then shows how methods which were originally developed for management decisions may be adapted to serve the purposes of this project. The remainder of the thesis is devoted to reports on the development of decision-making algorithms for orthographic view selection, sectioning and crosshatching, the preparation of orthographic views with essential hidden detail, and two approaches to the actual insertion of dimension lines and text. The thesis concludes that the theories of decision-making can be applied to work of this kind. It may be possible to generate computer solutions that are closer to the optimum than some man-made dimensioning schemes. Further work on important details is required before a commercially acceptable package could be produced.
Resumo:
This article presents a potential method to assist developers of future bioenergy schemes when selecting from available suppliers of biomass materials. The method aims to allow tacit requirements made on biomass suppliers to be considered at the design stage of new developments. The method used is a combination of the Analytical Hierarchy Process and the Quality Function Deployment methods (AHP-QFD). The output of the method is a ranking and relative weighting of the available suppliers which could be used to improve optimization algorithms such as linear and goal programming. The paper is at a conceptual stage and no results have been obtained. The aim is to use the AHP-QFD method to bridge the gap between treatment of explicit and tacit requirements of bioenergy schemes; allowing decision makers to identify the most successful supply strategy available.
Resumo:
We describe a novel and potentially important tool for candidate subunit vaccine selection through in silico reverse-vaccinology. A set of Bayesian networks able to make individual predictions for specific subcellular locations is implemented in three pipelines with different architectures: a parallel implementation with a confidence level-based decision engine and two serial implementations with a hierarchical decision structure, one initially rooted by prediction between membrane types and another rooted by soluble versus membrane prediction. The parallel pipeline outperformed the serial pipeline, but took twice as long to execute. The soluble-rooted serial pipeline outperformed the membrane-rooted predictor. Assessment using genomic test sets was more equivocal, as many more predictions are made by the parallel pipeline, yet the serial pipeline identifies 22 more of the 74 proteins of known location.
Resumo:
One of the major drawbacks for mobile nodes in wireless networks is power management. Our goal is to evaluate the performance power control scheme to be used to reduce network congestion, improve quality of service and collision avoidance in vehicular network and road safety application. Some of the importance of power control (PC) are improving spatial reuse, and increasing network capacity in mobile wireless communications. In this simulation we have evaluated the performance of existing rate algorithms compared with context Aware Rate selection algorithm (ACARS) and also seen the performance of ACARS and how it can be applied to road safety, improve network control and power management. Result shows that ACARS is able to minimize the total transmit power in the presence of propagation processes and mobility of vehicles, by adapting to the fast varying channels conditions with the Path loss exponent values that was used for that environment which is shown in the network simulation parameter. Our results have shown that ACARS is a very robust algorithm which performs very well with the effect of propagation processes that is prone to every transmitted signal in mobile networks. © 2013 IEEE.
Resumo:
A segment selection method controlled by Quality of Experience (QoE) factors for Dynamic Adaptive Streaming over HTTP (DASH) is presented in this paper. Current rate adaption algorithms aim to eliminate buffer underrun events by significantly reducing the code rate when experiencing pauses in replay. In reality, however, viewers may choose to accept a level of buffer underrun in order to achieve an improved level of picture fidelity or to accept the degradation in picture fidelity in order to maintain the service continuity. The proposed rate adaption scheme in our work can maximize the user QoE in terms of both continuity and fidelity (picture quality) in DASH applications. It is shown that using this scheme a high level of quality for streaming services, especially at low packet loss rates, can be achieved. Our scheme can also maintain a best trade-off between continuity-based quality and fidelity-based quality, by determining proper threshold values for the level of quality intended by clients with different quality requirements. In addition, the integration of the rate adaptation mechanism with the scheduling process is investigated in the context of a mobile communication network and related performances are analyzed.