26 resultados para model selection in binary regression

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper suggests a data envelopment analysis (DEA) model for selecting the most efficient alternative in advanced manufacturing technology in the presence of both cardinal and ordinal data. The paper explains the problem of using an iterative method for finding the most efficient alternative and proposes a new DEA model without the need of solving a series of LPs. A numerical example illustrates the model, and an application in technology selection with multi-inputs/multi-outputs shows the usefulness of the proposed approach. © 2012 Springer-Verlag London Limited.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We compare spot patterns generated by Turing mechanisms with those generated by replication cascades, in a model one-dimensional reaction-diffusion system. We determine the stability region of spot solutions in parameter space as a function of a natural control parameter (feed-rate) where degenerate patterns with different numbers of spots coexist for a fixed feed-rate. While it is possible to generate identical patterns via both mechanisms, we show that replication cascades lead to a wider choice of pattern profiles that can be selected through a tuning of the feed-rate, exploiting hysteresis and directionality effects of the different pattern pathways.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Relay selection has been considered as an effective method to improve the performance of cooperative communication. However, the Channel State Information (CSI) used in relay selection can be outdated, yielding severe performance degradation of cooperative communication systems. In this paper, we investigate the relay selection under outdated CSI in a Decode-and-Forward (DF) cooperative system to improve its outage performance. We formulize an optimization problem, where the set of relays that forwards data is optimized to minimize the probability of outage conditioned on the outdated CSI of all the decodable relays’ links. We then propose a novel multiple-relay selection strategy based on the solution of the optimization problem. Simulation results show that the proposed relay selection strategy achieves large improvement of outage performance compared with the existing relay selection strategies combating outdated CSI given in the literature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Markovian models are widely used to analyse quality-of-service properties of both system designs and deployed systems. Thanks to the emergence of probabilistic model checkers, this analysis can be performed with high accuracy. However, its usefulness is heavily dependent on how well the model captures the actual behaviour of the analysed system. Our work addresses this problem for a class of Markovian models termed discrete-time Markov chains (DTMCs). We propose a new Bayesian technique for learning the state transition probabilities of DTMCs based on observations of the modelled system. Unlike existing approaches, our technique weighs observations based on their age, to account for the fact that older observations are less relevant than more recent ones. A case study from the area of bioinformatics workflows demonstrates the effectiveness of the technique in scenarios where the model parameters change over time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Softeam has over 20 years of experience providing UML-based modelling solutions, such as its Modelio modelling tool, and its Constellation enterprise model management and collaboration environment. Due to the increasing number and size of the models used by Softeam’s clients, Softeam joined the MONDO FP7 EU research project, which worked on solutions for these scalability challenges and produced the Hawk model indexer among other results. This paper presents the technical details and several case studies on the integration of Hawk into Softeam’s toolset. The first case study measured the performance of Hawk’s Modelio support using varying amounts of memory for the Neo4j backend. In another case study, Hawk was integrated into Constellation to provide scalable global querying of model repositories. Finally, the combination of Hawk and the Epsilon Generation Language was compared against Modelio for document generation: for the largest model, Hawk was two orders of magnitude faster.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis is a study of low-dimensional visualisation methods for data visualisation under certainty of the input data. It focuses on the two main feed-forward neural network algorithms which are NeuroScale and Generative Topographic Mapping (GTM) by trying to make both algorithms able to accommodate the uncertainty. The two models are shown not to work well under high levels of noise within the data and need to be modified. The modification of both models, NeuroScale and GTM, are verified by using synthetic data to show their ability to accommodate the noise. The thesis is interested in the controversy surrounding the non-uniqueness of predictive gene lists (PGL) of predicting prognosis outcome of breast cancer patients as available in DNA microarray experiments. Many of these studies have ignored the uncertainty issue resulting in random correlations of sparse model selection in high dimensional spaces. The visualisation techniques are used to confirm that the patients involved in such medical studies are intrinsically unclassifiable on the basis of provided PGL evidence. This additional category of ‘unclassifiable’ should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The controversy surrounding the non-uniqueness of predictive gene lists (PGL) of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged 1. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists. Methods: We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE) and the Locally Linear Embedding(LLE) techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether a-posteriori two prognosis groups are separable on the evidence of the gene lists. A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset. Results: The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results. Conclusion: The random correlation effect to an arbitrary outcome induced by small subset selection from very high dimensional interrelated gene expression profiles leads to an outcome with associated uncertainty. This continuum and uncertainty precludes any attempts at constructing discriminative classifiers. However a patient's gene expression profile could possibly be used in treatment planning, based on knowledge of other patients' responses. We conclude that many of the patients involved in such medical studies are intrinsically unclassifiable on the basis of provided PGL evidence. This additional category of 'unclassifiable' should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A theoretical model is presented which describes selection in a genetic algorithm (GA) under a stochastic fitness measure and correctly accounts for finite population effects. Although this model describes a number of selection schemes, we only consider Boltzmann selection in detail here as results for this form of selection are particularly transparent when fitness is corrupted by additive Gaussian noise. Finite population effects are shown to be of fundamental importance in this case, as the noise has no effect in the infinite population limit. In the limit of weak selection we show how the effects of any Gaussian noise can be removed by increasing the population size appropriately. The theory is tested on two closely related problems: the one-max problem corrupted by Gaussian noise and generalization in a perceptron with binary weights. The averaged dynamics can be accurately modelled for both problems using a formalism which describes the dynamics of the GA using methods from statistical mechanics. The second problem is a simple example of a learning problem and by considering this problem we show how the accurate characterization of noise in the fitness evaluation may be relevant in machine learning. The training error (negative fitness) is the number of misclassified training examples in a batch and can be considered as a noisy version of the generalization error if an independent batch is used for each evaluation. The noise is due to the finite batch size and in the limit of large problem size and weak selection we show how the effect of this noise can be removed by increasing the population size. This allows the optimal batch size to be determined, which minimizes computation time as well as the total number of training examples required.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We discuss aggregation of data from neuropsychological patients and the process of evaluating models using data from a series of patients. We argue that aggregation can be misleading but not aggregating can also result in information loss. The basis for combining data needs to be theoretically defined, and the particular method of aggregation depends on the theoretical question and characteristics of the data. We present examples, often drawn from our own research, to illustrate these points. We also argue that statistical models and formal methods of model selection are a useful way to test theoretical accounts using data from several patients in multiple-case studies or case series. Statistical models can often measure fit in a way that explicitly captures what a theory allows; the parameter values that result from model fitting often measure theoretically important dimensions and can lead to more constrained theories or new predictions; and model selection allows the strength of evidence for models to be quantified without forcing this into the artificial binary choice that characterizes hypothesis testing methods. Methods that aggregate and then formally model patient data, however, are not automatically preferred to other methods. Which method is preferred depends on the question to be addressed, characteristics of the data, and practical issues like availability of suitable patients, but case series, multiple-case studies, single-case studies, statistical models, and process models should be complementary methods when guided by theory development.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A nature inspired decentralised multi-agent algorithm is proposed to solve a problem of distributed task selection in which cities produce and store batches of different mail types. Agents must collect and process the mail batches, without a priori knowledge of the available mail at the cities or inter-agent communication. In order to process a different mail type than the previous one, agents must undergo a change-over during which it remains inactive. We propose a threshold based algorithm in order to maximise the overall efficiency (the average amount of mail collected). We show that memory, i.e. the possibility for agents to develop preferences for certain cities, not only leads to emergent cooperation between agents, but also to a significant increase in efficiency (above the theoretical upper limit for any memoryless algorithm), and we systematically investigate the influence of the various model parameters. Finally, we demonstrate the flexibility of the algorithm to changes in circumstances, and its excellent scalability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data fluctuation in multiple measurements of Laser Induced Breakdown Spectroscopy (LIBS) greatly affects the accuracy of quantitative analysis. A new LIBS quantitative analysis method based on the Robust Least Squares Support Vector Machine (RLS-SVM) regression model is proposed. The usual way to enhance the analysis accuracy is to improve the quality and consistency of the emission signal, such as by averaging the spectral signals or spectrum standardization over a number of laser shots. The proposed method focuses more on how to enhance the robustness of the quantitative analysis regression model. The proposed RLS-SVM regression model originates from the Weighted Least Squares Support Vector Machine (WLS-SVM) but has an improved segmented weighting function and residual error calculation according to the statistical distribution of measured spectral data. Through the improved segmented weighting function, the information on the spectral data in the normal distribution will be retained in the regression model while the information on the outliers will be restrained or removed. Copper elemental concentration analysis experiments of 16 certified standard brass samples were carried out. The average value of relative standard deviation obtained from the RLS-SVM model was 3.06% and the root mean square error was 1.537%. The experimental results showed that the proposed method achieved better prediction accuracy and better modeling robustness compared with the quantitative analysis methods based on Partial Least Squares (PLS) regression, standard Support Vector Machine (SVM) and WLS-SVM. It was also demonstrated that the improved weighting function had better comprehensive performance in model robustness and convergence speed, compared with the four known weighting functions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A formalism recently introduced by Prugel-Bennett and Shapiro uses the methods of statistical mechanics to model the dynamics of genetic algorithms. To be of more general interest than the test cases they consider. In this paper, the technique is applied to the subset sum problem, which is a combinatorial optimization problem with a strongly non-linear energy (fitness) function and many local minima under single spin flip dynamics. It is a problem which exhibits an interesting dynamics, reminiscent of stabilizing selection in population biology. The dynamics are solved under certain simplifying assumptions and are reduced to a set of difference equations for a small number of relevant quantities. The quantities used are the population's cumulants, which describe its shape, and the mean correlation within the population, which measures the microscopic similarity of population members. Including the mean correlation allows a better description of the population than the cumulants alone would provide and represents a new and important extension of the technique. The formalism includes finite population effects and describes problems of realistic size. The theory is shown to agree closely to simulations of a real genetic algorithm and the mean best energy is accurately predicted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler 'naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data visualization algorithms and feature selection techniques are both widely used in bioinformatics but as distinct analytical approaches. Until now there has been no method of measuring feature saliency while training a data visualization model. We derive a generative topographic mapping (GTM) based data visualization approach which estimates feature saliency simultaneously with the training of the visualization model. The approach not only provides a better projection by modeling irrelevant features with a separate noise model but also gives feature saliency values which help the user to assess the significance of each feature. We compare the quality of projection obtained using the new approach with the projections from traditional GTM and self-organizing maps (SOM) algorithms. The results obtained on a synthetic and a real-life chemoinformatics dataset demonstrate that the proposed approach successfully identifies feature significance and provides coherent (compact) projections. © 2006 IEEE.