32 resultados para Model selection
em Aston University Research Archive
Resumo:
We discuss aggregation of data from neuropsychological patients and the process of evaluating models using data from a series of patients. We argue that aggregation can be misleading but not aggregating can also result in information loss. The basis for combining data needs to be theoretically defined, and the particular method of aggregation depends on the theoretical question and characteristics of the data. We present examples, often drawn from our own research, to illustrate these points. We also argue that statistical models and formal methods of model selection are a useful way to test theoretical accounts using data from several patients in multiple-case studies or case series. Statistical models can often measure fit in a way that explicitly captures what a theory allows; the parameter values that result from model fitting often measure theoretically important dimensions and can lead to more constrained theories or new predictions; and model selection allows the strength of evidence for models to be quantified without forcing this into the artificial binary choice that characterizes hypothesis testing methods. Methods that aggregate and then formally model patient data, however, are not automatically preferred to other methods. Which method is preferred depends on the question to be addressed, characteristics of the data, and practical issues like availability of suitable patients, but case series, multiple-case studies, single-case studies, statistical models, and process models should be complementary methods when guided by theory development.
Resumo:
We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler 'naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leave-one-out estimator for model selection and 2. the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach.
Resumo:
We report the case of a neologistic jargonaphasic and ask whether her target-related and abstruse neologisms are the result of a single deficit, which affects some items more severely than others, or two deficits: one to lexical access and the other to phonological encoding. We analyse both correct/incorrect performance and errors and apply both traditional and formal methods (maximum-likelihood estimation and model selection). All evidence points to a single deficit at the level of phonological encoding. Further characteristics are used to constrain the locus still further. V.S. does not show the type of length effect expected of a memory component, nor the pattern of errors associated with an articulatory deficit. We conclude that her neologistic errors can result from a single deficit at a level of phonological encoding that immediately follows lexical access where segments are represented in terms of their features. We do not conclude, however, that this is the only possible locus that will produce phonological errors in aphasia, or, indeed, jargonaphasia.
Resumo:
The thesis presents an experimentally validated modelling study of the flow of combustion air in an industrial radiant tube burner (RTB). The RTB is used typically in industrial heat treating furnaces. The work has been initiated because of the need for improvements in burner lifetime and performance which are related to the fluid mechanics of the com busting flow, and a fundamental understanding of this is therefore necessary. To achieve this, a detailed three-dimensional Computational Fluid Dynamics (CFD) model has been used, validated with experimental air flow, temperature and flue gas measurements. Initially, the work programme is presented and the theory behind RTB design and operation in addition to the theory behind swirling flows and methane combustion. NOx reduction techniques are discussed and numerical modelling of combusting flows is detailed in this section. The importance of turbulence, radiation and combustion modelling is highlighted, as well as the numerical schemes that incorporate discretization, finite volume theory and convergence. The study first focuses on the combustion air flow and its delivery to the combustion zone. An isothermal computational model was developed to allow the examination of the flow characteristics as it enters the burner and progresses through the various sections prior to the discharge face in the combustion area. Important features identified include the air recuperator swirler coil, the step ring, the primary/secondary air splitting flame tube and the fuel nozzle. It was revealed that the effectiveness of the air recuperator swirler is significantly compromised by the need for a generous assembly tolerance. Also, there is a substantial circumferential flow maldistribution introduced by the swirier, but that this is effectively removed by the positioning of a ring constriction in the downstream passage. Computations using the k-ε turbulence model show good agreement with experimentally measured velocity profiles in the combustion zone and proved the use of the modelling strategy prior to the combustion study. Reasonable mesh independence was obtained with 200,000 nodes. Agreement was poorer with the RNG k-ε and Reynolds Stress models. The study continues to address the combustion process itself and the heat transfer process internal to the RTB. A series of combustion and radiation model configurations were developed and the optimum combination of the Eddy Dissipation (ED) combustion model and the Discrete Transfer (DT) radiation model was used successfully to validate a burner experimental test. The previously cold flow validated k-ε turbulence model was used and reasonable mesh independence was obtained with 300,000 nodes. The combination showed good agreement with temperature measurements in the inner and outer walls of the burner, as well as with flue gas composition measured at the exhaust. The inner tube wall temperature predictions validated the experimental measurements in the largest portion of the thermocouple locations, highlighting a small flame bias to one side, although the model slightly over predicts the temperatures towards the downstream end of the inner tube. NOx emissions were initially over predicted, however, the use of a combustion flame temperature limiting subroutine allowed convergence to the experimental value of 451 ppmv. With the validated model, the effectiveness of certain RTB features identified previously is analysed, and an analysis of the energy transfers throughout the burner is presented, to identify the dominant mechanisms in each region. The optimum turbulence-combustion-radiation model selection was then the baseline for further model development. One of these models, an eccentrically positioned flame tube model highlights the failure mode of the RTB during long term operation. Other models were developed to address NOx reduction and improvement of the flame profile in the burner combustion zone. These included a modified fuel nozzle design, with 12 circular section fuel ports, which demonstrates a longer and more symmetric flame, although with limited success in NOx reduction. In addition, a zero bypass swirler coil model was developed that highlights the effect of the stronger swirling combustion flow. A reduced diameter and a 20 mm forward displaced flame tube model shows limited success in NOx reduction; although the latter demonstrated improvements in the discharge face heat distribution and improvements in the flame symmetry. Finally, Flue Gas Recirculation (FGR) modelling attempts indicate the difficulty of the application of this NOx reduction technique in the Wellman RTB. Recommendations for further work are made that include design mitigations for the fuel nozzle and further burner modelling is suggested to improve computational validation. The introduction of fuel staging is proposed, as well as a modification in the inner tube to enhance the effect of FGR.
Resumo:
This thesis is a study of low-dimensional visualisation methods for data visualisation under certainty of the input data. It focuses on the two main feed-forward neural network algorithms which are NeuroScale and Generative Topographic Mapping (GTM) by trying to make both algorithms able to accommodate the uncertainty. The two models are shown not to work well under high levels of noise within the data and need to be modified. The modification of both models, NeuroScale and GTM, are verified by using synthetic data to show their ability to accommodate the noise. The thesis is interested in the controversy surrounding the non-uniqueness of predictive gene lists (PGL) of predicting prognosis outcome of breast cancer patients as available in DNA microarray experiments. Many of these studies have ignored the uncertainty issue resulting in random correlations of sparse model selection in high dimensional spaces. The visualisation techniques are used to confirm that the patients involved in such medical studies are intrinsically unclassifiable on the basis of provided PGL evidence. This additional category of ‘unclassifiable’ should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.
Resumo:
Background: The controversy surrounding the non-uniqueness of predictive gene lists (PGL) of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged 1. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists. Methods: We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE) and the Locally Linear Embedding(LLE) techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether a-posteriori two prognosis groups are separable on the evidence of the gene lists. A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset. Results: The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results. Conclusion: The random correlation effect to an arbitrary outcome induced by small subset selection from very high dimensional interrelated gene expression profiles leads to an outcome with associated uncertainty. This continuum and uncertainty precludes any attempts at constructing discriminative classifiers. However a patient's gene expression profile could possibly be used in treatment planning, based on knowledge of other patients' responses. We conclude that many of the patients involved in such medical studies are intrinsically unclassifiable on the basis of provided PGL evidence. This additional category of 'unclassifiable' should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.
Resumo:
Emotional liability and mood dysregulation characterize bipolar disorder (BD), yet no study has examined effective connectivity between parahippocampal gyrus and prefrontal cortical regions in ventromedial and dorsal/lateral neural systems subserving mood regulation in BD. Participants comprised 46 individuals (age range: 18-56 years): 21 with a DSM-IV diagnosis of BD, type I currently remitted; and 25 age- and gender-matched healthy controls (HC). Participants performed an event-related functional magnetic resonance imaging paradigm, viewing mild and intense happy and neutral faces. We employed dynamic causal modeling (DCM) to identify significant alterations in effective connectivity between BD and HC. Bayes model selection was used to determine the best model. The right parahippocampal gyrus (PHG) and right subgenual cingulate gyrus (sgCG) were included as representative regions of the ventromedial neural system. The right dorsolateral prefrontal cortex (DLPFC) region was included as representative of the dorsal/lateral neural system. Right PHG-sgCG effective connectivity was significantly greater in BD than HC, reflecting more rapid, forward PHG-sgCG signaling in BD than HC. There was no between-group difference in sgCG-DLPFC effective connectivity. In BD, abnormally increased right PHG-sgCG effective connectivity and reduced right PHG activity to emotional stimuli suggest a dysfunctional ventromedial neural system implicated in early stimulus appraisal, encoding and automatic regulation of emotion that may represent a pathophysiological functional neural mechanism for mood dysregulation in BD.
Resumo:
This paper suggests a data envelopment analysis (DEA) model for selecting the most efficient alternative in advanced manufacturing technology in the presence of both cardinal and ordinal data. The paper explains the problem of using an iterative method for finding the most efficient alternative and proposes a new DEA model without the need of solving a series of LPs. A numerical example illustrates the model, and an application in technology selection with multi-inputs/multi-outputs shows the usefulness of the proposed approach. © 2012 Springer-Verlag London Limited.
Resumo:
Suboptimal maternal nutrition during gestation results in the establishment of long-term phenotypic changes and an increased disease risk in the offspring. To elucidate how such environmental sensitivity results in physiological outcomes, the molecular characterisation of these offspring has become the focus of many studies. However, the likely modification of key cellular processes such as metabolism in response to maternal undernutrition raises the question of whether the genes typically used as reference constants in gene expression studies are suitable controls. Using a mouse model of maternal protein undernutrition, we have investigated the stability of seven commonly used reference genes (18s, Hprt1, Pgk1, Ppib, Sdha, Tbp and Tuba1) in a variety of offspring tissues including liver, kidney, heart, retro-peritoneal and inter-scapular fat, extra-embryonic placenta and yolk sac, as well as in the preimplantation blastocyst and blastocyst-derived embryonic stem cells. We find that although the selected reference genes are all highly stable within this system, they show tissue, treatment and sex-specific variation. Furthermore, software-based selection approaches rank reference genes differently and do not always identify genes which differ between conditions. Therefore, we recommend that reference gene selection for gene expression studies should be thoroughly validated for each tissue of interest. © 2011 Elsevier Inc.
Resumo:
The existing method of pipeline monitoring, which requires an entire pipeline to be inspected periodically, wastes time and is expensive. A risk-based model that reduces the amount of time spent on inspection has been developed. This model not only reduces the cost of maintaining petroleum pipelines, but also suggests an efficient design and operation philosophy, construction method and logical insurance plans.The risk-based model uses analytic hierarchy process, a multiple attribute decision-making technique, to identify factors that influence failure on specific segments and analyze their effects by determining the probabilities of risk factors. The severity of failure is determined through consequence analysis, which establishes the effect of a failure in terms of cost caused by each risk factor and determines the cumulative effect of failure through probability analysis.
Resumo:
Abstract A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine.
Resumo:
We compare spot patterns generated by Turing mechanisms with those generated by replication cascades, in a model one-dimensional reaction-diffusion system. We determine the stability region of spot solutions in parameter space as a function of a natural control parameter (feed-rate) where degenerate patterns with different numbers of spots coexist for a fixed feed-rate. While it is possible to generate identical patterns via both mechanisms, we show that replication cascades lead to a wider choice of pattern profiles that can be selected through a tuning of the feed-rate, exploiting hysteresis and directionality effects of the different pattern pathways.
Resumo:
Aircraft manufacturing industries are looking for solutions in order to increase their productivity. One of the solutions is to apply the metrology systems during the production and assembly processes. Metrology Process Model (MPM) (Maropoulos et al, 2007) has been introduced which emphasises metrology applications with assembly planning, manufacturing processes and product designing. Measurability analysis is part of the MPM and the aim of this analysis is to check the feasibility for measuring the designed large scale components. Measurability Analysis has been integrated in order to provide an efficient matching system. Metrology database is structured by developing the Metrology Classification Model. Furthermore, the feature-based selection model is also explained. By combining two classification models, a novel approach and selection processes for integrated measurability analysis system (MAS) are introduced and such integrated MAS could provide much more meaningful matching results for the operators. © Springer-Verlag Berlin Heidelberg 2010.
Resumo:
A formalism recently introduced by Prugel-Bennett and Shapiro uses the methods of statistical mechanics to model the dynamics of genetic algorithms. To be of more general interest than the test cases they consider. In this paper, the technique is applied to the subset sum problem, which is a combinatorial optimization problem with a strongly non-linear energy (fitness) function and many local minima under single spin flip dynamics. It is a problem which exhibits an interesting dynamics, reminiscent of stabilizing selection in population biology. The dynamics are solved under certain simplifying assumptions and are reduced to a set of difference equations for a small number of relevant quantities. The quantities used are the population's cumulants, which describe its shape, and the mean correlation within the population, which measures the microscopic similarity of population members. Including the mean correlation allows a better description of the population than the cumulants alone would provide and represents a new and important extension of the technique. The formalism includes finite population effects and describes problems of realistic size. The theory is shown to agree closely to simulations of a real genetic algorithm and the mean best energy is accurately predicted.
Resumo:
Data visualization algorithms and feature selection techniques are both widely used in bioinformatics but as distinct analytical approaches. Until now there has been no method of measuring feature saliency while training a data visualization model. We derive a generative topographic mapping (GTM) based data visualization approach which estimates feature saliency simultaneously with the training of the visualization model. The approach not only provides a better projection by modeling irrelevant features with a separate noise model but also gives feature saliency values which help the user to assess the significance of each feature. We compare the quality of projection obtained using the new approach with the projections from traditional GTM and self-organizing maps (SOM) algorithms. The results obtained on a synthetic and a real-life chemoinformatics dataset demonstrate that the proposed approach successfully identifies feature significance and provides coherent (compact) projections. © 2006 IEEE.