34 resultados para regression discrete models
Resumo:
Homology modeling was used to build 3D models of the N-methyl-D-aspartate (NMDA) receptor glycine binding site on the basis of an X-ray structure of the water-soluble AMPA-sensitive receptor. The docking of agonists and antagonists to these models was used to reveal binding modes of ligands and to explain known structure-activity relationships. Two types of quantitative models, 3D-QSAR/CoMFA and a regression model based on docking energies, were built for antagonists (derivatives of 4-hydroxy-2-quinolone, quinoxaline-2,3-dione, and related compounds). The CoMFA steric and electrostatic maps were superimposed on the homology-based model, and a close correspondence was marked. The derived computational models have permitted the evaluation of the structural features crucial for high glycine binding site affinity and are important for the design of new ligands.
Resumo:
Conditional Gaussian (CG) distributions allow the inclusion of both discrete and continuous variables in a model assuming that the continuous variable is normally distributed. However, the CG distributions have proved to be unsuitable for survival data which tends to be highly skewed. A new method of analysis is required to take into account continuous variables which are not normally distributed. The aim of this paper is to introduce the more appropriate conditional phase-type (C-Ph) distribution for representing a continuous non-normal variable while also incorporating the causal information in the form of a Bayesian network.
Resumo:
To develop real-time simulations of wind instruments, digital waveguides filters can be used as an efficient representation of the air column. Many aerophones are shaped as horns which can be approximated using conical sections. Therefore the derivation of conical waveguide filters is of special interest. When these filters are used in combination with a generalized reed excitation, several classes of wind instruments can be simulated. In this paper we present the methods for transforming a continuous description of conical tube segments to a discrete filter representation. The coupling of the reed model with the conical waveguide and a simplified model of the termination at the open end are described in the same way. It turns out that the complete lossless conical waveguide requires only one type of filter.Furthermore, we developed a digital reed excitation model, which is purely based on numerical integration methods, i.e., without the use of a look-up table.
Resumo:
The relationships among organisms and their surroundings can be of immense complexity. To describe and understand an ecosystem as a tangled bank, multiple ways of interaction and their effects have to be considered, such as predation, competition, mutualism and facilitation. Understanding the resulting interaction networks is a challenge in changing environments, e.g. to predict knock-on effects of invasive species and to understand how climate change impacts biodiversity. The elucidation of complex ecological systems with their interactions will benefit enormously from the development of new machine learning tools that aim to infer the structure of interaction networks from field data. In the present study, we propose a novel Bayesian regression and multiple changepoint model (BRAM) for reconstructing species interaction networks from observed species distributions. The model has been devised to allow robust inference in the presence of spatial autocorrelation and distributional heterogeneity. We have evaluated the model on simulated data that combines a trophic niche model with a stochastic population model on a 2-dimensional lattice, and we have compared the performance of our model with L1-penalized sparse regression (LASSO) and non-linear Bayesian networks with the BDe scoring scheme. In addition, we have applied our method to plant ground coverage data from the western shore of the Outer Hebrides with the objective to infer the ecological interactions. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.
Resumo:
Purpose – Under investigation is Prosecco wine, a sparkling white wine from North-East Italy.
Information collection on consumer perceptions is particularly relevant when developing market
strategies for wine, especially so when local production and certification of origin play an important
role in the wine market of a given district, as in the case at hand. Investigating and characterizing the
structure of preference heterogeneity become crucial steps in every successful marketing strategy. The
purpose of this paper is to investigate the sources of systematic differences in consumer preferences.
Design/methodology/approach – The paper explores the effect of inclusion of answers to
attitudinal questions in a latent class regression model of stated willingness to pay (WTP) for this
specialty wine. These additional variables were included in the membership equations to investigate
whether they could be of help in the identification of latent classes. The individual specific WTPs from
the sampled respondents were then derived from the best fitting model and examined for consistency.
Findings – The use of answers to attitudinal question in the latent class regression model is found to
improve model fit, thereby helping in the identification of latent classes. The best performing model
obtained makes use of both attitudinal scores and socio-economic covariates identifying five latent
classes. A reasonable pattern of differences in WTP for Prosecco between CDO and TGI types were
derived from this model.
Originality/value – The approach appears informative and promising: attitudes emerge as
important ancillary indicators of taste differences for specialty wines. This might be of interest per se
and of practical use in market segmentation. If future research shows that these variables can be of use
in other contexts, it is quite possible that more attitudinal questions will be routinely incorporated in
structural latent class hedonic models.
Resumo:
Increasingly semiconductor manufacturers are exploring opportunities for virtual metrology (VM) enabled process monitoring and control as a means of reducing non-value added metrology and achieving ever more demanding wafer fabrication tolerances. However, developing robust, reliable and interpretable VM models can be very challenging due to the highly correlated input space often associated with the underpinning data sets. A particularly pertinent example is etch rate prediction of plasma etch processes from multichannel optical emission spectroscopy data. This paper proposes a novel input-clustering based forward stepwise regression methodology for VM model building in such highly correlated input spaces. Max Separation Clustering (MSC) is employed as a pre-processing step to identify a reduced srt of well-conditioned, representative variables that can then be used as inputs to state-of-the-art model building techniques such as Forward Selection Regression (FSR), Ridge regression, LASSO and Forward Selection Ridge Regression (FCRR). The methodology is validated on a benchmark semiconductor plasma etch dataset and the results obtained are compared with those achieved when the state-of-art approaches are applied directly to the data without the MSC pre-processing step. Significant performance improvements are observed when MSC is combined with FSR (13%) and FSRR (8.5%), but not with Ridge Regression (-1%) or LASSO (-32%). The optimal VM results are obtained using the MSC-FSR and MSC-FSRR generated models. © 2012 IEEE.
Resumo:
Virtual metrology (VM) aims to predict metrology values using sensor data from production equipment and physical metrology values of preceding samples. VM is a promising technology for the semiconductor manufacturing industry as it can reduce the frequency of in-line metrology operations and provide supportive information for other operations such as fault detection, predictive maintenance and run-to-run control. The prediction models for VM can be from a large variety of linear and nonlinear regression methods and the selection of a proper regression method for a specific VM problem is not straightforward, especially when the candidate predictor set is of high dimension, correlated and noisy. Using process data from a benchmark semiconductor manufacturing process, this paper evaluates the performance of four typical regression methods for VM: multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), neural networks (NN) and Gaussian process regression (GPR). It is observed that GPR performs the best among the four methods and that, remarkably, the performance of linear regression approaches that of GPR as the subset of selected input variables is increased. The observed competitiveness of high-dimensional linear regression models, which does not hold true in general, is explained in the context of extreme learning machines and functional link neural networks.
Resumo:
We study the sensitivity of a MAP configuration of a discrete probabilistic graphical model with respect to perturbations of its parameters. These perturbations are global, in the sense that simultaneous perturbations of all the parameters (or any chosen subset of them) are allowed. Our main contribution is an exact algorithm that can check whether the MAP configuration is robust with respect to given perturbations. Its complexity is essentially the same as that of obtaining the MAP configuration itself, so it can be promptly used with minimal effort. We use our algorithm to identify the largest global perturbation that does not induce a change in the MAP configuration, and we successfully apply this robustness measure in two practical scenarios: the prediction of facial action units with posed images and the classification of multiple real public data sets. A strong correlation between the proposed robustness measure and accuracy is verified in both scenarios.
Resumo:
A forward and backward least angle regression (LAR) algorithm is proposed to construct the nonlinear autoregressive model with exogenous inputs (NARX) that is widely used to describe a large class of nonlinear dynamic systems. The main objective of this paper is to improve model sparsity and generalization performance of the original forward LAR algorithm. This is achieved by introducing a replacement scheme using an additional backward LAR stage. The backward stage replaces insignificant model terms selected by forward LAR with more significant ones, leading to an improved model in terms of the model compactness and performance. A numerical example to construct four types of NARX models, namely polynomials, radial basis function (RBF) networks, neuro fuzzy and wavelet networks, is presented to illustrate the effectiveness of the proposed technique in comparison with some popular methods.
Resumo:
While the repeated nature of Discrete Choice Experiments is advantageous from a sampling efficiency perspective, patterns of choice may differ across the tasks, due, in part, to learning and fatigue. Using probabilistic decision process models, we find in a field study that learning and fatigue behavior may only be exhibited by a small subset of respondents. Most respondents in our sample show preference and variance stability consistent with rational pre-existent and
well formed preferences. Nearly all of the remainder exhibit both learning and fatigue effects. An important aspect of our approach is that it enables learning and fatigue effects to be explored, even though they were not envisaged during survey design or data collection.
Resumo:
Retinopathy of prematurity (ROP) is a rare disease in which retinal blood vessels of premature infants fail to develop normally, and is one of the major causes of childhood blindness throughout the world. The Discrete Conditional Phase-type (DC-Ph) model consists of two components, the conditional component measuring the inter-relationships between covariates and the survival component which models the survival distribution using a Coxian phase-type distribution. This paper expands the DC-Ph models by introducing a support vector machine (SVM), in the role of the conditional component. The SVM is capable of classifying multiple outcomes and is used to identify the infant's risk of developing ROP. Class imbalance makes predicting rare events difficult. A new class decomposition technique, which deals with the problem of multiclass imbalance, is introduced. Based on the SVM classification, the length of stay in the neonatal ward is modelled using a 5, 8 or 9 phase Coxian distribution.
Resumo:
This paper contributes to the understanding of lime-mortar masonry strength and deformation (which determine durability and allowable stresses/stiffness in design codes) by measuring the mechanical properties of brick bound with lime and lime-cement mortars. Based on the regression analysis of experimental results, models to estimate lime-mortar masonry compressive strength are proposed (less accurate for hydrated lime (CL90s) masonry due to the disparity between mortar and brick strengths). Also, three relationships between masonry elastic modulus and its compressive strength are proposed for cement-lime; hydraulic lime (NHL3.5 and 5); and hydrated/feebly hydraulic lime masonries respectively.
Disagreement between the experimental results and former mathematical prediction models (proposed primarily for cement masonry) is caused by a lack of provision for the significant deformation of lime masonry and the relative changes in strength and stiffness between mortar and brick over time (at 6 months and 1 year, the NHL 3.5 and 5 mortars are often stronger than the brick). Eurocode 6 provided the best predictions for the compressive strength of lime and cement-lime masonry based on the strength of their components. All models vastly overestimated the strength of CL90s masonry at 28 days however, Eurocode 6 became an accurate predictor after 6 months, when the mortar had acquired most of its final strength and stiffness.
The experimental results agreed with former stress-strain curves. It was evidenced that mortar strongly impacts masonry deformation, and that the masonry stress/strain relationship becomes increasingly non-linear as mortar strength lowers. It was also noted that, the influence of masonry stiffness on its compressive strength becomes smaller as the mortar hydraulicity increases.
Resumo:
In many applications, and especially those where batch processes are involved, a target scalar output of interest is often dependent on one or more time series of data. With the exponential growth in data logging in modern industries such time series are increasingly available for statistical modeling in soft sensing applications. In order to exploit time series data for predictive modelling, it is necessary to summarise the information they contain as a set of features to use as model regressors. Typically this is done in an unsupervised fashion using simple techniques such as computing statistical moments, principal components or wavelet decompositions, often leading to significant information loss and hence suboptimal predictive models. In this paper, a functional learning paradigm is exploited in a supervised fashion to derive continuous, smooth estimates of time series data (yielding aggregated local information), while simultaneously estimating a continuous shape function yielding optimal predictions. The proposed Supervised Aggregative Feature Extraction (SAFE) methodology can be extended to support nonlinear predictive models by embedding the functional learning framework in a Reproducing Kernel Hilbert Spaces setting. SAFE has a number of attractive features including closed form solution and the ability to explicitly incorporate first and second order derivative information. Using simulation studies and a practical semiconductor manufacturing case study we highlight the strengths of the new methodology with respect to standard unsupervised feature extraction approaches.
Resumo:
Numerical sound synthesis is often carried out using the finite difference time domain method. In order to analyse the stability of the derived models, energy methods can be used for both linear and nonlinear settings. For Hamiltonian systems the existence of a conserved numerical energy-like quantity can be used to guarantee the stability of the simulations. In this paper it is shown how to derive similar discrete conservation laws in cases where energy is dissipated due to friction or in the presence of an energy source due to an external force. A damped harmonic oscillator (for which an analytic solution is available) is used to present the proposed methodology. After showing how to arrive at a conserved quantity, the simulation of a nonlinear single reed shows an example of an application in the context of musical acoustics.