78 resultados para model selection in binary regression

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The identification of non-linear systems using only observed finite datasets has become a mature research area over the last two decades. A class of linear-in-the-parameter models with universal approximation capabilities have been intensively studied and widely used due to the availability of many linear-learning algorithms and their inherent convergence conditions. This article presents a systematic overview of basic research on model selection approaches for linear-in-the-parameter models. One of the fundamental problems in non-linear system identification is to find the minimal model with the best model generalisation performance from observational data only. The important concepts in achieving good model generalisation used in various non-linear system-identification algorithms are first reviewed, including Bayesian parameter regularisation and models selective criteria based on the cross validation and experimental design. A significant advance in machine learning has been the development of the support vector machine as a means for identifying kernel models based on the structural risk minimisation principle. The developments on the convex optimisation-based model construction algorithms including the support vector regression algorithms are outlined. Input selection algorithms and on-line system identification algorithms are also included in this review. Finally, some industrial applications of non-linear models are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the continuing debate over the impact of genetically modified (GM) crops on farmers of developing countries, it is important to accurately measure magnitudes such as farm-level yield gains from GM crop adoption. Yet most farm-level studies in the literature do not control for farmer self-selection, a potentially important source of bias in such estimates. We use farm-level panel data from Indian cotton farmers to investigate the yield effect of GM insect-resistant cotton. We explicitly take into account the fact that the choice of crop variety is an endogenous variable which might lead to bias from self-selection. A production function is estimated using a fixed-effects model to control for selection bias. Our results show that efficient farmers adopt Bacillus thuringiensis (Bt) cotton at a higher rate than their less efficient peers. This suggests that cross-sectional estimates of the yield effect of Bt cotton, which do not control for self-selection effects, are likely to be biased upwards. However, after controlling for selection bias, we still find that there is a significant positive yield effect from adoption of Bt cotton that more than offsets the additional cost of Bt seed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, the mixed logit (ML) using Bayesian methods was employed to examine willingness-to-pay (WTP) to consume bread produced with reduced levels of pesticides so as to ameliorate environmental quality, from data generated by a choice experiment. Model comparison used the marginal likelihood, which is preferable for Bayesian model comparison and testing. Models containing constant and random parameters for a number of distributions were considered, along with models in ‘preference space’ and ‘WTP space’ as well as those allowing for misreporting. We found: strong support for the ML estimated in WTP space; little support for fixing the price coefficient a common practice advocated and adopted in the environmental economics literature; and, weak evidence for misreporting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the continuing debate over the impact of genetically modified (GM) crops on farmers of developing countries, it is important to accurately measure magnitudes such as farm-level yield gains from GM crop adoption. Yet most farm-level studies in the literature do not control for farmer self-selection, a potentially important source of bias in such estimates. We use farm-level panel data from Indian cotton farmers to investigate the yield effect of GM insect-resistant cotton. We explicitly take into account the fact that the choice of crop variety is an endogenous variable which might lead to bias from self-selection. A production function is estimated using a fixed-effects model to control for selection bias. Our results show that efficient farmers adopt Bacillus thuringiensis (Bt) cotton at a higher rate than their less efficient peers. This suggests that cross-sectional estimates of the yield effect of Bt cotton, which do not control for self-selection effects, are likely to be biased upwards. However, after controlling for selection bias, we still find that there is a significant positive yield effect from adoption of Bt cotton that more than offsets the additional cost of Bt seed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper derives some exact power properties of tests for spatial autocorrelation in the context of a linear regression model. In particular, we characterize the circumstances in which the power vanishes as the autocorrelation increases, thus extending the work of Krämer (2005). More generally, the analysis in the paper sheds new light on how the power of tests for spatial autocorrelation is affected by the matrix of regressors and by the spatial structure. We mainly focus on the problem of residual spatial autocorrelation, in which case it is appropriate to restrict attention to the class of invariant tests, but we also consider the case when the autocorrelation is due to the presence of a spatially lagged dependent variable among the regressors. A numerical study aimed at assessing the practical relevance of the theoretical results is included

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The development of effective methods for predicting the quality of three-dimensional (3D) models is fundamentally important for the success of tertiary structure (TS) prediction strategies. Since CASP7, the Quality Assessment (QA) category has existed to gauge the ability of various model quality assessment programs (MQAPs) at predicting the relative quality of individual 3D models. For the CASP8 experiment, automated predictions were submitted in the QA category using two methods from the ModFOLD server-ModFOLD version 1.1 and ModFOLDclust. ModFOLD version 1.1 is a single-model machine learning based method, which was used for automated predictions of global model quality (QMODE1). ModFOLDclust is a simple clustering based method, which was used for automated predictions of both global and local quality (QMODE2). In addition, manual predictions of model quality were made using ModFOLD version 2.0-an experimental method that combines the scores from ModFOLDclust and ModFOLD v1.1. Predictions from the ModFOLDclust method were the most successful of the three in terms of the global model quality, whilst the ModFOLD v1.1 method was comparable in performance to other single-model based methods. In addition, the ModFOLDclust method performed well at predicting the per-residue, or local, model quality scores. Predictions of the per-residue errors in our own 3D models, selected using the ModFOLD v2.0 method, were also the most accurate compared with those from other methods. All of the MQAPs described are publicly accessible via the ModFOLD server at: http://www.reading.ac.uk/bioinf/ModFOLD/. The methods are also freely available to download from: http://www.reading.ac.uk/bioinf/downloads/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper investigates the characteristics of unaccusative verbs in Italian with respect to the consistency with which these verbs select the auxiliaries ‘be’ (essere) and ‘have’ (avere) in compound tense forms. The study builds on the gradient approach to split intransitivity (Sorace 2000) by exploring the behaviour of 29 intransitive Italian verbs with respect to their core-peripheral features: auxiliary selection acceptability ratings and associated variance measures. Although there is clear support for the gradient approach in relation to the general order of semantic categories along the unaccusativity gradient, the results reveal that the ordering of subclasses within the Change group conflict with that currently proposed in the literature. In addition, the findings demonstrate the aspectual and lexical semantic characteristics of internally-caused change-of-state verbs in Italian require further investigation before their auxiliary selection behaviour can be properly understood. Furthermore, contrary to the gradient account, Existence verbs, the most stative and therefore the most peripheral subclass in the unaccusativity hierarchy, exhibit behaviour more characteristic of core unaccusative verbs. This study examines a wider range of semantic subclasses of unaccusative verbs than has hitherto been reported and identifies the core-peripheral boundary for Italian.1

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As part of a large European coastal operational oceanography project (ECOOP), we have developed a web portal for the display and comparison of model and in situ marine data. The distributed model and in situ datasets are accessed via an Open Geospatial Consortium Web Map Service (WMS) and Web Feature Service (WFS) respectively. These services were developed independently and readily integrated for the purposes of the ECOOP project, illustrating the ease of interoperability resulting from adherence to international standards. The key feature of the portal is the ability to display co-plotted timeseries of the in situ and model data and the quantification of misfits between the two. By using standards-based web technology we allow the user to quickly and easily explore over twenty model data feeds and compare these with dozens of in situ data feeds without being concerned with the low level details of differing file formats or the physical location of the data. Scientific and operational benefits to this work include model validation, quality control of observations, data assimilation and decision support in near real time. In these areas it is essential to be able to bring different data streams together from often disparate locations.