78 resultados para model selection in binary regression


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aimed at reducing deficiencies in representing the Madden-Julian oscillation (MJO) in general circulation models (GCMs), a global model evaluation project on vertical structure and physical processes of the MJO was coordinated. In this paper, results from the climate simulation component of this project are reported. It is shown that the MJO remains a great challenge in these latest generation GCMs. The systematic eastward propagation of the MJO is only well simulated in about one-fourth of the total participating models. The observed vertical westward tilt with altitude of the MJO is well simulated in good MJO models, but not in the poor ones. Damped Kelvin wave responses to the east of convection in the lower troposphere could be responsible for the missing MJO preconditioning process in these poor MJO models. Several process-oriented diagnostics were conducted to discriminate key processes for realistic MJO simulations. While large-scale rainfall partition and low-level mean zonal winds over the Indo-Pacific in a model are not found to be closely associated with its MJO skill, two metrics, including the low-level relative humidity difference between high and low rain events and seasonal mean gross moist stability, exhibit statistically significant correlations with the MJO performance. It is further indicated that increased cloud-radiative feedback tends to be associated with reduced amplitude of intraseasonal variability, which is incompatible with the radiative instability theory previously proposed for the MJO. Results in this study confirm that inclusion of air-sea interaction can lead to significant improvement in simulating the MJO.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During the development of new therapies, it is not uncommon to test whether a new treatment works better than the existing treatment for all patients who suffer from a condition (full population) or for a subset of the full population (subpopulation). One approach that may be used for this objective is to have two separate trials, where in the first trial, data are collected to determine if the new treatment benefits the full population or the subpopulation. The second trial is a confirmatory trial to test the new treatment in the population selected in the first trial. In this paper, we consider the more efficient two-stage adaptive seamless designs (ASDs), where in stage 1, data are collected to select the population to test in stage 2. In stage 2, additional data are collected to perform confirmatory analysis for the selected population. Unlike the approach that uses two separate trials, for ASDs, stage 1 data are also used in the confirmatory analysis. Although ASDs are efficient, using stage 1 data both for selection and confirmatory analysis introduces selection bias and consequently statistical challenges in making inference. We will focus on point estimation for such trials. In this paper, we describe the extent of bias for estimators that ignore multiple hypotheses and selecting the population that is most likely to give positive trial results based on observed stage 1 data. We then derive conditionally unbiased estimators and examine their mean squared errors for different scenarios.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spanish captures the difference between eventive and stative passives via an obligatory choice between two copula; verbal passives take the copula ser and adjectival passives take the copula estar. In this study, we compare and contrast US and Canadian heritage speakers of Spanish on their knowledge of this difference in relation to copula choice in Spanish. The backgrounds of the target groups differ significantly from each other in that only one of them, the Canadian group, has grown up in a societal multilingual environment. We discuss the results as being supportive of two non-mutually exclusive explanation factors: (a) French facilitates (bootstraps) the acquisition of eventive and stative passives and/or (b) the US/Canadian HS differences (e.g. status of bilingualism and the languages at stake) is a reflection of the uniqueness of the language contact situations and the effects this has on the input HSS receive.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An efficient model identification algorithm for a large class of linear-in-the-parameters models is introduced that simultaneously optimises the model approximation ability, sparsity and robustness. The derived model parameters in each forward regression step are initially estimated via the orthogonal least squares (OLS), followed by being tuned with a new gradient-descent learning algorithm based on the basis pursuit that minimises the l(1) norm of the parameter estimate vector. The model subset selection cost function includes a D-optimality design criterion that maximises the determinant of the design matrix of the subset to ensure model robustness and to enable the model selection procedure to automatically terminate at a sparse model. The proposed approach is based on the forward OLS algorithm using the modified Gram-Schmidt procedure. Both the parameter tuning procedure, based on basis pursuit, and the model selection criterion, based on the D-optimality that is effective in ensuring model robustness, are integrated with the forward regression. As a consequence the inherent computational efficiency associated with the conventional forward OLS approach is maintained in the proposed algorithm. Examples demonstrate the effectiveness of the new approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this correspondence new robust nonlinear model construction algorithms for a large class of linear-in-the-parameters models are introduced to enhance model robustness via combined parameter regularization and new robust structural selective criteria. In parallel to parameter regularization, we use two classes of robust model selection criteria based on either experimental design criteria that optimizes model adequacy, or the predicted residual sums of squares (PRESS) statistic that optimizes model generalization capability, respectively. Three robust identification algorithms are introduced, i.e., combined A- and D-optimality with regularized orthogonal least squares algorithm, respectively; and combined PRESS statistic with regularized orthogonal least squares algorithm. A common characteristic of these algorithms is that the inherent computation efficiency associated with the orthogonalization scheme in orthogonal least squares or regularized orthogonal least squares has been extended such that the new algorithms are computationally efficient. Numerical examples are included to demonstrate effectiveness of the algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A physically motivated statistical model is used to diagnose variability and trends in wintertime ( October - March) Global Precipitation Climatology Project (GPCP) pentad (5-day mean) precipitation. Quasi-geostrophic theory suggests that extratropical precipitation amounts should depend multiplicatively on the pressure gradient, saturation specific humidity, and the meridional temperature gradient. This physical insight has been used to guide the development of a suitable statistical model for precipitation using a mixture of generalized linear models: a logistic model for the binary occurrence of precipitation and a Gamma distribution model for the wet day precipitation amount. The statistical model allows for the investigation of the role of each factor in determining variations and long-term trends. Saturation specific humidity q(s) has a generally negative effect on global precipitation occurrence and with the tropical wet pentad precipitation amount, but has a positive relationship with the pentad precipitation amount at mid- and high latitudes. The North Atlantic Oscillation, a proxy for the meridional temperature gradient, is also found to have a statistically significant positive effect on precipitation over much of the Atlantic region. Residual time trends in wet pentad precipitation are extremely sensitive to the choice of the wet pentad threshold because of increasing trends in low-amplitude precipitation pentads; too low a choice of threshold can lead to a spurious decreasing trend in wet pentad precipitation amounts. However, for not too small thresholds, it is found that the meridional temperature gradient is an important factor for explaining part of the long-term trend in Atlantic precipitation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonlinear adjustment toward long-run price equilibrium relationships in the sugar-ethanol-oil nexus in Brazil is examined. We develop generalized bivariate error correction models that allow for cointegration between sugar, ethanol, and oil prices, where dynamic adjustments are potentially nonlinear functions of the disequilibrium errors. A range of models are estimated using Bayesian Monte Carlo Markov Chain algorithms and compared using Bayesian model selection methods. The results suggest that the long-run drivers of Brazilian sugar prices are oil prices and that there are nonlinearities in the adjustment processes of sugar and ethanol prices to oil price but linear adjustment between ethanol and sugar prices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A physically motivated statistical model is used to diagnose variability and trends in wintertime ( October - March) Global Precipitation Climatology Project (GPCP) pentad (5-day mean) precipitation. Quasi-geostrophic theory suggests that extratropical precipitation amounts should depend multiplicatively on the pressure gradient, saturation specific humidity, and the meridional temperature gradient. This physical insight has been used to guide the development of a suitable statistical model for precipitation using a mixture of generalized linear models: a logistic model for the binary occurrence of precipitation and a Gamma distribution model for the wet day precipitation amount. The statistical model allows for the investigation of the role of each factor in determining variations and long-term trends. Saturation specific humidity q(s) has a generally negative effect on global precipitation occurrence and with the tropical wet pentad precipitation amount, but has a positive relationship with the pentad precipitation amount at mid- and high latitudes. The North Atlantic Oscillation, a proxy for the meridional temperature gradient, is also found to have a statistically significant positive effect on precipitation over much of the Atlantic region. Residual time trends in wet pentad precipitation are extremely sensitive to the choice of the wet pentad threshold because of increasing trends in low-amplitude precipitation pentads; too low a choice of threshold can lead to a spurious decreasing trend in wet pentad precipitation amounts. However, for not too small thresholds, it is found that the meridional temperature gradient is an important factor for explaining part of the long-term trend in Atlantic precipitation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The note proposes an efficient nonlinear identification algorithm by combining a locally regularized orthogonal least squares (LROLS) model selection with a D-optimality experimental design. The proposed algorithm aims to achieve maximized model robustness and sparsity via two effective and complementary approaches. The LROLS method alone is capable of producing a very parsimonious model with excellent generalization performance. The D-optimality design criterion further enhances the model efficiency and robustness. An added advantage is that the user only needs to specify a weighting for the D-optimality cost in the combined model selecting criterion and the entire model construction procedure becomes automatic. The value of this weighting does not influence the model selection procedure critically and it can be chosen with ease from a wide range of values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper introduces an efficient construction algorithm for obtaining sparse linear-in-the-weights regression models based on an approach of directly optimizing model generalization capability. This is achieved by utilizing the delete-1 cross validation concept and the associated leave-one-out test error also known as the predicted residual sums of squares (PRESS) statistic, without resorting to any other validation data set for model evaluation in the model construction process. Computational efficiency is ensured using an orthogonal forward regression, but the algorithm incrementally minimizes the PRESS statistic instead of the usual sum of the squared training errors. A local regularization method can naturally be incorporated into the model selection procedure to further enforce model sparsity. The proposed algorithm is fully automatic, and the user is not required to specify any criterion to terminate the model construction procedure. Comparisons with some of the existing state-of-art modeling methods are given, and several examples are included to demonstrate the ability of the proposed algorithm to effectively construct sparse models that generalize well.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An automatic nonlinear predictive model-construction algorithm is introduced based on forward regression and the predicted-residual-sums-of-squares (PRESS) statistic. The proposed algorithm is based on the fundamental concept of evaluating a model's generalisation capability through crossvalidation. This is achieved by using the PRESS statistic as a cost function to optimise model structure. In particular, the proposed algorithm is developed with the aim of achieving computational efficiency, such that the computational effort, which would usually be extensive in the computation of the PRESS statistic, is reduced or minimised. The computation of PRESS is simplified by avoiding a matrix inversion through the use of the orthogonalisation procedure inherent in forward regression, and is further reduced significantly by the introduction of a forward-recursive formula. Based on the properties of the PRESS statistic, the proposed algorithm can achieve a fully automated procedure without resort to any other validation data set for iterative model evaluation. Numerical examples are used to demonstrate the efficacy of the algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose an efficient two-level model identification method for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularization parameters in the elastic net are optimized using a particle swarm optimization (PSO) algorithm at the upper level by minimizing the leave one out (LOO) mean square error (LOOMSE). Illustrative examples are included to demonstrate the effectiveness of the new approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the calibration and evaluation of flood inundation models are a prerequisite for their successful application, there is a clear need to ensure that the performance measures that quantify how well models match the available observations are fit for purpose. This paper evaluates the binary pattern performance measures that are frequently used to compare flood inundation models with observations of flood extent. This evaluation considers whether these measures are able to calibrate and evaluate model predictions in a credible and consistent way, i.e. identifying the underlying model behaviour for a number of different purposes such as comparing models of floods of different magnitudes or on different catchments. Through theoretical examples, it is shown that the binary pattern measures are not consistent for floods of different sizes, such that for the same vertical error in water level, a model of a flood of large magnitude appears to perform better than a model of a smaller magnitude flood. Further, the commonly used Critical Success Index (usually referred to as F<2 >) is biased in favour of overprediction of the flood extent, and is also biased towards correctly predicting areas of the domain with smaller topographic gradients. Consequently, it is recommended that future studies consider carefully the implications of reporting conclusions using these performance measures. Additionally, future research should consider whether a more robust and consistent analysis could be achieved by using elevation comparison methods instead.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An efficient two-level model identification method aiming at maximising a model׳s generalisation capability is proposed for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularisation parameters in the elastic net are optimised using a particle swarm optimisation (PSO) algorithm at the upper level by minimising the leave one out (LOO) mean square error (LOOMSE). There are two elements of original contributions. Firstly an elastic net cost function is defined and applied based on orthogonal decomposition, which facilitates the automatic model structure selection process with no need of using a predetermined error tolerance to terminate the forward selection process. Secondly it is shown that the LOOMSE based on the resultant ENOFR models can be analytically computed without actually splitting the data set, and the associate computation cost is small due to the ENOFR procedure. Consequently a fully automated procedure is achieved without resort to any other validation data set for iterative model evaluation. Illustrative examples are included to demonstrate the effectiveness of the new approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many studies evaluating model boundary-layer schemes focus either on near-surface parameters or on short-term observational campaigns. This reflects the observational datasets that are widely available for use in model evaluation. In this paper we show how surface and long-term Doppler lidar observations, combined in a way to match model representation of the boundary layer as closely as possible, can be used to evaluate the skill of boundary-layer forecasts. We use a 2-year observational dataset from a rural site in the UK to evaluate a climatology of boundary layer type forecast by the UK Met Office Unified Model. In addition, we demonstrate the use of a binary skill score (Symmetric Extremal Dependence Index) to investigate the dependence of forecast skill on season, horizontal resolution and forecast leadtime. A clear diurnal and seasonal cycle can be seen in the climatology of both the model and observations, with the main discrepancies being the model overpredicting cumulus capped and decoupled stratocumulus capped boundary-layers and underpredicting well mixed boundary-layers. Using the SEDI skill score the model is most skillful at predicting the surface stability. The skill of the model in predicting cumulus capped and stratocumulus capped stable boundary layer forecasts is low but greater than a 24 hr persistence forecast. In contrast, the prediction of decoupled boundary-layers and boundary-layers with multiple cloud layers is lower than persistence. This process based evaluation approach has the potential to be applied to other boundary-layer parameterisation schemes with similar decision structures.