912 resultados para Model selection
Resumo:
A novel technique for estimating the rank of the trajectory matrix in the local subspace affinity (LSA) motion segmentation framework is presented. This new rank estimation is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built with LSA. The result is an enhanced model selection technique for trajectory matrix rank estimation by which it is possible to automate LSA, without requiring any a priori knowledge, and to improve the final segmentation
Big Decisions and Sparse Data: Adapting Scientific Publishing to the Needs of Practical Conservation
Resumo:
The biggest challenge in conservation biology is breaking down the gap between research and practical management. A major obstacle is the fact that many researchers are unwilling to tackle projects likely to produce sparse or messy data because the results would be difficult to publish in refereed journals. The obvious solution to sparse data is to build up results from multiple studies. Consequently, we suggest that there needs to be greater emphasis in conservation biology on publishing papers that can be built on by subsequent research rather than on papers that produce clear results individually. This building approach requires: (1) a stronger theoretical framework, in which researchers attempt to anticipate models that will be relevant in future studies and incorporate expected differences among studies into those models; (2) use of modern methods for model selection and multi-model inference, and publication of parameter estimates under a range of plausible models; (3) explicit incorporation of prior information into each case study; and (4) planning management treatments in an adaptive framework that considers treatments applied in other studies. We encourage journals to publish papers that promote this building approach rather than expecting papers to conform to traditional standards of rigor as stand-alone papers, and believe that this shift in publishing philosophy would better encourage researchers to tackle the most urgent conservation problems.
Resumo:
Understanding the effect of habitat fragmentation is a fundamental yet complicated aim of many ecological studies. Beni savanna is a naturally fragmented forest habitat, where forest islands exhibit variation in resources and threats. To understand how the availability of resources and threats affect the use of forest islands by parrots, we applied occupancy modeling to quantify use and detection probabilities for 12 parrot species on 60 forest islands. The presence of urucuri (Attalea phalerata) and macaw (Acrocomia aculeata) palms, the number of tree cavities on the islands, and the presence of selective logging,and fire were included as covariates associated with availability of resources and threats. The model-selection analysis indicated that both resources and threats variables explained the use of forest islands by parrots. For most species, the best models confirmed predictions. The number of cavities was positively associated with use of forest islands by 11 species. The area of the island and the presence of macaw palm showed a positive association with the probability of use by seven and five species, respectively, while selective logging and fire showed a negative association with five and six species, respectively. The Blue-throated Macaw (Ara glaucogularis), the critically endangered parrot species endemic to our study area, was the only species that showed a negative association with both threats. Monitoring continues to be essential to evaluate conservation and management actions of parrot populations. Understanding of how species are using this natural fragmented habitat will help determine which fragments should be preserved and which conservation actions are needed.
Resumo:
A construction algorithm for multioutput radial basis function (RBF) network modelling is introduced by combining a locally regularised orthogonal least squares (LROLS) model selection with a D-optimality experimental design. The proposed algorithm aims to achieve maximised model robustness and sparsity via two effective and complementary approaches. The LROLS method alone is capable of producing a very parsimonious RBF network model with excellent generalisation performance. The D-optimality design criterion enhances the model efficiency and robustness. A further advantage of the combined approach is that the user only needs to specify a weighting for the D-optimality cost in the combined RBF model selecting criterion and the entire model construction procedure becomes automatic. The value of this weighting does not influence the model selection procedure critically and it can be chosen with ease from a wide range of values.
Resumo:
The note proposes an efficient nonlinear identification algorithm by combining a locally regularized orthogonal least squares (LROLS) model selection with a D-optimality experimental design. The proposed algorithm aims to achieve maximized model robustness and sparsity via two effective and complementary approaches. The LROLS method alone is capable of producing a very parsimonious model with excellent generalization performance. The D-optimality design criterion further enhances the model efficiency and robustness. An added advantage is that the user only needs to specify a weighting for the D-optimality cost in the combined model selecting criterion and the entire model construction procedure becomes automatic. The value of this weighting does not influence the model selection procedure critically and it can be chosen with ease from a wide range of values.
Resumo:
The paper introduces an efficient construction algorithm for obtaining sparse linear-in-the-weights regression models based on an approach of directly optimizing model generalization capability. This is achieved by utilizing the delete-1 cross validation concept and the associated leave-one-out test error also known as the predicted residual sums of squares (PRESS) statistic, without resorting to any other validation data set for model evaluation in the model construction process. Computational efficiency is ensured using an orthogonal forward regression, but the algorithm incrementally minimizes the PRESS statistic instead of the usual sum of the squared training errors. A local regularization method can naturally be incorporated into the model selection procedure to further enforce model sparsity. The proposed algorithm is fully automatic, and the user is not required to specify any criterion to terminate the model construction procedure. Comparisons with some of the existing state-of-art modeling methods are given, and several examples are included to demonstrate the ability of the proposed algorithm to effectively construct sparse models that generalize well.
Resumo:
Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm.
Resumo:
We consider the finite sample properties of model selection by information criteria in conditionally heteroscedastic models. Recent theoretical results show that certain popular criteria are consistent in that they will select the true model asymptotically with probability 1. To examine the empirical relevance of this property, Monte Carlo simulations are conducted for a set of non–nested data generating processes (DGPs) with the set of candidate models consisting of all types of model used as DGPs. In addition, not only is the best model considered but also those with similar values of the information criterion, called close competitors, thus forming a portfolio of eligible models. To supplement the simulations, the criteria are applied to a set of economic and financial series. In the simulations, the criteria are largely ineffective at identifying the correct model, either as best or a close competitor, the parsimonious GARCH(1, 1) model being preferred for most DGPs. In contrast, asymmetric models are generally selected to represent actual data. This leads to the conjecture that the properties of parameterizations of processes commonly used to model heteroscedastic data are more similar than may be imagined and that more attention needs to be paid to the behaviour of the standardized disturbances of such models, both in simulation exercises and in empirical modelling.
Resumo:
In this paper we propose an efficient two-level model identification method for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularization parameters in the elastic net are optimized using a particle swarm optimization (PSO) algorithm at the upper level by minimizing the leave one out (LOO) mean square error (LOOMSE). Illustrative examples are included to demonstrate the effectiveness of the new approaches.
Resumo:
We analyse by simulation the impact of model-selection strategies (sometimes called pre-testing) on forecast performance in both constant-and non-constant-parameter processes. Restricted, unrestricted and selected models are compared when either of the first two might generate the data. We find little evidence that strategies such as general-to-specific induce significant over-fitting, or thereby cause forecast-failure rejection rates to greatly exceed nominal sizes. Parameter non-constancies put a premium on correct specification, but in general, model-selection effects appear to be relatively small, and progressive research is able to detect the mis-specifications.
Resumo:
This paper describes some recent advances and contributions to our understanding of economic forecasting. The framework we develop helps explain the findings of forecasting competitions and the prevalence of forecast failure. It constitutes a general theoretical background against which recent results can be judged. We compare this framework to a previous formulation, which was silent on the very issues of most concern to the forecaster. We describe a number of aspects which it illuminates, and draw out the implications for model selection. Finally, we discuss the areas where research remains needed to clarify empirical findings which lack theoretical explanations.
Resumo:
In this paper we discuss the current state-of-the-art in estimating, evaluating, and selecting among non-linear forecasting models for economic and financial time series. We review theoretical and empirical issues, including predictive density, interval and point evaluation and model selection, loss functions, data-mining, and aggregation. In addition, we argue that although the evidence in favor of constructing forecasts using non-linear models is rather sparse, there is reason to be optimistic. However, much remains to be done. Finally, we outline a variety of topics for future research, and discuss a number of areas which have received considerable attention in the recent literature, but where many questions remain.
Resumo:
An efficient two-level model identification method aiming at maximising a model׳s generalisation capability is proposed for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularisation parameters in the elastic net are optimised using a particle swarm optimisation (PSO) algorithm at the upper level by minimising the leave one out (LOO) mean square error (LOOMSE). There are two elements of original contributions. Firstly an elastic net cost function is defined and applied based on orthogonal decomposition, which facilitates the automatic model structure selection process with no need of using a predetermined error tolerance to terminate the forward selection process. Secondly it is shown that the LOOMSE based on the resultant ENOFR models can be analytically computed without actually splitting the data set, and the associate computation cost is small due to the ENOFR procedure. Consequently a fully automated procedure is achieved without resort to any other validation data set for iterative model evaluation. Illustrative examples are included to demonstrate the effectiveness of the new approaches.
Resumo:
An efficient data based-modeling algorithm for nonlinear system identification is introduced for radial basis function (RBF) neural networks with the aim of maximizing generalization capability based on the concept of leave-one-out (LOO) cross validation. Each of the RBF kernels has its own kernel width parameter and the basic idea is to optimize the multiple pairs of regularization parameters and kernel widths, each of which is associated with a kernel, one at a time within the orthogonal forward regression (OFR) procedure. Thus, each OFR step consists of one model term selection based on the LOO mean square error (LOOMSE), followed by the optimization of the associated kernel width and regularization parameter, also based on the LOOMSE. Since like our previous state-of-the-art local regularization assisted orthogonal least squares (LROLS) algorithm, the same LOOMSE is adopted for model selection, our proposed new OFR algorithm is also capable of producing a very sparse RBF model with excellent generalization performance. Unlike our previous LROLS algorithm which requires an additional iterative loop to optimize the regularization parameters as well as an additional procedure to optimize the kernel width, the proposed new OFR algorithm optimizes both the kernel widths and regularization parameters within the single OFR procedure, and consequently the required computational complexity is dramatically reduced. Nonlinear system identification examples are included to demonstrate the effectiveness of this new approach in comparison to the well-known approaches of support vector machine and least absolute shrinkage and selection operator as well as the LROLS algorithm.
Resumo:
This paper investigates the feasibility of using approximate Bayesian computation (ABC) to calibrate and evaluate complex individual-based models (IBMs). As ABC evolves, various versions are emerging, but here we only explore the most accessible version, rejection-ABC. Rejection-ABC involves running models a large number of times, with parameters drawn randomly from their prior distributions, and then retaining the simulations closest to the observations. Although well-established in some fields, whether ABC will work with ecological IBMs is still uncertain. Rejection-ABC was applied to an existing 14-parameter earthworm energy budget IBM for which the available data consist of body mass growth and cocoon production in four experiments. ABC was able to narrow the posterior distributions of seven parameters, estimating credible intervals for each. ABC’s accepted values produced slightly better fits than literature values do. The accuracy of the analysis was assessed using cross-validation and coverage, currently the best available tests. Of the seven unnarrowed parameters, ABC revealed that three were correlated with other parameters, while the remaining four were found to be not estimable given the data available. It is often desirable to compare models to see whether all component modules are necessary. Here we used ABC model selection to compare the full model with a simplified version which removed the earthworm’s movement and much of the energy budget. We are able to show that inclusion of the energy budget is necessary for a good fit to the data. We show how our methodology can inform future modelling cycles, and briefly discuss how more advanced versions of ABC may be applicable to IBMs. We conclude that ABC has the potential to represent uncertainty in model structure, parameters and predictions, and to embed the often complex process of optimizing an IBM’s structure and parameters within an established statistical framework, thereby making the process more transparent and objective.