882 resultados para Bayesian model selection


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we propose an efficient two-level model identification method for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularization parameters in the elastic net are optimized using a particle swarm optimization (PSO) algorithm at the upper level by minimizing the leave one out (LOO) mean square error (LOOMSE). Illustrative examples are included to demonstrate the effectiveness of the new approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We analyse by simulation the impact of model-selection strategies (sometimes called pre-testing) on forecast performance in both constant-and non-constant-parameter processes. Restricted, unrestricted and selected models are compared when either of the first two might generate the data. We find little evidence that strategies such as general-to-specific induce significant over-fitting, or thereby cause forecast-failure rejection rates to greatly exceed nominal sizes. Parameter non-constancies put a premium on correct specification, but in general, model-selection effects appear to be relatively small, and progressive research is able to detect the mis-specifications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes some recent advances and contributions to our understanding of economic forecasting. The framework we develop helps explain the findings of forecasting competitions and the prevalence of forecast failure. It constitutes a general theoretical background against which recent results can be judged. We compare this framework to a previous formulation, which was silent on the very issues of most concern to the forecaster. We describe a number of aspects which it illuminates, and draw out the implications for model selection. Finally, we discuss the areas where research remains needed to clarify empirical findings which lack theoretical explanations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we discuss the current state-of-the-art in estimating, evaluating, and selecting among non-linear forecasting models for economic and financial time series. We review theoretical and empirical issues, including predictive density, interval and point evaluation and model selection, loss functions, data-mining, and aggregation. In addition, we argue that although the evidence in favor of constructing forecasts using non-linear models is rather sparse, there is reason to be optimistic. However, much remains to be done. Finally, we outline a variety of topics for future research, and discuss a number of areas which have received considerable attention in the recent literature, but where many questions remain.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An efficient two-level model identification method aiming at maximising a model׳s generalisation capability is proposed for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularisation parameters in the elastic net are optimised using a particle swarm optimisation (PSO) algorithm at the upper level by minimising the leave one out (LOO) mean square error (LOOMSE). There are two elements of original contributions. Firstly an elastic net cost function is defined and applied based on orthogonal decomposition, which facilitates the automatic model structure selection process with no need of using a predetermined error tolerance to terminate the forward selection process. Secondly it is shown that the LOOMSE based on the resultant ENOFR models can be analytically computed without actually splitting the data set, and the associate computation cost is small due to the ENOFR procedure. Consequently a fully automated procedure is achieved without resort to any other validation data set for iterative model evaluation. Illustrative examples are included to demonstrate the effectiveness of the new approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An efficient data based-modeling algorithm for nonlinear system identification is introduced for radial basis function (RBF) neural networks with the aim of maximizing generalization capability based on the concept of leave-one-out (LOO) cross validation. Each of the RBF kernels has its own kernel width parameter and the basic idea is to optimize the multiple pairs of regularization parameters and kernel widths, each of which is associated with a kernel, one at a time within the orthogonal forward regression (OFR) procedure. Thus, each OFR step consists of one model term selection based on the LOO mean square error (LOOMSE), followed by the optimization of the associated kernel width and regularization parameter, also based on the LOOMSE. Since like our previous state-of-the-art local regularization assisted orthogonal least squares (LROLS) algorithm, the same LOOMSE is adopted for model selection, our proposed new OFR algorithm is also capable of producing a very sparse RBF model with excellent generalization performance. Unlike our previous LROLS algorithm which requires an additional iterative loop to optimize the regularization parameters as well as an additional procedure to optimize the kernel width, the proposed new OFR algorithm optimizes both the kernel widths and regularization parameters within the single OFR procedure, and consequently the required computational complexity is dramatically reduced. Nonlinear system identification examples are included to demonstrate the effectiveness of this new approach in comparison to the well-known approaches of support vector machine and least absolute shrinkage and selection operator as well as the LROLS algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It is known that large fragment sizes and high connectivity levels are key components for maintaining species in fragments; however, their relative effects are poorly understood, especially in tropical areas. In order to test these effects, we built models for explaining understory birds occurrence in a fragmented Atlantic Rain Forest landscape with intermediate habitat cover (3%). Data from over 9000 mist-net hours from 17 fragments differing in size (2-175 ha) and connectivity (considering corridor linkages and distance to nearby fragments) were ranked under a model selection approach. A total 1293 individuals of 62 species were recorded. Species richness, abundance and compositional variation were mainly affected by connectivity indices that consider the capacity of species to use corridors and/or to cross short distances up to 30 m through the matrix. Bird functional groups were differently affected by area and connectivity: while terrestrial insectivores, omnivores and frugivores were affected by both area and connectivity, the other groups (understory insectivores, nectarivores, and others) were affected only by connectivity. In the studied landscape, well connected fragments can sustain an elevated number of species and individuals. Connectivity gives the opportunity for individuals to use multiple fragments, reducing the influence of fragment size. While preserving large fragments is a conservation target worldwide and should continue to be, our results indicated that connectivity between fragments can enhance the area functionally connected and is beneficial to all functional groups and therefore should be a conservation priority. (C) 2008 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Time-lagged responses of biological variables to landscape modifications are widely recognized, but rarely considered in ecological studies. In order to test for the existence of time-lags in the response of trees, small mammals, birds and frogs to changes in fragment area and connectivity, we studied a fragmented and highly dynamic landscape in the Atlantic forest region. We also investigated the biological correlates associated with differential responses among taxonomic groups. Species richness and abundance for four taxonomic groups were measured in 21 secondary forest fragments during the same period (2000-2002), following a standardized protocol. Data analyses were based on power regressions and model selection procedures. The model inputs included present (2000) and past (1962, 1981) fragment areas and connectivity, as well as observed changes in these parameters. Although past landscape structure was particularly relevant for trees, all taxonomic groups (except small mammals) were affected by landscape dynamics, exhibiting a time-lagged response. Furthermore, fragment area was more important for species groups with lower dispersal capacity, while species with higher dispersal ability had stronger responses to connectivity measures. Although these secondary forest fragments still maintain a large fraction of their original biodiversity, the delay in biological response combined with high rates of deforestation and fast forest regeneration imply in a reduction in the average age of the forest. This also indicates that future species losses are likely, especially those that are more strictly-forest dwellers. Conservation actions should be implemented to reduce species extinction, to maintain old-growth forests and to favour the regeneration process. Our results demonstrate that landscape history can strongly affect the present distribution pattern of species in fragmented landscapes, and should be considered in conservation planning. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Spiders are considered conservative with regard to their resting metabolic rate, presenting the same allometric relation with body mass as the majority of land-arthropods. Nevertheless, web-building is thought to have a great impact on the energetic metabolism, and any modification that affects this complex behavior is expected to have an impact over the daily energetic budget. We analyzed the possibility of the presence of the cribellum having an effect on the allometric relation between resting metabolic rate and body mass for an ecribellate species (Zosis geniculata) and a cribellate one (Metazygia rogenhoferi), and employed a model selection approach to test if these species had the same allometric relationship as other land-arthropods. Our results show that M. rogenhoferi has a higher resting metabolic rate, while Z. geniculata fitted the allometric prediction for land arthropods. This indicates that the absence of the cribellum is associated with a higher resting metabolic rate, thus explaining the higher promptness to activity found for the ecribellate species. If our result proves to be a general rule among spiders, the radiation of Araneoidea could be connected to a more energy-consuming life style. Thus, we briefly outline an alternative model of diversification of Araneoidea that accounts for this possibility. (C) 2011 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

1. Analyses of species association have major implications for selecting indicators for freshwater biomonitoring and conservation, because they allow for the elimination of redundant information and focus on taxa that can be easily handled and identified. These analyses are particularly relevant in the debate about using speciose groups (such as the Chironomidae) as indicators in the tropics, because they require difficult and time-consuming analysis, and their responses to environmental gradients, including anthropogenic stressors, are poorly known. 2. Our objective was to show whether chironomid assemblages in Neotropical streams include clear associations of taxa and, if so, how well these associations could be explained by a set of models containing information from different spatial scales. For this, we formulated a priori models that allowed for the influence of local, landscape and spatial factors on chironomid taxon associations (CTA). These models represented biological hypotheses capable of explaining associations between chironomid taxa. For instance, CTA could be best explained by local variables (e.g. pH, conductivity and water temperature) or by processes acting at wider landscape scales (e.g. percentage of forest cover). 3. Biological data were taken from 61 streams in Southeastern Brazil, 47 of which were in well-preserved regions, and 14 of which drained areas severely affected by anthropogenic activities. We adopted a model selection procedure using Akaike`s information criterion to determine the most parsimonious models for explaining CTA. 4. Applying Kendall`s coefficient of concordance, seven genera (Tanytarsus/Caladomyia, Ablabesmyia, Parametriocnemus, Pentaneura, Nanocladius, Polypedilum and Rheotanytarsus) were identified as associated taxa. The best-supported model explained 42.6% of the total variance in the abundance of associated taxa. This model combined local and landscape environmental filters and spatial variables (which were derived from eigenfunction analysis). However, the model with local filters and spatial variables also had a good chance of being selected as the best model. 5. Standardised partial regression coefficients of local and landscape filters, including spatial variables, derived from model averaging allowed an estimation of which variables were best correlated with the abundance of associated taxa. In general, the abundance of the associated genera tended to be lower in streams characterised by a high percentage of forest cover (landscape scale), lower proportion of muddy substrata and high values of pH and conductivity (local scale). 6. Overall, our main result adds to the increasing number of studies that have indicated the importance of local and landscape variables, as well as the spatial relationships among sampling sites, for explaining aquatic insect community patterns in streams. Furthermore, our findings open new possibilities for the elimination of redundant data in the assessment of anthropogenic impacts on tropical streams.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we extend the long-term survival model proposed by Chen et al. [Chen, M.-H., Ibrahim, J.G., Sinha, D., 1999. A new Bayesian model for survival data with a surviving fraction. journal of the American Statistical Association 94, 909-919] via the generating function of a real sequence introduced by Feller [Feller, W., 1968. An Introduction to Probability Theory and its Applications, third ed., vol. 1, Wiley, New York]. A direct consequence of this new formulation is the unification of the long-term survival models proposed by Berkson and Gage [Berkson, J., Gage, R.P., 1952. Survival cure for cancer patients following treatment. journal of the American Statistical Association 47, 501-515] and Chen et al. (see citation above). Also, we show that the long-term survival function formulated in this paper satisfies the proportional hazards property if, and only if, the number of competing causes related to the occurrence of an event of interest follows a Poisson distribution. Furthermore, a more flexible model than the one proposed by Yin and Ibrahim [Yin, G., Ibrahim, J.G., 2005. Cure rate models: A unified approach. The Canadian journal of Statistics 33, 559-570] is introduced and, motivated by Feller`s results, a very useful competing index is defined. (c) 2008 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods: We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results: Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion: The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Using vector autoregressive (VAR) models and Monte-Carlo simulation methods we investigate the potential gains for forecasting accuracy and estimation uncertainty of two commonly used restrictions arising from economic relationships. The Örst reduces parameter space by imposing long-term restrictions on the behavior of economic variables as discussed by the literature on cointegration, and the second reduces parameter space by imposing short-term restrictions as discussed by the literature on serial-correlation common features (SCCF). Our simulations cover three important issues on model building, estimation, and forecasting. First, we examine the performance of standard and modiÖed information criteria in choosing lag length for cointegrated VARs with SCCF restrictions. Second, we provide a comparison of forecasting accuracy of Ötted VARs when only cointegration restrictions are imposed and when cointegration and SCCF restrictions are jointly imposed. Third, we propose a new estimation algorithm where short- and long-term restrictions interact to estimate the cointegrating and the cofeature spaces respectively. We have three basic results. First, ignoring SCCF restrictions has a high cost in terms of model selection, because standard information criteria chooses too frequently inconsistent models, with too small a lag length. Criteria selecting lag and rank simultaneously have a superior performance in this case. Second, this translates into a superior forecasting performance of the restricted VECM over the VECM, with important improvements in forecasting accuracy ñreaching more than 100% in extreme cases. Third, the new algorithm proposed here fares very well in terms of parameter estimation, even when we consider the estimation of long-term parameters, opening up the discussion of joint estimation of short- and long-term parameters in VAR models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Despite the commonly held belief that aggregate data display short-run comovement, there has been little discussion about the econometric consequences of this feature of the data. We use exhaustive Monte-Carlo simulations to investigate the importance of restrictions implied by common-cyclical features for estimates and forecasts based on vector autoregressive models. First, we show that the ìbestî empirical model developed without common cycle restrictions need not nest the ìbestî model developed with those restrictions. This is due to possible differences in the lag-lengths chosen by model selection criteria for the two alternative models. Second, we show that the costs of ignoring common cyclical features in vector autoregressive modelling can be high, both in terms of forecast accuracy and efficient estimation of variance decomposition coefficients. Third, we find that the Hannan-Quinn criterion performs best among model selection criteria in simultaneously selecting the lag-length and rank of vector autoregressions.